Ok, go easy on me, I'm very new to POE, and to state machines and event
loops and all that fancy stuff. But I have written a nonblocking
server/client application before, so I understand some of the issues.
So here's the setup: I have some very large files that I'm parsing with
Parse::RecDescent. These files consist of multiple records. I'd like to
be able to get access to those records as they are parsed, instead of
waiting for Parse::RecDescent to completely finish parsing the file. To
accomplish this, I'm only going to feed P::RD one record at a time, but I
need to be able to "pause" P::RD while I extract the data it's parsed so
far, and feed in the new text to be parsed in the next round.
So, to repeat with a little bit of example code. Here's the "end-user"
code:
use MyParser;
my $p = new MyParser $somefile;
while(my $record = $p->next_record()) {
# do something with $record
}
exit;
Ahh, so simple to be the end-user.
Note that there's no reason for the end-user code to actually traverse the
entire document - they could exit at any time. I guess I could have a
$p->done() method to somehow signal the parser to stop parsing anything
else.
Here's the code for MyParser.pm:
package MyParser.pm
require Parse::RecDescent;
require Carp;
my $grammar = q{ }; # as above, but I'll read it in from a file,
# or even better, once I know what I'm doing
# I can precompile it into it's own module
# and simply use it.
sub new {
$self = {};
$self->{parser} = new Parse::RecDescent ($grammar);
$self->{filename} = shift ||
croak("Must supply filename to MyParser::new()");
open($self->{filehandle}, "<$filename") ||
croak("Cannot open file '$filename': $!");
my $text;
{
local $/ = "\n//\n"; # record separator
$text = <$self->{filehandle}>; # only hand the first record off
}
# somehow use POE to "fork" this off:
$self->{parser}->startrule($text, 1, $self);
# aye, there's the rub; without POE, this won't happen until
# Parse::RecDescent completely finishes it's parse:
my $class = ref $_[0] || $_[0];
bless $self, $class;
}
sub next_record {
my $self = shift;
# somehow use POE to collect the record that P::RD has already made
# for us.
# somehow use POE to signal to the parser to go grab another chunk of
# text, and start reading it.
}
And here's the Parse::RecDescent grammar:
file: record(s?)[$arg[0]] eofile
{ # OK, if we get here, we're all done, and all we need to do is
# clean up a bit ...
# ... done cleaning up.
}
record: rule1 rule2 rule3 yadda yadda yadda
{ # OK, I've read a record, and I have all the data.
# I'd like to stop here until someone calls next_record(),
# at which point, I'll hand them the data I already have
# and start on the next record (if present).
# Oh, but what if someone has already called next_record()
# before I even got here (since parsing each record is in fact
# non trivial, and may take a little while)? Then I'll
# immediately hand them the data and get started on the next
# record (if present).
# Before starting on the next record, we'll need to read
# more text from the file:
{ local $/ = "\n//\n";
$text .= <$arg[0]->{filehandle}>;
}
}
eofile: /\Z/
And if you've read this far, I thank you for your patience. Now, what
crucial piece of documentation do I need to go study in order to figure
out how to fill in the gaps in my code (and thinking).
Thanks,
-Aaron
P.S. Oh, I am aware that I could make my life incredibly easier by, in
fact, *not* using POE, and simply doing something like this:
sub next_record {
my $self = shift;
my $text;
{ local $/ = "\n//\n";
$text = <$self->{filehandle}>;
}
my $record;
$record = $self->{parser}->startrule($text) if $text;
return $record;
}
and then simply changing the P::RD grammar to only include the record
production. i.e. start a new parse for every record rather than
pausing/restarting a single parse and trying to get records out midway
through the global parse.
But then I wouldn't get to learn POE, now would I?
Thanks again.