Ok, go easy on me, I'm very new to POE, and to state machines and event
loops and all that fancy stuff.  But I have written a nonblocking
server/client application before, so I understand some of the issues.

So here's the setup: I have some very large files that I'm parsing with
Parse::RecDescent.  These files consist of multiple records.  I'd like to
be able to get access to those records as they are parsed, instead of
waiting for Parse::RecDescent to completely finish parsing the file.  To
accomplish this, I'm only going to feed P::RD one record at a time, but I
need to be able to "pause" P::RD while I extract the data it's parsed so
far, and feed in the new text to be parsed in the next round.

So, to repeat with a little bit of example code.  Here's the "end-user"
code:

use MyParser;

my $p = new MyParser $somefile;
while(my $record = $p->next_record()) {
    # do something with $record
}
exit;

Ahh, so simple to be the end-user.

Note that there's no reason for the end-user code to actually traverse the
entire document - they could exit at any time.  I guess I could have a
$p->done() method to somehow signal the parser to stop parsing anything
else.

Here's the code for MyParser.pm:

package MyParser.pm

require Parse::RecDescent;
require Carp;

my $grammar = q{ }; # as above, but I'll read it in from a file,
                    # or even better, once I know what I'm doing
                    # I can precompile it into it's own module
                    # and simply use it.

sub new {

    $self = {};
    $self->{parser} = new Parse::RecDescent ($grammar);
    $self->{filename} = shift ||
        croak("Must supply filename to MyParser::new()");
    open($self->{filehandle}, "<$filename") ||
        croak("Cannot open file '$filename': $!");

    my $text;
    {
        local $/ = "\n//\n"; # record separator
        $text = <$self->{filehandle}>; # only hand the first record off
    }

    # somehow use POE to "fork" this off:
    $self->{parser}->startrule($text, 1, $self);

    # aye, there's the rub; without POE, this won't happen until
    # Parse::RecDescent completely finishes it's parse:

    my $class = ref $_[0] || $_[0];
    bless $self, $class;

}

sub next_record {

    my $self = shift;

    # somehow use POE to collect the record that P::RD has already made
    # for us.

    # somehow use POE to signal to the parser to go grab another chunk of
    # text, and start reading it.

}

And here's the Parse::RecDescent grammar:

file: record(s?)[$arg[0]] eofile
        { # OK, if we get here, we're all done, and all we need to do is
          # clean up a bit ...

          # ... done cleaning up.
        }

record: rule1 rule2 rule3 yadda yadda yadda
        { # OK, I've read a record, and I have all the data.
          # I'd like to stop here until someone calls next_record(),
          # at which point, I'll hand them the data I already have
          # and start on the next record (if present).

          # Oh, but what if someone has already called next_record()
          # before I even got here (since parsing each record is in fact
          # non trivial, and may take a little while)?  Then I'll
          # immediately hand them the data and get started on the next
          # record (if present).

          # Before starting on the next record, we'll need to read
          # more text from the file:
          { local $/ = "\n//\n";
            $text .= <$arg[0]->{filehandle}>;
          }
        }

eofile: /\Z/

And if you've read this far, I thank you for your patience.  Now, what
crucial piece of documentation do I need to go study in order to figure
out how to fill in the gaps in my code (and thinking).

Thanks,

-Aaron

P.S. Oh, I am aware that I could make my life incredibly easier by, in
fact, *not* using POE, and simply doing something like this:

sub next_record {

    my $self = shift;
    my $text;

    { local $/ = "\n//\n";
      $text = <$self->{filehandle}>;
    }
    my $record;
    $record = $self->{parser}->startrule($text) if $text;

    return $record;
}

and then simply changing the P::RD grammar to only include the record
production.  i.e. start a new parse for every record rather than
pausing/restarting a single parse and trying to get records out midway
through the global parse.

But then I wouldn't get to learn POE, now would I?

Thanks again.

Reply via email to