Re: [ruby.parslet] The right way to parse multiline files..

Nigel Thorne Mon, 12 May 2014 03:58:14 -0700

Parsers for this already exist.
    https://github.com/kbarber/ruby-iptables


However... as a learning exercise...

\S matches ' \t\n\r\f' ... which as you can see this include "new line"
This means :word consumes new lines... which is breaking your parser.

Check here https://gist.github.com/NigelThorne/c05a3b85cb67bf5866eb for a
patched version of your parser

This is not complete as it doesn't handle this sort of thing

-A INPUT -p sctp -m sctp --chunk-types all DATA
-A INPUT -p sctp -m sctp --chunk-types all DATA:U
-A INPUT -p sctp -m sctp ! --chunk-types all DATA:U

I expect there is a finite set of flags that can be passed to these
commands, in which case you would probably be better served for looking for
them explicitly. That way you know what format the following parameters for
the flag would be.

Note: I had to add a "text" rule. It too is not complete. I assume in
reality the text can include escaped characters.

Anyway.. it should take you a step forward.

All the best,
Nigel

---
"No man is an island... except Philip"


On Mon, May 12, 2014 at 9:27 AM, Ashley Penney <[email protected]
> wrote:

> This was definitely helpful at prodding me to go back and try again.  My
> current parser looks like:
>
> class IpParser < Parslet::Parser
>   root(:firewall)
>
>   rule(:firewall)      { table.repeat(1).as(:firewall) >> eol }
>   rule(:table)         { (name.as(:name) >> (chain.repeat(1)).as(:chains)
> >> rule.repeat(1).as(:rules)).as(:table) >> commit }
>   rule(:name)          { star >> line >> eol }
>   rule(:chain)         { (colon >> word.as(:name) >> space >>
>                           word.as(:policy) >> space >>
>                          left_bracket >> integers.as(:packet_counter) >>
>                          colon >> integers.as(:byte_counter) >>
>                          right_bracket >> eol).as(:chain) }
>   rule(:rule)          { rule_piece.repeat(1) >> eol }
>   rule(:rule_piece)    { argument >> space? >> negation? >> space? >>
> word? >> space? >> negation? }
>
>   rule(:commit)        { str('COMMIT') >> eol }
>   rule(:star)          { str('*') }
>   rule(:line)          { match['^\n'].repeat(1) }
>    rule(:eol)           { match["\n"] }
>   rule(:colon)         { str(':') }
>   rule(:word)          { match['\S'].repeat(1) }
>   rule(:word?)         { word.maybe }
>   rule(:space)         { match('\s').repeat(1) }
>   rule(:space?)        { space.maybe }
>   rule(:dash)          { str('-') }
>   rule(:left_bracket)  { str('[') }
>   rule(:right_bracket) { str(']') }
>   rule(:integers)      { match['0-9'].repeat(1) }
>   rule(:negation)      { str('!') }
>   rule(:negation?)      { negation.maybe }
>   rule(:argument)      { str('-') >> word }
> end
>
> Along with:
>
> stuff = File.read('./example').gsub(/^#.*\n/, '')
> pp parse(stuff)
>
> This fails with:
>
> Failed to match sequence (firewall:(TABLE{1, }) EOL) at line 1 char 1.
> `- Expected at least 1 of TABLE at line 1 char 1.
>    `- Failed to match sequence (table:(name:NAME chains:(CHAIN{1, })
> rules:(RULE{1, })) COMMIT) at line 1 char 1.
>       `- Failed to match sequence (name:NAME chains:(CHAIN{1, })
> rules:(RULE{1, })) at line 7 char 1.
>          `- Expected at least 1 of RULE at line 7 char 1.
>             `- Failed to match sequence (RULE_PIECE{1, } EOL) at line 8
> char 4.
>                `- Failed to match [\n] at line 8 char 4.
>
> I'm struggling with the rule(:rule_piece) rule.  The trouble is there's a
> number of valid rule pieces, and there's no easy way I can understand to
> deliminate it.  Given something like:
>
> -a blah ! thing ! thing -b
>
> My understanding is if I did something like (str('-') >>
> word).as(:argument) >> something >> str('-') I would then have the problem
> that it would begin parsing past the -.  I think what I need is a way to
> "rewind" a character.  I've tried:
>
>   rule(:rule_piece)    { argument >> nondash >> dash.present? }
>   rule(:nondash)       { match['^-'].repeat(1) }
>
> But this causes other weird problems where it seems to match too far and
> fail on line 20 of the `example` file in my repo.  Is there a better way to
> handle this problem of "argument >> stuff >> until you see a dash"?
>
>
> On Sun, May 11, 2014 at 12:43 PM, Torsten Ruger <[email protected]>wrote:
>
>> Hi Ashley,
>> i am just new too, but i was online, so 2 pennies:
>>
>> Your file seems to have a clear structure, so i would definitely go for
>> the "all in one" approach, not line by line as you seem to.
>> You talk about breaking up, and sub-parsers, and breaking up is good, but
>> they are all just rules (not parsers). You just need one parser, with one
>> root (which will probably be a repeat of the sections you have)
>> And like you sort of suggested in 1) you just create rules that parse the
>> sub-content. And they are made up of rules that parse the sub-content,
>> until you are at strings.
>>
>> The "gobbling up" problem comes from not having clear ends to the rules.
>> Rules are greedy and will read as much as they can, so if you write
>> any.repeat, that's that.
>> Good news is that your content has clear delimiters, so you just parse
>> until then. So for example you'd have a rule for a line that parses those
>> option strings and is delimited by newline.
>> And you could have a rule that parses lines, until "commit"
>>
>> Hope that helps
>>
>> Torsten
>>
>>
>>   Ashley Penney <[email protected]>
>>  11. toukokuuta 2014 18.18
>> --
>> Ashley Penney
>> [email protected]
>> Module Engineer
>>
>> *Join us at PuppetConf 2014**, September 23-24 in San Francisco
>> - http://puppetconf.com <http://puppetconf.com/>*
>>
>>
>
>
> --
> Ashley Penney
> [email protected]
> Module Engineer
>
> *Join us at PuppetConf 2014**, September 23-24 in San Francisco
> - http://puppetconf.com <http://puppetconf.com/>*
>

Re: [ruby.parslet] The right way to parse multiline files..

Reply via email to