RE: Problems with skip directive and error messages.

Orton, Yves Thu, 26 Jun 2003 00:50:37 -0700

> I originally did this, and it worked . . . until you get an 
> error in the
> config file, at which point the line numbers were wrong, the 
> surrounding
> context was wrong, and it was much more difficult to tell where the
> problem was.  Also I'm stubborn, and having implemented 
> comment skipping
> and \newline continuation with lex and yacc I figured I'd be 
> able to do it with P::RD :)


Yeah, I hadnt thought of line number issues. good point.

> 
> > > I've tried various combinations of backslashes incase it's an
> > > interpolation issue, but that hasn't helped.  At this 
> stage I'm stuck.
> > 
> > Youll kick yourself. It works just fine with the regex:
> > 
> > $Parse::RecDescent::skip=qr/(?:\s*#.*\n|\s*\\\s*\n|\s*)*/;
> 
> This will match " \\          \n" though, whereas I want it to match
> only "      \\\n".  Inserting the \s* does seem to solve some problem
> with having that many backslashes in a row though.

Yeah, im still trying to figure that one out.  But you could change it to 

$Parse::RecDescent::skip=qr/(?:\s*#.*\n|\s*\\\r?\n|\s*)*/;

to make it less tolerant (although i personally think its silly to care if
the line ends "\\        \n", if only becuase the perl motto is be strict
with what you emit and lenient on what you receive.

> 
> > $Parse::RecDescent::skip=qr/(?:\s*#.*\n|\s*[\\]\s*\n|\s*)*/;
> 
> I tried the character class too, with no luck :(

Of course. the double interpolation leaves it as 

/[\].../ 

which is a bad regex.

> > Also, is # ONLY used for comments? I would change the regex to be
> > 
> > $Parse::RecDescent::skip=qr/\s*(?:(?!<[\\\\])#.*\n|[\\\\]\s*\n)*/;
> 
> Eh, too many backslashes!  Need to study this.  Having to use '[\\\\]'
> seems wrong though - too many levels of interpolation.

Well it is getting double interpreted. First in the qr// then later in the
eval when that qr is strignified.

> 
> '(?!<[\\\\])' - should that be '(?<![\\\\])', zero-width negative
> look-behind assertion?  

Yup, thats the zero width look behind. Sorry about the typo, in my test code
I actually have

$Parse::RecDescent::skip=qr/\s*(?:(?<![\\\\])#.*\n|[\\\\]\s*\n)*/;

> Otherwise I can't see it in perlre.  So the
> regex will match an unescaped # or a literal \ followed by newline,
> yeah?  Ok, I think I get it.

Well heres the \r? version commented:

$Parse::RecDescent::skip=qr/
                            \s*              # match optional leading spaces
                            (?:              # followed by 0 or more of one
of 
                               (?<![\\\\])   #    not preceded by a
backslash
                               \#            #    a comment symbol
                               .*            #    followed by 0 or more non
newlines
                               \n            #    followed by a newline 
                            |                # or 
                               [\\\\]        #    backslash 
                               \r?           #    optional carriage return
for win32 types
                               \n            #    newline
                            )*               # zero or more times.
                           /x;

> 
> > It looks like the problem is that the qr() in $PRD::skip 
> gets stringified,
> > and then evalled. This is a diasasterous circumstance as 
> the eval destroys
> > all of the benefit of using qr(), and is responsible for 
> (?!<\\) becoming
> > (?!<\) which causes all kinds of problems. Even worse is 
> that \\\s* becomes
> > \\s* which when evalled becomes \s* which of course will 
> never match "\\  "
> > (or shouldnt, im not so sure about what Damian is doing here.)
> 
> I had problems using comments and whitespace in regexs too, 
> even though
> they started with '(?x-ism:'

I havent looked closely at what qr() munging Damian is up to.  I suspect it
could be complex.

> 
> > This applies whether or not you use qr() or another form of 
> quoting. The
> > solution is as I did above, to use [\\\\] instead of \\ 
> this is ok because
> > /[\\\\]/ matches the same thing as /\\/ but when it gets 
> double interpolated
> > it becomes /[\\]/ which of course is also the same as 
> > /\\/
> 
> Yeah, icky.  I guess I didn't quite go far enough with my escaping.

Heh, sometimes you gotta do tricks eh? :-)

> 
> > Why? The grammar say to first compare "yellow" against 
> 'global_lines', and
> > to accept that it wont match. It doesnt (as it does not 
> begin with "foo" or
> > "bar" or "burger","chips","pizza"), so the 'global_lines' 
> rule is satisfied
> > and it goes on to match "yellow" against 'backup'. 'backup' 
> requires that
> > the string starts with the literal "backup" which certainly 
> doesn't match,
> > so it complains of the fact, quite correctly.
> 
> That and the code problem are due to posting after 12 hours in work,
> without dinner.  Should have tested it first, sorry.

heh. been there, got stung too. didnt bother with the t-shirt. ;-)

> 
> > Which shows that the subrule 'global_lines' was considered 
> to match (0
> > times) which indicates that it will then try to match 
> "yellow" against the
> > next subrule of 'config'
> 
> Yup, that's what I see, and why the error can be ignored.  
> That's why I
> want a <reallycommit> that can't be backtracked past.

But im not sure if you really need it.....

> 
> > Afaict "yellow custard" doesn't ever cause a commit to 
> fire. As such when it
> 
> > > backup_line:        "yellow"      <commit>        boolean
> 
> Shouldn't the parser match "yellow", then <commit>, then fail to match
> boolean, then reach the <error?> ?

No. 'backup_line' is only reached via 'backup', and as the first part of the
'backup' rule includes string literals ('backup' '{') which 'yellow' doesnt
match, thus it never even gets to the backup_line.

my $grammar = <<'GRAMMAR';
config:         global_lines(?) backup /\s*/

global_lines:     "foo"         <commit>        boolean
                | "bar"         <commit>        string
                | common

boolean:          /\b(yes|true|on|1)\b/
                | /\b(no|false|off|0)\b/
                | <error>

string:         /\S+/

backup:         "backup" "{" backup_line(s?) "}"

backup_line:      "yellow"      <commit>        boolean
                | "blue"        <commit>        string
                | common

common:           "burger"      <commit>        boolean
                | "chips"       <commit>        boolean
                | "pizza"       <commit>        string
                | <error>
GRAMMAR

for "yellow custard" we get something like the below traversal:

config
global_lines 
common [fail] 
(back to config) [global_lines matches 0 times succefully]
backup [fail "yellow" ne "backup"]
(back to config) [backup fails] [config fails]

End result, it never even got to backup_line to match the yellow against the
yellow!

> After spending a few more hours at it, and trying pretty much every
> possible combination of <error>, <error?>, <reject>, 
> <commit>, etc, I've
> given up on getting error messages.  

As i sadi im pretty sure it never even gets to the point in the grammar
where the messages you are expecting would be generated.

>The parser fails, I can 
> figure out
> where the error is like so: split the original text, strip comments,
> split the remaining text, calculate the line number, print a 
> warning and
> the first line.

Gah, Ill grant you that an alternate strategy may be better, but in this
case you are misunderstanding the grammar you are parsing and thus are
banging your head against an essentially non-existant wall. (Yeah i hate it
when people say that to me too :-)

> 
> Now that I've given up and removed the extra error stuff, I'm getting
> error messages:
>      ERROR (line -96): Invalid boolean: Was expecting /yes|true|on|1/,
>                        or /no|false|off|0/
> They've still got the negative line numbers though.

That one I left well alone. For good reason. :-) I have got the foggiest
whats up with that one, and frankly im getting over flu today and even
looking at P::RD's code this morning made my headache worse. Damain is one
crazy perl programmer! (One day ill be that crazy too, muhahahahah)

> But not it's not parsing the backup section properly.  Oh 
> wait, it is :)
> Any time you use autoactions, don't forget to put {1} after 
> rules which
> shouldn't get autoactions, as they have a nasty tendency to fail
> otherwise.

Uhuh. In fact, my tendency is to use {1} as the autoaction and to explicitly
provide the others. I find that usually matches my thinking better.
Especially as I tend to return things in what seems to be strange ways (try
returning undef as a matched value in the tree, its frigging nightmare. my
best solution was to create a special object and then use isa tests. Blech.
There needs to be a way to return a false value from a code block in a rule
but have the rule be considered to be a match.)

> Thanks for the suggestions - I've tried them with some success :)

No problem. Glad to be of assistance.

> Time to give it a rest and get the rest of the script sorted.

:-)

Once you stop wanting to murder me for the above comments please let me know
how it worked out. 

Also do you Perlmonks.org at all? Good resource site if you want to have
more points of view.

Yves

RE: Problems with skip directive and error messages.

Reply via email to