1] Fix #2666 - Make sure regex are parsed in regex context

Markus Roberts Wed, 23 Sep 2009 07:54:06 -0700

When lexing 4096, we'd reset @no_regex to false, and we'd parse / 23 /
> as a regex.
> Essentially no_regex_after has 3 distinct values:
>        * true, means we can't lex a regex until it is said we can
>        * false, we now can lex a regex
>        * nil, leave no_regex in the mode it was
>
> I know this is kludgy like the whole patch. I'd favor another approach
> if I'd have one.
>
> > I recognise that the behavior isn't the same and suspect that it has
> > something to do with
> >
> >         ')' => :RPAREN,
> > -        '=' => :EQUALS,
> > +        '=' => { :name => :EQUALS, :no_regex_after => true },
> >         '+=' => :APPENDS,
> >
> > but I'm not clear on the purpose.  We're setting @no_regex to true
> > when we see an :EQUALS and then leaving it unchanged until we see
> > a  :CASE, :NODE, etc.  Why not take a more restrictive stance and only
> > allow regular expressions directly after :CASE, :NODE, etc.?
>
> I don't remember exactly, but I wen't the other way because there was a
> case that wouldn't work easily (I tried at least 3 differents
> alternatives yesterday in a small amount of time so things are now
> blurred in my mind).
>


I figured as much.  If you remember the case I'd be interested in knowing
what it is; I may try to doodle something together as that's often the best
way to see where the cliffs and pitfalls are in a problem like this.

> I guess my inclination would have been to (as I noted on the ticket)
> > make the test a property of the :REGEX token type (using a general
> > "applicable" test that didn't refer to the REGEX rules), and let it
> > look at the most recently generated token(s) to make the judgement
> > rather than scattering the infomation on all the various token types.
> >  But it may be that I'm missing some subtilty of the syntax that makes
> > that infeasable.
>
> My feeling is that either way will fail. There are too much places where
> a / can be mis-lexed as a regex.
>

So what if DIV is the default?  I was under the impression (perhaps
mistaken) that there was a relatively small number of places a regular
expression could occur (directly after NODE, directly after CASE, etc.) and
everywhere else we could assume it was DIV because REGEXP wouldn't be legal.


> The more I think about it, the more I think the parser should parse the
> regex, and the lexer shouldn't do anything about those.
>
> Maybe with an untested grammar like this:
>
> regex: DIV regex_content DIV
> regex_content: regex_content regex_string
>               | LPAREN regex_content RPAREN
>               | regex_content STAR
>               | regex_content QMARK
>               | LBRACE character_class RBRACE
>               ...
>
> character_class: string
>
> Of course the lexer would have to provide some support, especially it
> should lex spaces between / /...
>
> What do you think?
>

I thought about this and its gets goofy around the edges.  Take:

    node /www(\d)\.case\.edu/ {...

and think about the token stream this would produce. The parser would have
to de-tokenize CASE, for one thing, and I'm not sure how well the lexer
would like the rest.

Interactive (where the parser asks the lexer for either a token or a
character (with the ability to unget) would be cleaner.

-- Markus

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[Puppet-dev] Re: [PATCH/puppet 1/1] Fix #2666 - Make sure regex are parsed in regex context

Reply via email to