On Wed, 2009-07-01 at 16:32 -0500, Luke Kanies wrote:
> On Jul 1, 2009, at 4:25 PM, Brice Figureau wrote:
> 
> >
> > On 29/06/09 21:27, Luke Kanies wrote:
> >> On Jun 28, 2009, at 10:10 AM, Brice Figureau wrote:
> >>
> >>> On 28/06/09 15:47, Luke Kanies wrote:
> >>>> I have two questions about this:
> >>>>
> >>>> 1) Any idea how this affects performance?
> >>> Interesting question. So I ran puppetdoc (which incidently uses the
> >>> parser to parse every manifest) on the Days of Wonder manifests  
> >>> (about
> >>> 100 manifests, spread in about 50 modules).
> >>>
> >>> With this patch, I get on an average of 3 run after a blank run:
> >>> 6.084s
> >>> Without the patch: 6.022s
> >>> So with the patch we're 1.1% slower with the patch.
> >>
> >> I'd be interested in the results using a class instead of a hash; I
> >> expect it'll be a bit slower yet.
> >
> > Unlike what you and I were thinking, it's a a little bit faster:  
> > 6.069s,
> > so a slowdown of 0.8%.
> 
> Nice.
> 
> >
> >>>> 2) Is it reasonable to just create an instance of a new class here,
> >>>> or
> >>>> maybe a struct, rather than passing a hash?  The hash is built the
> >>>> same every time (although obviously with different data), which
> >>>> implies it makes sense to use a class.
> >>> Yes, you're right, I'll rework this patch.
> >>
> >> If it's much slower, it's probably not worth it.  They're all
> >> ephemeral, so maybe not a biggie.
> >>
> >>>> It'd be best if we could just return the instance to the parser,
> >>>> rather than [:token, ...], but we'd probably need to find some  
> >>>> way to
> >>>> make the token evaluate appropriately for the grammar.
> >>>> That is, it might work to add a 'to_s' method that returns the  
> >>>> token
> >>>> string.
> >>> Unfortunately I don't think racc will support this, because it
> >>> explecitily wants an array of a Symbol and token object:
> >>> http://i.loveruby.net/en/projects/racc/doc/parser.html
> >>
> >> It's probably worth testing.  E.g., we pass symbols but it might work
> >> equivalently with strings, and if it does, it's using some kind of
> >> converter method.  All we have to do is implement that converter  
> >> method.
> >
> > I tried everything I could, even reading 25% of the ruby C code and  
> > Racc
> > C code, and I can't find something that works.
> >
> > The issue is that the racc generator creates the token array like  
> > this:
> > racc_token_table = {
> >  false => 0,
> >  Object.new => 1,
> >  :LBRACK => 2,
> >  :DQTEXT => 3,
> > ...
> > }
> >
> > It then loop around the lexer and on each lexed token it refers to  
> > this
> > hash to get the correct token number (which will be used as index in
> > other tables...).
> >
> > I tried to return a class that mimics a symbol (ie implemented hash  
> > and
> > eql? to match Symbol ones) but it doesn't work. It never succeed in
> > finding the token index.
> > Even though the TokenValue.hash==Symbol.Hash, TokenValue.eql? is never
> > called.
> > Reading the C hash code doesn't give me more inputs: basically the  
> > hash
> > lookup is done by computing the hash, getting the right bucket,
> > traversing it trying to find an equal key.
> > Equality is treated specially, if we face int, Symbols or Strings on
> > both side (for performance reason, the equality is derived directly),
> > but on regular object, a.eql?(b) is called.
> > So I monkey patched Symbol.eql? to see if it would be called, but it
> > doesn't <g>
> >
> > I think I'll stop there and keep return a Symbol as the token.
> 
> Seems reasonable.

Yes. Unfortunately now I know much more about MRI internals than I
wanted :-)

> >
> > Note that it works with token as string if you define in the grammar  
> > the
> > tokens as string (ie enclosed in ""), so no conversion happen for  
> > sure.
> 
> Are you saying that if we defined the lexing tokens as strings instead  
> of symbols then we can return a single object instead an array of the  
> token plus the value?

No, the array is mandatory.
Looking at racc C code, you'll see that they weren't very inventive or
helpful with us: what they accept is very strict.
I was just meaning that if you want to have token as strings instead of
symbols, your grammar must refer to them as string and not token:

token "NAME"
instead of 
token NAME

and everywhere you use the token, you have to use "NAME" instead of
NAME.
Except if you use a conversion table to go back to symbols, but what's
the point.

Anyway, the more I look to racc, the more I think they shouldn't have
stopped the development so early :-(
Unfortunately we're pretty tied to racc, and there is not many available
other solutions (since a long time ago when I was adding the if
expression stuff I thought we could rewrite the parser with a more
recent parser generator ala Treetop, Dhaka or Grammar
(http://rubyforge.org/projects/grammar/), but I never got further than
the idea).

> If that's the case, that seems like the right move.  I think that's  
> much cleaner than the array.
> 
> If not, then, well, we stick with the array.  It just seems very C-like.

Yes, we have to stick with the array.

I'll post an updated patch later this week.
-- 
Brice Figureau
My Blog: http://www.masterzen.fr/


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to