On Wed, 2009-07-01 at 16:32 -0500, Luke Kanies wrote: > On Jul 1, 2009, at 4:25 PM, Brice Figureau wrote: > > > > > On 29/06/09 21:27, Luke Kanies wrote: > >> On Jun 28, 2009, at 10:10 AM, Brice Figureau wrote: > >> > >>> On 28/06/09 15:47, Luke Kanies wrote: > >>>> I have two questions about this: > >>>> > >>>> 1) Any idea how this affects performance? > >>> Interesting question. So I ran puppetdoc (which incidently uses the > >>> parser to parse every manifest) on the Days of Wonder manifests > >>> (about > >>> 100 manifests, spread in about 50 modules). > >>> > >>> With this patch, I get on an average of 3 run after a blank run: > >>> 6.084s > >>> Without the patch: 6.022s > >>> So with the patch we're 1.1% slower with the patch. > >> > >> I'd be interested in the results using a class instead of a hash; I > >> expect it'll be a bit slower yet. > > > > Unlike what you and I were thinking, it's a a little bit faster: > > 6.069s, > > so a slowdown of 0.8%. > > Nice. > > > > >>>> 2) Is it reasonable to just create an instance of a new class here, > >>>> or > >>>> maybe a struct, rather than passing a hash? The hash is built the > >>>> same every time (although obviously with different data), which > >>>> implies it makes sense to use a class. > >>> Yes, you're right, I'll rework this patch. > >> > >> If it's much slower, it's probably not worth it. They're all > >> ephemeral, so maybe not a biggie. > >> > >>>> It'd be best if we could just return the instance to the parser, > >>>> rather than [:token, ...], but we'd probably need to find some > >>>> way to > >>>> make the token evaluate appropriately for the grammar. > >>>> That is, it might work to add a 'to_s' method that returns the > >>>> token > >>>> string. > >>> Unfortunately I don't think racc will support this, because it > >>> explecitily wants an array of a Symbol and token object: > >>> http://i.loveruby.net/en/projects/racc/doc/parser.html > >> > >> It's probably worth testing. E.g., we pass symbols but it might work > >> equivalently with strings, and if it does, it's using some kind of > >> converter method. All we have to do is implement that converter > >> method. > > > > I tried everything I could, even reading 25% of the ruby C code and > > Racc > > C code, and I can't find something that works. > > > > The issue is that the racc generator creates the token array like > > this: > > racc_token_table = { > > false => 0, > > Object.new => 1, > > :LBRACK => 2, > > :DQTEXT => 3, > > ... > > } > > > > It then loop around the lexer and on each lexed token it refers to > > this > > hash to get the correct token number (which will be used as index in > > other tables...). > > > > I tried to return a class that mimics a symbol (ie implemented hash > > and > > eql? to match Symbol ones) but it doesn't work. It never succeed in > > finding the token index. > > Even though the TokenValue.hash==Symbol.Hash, TokenValue.eql? is never > > called. > > Reading the C hash code doesn't give me more inputs: basically the > > hash > > lookup is done by computing the hash, getting the right bucket, > > traversing it trying to find an equal key. > > Equality is treated specially, if we face int, Symbols or Strings on > > both side (for performance reason, the equality is derived directly), > > but on regular object, a.eql?(b) is called. > > So I monkey patched Symbol.eql? to see if it would be called, but it > > doesn't <g> > > > > I think I'll stop there and keep return a Symbol as the token. > > Seems reasonable.
Yes. Unfortunately now I know much more about MRI internals than I wanted :-) > > > > Note that it works with token as string if you define in the grammar > > the > > tokens as string (ie enclosed in ""), so no conversion happen for > > sure. > > Are you saying that if we defined the lexing tokens as strings instead > of symbols then we can return a single object instead an array of the > token plus the value? No, the array is mandatory. Looking at racc C code, you'll see that they weren't very inventive or helpful with us: what they accept is very strict. I was just meaning that if you want to have token as strings instead of symbols, your grammar must refer to them as string and not token: token "NAME" instead of token NAME and everywhere you use the token, you have to use "NAME" instead of NAME. Except if you use a conversion table to go back to symbols, but what's the point. Anyway, the more I look to racc, the more I think they shouldn't have stopped the development so early :-( Unfortunately we're pretty tied to racc, and there is not many available other solutions (since a long time ago when I was adding the if expression stuff I thought we could rewrite the parser with a more recent parser generator ala Treetop, Dhaka or Grammar (http://rubyforge.org/projects/grammar/), but I never got further than the idea). > If that's the case, that seems like the right move. I think that's > much cleaner than the array. > > If not, then, well, we stick with the array. It just seems very C-like. Yes, we have to stick with the array. I'll post an updated patch later this week. -- Brice Figureau My Blog: http://www.masterzen.fr/ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en -~----------~----~----~----~------~----~------~--~---
