On Jul 2, 2009, at 2:19 AM, Brice Figureau wrote: > > On Wed, 2009-07-01 at 16:32 -0500, Luke Kanies wrote: >> On Jul 1, 2009, at 4:25 PM, Brice Figureau wrote: >> >>> >>> On 29/06/09 21:27, Luke Kanies wrote: >>>> On Jun 28, 2009, at 10:10 AM, Brice Figureau wrote: >>>> >>>>> On 28/06/09 15:47, Luke Kanies wrote: >>>>>> I have two questions about this: >>>>>> >>>>>> 1) Any idea how this affects performance? >>>>> Interesting question. So I ran puppetdoc (which incidently uses >>>>> the >>>>> parser to parse every manifest) on the Days of Wonder manifests >>>>> (about >>>>> 100 manifests, spread in about 50 modules). >>>>> >>>>> With this patch, I get on an average of 3 run after a blank run: >>>>> 6.084s >>>>> Without the patch: 6.022s >>>>> So with the patch we're 1.1% slower with the patch. >>>> >>>> I'd be interested in the results using a class instead of a hash; I >>>> expect it'll be a bit slower yet. >>> >>> Unlike what you and I were thinking, it's a a little bit faster: >>> 6.069s, >>> so a slowdown of 0.8%. >> >> Nice. >> >>> >>>>>> 2) Is it reasonable to just create an instance of a new class >>>>>> here, >>>>>> or >>>>>> maybe a struct, rather than passing a hash? The hash is built >>>>>> the >>>>>> same every time (although obviously with different data), which >>>>>> implies it makes sense to use a class. >>>>> Yes, you're right, I'll rework this patch. >>>> >>>> If it's much slower, it's probably not worth it. They're all >>>> ephemeral, so maybe not a biggie. >>>> >>>>>> It'd be best if we could just return the instance to the parser, >>>>>> rather than [:token, ...], but we'd probably need to find some >>>>>> way to >>>>>> make the token evaluate appropriately for the grammar. >>>>>> That is, it might work to add a 'to_s' method that returns the >>>>>> token >>>>>> string. >>>>> Unfortunately I don't think racc will support this, because it >>>>> explecitily wants an array of a Symbol and token object: >>>>> http://i.loveruby.net/en/projects/racc/doc/parser.html >>>> >>>> It's probably worth testing. E.g., we pass symbols but it might >>>> work >>>> equivalently with strings, and if it does, it's using some kind of >>>> converter method. All we have to do is implement that converter >>>> method. >>> >>> I tried everything I could, even reading 25% of the ruby C code and >>> Racc >>> C code, and I can't find something that works. >>> >>> The issue is that the racc generator creates the token array like >>> this: >>> racc_token_table = { >>> false => 0, >>> Object.new => 1, >>> :LBRACK => 2, >>> :DQTEXT => 3, >>> ... >>> } >>> >>> It then loop around the lexer and on each lexed token it refers to >>> this >>> hash to get the correct token number (which will be used as index in >>> other tables...). >>> >>> I tried to return a class that mimics a symbol (ie implemented hash >>> and >>> eql? to match Symbol ones) but it doesn't work. It never succeed in >>> finding the token index. >>> Even though the TokenValue.hash==Symbol.Hash, TokenValue.eql? is >>> never >>> called. >>> Reading the C hash code doesn't give me more inputs: basically the >>> hash >>> lookup is done by computing the hash, getting the right bucket, >>> traversing it trying to find an equal key. >>> Equality is treated specially, if we face int, Symbols or Strings on >>> both side (for performance reason, the equality is derived >>> directly), >>> but on regular object, a.eql?(b) is called. >>> So I monkey patched Symbol.eql? to see if it would be called, but it >>> doesn't <g> >>> >>> I think I'll stop there and keep return a Symbol as the token. >> >> Seems reasonable. > > Yes. Unfortunately now I know much more about MRI internals than I > wanted :-) > >>> >>> Note that it works with token as string if you define in the grammar >>> the >>> tokens as string (ie enclosed in ""), so no conversion happen for >>> sure. >> >> Are you saying that if we defined the lexing tokens as strings >> instead >> of symbols then we can return a single object instead an array of the >> token plus the value? > > No, the array is mandatory. > Looking at racc C code, you'll see that they weren't very inventive or > helpful with us: what they accept is very strict. > I was just meaning that if you want to have token as strings instead > of > symbols, your grammar must refer to them as string and not token: > > token "NAME" > instead of > token NAME > > and everywhere you use the token, you have to use "NAME" instead of > NAME. > Except if you use a conversion table to go back to symbols, but what's > the point.
I see. > > Anyway, the more I look to racc, the more I think they shouldn't have > stopped the development so early :-( > Unfortunately we're pretty tied to racc, and there is not many > available > other solutions (since a long time ago when I was adding the if > expression stuff I thought we could rewrite the parser with a more > recent parser generator ala Treetop, Dhaka or Grammar > (http://rubyforge.org/projects/grammar/), but I never got further than > the idea). Yeah, I've thought the same thing a couple of times. > >> If that's the case, that seems like the right move. I think that's >> much cleaner than the array. >> >> If not, then, well, we stick with the array. It just seems very C- >> like. > > Yes, we have to stick with the array. > > I'll post an updated patch later this week. Ok. I'm mid-move still (actually flying to pdx tomorrow), but next week I'll finally have bandwidth again. -- I have learned to use the word 'impossible' with the greatest caution. -- Wernher von Braun --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en -~----------~----~----~----~------~----~------~--~---
