On Jul 2, 2009, at 2:19 AM, Brice Figureau wrote:

>
> On Wed, 2009-07-01 at 16:32 -0500, Luke Kanies wrote:
>> On Jul 1, 2009, at 4:25 PM, Brice Figureau wrote:
>>
>>>
>>> On 29/06/09 21:27, Luke Kanies wrote:
>>>> On Jun 28, 2009, at 10:10 AM, Brice Figureau wrote:
>>>>
>>>>> On 28/06/09 15:47, Luke Kanies wrote:
>>>>>> I have two questions about this:
>>>>>>
>>>>>> 1) Any idea how this affects performance?
>>>>> Interesting question. So I ran puppetdoc (which incidently uses  
>>>>> the
>>>>> parser to parse every manifest) on the Days of Wonder manifests
>>>>> (about
>>>>> 100 manifests, spread in about 50 modules).
>>>>>
>>>>> With this patch, I get on an average of 3 run after a blank run:
>>>>> 6.084s
>>>>> Without the patch: 6.022s
>>>>> So with the patch we're 1.1% slower with the patch.
>>>>
>>>> I'd be interested in the results using a class instead of a hash; I
>>>> expect it'll be a bit slower yet.
>>>
>>> Unlike what you and I were thinking, it's a a little bit faster:
>>> 6.069s,
>>> so a slowdown of 0.8%.
>>
>> Nice.
>>
>>>
>>>>>> 2) Is it reasonable to just create an instance of a new class  
>>>>>> here,
>>>>>> or
>>>>>> maybe a struct, rather than passing a hash?  The hash is built  
>>>>>> the
>>>>>> same every time (although obviously with different data), which
>>>>>> implies it makes sense to use a class.
>>>>> Yes, you're right, I'll rework this patch.
>>>>
>>>> If it's much slower, it's probably not worth it.  They're all
>>>> ephemeral, so maybe not a biggie.
>>>>
>>>>>> It'd be best if we could just return the instance to the parser,
>>>>>> rather than [:token, ...], but we'd probably need to find some
>>>>>> way to
>>>>>> make the token evaluate appropriately for the grammar.
>>>>>> That is, it might work to add a 'to_s' method that returns the
>>>>>> token
>>>>>> string.
>>>>> Unfortunately I don't think racc will support this, because it
>>>>> explecitily wants an array of a Symbol and token object:
>>>>> http://i.loveruby.net/en/projects/racc/doc/parser.html
>>>>
>>>> It's probably worth testing.  E.g., we pass symbols but it might  
>>>> work
>>>> equivalently with strings, and if it does, it's using some kind of
>>>> converter method.  All we have to do is implement that converter
>>>> method.
>>>
>>> I tried everything I could, even reading 25% of the ruby C code and
>>> Racc
>>> C code, and I can't find something that works.
>>>
>>> The issue is that the racc generator creates the token array like
>>> this:
>>> racc_token_table = {
>>> false => 0,
>>> Object.new => 1,
>>> :LBRACK => 2,
>>> :DQTEXT => 3,
>>> ...
>>> }
>>>
>>> It then loop around the lexer and on each lexed token it refers to
>>> this
>>> hash to get the correct token number (which will be used as index in
>>> other tables...).
>>>
>>> I tried to return a class that mimics a symbol (ie implemented hash
>>> and
>>> eql? to match Symbol ones) but it doesn't work. It never succeed in
>>> finding the token index.
>>> Even though the TokenValue.hash==Symbol.Hash, TokenValue.eql? is  
>>> never
>>> called.
>>> Reading the C hash code doesn't give me more inputs: basically the
>>> hash
>>> lookup is done by computing the hash, getting the right bucket,
>>> traversing it trying to find an equal key.
>>> Equality is treated specially, if we face int, Symbols or Strings on
>>> both side (for performance reason, the equality is derived  
>>> directly),
>>> but on regular object, a.eql?(b) is called.
>>> So I monkey patched Symbol.eql? to see if it would be called, but it
>>> doesn't <g>
>>>
>>> I think I'll stop there and keep return a Symbol as the token.
>>
>> Seems reasonable.
>
> Yes. Unfortunately now I know much more about MRI internals than I
> wanted :-)
>
>>>
>>> Note that it works with token as string if you define in the grammar
>>> the
>>> tokens as string (ie enclosed in ""), so no conversion happen for
>>> sure.
>>
>> Are you saying that if we defined the lexing tokens as strings  
>> instead
>> of symbols then we can return a single object instead an array of the
>> token plus the value?
>
> No, the array is mandatory.
> Looking at racc C code, you'll see that they weren't very inventive or
> helpful with us: what they accept is very strict.
> I was just meaning that if you want to have token as strings instead  
> of
> symbols, your grammar must refer to them as string and not token:
>
> token "NAME"
> instead of
> token NAME
>
> and everywhere you use the token, you have to use "NAME" instead of
> NAME.
> Except if you use a conversion table to go back to symbols, but what's
> the point.

I see.

>
> Anyway, the more I look to racc, the more I think they shouldn't have
> stopped the development so early :-(
> Unfortunately we're pretty tied to racc, and there is not many  
> available
> other solutions (since a long time ago when I was adding the if
> expression stuff I thought we could rewrite the parser with a more
> recent parser generator ala Treetop, Dhaka or Grammar
> (http://rubyforge.org/projects/grammar/), but I never got further than
> the idea).

Yeah, I've thought the same thing a couple of times.

>
>> If that's the case, that seems like the right move.  I think that's
>> much cleaner than the array.
>>
>> If not, then, well, we stick with the array.  It just seems very C- 
>> like.
>
> Yes, we have to stick with the array.
>
> I'll post an updated patch later this week.

Ok.  I'm mid-move still (actually flying to pdx tomorrow), but next  
week I'll finally have bandwidth again.

-- 
I have learned to use the word 'impossible' with the greatest caution.
     -- Wernher von Braun
---------------------------------------------------------------------
Luke Kanies | http://reductivelabs.com | http://madstop.com


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to