Re: [Pharo-project] PetitSmalltalk: now with a self-contained grammar for number literals

Frank Shearar Thu, 26 Apr 2012 02:48:56 -0700

On 26 April 2012 03:35, Levente Uzonyi <[email protected]> wrote:
> On Wed, 25 Apr 2012, Levente Uzonyi wrote:
>
>> On Wed, 25 Apr 2012, Frank Shearar wrote:
>>
>>> On 20 April 2012 15:15, Frank Shearar <[email protected]> wrote:
>>>>
>>>> On 20 April 2012 03:51, Levente Uzonyi <[email protected]> wrote:
>>>>>
>>>>> On Thu, 19 Apr 2012, Frank Shearar wrote:
>>>>>
>>>>>> I found a serious bug in parsing numbers with negative exponents. It
>>>>>> was completey broken, in fact, parsing 1e-1 as 10, not 1 / 10. Anyway.
>>>>>> This version fixes that, and adds a bunch of tests demonstrating that
>>>>>> number parsing will return rationals if it can.
>>>>>>
>>>>>> It's significantly slower than Squeak's SqNumberParser:
>>>>>>
>>>>>> Time millisecondsToRun: [100000 timesRepeat: [SqNumberParser parse:
>>>>>> '1234567890']] => 466
>>>>>>
>>>>>> Time millisecondsToRun: [100000 timesRepeat: [PPSmalltalkNumberParser
>>>>>> parse: '1234567890']] => 32082
>>>>>>
>>>>>> I've attached a MessageTally spying on the latter: I've not much skill
>>>>>> in reading these, but nothing leaps out at me as being obviously
>>>>>> awful.
>>>>>
>>>>>
>>>>>
>>>>> Didn't check the code, just the tally, and I think that
>>>>> PPSmalltalkNumberParser(PPSmalltalkNumberGrammar)>>digitsBase: is
>>>>> begging
>>>>> for optimization. It's probably also the cause of the high amount of
>>>>> garbage
>>>>> which causes significant amount of time spent with garbage collection.
>>>>> It's also interesting is that the finalization process does so much
>>>>> work,
>>>>> there may be something wrong with your image.
>>>>
>>>>
>>>> Thanks for taking a look, Levente.
>>>>
>>>> I'd expect digitsBase: to dominate the running costs, given that we're
>>>> parsing numbers.
>>>>
>>>> I do make a large number of throwaway "immutable" values with a
>>>> Builder-like pattern... in PPSmalltalkNumberParser >>
>>>> #makeNumberFrom:base:. That, I would imagine, could explain the
>>>> garbage?
>>>>
>>>> If I may, what do you look for when reading the MessageTally? How do
>>>> you tell, for instance, that there's excessive garbage production?
>>>> That the incremental GCs take 7ms? (I'm reading Andreas' comments on
>>>> http://wiki.squeak.org/squeak/4210 again.)
>>>
>>>
>>> Levente, you're quite right: #digitsBase: has now been optimised even
>>> more, reducing the time taken to run my benchmark
>>>
>>> MessageTally spyOn: [Time millisecondsToRun: [100000 timesRepeat:
>>> [PPSmalltalkNumberParser parse: '1234567890']]]
>>>
>>> from ~32 seconds to ~16 seconds. (Memoising was the answer:
>>> #digitsBase: is effectively a higher-order production and, like OMeta,
>>> PPCompositeParser doesn't memoise those. A simple class var dictionary
>>> solves that problem.
>>
>>
>> Creating a new parser (PPSmalltalkNumberParser) takes quite a while, if
>> you extract the parser creation, then you'll get closer to the performance
>> limit of PetitParser:
>>
>> | p |
>> p := PPSmalltalkNumberParser new.
>> [ Time millisecondsToRun: [100000 timesRepeat: [p parse: '1234567890']] ]
>> timeProfile. "==> 2594"
>>
>> There are still places where you create parsers on the fly, like
>> PPSmalltalkNumberGrammar #>> number. These should be avoided.
>
>
> I hacked the code a bit and got the following results:
>
> (1 to: 5) collect: [ :run |
>
>        | p |
>        p := PPSmalltalkNumberParser new.
>        Smalltalk garbageCollect.
>        [ 1 to: 100000 do: [ :i | p parse: '1234567890' ] ] timeToRun ].
>
> #(1198 1209 1225 1221 1204)
>
> It's ~6 times slower than SqNumberParser and I don't really think it can get
> much closer to it. The mcz is available here:
> http://leves.web.elte.hu/squeak/PetitSmalltalk-ul.62.mcz


Ah excellent! Thanks! I see you eagerly initialise the number parsers
where I'd gone the lazy/memoise route. Any performance implications I
should be aware of, other than of course that eager initialisation
means keeping 36 parsers around when maybe three or four of those will
see regular use? (decimal, hex, binary, octal) For instance, I guess
the running cost of Dictionary >> #at:ifAbsentPut: is greater than
Array >> #at: because the latter just uses a primitive.

Lastly, I added PetitSmalltalk-fbs.64 to
http://ss3.gemstone.com/ss/Scratchpad-fbs/, based on your .62. It just
normalises how signs work. Why I didn't write the exponent sign as a
string from the beginning (like I do the number's sign), I don't know.
At any rate, that normalises NumberMaker's API (it always takes
Strings) and keeps the cheaper x = '-' rather than #($- '-') includes:
x, and makes the tests all pass :)

And now I'm going to add a feature request for SS3, asking for a
RESTful interface to the repositories!

frank

> Levente
>
>
>>
>>
>> Levente
>>
>>
>>>
>>> frank
>>>
>>>> frank
>>>>
>>>>> Levente
>>>>>
>>>>>
>>>>>>
>>>>>> frank
>>>>>>
>>>>>> On 14 September 2011 20:26, Frank Shearar <[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 3 September 2011 19:35, Nicolas Cellier
>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> 2011/9/3 Frank Shearar <[email protected]>:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3 September 2011 18:50, Lukas Renggli <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think it is a good idea to have the number parser separate,
>>>>>>>>>> after
>>>>>>>>>> all it might also make sense to use it separately.
>>>>>>>>>>
>>>>>>>>>> It seems that the new Smalltalk grammar is significantly slower.
>>>>>>>>>> The
>>>>>>>>>> benchmark PPSmalltalkClassesTests class>>#benchmark: that uses the
>>>>>>>>>> source code of the collection hierarchy and does not especially
>>>>>>>>>> target
>>>>>>>>>> number literals runs 30% slower.
>>>>>>>>>>
>>>>>>>>>> Also I see that "Number readFrom: ..." is still used within the
>>>>>>>>>> grammar. This seems to be a bit strange, no?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes: it's a double-parse, which is a bit lame. First, we parse the
>>>>>>>>> literal with PPSmalltalkNumberParser, which ensures that the thing
>>>>>>>>> given to Number class >> #readFrom: is a well-formed token (so, in
>>>>>>>>> particular, Squeak's Number doesn't get to see anything other than
>>>>>>>>> a
>>>>>>>>> well-formed token).
>>>>>>>>>
>>>>>>>>> It sounds like you're happy with the basic concept, so maybe I
>>>>>>>>> should
>>>>>>>>> remove the Number class >> #readFrom: stuff, see if I can't remove
>>>>>>>>> the
>>>>>>>>> performance issues, and resubmit the patch.
>>>>>>>>>
>>>>>>>>> frank
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, a NumberParser is essentially parsing, and this duplication
>>>>>>>> sounds
>>>>>>>> useless.
>>>>>>>> The main feature of interest in NumberParser that I consider a
>>>>>>>> requirement and should find its equivalence in a PetitNumberParser
>>>>>>>> is:
>>>>>>>> - round a decimal representation to nearest Float
>>>>>>>> It's simple, just convert a Fraction asFloat in a single final step
>>>>>>>> to
>>>>>>>> avoid cumulating round off errors - see
>>>>>>>> #makeFloatFromMantissa:exponent:base:
>>>>>>>>
>>>>>>>> The second feature of interest in NumberParser is the ability to
>>>>>>>> parser LargeInteger efficiently by avoiding (10 * largeValue +
>>>>>>>> digitValue) loops, and replacing them with a log(n) cost.
>>>>>>>> This would be a simple thing to implement in a functional language.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hopefully this won't offend your sensibilities too much :). It does,
>>>>>>> in fact, use 10* loops - I wrote an experimental "front half * rear
>>>>>>> half" recursion, which was slower in my benchmarks.
>>>>>>>
>>>>>>> This version has the grammar and parser doing no string->number
>>>>>>> conversion at all. PPSmalltalkNumberMaker supplies a number of
>>>>>>> utility
>>>>>>> methods designed to stop one from making malformed numbers. It also
>>>>>>> supplies a builder interface that the parser uses to construct
>>>>>>> numbers.
>>>>>>>
>>>>>>> frank
>>>>>>>
>>>>>>>> Nicolas
>>>>>>>>
>>>>>>>>>> Lukas
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 3 September 2011 17:18, Frank Shearar <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 3 September 2011 15:56, Lukas Renggli <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 3 September 2011 16:51, Frank Shearar
>>>>>>>>>>>> <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Lukas,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I haven't :) mainly because I'm unsure where to put it - is
>>>>>>>>>>>>> there
>>>>>>>>>>>>> perhaps a PP Inbox, or shall I just post the merged version, or
>>>>>>>>>>>>> what's
>>>>>>>>>>>>> your preference? (How about an mcd between my merge and PP's
>>>>>>>>>>>>> head?)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Just put the .mcz at some public URL (dropbox, squeak source,
>>>>>>>>>>>> ...)
>>>>>>>>>>>> or
>>>>>>>>>>>> attach it to a mail.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Ah, great - here it is. You'll see I've written the grammar as a
>>>>>>>>>>> separate class. That was really more to make what I'd done more
>>>>>>>>>>> obvious and to minimise the change to PPSmalltalkGrammar, but
>>>>>>>>>>> perhaps
>>>>>>>>>>> it's not a bad idea anyway: it's easy to see the number literal
>>>>>>>>>>> subgrammar.
>>>>>>>>>>>
>>>>>>>>>>> frank
>>>>>>>>>>>
>>>>>>>>>>>> Lukas
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Lukas Renggli
>>>>>>>>>>>> www.lukas-renggli.ch
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Lukas Renggli
>>>>>>>>>> www.lukas-renggli.ch
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>
>>
>>
>

Re: [Pharo-project] PetitSmalltalk: now with a self-contained grammar for number literals

Reply via email to