Re: [Pharo-project] PetitSmalltalk: now with a self-contained grammar for number literals

Eliot Miranda Wed, 25 Apr 2012 14:11:25 -0700

On Wed, Apr 25, 2012 at 1:11 PM, Levente Uzonyi <[email protected]> wrote:


> On Fri, 20 Apr 2012, Frank Shearar wrote:
>
>  On 20 April 2012 03:51, Levente Uzonyi <[email protected]> wrote:
>>
>>> On Thu, 19 Apr 2012, Frank Shearar wrote:
>>>
>>>  I found a serious bug in parsing numbers with negative exponents. It
>>>> was completey broken, in fact, parsing 1e-1 as 10, not 1 / 10. Anyway.
>>>> This version fixes that, and adds a bunch of tests demonstrating that
>>>> number parsing will return rationals if it can.
>>>>
>>>> It's significantly slower than Squeak's SqNumberParser:
>>>>
>>>> Time millisecondsToRun: [100000 timesRepeat: [SqNumberParser parse:
>>>> '1234567890']] => 466
>>>>
>>>>
>>>> Time millisecondsToRun: [100000 timesRepeat: [PPSmalltalkNumberParser
>>>> parse: '1234567890']] => 32082
>>>>
>>>> I've attached a MessageTally spying on the latter: I've not much skill
>>>> in reading these, but nothing leaps out at me as being obviously
>>>> awful.
>>>>
>>>
>>>
>>> Didn't check the code, just the tally, and I think that
>>> PPSmalltalkNumberParser(**PPSmalltalkNumberGrammar)>>**digitsBase: is
>>> begging
>>> for optimization. It's probably also the cause of the high amount of
>>> garbage
>>> which causes significant amount of time spent with garbage collection.
>>> It's also interesting is that the finalization process does so much work,
>>> there may be something wrong with your image.
>>>
>>
>> Thanks for taking a look, Levente.
>>
>> I'd expect digitsBase: to dominate the running costs, given that we're
>> parsing numbers.
>>
>
> I finally checked the code and there's plenty of space for optimization.
> Note that the code can't be loaded into Squeak, because there's an invalid
> symbol #__gen__binding, and some methods with nil category.
>
>
>
>> I do make a large number of throwaway "immutable" values with a
>> Builder-like pattern... in PPSmalltalkNumberParser >>
>> #makeNumberFrom:base:. That, I would imagine, could explain the
>> garbage?
>>
>
> The current garbage collector is not optimal for large images and large
> amount of garbage, so you should try avoid creating it in performance
> critial parts of your code.
>
>
>
>> If I may, what do you look for when reading the MessageTally? How do
>> you tell, for instance, that there's excessive garbage production?
>>
>
> In your tally GC time was 20% of total time and another 20% for the
> finalization process. These numbers should be much lower, usually less than
> 1%.
>
>
>  That the incremental GCs take 7ms? (I'm reading Andreas' comments on
>> http://wiki.squeak.org/squeak/**4210 
>> <http://wiki.squeak.org/squeak/4210>again.)
>>
>
> 7ms for an incremental GC is also a bit high, it should be around 1-2ms.


Careful.  Absolute GC times (as any run-time) depend on the machine.  On a
slow machine GC runs slow...


>
>
>
> Levente
>
>
>
>> frank
>>
>>  Levente
>>>
>>>
>>>
>>>> frank
>>>>
>>>> On 14 September 2011 20:26, Frank Shearar <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>> On 3 September 2011 19:35, Nicolas Cellier
>>>>> <nicolas.cellier.aka.nice@**gmail.com<[email protected]>>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> 2011/9/3 Frank Shearar <[email protected]>:
>>>>>>
>>>>>>>
>>>>>>> On 3 September 2011 18:50, Lukas Renggli <[email protected]> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> I think it is a good idea to have the number parser separate, after
>>>>>>>> all it might also make sense to use it separately.
>>>>>>>>
>>>>>>>> It seems that the new Smalltalk grammar is significantly slower. The
>>>>>>>> benchmark PPSmalltalkClassesTests class>>#benchmark: that uses the
>>>>>>>> source code of the collection hierarchy and does not especially
>>>>>>>> target
>>>>>>>> number literals runs 30% slower.
>>>>>>>>
>>>>>>>> Also I see that "Number readFrom: ..." is still used within the
>>>>>>>> grammar. This seems to be a bit strange, no?
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Yes: it's a double-parse, which is a bit lame. First, we parse the
>>>>>>> literal with PPSmalltalkNumberParser, which ensures that the thing
>>>>>>> given to Number class >> #readFrom: is a well-formed token (so, in
>>>>>>> particular, Squeak's Number doesn't get to see anything other than a
>>>>>>> well-formed token).
>>>>>>>
>>>>>>> It sounds like you're happy with the basic concept, so maybe I should
>>>>>>> remove the Number class >> #readFrom: stuff, see if I can't remove
>>>>>>> the
>>>>>>> performance issues, and resubmit the patch.
>>>>>>>
>>>>>>> frank
>>>>>>>
>>>>>>>
>>>>>> Yes, a NumberParser is essentially parsing, and this duplication
>>>>>> sounds
>>>>>> useless.
>>>>>> The main feature of interest in NumberParser that I consider a
>>>>>> requirement and should find its equivalence in a PetitNumberParser is:
>>>>>> - round a decimal representation to nearest Float
>>>>>> It's simple, just convert a Fraction asFloat in a single final step to
>>>>>> avoid cumulating round off errors - see
>>>>>> #makeFloatFromMantissa:**exponent:base:
>>>>>>
>>>>>> The second feature of interest in NumberParser is the ability to
>>>>>> parser LargeInteger efficiently by avoiding (10 * largeValue +
>>>>>> digitValue) loops, and replacing them with a log(n) cost.
>>>>>> This would be a simple thing to implement in a functional language.
>>>>>>
>>>>>
>>>>>
>>>>> Hopefully this won't offend your sensibilities too much :). It does,
>>>>> in fact, use 10* loops - I wrote an experimental "front half * rear
>>>>> half" recursion, which was slower in my benchmarks.
>>>>>
>>>>> This version has the grammar and parser doing no string->number
>>>>> conversion at all. PPSmalltalkNumberMaker supplies a number of utility
>>>>> methods designed to stop one from making malformed numbers. It also
>>>>> supplies a builder interface that the parser uses to construct
>>>>> numbers.
>>>>>
>>>>> frank
>>>>>
>>>>>  Nicolas
>>>>>>
>>>>>>  Lukas
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3 September 2011 17:18, Frank Shearar <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3 September 2011 15:56, Lukas Renggli <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 3 September 2011 16:51, Frank Shearar <[email protected]
>>>>>>>>>> >
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Lukas,
>>>>>>>>>>>
>>>>>>>>>>> I haven't :) mainly because I'm unsure where to put it - is there
>>>>>>>>>>> perhaps a PP Inbox, or shall I just post the merged version, or
>>>>>>>>>>> what's
>>>>>>>>>>> your preference? (How about an mcd between my merge and PP's
>>>>>>>>>>> head?)
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Just put the .mcz at some public URL (dropbox, squeak source, ...)
>>>>>>>>>> or
>>>>>>>>>> attach it to a mail.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ah, great - here it is. You'll see I've written the grammar as a
>>>>>>>>> separate class. That was really more to make what I'd done more
>>>>>>>>> obvious and to minimise the change to PPSmalltalkGrammar, but
>>>>>>>>> perhaps
>>>>>>>>> it's not a bad idea anyway: it's easy to see the number literal
>>>>>>>>> subgrammar.
>>>>>>>>>
>>>>>>>>> frank
>>>>>>>>>
>>>>>>>>>  Lukas
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Lukas Renggli
>>>>>>>>>> www.lukas-renggli.ch
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Lukas Renggli
>>>>>>>> www.lukas-renggli.ch
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>>
>


-- 
best,
Eliot

Re: [Pharo-project] PetitSmalltalk: now with a self-contained grammar for number literals

Reply via email to