On 25 April 2012 21:11, Levente Uzonyi <[email protected]> wrote:
> On Fri, 20 Apr 2012, Frank Shearar wrote:
>
>> On 20 April 2012 03:51, Levente Uzonyi <[email protected]> wrote:
>>>
>>> On Thu, 19 Apr 2012, Frank Shearar wrote:
>>>
>>>> I found a serious bug in parsing numbers with negative exponents. It
>>>> was completey broken, in fact, parsing 1e-1 as 10, not 1 / 10. Anyway.
>>>> This version fixes that, and adds a bunch of tests demonstrating that
>>>> number parsing will return rationals if it can.
>>>>
>>>> It's significantly slower than Squeak's SqNumberParser:
>>>>
>>>> Time millisecondsToRun: [100000 timesRepeat: [SqNumberParser parse:
>>>> '1234567890']] => 466
>>>>
>>>> Time millisecondsToRun: [100000 timesRepeat: [PPSmalltalkNumberParser
>>>> parse: '1234567890']] => 32082
>>>>
>>>> I've attached a MessageTally spying on the latter: I've not much skill
>>>> in reading these, but nothing leaps out at me as being obviously
>>>> awful.
>>>
>>>
>>>
>>> Didn't check the code, just the tally, and I think that
>>> PPSmalltalkNumberParser(PPSmalltalkNumberGrammar)>>digitsBase: is begging
>>> for optimization. It's probably also the cause of the high amount of
>>> garbage
>>> which causes significant amount of time spent with garbage collection.
>>> It's also interesting is that the finalization process does so much work,
>>> there may be something wrong with your image.
>>
>>
>> Thanks for taking a look, Levente.
>>
>> I'd expect digitsBase: to dominate the running costs, given that we're
>> parsing numbers.
>
>
> I finally checked the code and there's plenty of space for optimization.
> Note that the code can't be loaded into Squeak, because there's an invalid
> symbol #__gen__binding, and some methods with nil category.
Indeed: you might recall I've mentioned some incompatibilities between
Pharo and Squeak where PetitParser is concerned: Symbol >> isBinary
doesn't exist, you MUST have Scanner prefAllowUnderscoreSelectors:
true and Scanner allowUnderscoreAsAssignment: false, you need to
fiddle with RB so you can load AST-Core and AST-Compiler, and so on.
It's _possible_ (I wrote/am writing the number grammar in a Squeak
image), and once I've addressed _this_ incompatibility, I'll address
the others!
>> I do make a large number of throwaway "immutable" values with a
>> Builder-like pattern... in PPSmalltalkNumberParser >>
>> #makeNumberFrom:base:. That, I would imagine, could explain the
>> garbage?
>
>
> The current garbage collector is not optimal for large images and large
> amount of garbage, so you should try avoid creating it in performance
> critial parts of your code.
I foolishly forgot to add the message tally. You'll notice the absence
of the weak finalization process: that might have been because the
image predated the recent weak finalization thrashing fix.
frank
>> If I may, what do you look for when reading the MessageTally? How do
>> you tell, for instance, that there's excessive garbage production?
>
>
> In your tally GC time was 20% of total time and another 20% for the
> finalization process. These numbers should be much lower, usually less than
> 1%.
>
>
>> That the incremental GCs take 7ms? (I'm reading Andreas' comments on
>> http://wiki.squeak.org/squeak/4210 again.)
>
>
> 7ms for an incremental GC is also a bit high, it should be around 1-2ms.
>
>
> Levente
>
>
>>
>> frank
>>
>>> Levente
>>>
>>>
>>>>
>>>> frank
>>>>
>>>> On 14 September 2011 20:26, Frank Shearar <[email protected]>
>>>> wrote:
>>>>>
>>>>>
>>>>> On 3 September 2011 19:35, Nicolas Cellier
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>> 2011/9/3 Frank Shearar <[email protected]>:
>>>>>>>
>>>>>>>
>>>>>>> On 3 September 2011 18:50, Lukas Renggli <[email protected]> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> I think it is a good idea to have the number parser separate, after
>>>>>>>> all it might also make sense to use it separately.
>>>>>>>>
>>>>>>>> It seems that the new Smalltalk grammar is significantly slower. The
>>>>>>>> benchmark PPSmalltalkClassesTests class>>#benchmark: that uses the
>>>>>>>> source code of the collection hierarchy and does not especially
>>>>>>>> target
>>>>>>>> number literals runs 30% slower.
>>>>>>>>
>>>>>>>> Also I see that "Number readFrom: ..." is still used within the
>>>>>>>> grammar. This seems to be a bit strange, no?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Yes: it's a double-parse, which is a bit lame. First, we parse the
>>>>>>> literal with PPSmalltalkNumberParser, which ensures that the thing
>>>>>>> given to Number class >> #readFrom: is a well-formed token (so, in
>>>>>>> particular, Squeak's Number doesn't get to see anything other than a
>>>>>>> well-formed token).
>>>>>>>
>>>>>>> It sounds like you're happy with the basic concept, so maybe I should
>>>>>>> remove the Number class >> #readFrom: stuff, see if I can't remove
>>>>>>> the
>>>>>>> performance issues, and resubmit the patch.
>>>>>>>
>>>>>>> frank
>>>>>>>
>>>>>>
>>>>>> Yes, a NumberParser is essentially parsing, and this duplication
>>>>>> sounds
>>>>>> useless.
>>>>>> The main feature of interest in NumberParser that I consider a
>>>>>> requirement and should find its equivalence in a PetitNumberParser is:
>>>>>> - round a decimal representation to nearest Float
>>>>>> It's simple, just convert a Fraction asFloat in a single final step to
>>>>>> avoid cumulating round off errors - see
>>>>>> #makeFloatFromMantissa:exponent:base:
>>>>>>
>>>>>> The second feature of interest in NumberParser is the ability to
>>>>>> parser LargeInteger efficiently by avoiding (10 * largeValue +
>>>>>> digitValue) loops, and replacing them with a log(n) cost.
>>>>>> This would be a simple thing to implement in a functional language.
>>>>>
>>>>>
>>>>>
>>>>> Hopefully this won't offend your sensibilities too much :). It does,
>>>>> in fact, use 10* loops - I wrote an experimental "front half * rear
>>>>> half" recursion, which was slower in my benchmarks.
>>>>>
>>>>> This version has the grammar and parser doing no string->number
>>>>> conversion at all. PPSmalltalkNumberMaker supplies a number of utility
>>>>> methods designed to stop one from making malformed numbers. It also
>>>>> supplies a builder interface that the parser uses to construct
>>>>> numbers.
>>>>>
>>>>> frank
>>>>>
>>>>>> Nicolas
>>>>>>
>>>>>>>> Lukas
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3 September 2011 17:18, Frank Shearar <[email protected]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3 September 2011 15:56, Lukas Renggli <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 3 September 2011 16:51, Frank Shearar <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Lukas,
>>>>>>>>>>>
>>>>>>>>>>> I haven't :) mainly because I'm unsure where to put it - is there
>>>>>>>>>>> perhaps a PP Inbox, or shall I just post the merged version, or
>>>>>>>>>>> what's
>>>>>>>>>>> your preference? (How about an mcd between my merge and PP's
>>>>>>>>>>> head?)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Just put the .mcz at some public URL (dropbox, squeak source, ...)
>>>>>>>>>> or
>>>>>>>>>> attach it to a mail.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ah, great - here it is. You'll see I've written the grammar as a
>>>>>>>>> separate class. That was really more to make what I'd done more
>>>>>>>>> obvious and to minimise the change to PPSmalltalkGrammar, but
>>>>>>>>> perhaps
>>>>>>>>> it's not a bad idea anyway: it's easy to see the number literal
>>>>>>>>> subgrammar.
>>>>>>>>>
>>>>>>>>> frank
>>>>>>>>>
>>>>>>>>>> Lukas
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Lukas Renggli
>>>>>>>>>> www.lukas-renggli.ch
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Lukas Renggli
>>>>>>>> www.lukas-renggli.ch
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>>
>
- 16181 tallies, 16206 msec.
**Tree**
--------------------------------
Process: (40s) 80032: nil
--------------------------------
66.5% {10777ms} PPSmalltalkNumberParser class(PPCompositeParser
class)>>parse:startingAt:
66.2% {10728ms} PPSmalltalkNumberParser class(PPCompositeParser
class)>>newStartingAt:
66.2% {10728ms}
PPSmalltalkNumberParser(PPCompositeParser)>>initializeStartingAt:
27.0% {4376ms} PPSmalltalkNumberParser(PPCompositeParser)>>productionNames
|8.2% {1329ms} Dictionary>>at:put:
| |3.9% {632ms} Association class>>key:value:
| | |3.3% {535ms} Association>>key:value:
| |2.3% {373ms} primitives
| |1.9% {308ms} Dictionary(HashedCollection)>>atNewIndex:put:
| | 1.8% {292ms} Dictionary(HashedCollection)>>grow
|7.2% {1167ms} ByteString(String)>>asSymbol
| |6.9% {1118ms} Symbol class>>intern:
| | 6.6% {1070ms} Symbol class>>lookup:
| | 6.5% {1053ms} WeakSet>>like:
| | 6.4% {1037ms} WeakSet>>scanFor:
| | 5.6% {908ms} ByteSymbol(Symbol)>>=
| | 5.0% {810ms} primitives
|4.4% {713ms} Array(SequenceableCollection)>>includes:
| |2.7% {438ms} primitives
| |1.7% {276ms} Array(SequenceableCollection)>>indexOf:
| | 1.6% {259ms} Array(SequenceableCollection)>>indexOf:ifAbsent:
|1.9% {308ms} PPSmalltalkNumberParser class(Behavior)>>allInstVarNames
| |1.4% {227ms} PPSmalltalkNumberGrammar
class(Behavior)>>allInstVarNames
| | 1.1% {178ms} PPCompositeParser class(Behavior)>>allInstVarNames
|1.8% {292ms} PPSmalltalkNumberParser class(PPCompositeParser
class)>>ignoredNames
| |1.5% {243ms} PPCompositeParser class(Behavior)>>allInstVarNames
| | 1.1% {178ms} PPDelegateParser class(Behavior)>>allInstVarNames
|1.3% {211ms} Array(SequenceableCollection)>>collect:
10.1% {1637ms} PPSmalltalkNumberParser(PPSmalltalkNumberGrammar)>>exponent
|8.5% {1378ms} PPPredicateObjectParser class>>anyOf:
| 5.0% {810ms} ByteString(Object)>>printString
| |4.5% {729ms} ByteString(Object)>>printStringLimitedTo:
| | 3.0% {486ms} String class(SequenceableCollection
class)>>streamContents:limitedTo:
| | |2.0% {324ms} LimitedWriteStream class(PositionableStream
class)>>on:
| | | 1.8% {292ms} primitives
| | 1.3% {211ms} ByteString(String)>>printOn:
| | 1.3% {211ms} ByteString(String)>>storeOn:
| 2.9% {470ms} ByteString(String)>>,
| 2.2% {357ms} primitives
6.6% {1070ms} PPDelegateParser class(PPParser class)>>named:
|6.4% {1037ms} PPDelegateParser(PPParser)>>name:
| 5.0% {810ms} PPDelegateParser(PPParser)>>propertyAt:put:
| |4.1% {664ms} Dictionary>>at:put:
| | 2.4% {389ms} Association class>>key:value:
| | |1.9% {308ms} primitives
| | 1.1% {178ms} primitives
| 1.4% {227ms} primitives
4.7% {762ms} PPSmalltalkNumberParser(PPSmalltalkNumberGrammar)>>sign
|4.3% {697ms} Character>>asParser
| 4.2% {681ms} PPLiteralObjectParser class(PPLiteralParser class)>>on:
| 2.7% {438ms} Character(Object)>>printString
| 2.7% {438ms} Character(Object)>>printStringLimitedTo:
| 2.3% {373ms} String class(SequenceableCollection
class)>>streamContents:limitedTo:
| 1.7% {276ms} LimitedWriteStream class(PositionableStream
class)>>on:
| 1.5% {243ms} primitives
4.6% {745ms} PPSmalltalkNumberParser>>decimalNumber
|3.8% {616ms}
PPSmalltalkNumberParser(PPSmalltalkNumberGrammar)>>decimalNumber
| 2.1% {340ms} Character>>asParser
| 2.1% {340ms} PPLiteralObjectParser class(PPLiteralParser
class)>>on:
| 1.1% {178ms} Character(Object)>>printString
| 1.1% {178ms} Character(Object)>>printStringLimitedTo:
4.3% {697ms}
PPSmalltalkNumberParser(PPSmalltalkNumberGrammar)>>signedDigits
|4.1% {664ms}
PPSmalltalkNumberParser(PPSmalltalkNumberGrammar)>>signedDigitsBase:
| 2.8% {454ms} Character>>asParser
| 2.8% {454ms} PPLiteralObjectParser class(PPLiteralParser
class)>>on:
| 1.7% {276ms} Character(Object)>>printString
| 1.6% {259ms} Character(Object)>>printStringLimitedTo:
| 1.2% {194ms} String class(SequenceableCollection
class)>>streamContents:limitedTo:
3.1% {502ms} Dictionary>>keysAndValuesDo:
|2.3% {373ms} Dictionary>>associationsDo:
2.2% {357ms} PPSmalltalkNumberParser(PPSmalltalkNumberGrammar)>>scale
|2.0% {324ms} Character>>asParser
| 2.0% {324ms} PPLiteralObjectParser class(PPLiteralParser class)>>on:
| 1.1% {178ms} Character(Object)>>printString
| 1.1% {178ms} Character(Object)>>printStringLimitedTo:
1.2% {194ms} PPSmalltalkNumberParser(PPSmalltalkNumberGrammar)>>number
1.1% {178ms} BlockClosure>>asParser
1.0% {162ms} PPPluggableParser class>>on:
7.9% {1280ms} PPDelegateParser>>parseOn:
|7.9% {1280ms} PPActionParser>>parseOn:
| 7.8% {1264ms} PPSequenceParser>>parseOn:
| 6.4% {1037ms} PPDelegateParser>>parseOn:
| |6.2% {1005ms} PPSequenceParser>>parseOn:
| | 5.4% {875ms} PPFlattenParser>>parseOn:
| | 5.0% {810ms} PPPossessiveRepeatingParser>>parseOn:
| | 3.5% {567ms} PPPredicateObjectParser>>parseOn:
| | 2.6% {421ms} PPFailure class>>message:at:
| | 2.6% {421ms} PPFailure>>initializeMessage:at:
| 1.2% {194ms} PPOptionalParser>>parseOn:
6.9% {1118ms} PPAndParser>>parseOn:
|6.9% {1118ms} PPSequenceParser>>parseOn:
| 5.3% {859ms} PPFlattenParser>>parseOn:
| |5.0% {810ms} PPDelegateParser>>parseOn:
| | 4.9% {794ms} PPFlattenParser>>parseOn:
| | 4.4% {713ms} PPPossessiveRepeatingParser>>parseOn:
| | 2.7% {438ms} PPPredicateObjectParser>>parseOn:
| | 1.9% {308ms} PPFailure class>>message:at:
| | 1.8% {292ms} PPFailure>>initializeMessage:at:
| 1.0% {162ms} PPOptionalParser>>parseOn:
4.1% {664ms} Character>>asParser
4.1% {664ms} PPLiteralObjectParser class(PPLiteralParser class)>>on:
2.5% {405ms} Character(Object)>>printString
|2.4% {389ms} Character(Object)>>printStringLimitedTo:
| 2.0% {324ms} String class(SequenceableCollection
class)>>streamContents:limitedTo:
| 1.2% {194ms} LimitedWriteStream class(PositionableStream class)>>on:
1.0% {162ms} ByteString(String)>>,
7.1% {1151ms} Array(SequenceableCollection)>>includes:
7.0% {1134ms} Array(SequenceableCollection)>>indexOf:
6.8% {1102ms} Array(SequenceableCollection)>>indexOf:ifAbsent:
5.1% {827ms} Array(SequenceableCollection)>>indexOf:startingAt:ifAbsent:
|3.2% {519ms} Character>>=
|1.9% {308ms} primitives
1.7% {276ms} primitives
4.0% {648ms} PPSmalltalkNumberParser>>makeNumberFrom:base:
3.3% {535ms} PPSmalltalkNumberMaker>>makeNumber
3.1% {502ms} PPSmalltalkNumberMaker>>makeInteger
3.0% {486ms} PPSmalltalkNumberMaker>>toInteger:
3.0% {486ms} PPSmalltalkNumberMaker>>toInteger:base:
1.7% {276ms} Character>>digitValue
**Leaves**
5.8% {940ms} LimitedWriteStream class(PositionableStream class)>>on:
5.6% {908ms} ByteSymbol(Symbol)>>=
5.2% {843ms} ByteString(String)>>,
4.9% {794ms} PPFailure>>initializeMessage:at:
3.8% {616ms} Association>>key:value:
3.7% {600ms} Character>>=
3.4% {551ms} Dictionary>>at:put:
3.2% {519ms} PPSequenceParser class(PPListParser class)>>withAll:
2.8% {454ms} Array(SequenceableCollection)>>includes:
2.5% {405ms} Association class>>key:value:
2.4% {389ms} PPSmalltalkNumberGrammar class(ClassDescription)>>instVarNames
2.3% {373ms} Array(SequenceableCollection)>>indexOf:ifAbsent:
2.3% {373ms} Array(SequenceableCollection)>>indexOf:startingAt:ifAbsent:
2.3% {373ms} Dictionary>>associationsDo:
2.1% {340ms} PPOptionalParser(PPDelegateParser)>>setParser:
1.8% {292ms} ByteString class(String class)>>new:
1.7% {276ms} Character>>digitValue
1.6% {259ms} PPLiteralObjectParser class(PPLiteralParser class)>>on:message:
1.5% {243ms} LimitedWriteStream>>nextPut:
1.4% {227ms} PPDelegateParser(PPParser)>>name:
1.3% {211ms} PPPossessiveRepeatingParser>>parseOn:
1.3% {211ms} Array(SequenceableCollection)>>collect:
1.3% {211ms} LimitedWriteStream>>setLimit:limitBlock:
1.2% {194ms} PPSequenceParser>>parseOn:
1.1% {178ms} PPFlattenParser>>create:start:stop:
1.0% {162ms} PPPredicateObjectParser class>>anyOf:
**Memory**
old +0 bytes
young -312,908 bytes
used -312,908 bytes
free +312,908 bytes
**GCs**
full 0 totalling 0ms (0.0% uptime)
incr 807 totalling 3,660ms (23.0% uptime), avg 5.0ms
tenures 0
root table 0 overflows