On 26 April 2012 16:01, Levente Uzonyi <[email protected]> wrote: > On Thu, 26 Apr 2012, Frank Shearar wrote: > >> On 26 April 2012 03:35, Levente Uzonyi <[email protected]> wrote: >>> >>> On Wed, 25 Apr 2012, Levente Uzonyi wrote: >>> >>>> On Wed, 25 Apr 2012, Frank Shearar wrote: >>>> >>>>> On 20 April 2012 15:15, Frank Shearar <[email protected]> wrote: >>>>>> >>>>>> >>>>>> On 20 April 2012 03:51, Levente Uzonyi <[email protected]> wrote: >>>>>>> >>>>>>> >>>>>>> On Thu, 19 Apr 2012, Frank Shearar wrote: >>>>>>> >>>>>>>> I found a serious bug in parsing numbers with negative exponents. It >>>>>>>> was completey broken, in fact, parsing 1e-1 as 10, not 1 / 10. >>>>>>>> Anyway. >>>>>>>> This version fixes that, and adds a bunch of tests demonstrating >>>>>>>> that >>>>>>>> number parsing will return rationals if it can. >>>>>>>> >>>>>>>> It's significantly slower than Squeak's SqNumberParser: >>>>>>>> >>>>>>>> Time millisecondsToRun: [100000 timesRepeat: [SqNumberParser parse: >>>>>>>> '1234567890']] => 466 >>>>>>>> >>>>>>>> Time millisecondsToRun: [100000 timesRepeat: >>>>>>>> [PPSmalltalkNumberParser >>>>>>>> parse: '1234567890']] => 32082 >>>>>>>> >>>>>>>> I've attached a MessageTally spying on the latter: I've not much >>>>>>>> skill >>>>>>>> in reading these, but nothing leaps out at me as being obviously >>>>>>>> awful. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Didn't check the code, just the tally, and I think that >>>>>>> PPSmalltalkNumberParser(PPSmalltalkNumberGrammar)>>digitsBase: is >>>>>>> begging >>>>>>> for optimization. It's probably also the cause of the high amount of >>>>>>> garbage >>>>>>> which causes significant amount of time spent with garbage >>>>>>> collection. >>>>>>> It's also interesting is that the finalization process does so much >>>>>>> work, >>>>>>> there may be something wrong with your image. >>>>>> >>>>>> >>>>>> >>>>>> Thanks for taking a look, Levente. >>>>>> >>>>>> I'd expect digitsBase: to dominate the running costs, given that we're >>>>>> parsing numbers. >>>>>> >>>>>> I do make a large number of throwaway "immutable" values with a >>>>>> Builder-like pattern... in PPSmalltalkNumberParser >> >>>>>> #makeNumberFrom:base:. That, I would imagine, could explain the >>>>>> garbage? >>>>>> >>>>>> If I may, what do you look for when reading the MessageTally? How do >>>>>> you tell, for instance, that there's excessive garbage production? >>>>>> That the incremental GCs take 7ms? (I'm reading Andreas' comments on >>>>>> http://wiki.squeak.org/squeak/4210 again.) >>>>> >>>>> >>>>> >>>>> Levente, you're quite right: #digitsBase: has now been optimised even >>>>> more, reducing the time taken to run my benchmark >>>>> >>>>> MessageTally spyOn: [Time millisecondsToRun: [100000 timesRepeat: >>>>> [PPSmalltalkNumberParser parse: '1234567890']]] >>>>> >>>>> from ~32 seconds to ~16 seconds. (Memoising was the answer: >>>>> #digitsBase: is effectively a higher-order production and, like OMeta, >>>>> PPCompositeParser doesn't memoise those. A simple class var dictionary >>>>> solves that problem. >>>> >>>> >>>> >>>> Creating a new parser (PPSmalltalkNumberParser) takes quite a while, if >>>> you extract the parser creation, then you'll get closer to the >>>> performance >>>> limit of PetitParser: >>>> >>>> | p | >>>> p := PPSmalltalkNumberParser new. >>>> [ Time millisecondsToRun: [100000 timesRepeat: [p parse: '1234567890']] >>>> ] >>>> timeProfile. "==> 2594" >>>> >>>> There are still places where you create parsers on the fly, like >>>> PPSmalltalkNumberGrammar #>> number. These should be avoided. >>> >>> >>> >>> I hacked the code a bit and got the following results: >>> >>> (1 to: 5) collect: [ :run | >>> >>> | p | >>> p := PPSmalltalkNumberParser new. >>> Smalltalk garbageCollect. >>> [ 1 to: 100000 do: [ :i | p parse: '1234567890' ] ] timeToRun ]. >>> >>> #(1198 1209 1225 1221 1204) >>> >>> It's ~6 times slower than SqNumberParser and I don't really think it can >>> get >>> much closer to it. The mcz is available here: >>> http://leves.web.elte.hu/squeak/PetitSmalltalk-ul.62.mcz >> >> >> Ah excellent! Thanks! I see you eagerly initialise the number parsers >> where I'd gone the lazy/memoise route. Any performance implications I >> should be aware of, other than of course that eager initialisation > > > There's only one perdicate parser (instead of two + a choice parser) and a > String is passed as a collection to the PPPredicateParser, so #includes: can > use a fast primitive. > > >> means keeping 36 parsers around when maybe three or four of those will >> see regular use? (decimal, hex, binary, octal) For instance, I guess >> the running cost of Dictionary >> #at:ifAbsentPut: is greater than >> Array >> #at: because the latter just uses a primitive. > > > Lazy initialization of the parsers is also possible, but running the tests > will initialize 35 of them. I wonder if holding the parsers in a class > variable has any side effects. Anyway, I uploaded a new version which has > lazy initialization: > http://leves.web.elte.hu/squeak/PetitSmalltalk-ul.65.mcz
Er, yes. That should have been obvious. I feel a bit silly now! In which case there's nothing to decide either way. I'm unhappy using a class-side cache: it means consuming memory forever when you only need (some of) the 35 parsers during parsing. But PPCompositeParser works by initialising ALL instvars to PPDelegateParsers, so it's a bit complicated putting non-parser things in the instvars. It worked when I memoised the original digit parsers (so when I moved DigitParsers -> digitParsers) but broke when I applied the same technique elsewhere. Presumably I was lucky the first time round, and it only worked because of where in the initialisation I initialised the dictionary. If I could browbeat the grammar to use an instvar to hold the digit parsers it would make a lot of sense to switch to lazily instantiating the number parsers: you're likely only to instantiate at most four of them at a time, unless you're being very tricky with your arithmetic. frank > Levente > > >> >> Lastly, I added PetitSmalltalk-fbs.64 to >> http://ss3.gemstone.com/ss/Scratchpad-fbs/, based on your .62. It just >> normalises how signs work. Why I didn't write the exponent sign as a >> string from the beginning (like I do the number's sign), I don't know. >> At any rate, that normalises NumberMaker's API (it always takes >> Strings) and keeps the cheaper x = '-' rather than #($- '-') includes: >> x, and makes the tests all pass :) >> >> And now I'm going to add a feature request for SS3, asking for a >> RESTful interface to the repositories! >> >> frank >> >>> Levente >>> >>> >>>> >>>> >>>> Levente >>>> >>>> >>>>> >>>>> frank >>>>> >>>>>> frank >>>>>> >>>>>>> Levente >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> frank >>>>>>>> >>>>>>>> On 14 September 2011 20:26, Frank Shearar <[email protected]> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 3 September 2011 19:35, Nicolas Cellier >>>>>>>>> <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2011/9/3 Frank Shearar <[email protected]>: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 3 September 2011 18:50, Lukas Renggli <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I think it is a good idea to have the number parser separate, >>>>>>>>>>>> after >>>>>>>>>>>> all it might also make sense to use it separately. >>>>>>>>>>>> >>>>>>>>>>>> It seems that the new Smalltalk grammar is significantly slower. >>>>>>>>>>>> The >>>>>>>>>>>> benchmark PPSmalltalkClassesTests class>>#benchmark: that uses >>>>>>>>>>>> the >>>>>>>>>>>> source code of the collection hierarchy and does not especially >>>>>>>>>>>> target >>>>>>>>>>>> number literals runs 30% slower. >>>>>>>>>>>> >>>>>>>>>>>> Also I see that "Number readFrom: ..." is still used within the >>>>>>>>>>>> grammar. This seems to be a bit strange, no? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Yes: it's a double-parse, which is a bit lame. First, we parse >>>>>>>>>>> the >>>>>>>>>>> literal with PPSmalltalkNumberParser, which ensures that the >>>>>>>>>>> thing >>>>>>>>>>> given to Number class >> #readFrom: is a well-formed token (so, >>>>>>>>>>> in >>>>>>>>>>> particular, Squeak's Number doesn't get to see anything other >>>>>>>>>>> than >>>>>>>>>>> a >>>>>>>>>>> well-formed token). >>>>>>>>>>> >>>>>>>>>>> It sounds like you're happy with the basic concept, so maybe I >>>>>>>>>>> should >>>>>>>>>>> remove the Number class >> #readFrom: stuff, see if I can't >>>>>>>>>>> remove >>>>>>>>>>> the >>>>>>>>>>> performance issues, and resubmit the patch. >>>>>>>>>>> >>>>>>>>>>> frank >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yes, a NumberParser is essentially parsing, and this duplication >>>>>>>>>> sounds >>>>>>>>>> useless. >>>>>>>>>> The main feature of interest in NumberParser that I consider a >>>>>>>>>> requirement and should find its equivalence in a PetitNumberParser >>>>>>>>>> is: >>>>>>>>>> - round a decimal representation to nearest Float >>>>>>>>>> It's simple, just convert a Fraction asFloat in a single final >>>>>>>>>> step >>>>>>>>>> to >>>>>>>>>> avoid cumulating round off errors - see >>>>>>>>>> #makeFloatFromMantissa:exponent:base: >>>>>>>>>> >>>>>>>>>> The second feature of interest in NumberParser is the ability to >>>>>>>>>> parser LargeInteger efficiently by avoiding (10 * largeValue + >>>>>>>>>> digitValue) loops, and replacing them with a log(n) cost. >>>>>>>>>> This would be a simple thing to implement in a functional >>>>>>>>>> language. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hopefully this won't offend your sensibilities too much :). It >>>>>>>>> does, >>>>>>>>> in fact, use 10* loops - I wrote an experimental "front half * rear >>>>>>>>> half" recursion, which was slower in my benchmarks. >>>>>>>>> >>>>>>>>> This version has the grammar and parser doing no string->number >>>>>>>>> conversion at all. PPSmalltalkNumberMaker supplies a number of >>>>>>>>> utility >>>>>>>>> methods designed to stop one from making malformed numbers. It also >>>>>>>>> supplies a builder interface that the parser uses to construct >>>>>>>>> numbers. >>>>>>>>> >>>>>>>>> frank >>>>>>>>> >>>>>>>>>> Nicolas >>>>>>>>>> >>>>>>>>>>>> Lukas >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 3 September 2011 17:18, Frank Shearar >>>>>>>>>>>> <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 3 September 2011 15:56, Lukas Renggli <[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 3 September 2011 16:51, Frank Shearar >>>>>>>>>>>>>> <[email protected]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Lukas, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I haven't :) mainly because I'm unsure where to put it - is >>>>>>>>>>>>>>> there >>>>>>>>>>>>>>> perhaps a PP Inbox, or shall I just post the merged version, >>>>>>>>>>>>>>> or >>>>>>>>>>>>>>> what's >>>>>>>>>>>>>>> your preference? (How about an mcd between my merge and PP's >>>>>>>>>>>>>>> head?) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Just put the .mcz at some public URL (dropbox, squeak source, >>>>>>>>>>>>>> ...) >>>>>>>>>>>>>> or >>>>>>>>>>>>>> attach it to a mail. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Ah, great - here it is. You'll see I've written the grammar as >>>>>>>>>>>>> a >>>>>>>>>>>>> separate class. That was really more to make what I'd done more >>>>>>>>>>>>> obvious and to minimise the change to PPSmalltalkGrammar, but >>>>>>>>>>>>> perhaps >>>>>>>>>>>>> it's not a bad idea anyway: it's easy to see the number literal >>>>>>>>>>>>> subgrammar. >>>>>>>>>>>>> >>>>>>>>>>>>> frank >>>>>>>>>>>>> >>>>>>>>>>>>>> Lukas >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Lukas Renggli >>>>>>>>>>>>>> www.lukas-renggli.ch >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Lukas Renggli >>>>>>>>>>>> www.lukas-renggli.ch >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >>>> >>> >> >
