On Wed, Apr 25, 2012 at 1:11 PM, Levente Uzonyi <[email protected]> wrote:
> On Fri, 20 Apr 2012, Frank Shearar wrote: > > On 20 April 2012 03:51, Levente Uzonyi <[email protected]> wrote: >> >>> On Thu, 19 Apr 2012, Frank Shearar wrote: >>> >>> I found a serious bug in parsing numbers with negative exponents. It >>>> was completey broken, in fact, parsing 1e-1 as 10, not 1 / 10. Anyway. >>>> This version fixes that, and adds a bunch of tests demonstrating that >>>> number parsing will return rationals if it can. >>>> >>>> It's significantly slower than Squeak's SqNumberParser: >>>> >>>> Time millisecondsToRun: [100000 timesRepeat: [SqNumberParser parse: >>>> '1234567890']] => 466 >>>> >>>> >>>> Time millisecondsToRun: [100000 timesRepeat: [PPSmalltalkNumberParser >>>> parse: '1234567890']] => 32082 >>>> >>>> I've attached a MessageTally spying on the latter: I've not much skill >>>> in reading these, but nothing leaps out at me as being obviously >>>> awful. >>>> >>> >>> >>> Didn't check the code, just the tally, and I think that >>> PPSmalltalkNumberParser(**PPSmalltalkNumberGrammar)>>**digitsBase: is >>> begging >>> for optimization. It's probably also the cause of the high amount of >>> garbage >>> which causes significant amount of time spent with garbage collection. >>> It's also interesting is that the finalization process does so much work, >>> there may be something wrong with your image. >>> >> >> Thanks for taking a look, Levente. >> >> I'd expect digitsBase: to dominate the running costs, given that we're >> parsing numbers. >> > > I finally checked the code and there's plenty of space for optimization. > Note that the code can't be loaded into Squeak, because there's an invalid > symbol #__gen__binding, and some methods with nil category. > > > >> I do make a large number of throwaway "immutable" values with a >> Builder-like pattern... in PPSmalltalkNumberParser >> >> #makeNumberFrom:base:. That, I would imagine, could explain the >> garbage? >> > > The current garbage collector is not optimal for large images and large > amount of garbage, so you should try avoid creating it in performance > critial parts of your code. > > > >> If I may, what do you look for when reading the MessageTally? How do >> you tell, for instance, that there's excessive garbage production? >> > > In your tally GC time was 20% of total time and another 20% for the > finalization process. These numbers should be much lower, usually less than > 1%. > > > That the incremental GCs take 7ms? (I'm reading Andreas' comments on >> http://wiki.squeak.org/squeak/**4210 >> <http://wiki.squeak.org/squeak/4210>again.) >> > > 7ms for an incremental GC is also a bit high, it should be around 1-2ms. Careful. Absolute GC times (as any run-time) depend on the machine. On a slow machine GC runs slow... > > > > Levente > > > >> frank >> >> Levente >>> >>> >>> >>>> frank >>>> >>>> On 14 September 2011 20:26, Frank Shearar <[email protected]> >>>> wrote: >>>> >>>>> >>>>> On 3 September 2011 19:35, Nicolas Cellier >>>>> <nicolas.cellier.aka.nice@**gmail.com<[email protected]>> >>>>> wrote: >>>>> >>>>>> >>>>>> 2011/9/3 Frank Shearar <[email protected]>: >>>>>> >>>>>>> >>>>>>> On 3 September 2011 18:50, Lukas Renggli <[email protected]> wrote: >>>>>>> >>>>>>>> >>>>>>>> I think it is a good idea to have the number parser separate, after >>>>>>>> all it might also make sense to use it separately. >>>>>>>> >>>>>>>> It seems that the new Smalltalk grammar is significantly slower. The >>>>>>>> benchmark PPSmalltalkClassesTests class>>#benchmark: that uses the >>>>>>>> source code of the collection hierarchy and does not especially >>>>>>>> target >>>>>>>> number literals runs 30% slower. >>>>>>>> >>>>>>>> Also I see that "Number readFrom: ..." is still used within the >>>>>>>> grammar. This seems to be a bit strange, no? >>>>>>>> >>>>>>> >>>>>>> >>>>>>> Yes: it's a double-parse, which is a bit lame. First, we parse the >>>>>>> literal with PPSmalltalkNumberParser, which ensures that the thing >>>>>>> given to Number class >> #readFrom: is a well-formed token (so, in >>>>>>> particular, Squeak's Number doesn't get to see anything other than a >>>>>>> well-formed token). >>>>>>> >>>>>>> It sounds like you're happy with the basic concept, so maybe I should >>>>>>> remove the Number class >> #readFrom: stuff, see if I can't remove >>>>>>> the >>>>>>> performance issues, and resubmit the patch. >>>>>>> >>>>>>> frank >>>>>>> >>>>>>> >>>>>> Yes, a NumberParser is essentially parsing, and this duplication >>>>>> sounds >>>>>> useless. >>>>>> The main feature of interest in NumberParser that I consider a >>>>>> requirement and should find its equivalence in a PetitNumberParser is: >>>>>> - round a decimal representation to nearest Float >>>>>> It's simple, just convert a Fraction asFloat in a single final step to >>>>>> avoid cumulating round off errors - see >>>>>> #makeFloatFromMantissa:**exponent:base: >>>>>> >>>>>> The second feature of interest in NumberParser is the ability to >>>>>> parser LargeInteger efficiently by avoiding (10 * largeValue + >>>>>> digitValue) loops, and replacing them with a log(n) cost. >>>>>> This would be a simple thing to implement in a functional language. >>>>>> >>>>> >>>>> >>>>> Hopefully this won't offend your sensibilities too much :). It does, >>>>> in fact, use 10* loops - I wrote an experimental "front half * rear >>>>> half" recursion, which was slower in my benchmarks. >>>>> >>>>> This version has the grammar and parser doing no string->number >>>>> conversion at all. PPSmalltalkNumberMaker supplies a number of utility >>>>> methods designed to stop one from making malformed numbers. It also >>>>> supplies a builder interface that the parser uses to construct >>>>> numbers. >>>>> >>>>> frank >>>>> >>>>> Nicolas >>>>>> >>>>>> Lukas >>>>>>>> >>>>>>>> >>>>>>>> On 3 September 2011 17:18, Frank Shearar <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> On 3 September 2011 15:56, Lukas Renggli <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 3 September 2011 16:51, Frank Shearar <[email protected] >>>>>>>>>> > >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Lukas, >>>>>>>>>>> >>>>>>>>>>> I haven't :) mainly because I'm unsure where to put it - is there >>>>>>>>>>> perhaps a PP Inbox, or shall I just post the merged version, or >>>>>>>>>>> what's >>>>>>>>>>> your preference? (How about an mcd between my merge and PP's >>>>>>>>>>> head?) >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Just put the .mcz at some public URL (dropbox, squeak source, ...) >>>>>>>>>> or >>>>>>>>>> attach it to a mail. >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Ah, great - here it is. You'll see I've written the grammar as a >>>>>>>>> separate class. That was really more to make what I'd done more >>>>>>>>> obvious and to minimise the change to PPSmalltalkGrammar, but >>>>>>>>> perhaps >>>>>>>>> it's not a bad idea anyway: it's easy to see the number literal >>>>>>>>> subgrammar. >>>>>>>>> >>>>>>>>> frank >>>>>>>>> >>>>>>>>> Lukas >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Lukas Renggli >>>>>>>>>> www.lukas-renggli.ch >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Lukas Renggli >>>>>>>> www.lukas-renggli.ch >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>> >>> >> >> > -- best, Eliot
