Re: [Pharo-dev] OpalCompiler evaluate speed

Nicolas Cellier Tue, 21 Nov 2017 13:50:35 -0800

2017-11-21 14:19 GMT+01:00 Nicolas Cellier <
[email protected]>:


> I have an ArbitraryPrecisionFloatTests doing an exhaustive test for
> printing and reevaluating all positve half precision float.
>
> That's about 2^15 or approximately 32k loop which evaluate snippets like
>
>     (ArbitraryPrecisionFloat readFrom: '1.123' readStream numBits: 10)
>
> The test was naively written with Compiler evaluate: and was using the
> legacy Compiler.
>
> If I rewrite self class compiler evaluate: the test times out.
> Let's see what increase is necessary:
>
>     [ ArbitraryPrecisionFloatTest new testPrintAndEvaluate  ] timeToRun.
>     -> 3s with legacy Compiler
>     -> 14s with OpalCompiler
>
> It's not unexpected that intermediate representation (IR) reification has
> a cost, but here the 4.5x is a bit too much...
> This test did account for 1/4 of total test duration already (3s out of
> 12s).
> With Opal, the total test duration doubles... (14s out of 23s)
>
> So let's analyze the hot spot with:
>
>     MessageTally  spyOn: [ ArbitraryPrecisionFloatTest new
> testPrintAndEvaluate  ].
>
> (I didn't use AndreasSystemProfiler becuase outputs seems a bit garbbled,
> no matter since the primitives do not account that much, a MessageTally
> will do the job)
>
> I first see a hot spot which does not seem that necessary:
>
>       |    |24.6% {3447ms} RBMethodNode(RBProgramNode)>>formattedCode
>
> From the comments I understand that AST-based stuff requires a pattern
> (DoIt) and an explicit return (^), but this expensive formatting seems too
> much for just evaluating. i think that we should change that.
>
> Then comes:
>
>       |    |20.7% {2902ms} RBMethodNode>>generate:
>
> which is split in two halves, ATS->IR and IR->bytecode
>
>       |    |  |9.3% {1299ms} RBMethodNode>>generateIR
>
>       |    |  |  |11.4% {1596ms} IRMethod>>generate:
>
> But then I see this cost a 2nd time which also leave room for progress:
>
>       |                |10.9% {1529ms} RBMethodNode>>generateIR
>
>       |                |  |12.9% {1814ms} IRMethod>>generate:
>
> The first is in RBMethodNode>>generateWithSource, the second in
> OpalCompiler>>compile
>
> Last comes the parse time (sourceCode -> AST)
>
>       |                  13.2% {1858ms} OpalCompiler>>parse
>
> Along with semantic analysis
>
>       |                  6.0% {837ms} OpalCompiler>>doSemanticAnalysis
>
> -----------------------------------
>
> For comparison, the legacy Compiler decomposes into:
>
>       |        |61.5% {2223ms} Parser>>parse:class:category:
> noPattern:context:notifying:ifFail:
>
> which more or less covers parse time + semantic analysis time.
> That means that Opal does a fair work for this stage.
>
> Then, the direct AST->byteCode phase is:
>
>      |      16.9% {609ms} MethodNode>>generate
>
> IR costs almost a 5x on this phase, but we know it's the price to pay for
> the additional features that it potentially offers. If only we would do it
> once...
>
> And that's all for the legacy one...
>
> --------------------------------------
>
> This little exercize shows that a 2x acceleration of OpalCompiler evaluate
> seems achievable:
> - simplify the uselessely expensive formatted code
> - generate bytecodes once, not twice
>
> Then it will be a bit more 2x slower than legacy, which is a better trade
> for yet to come superior features potentially brought by Opal.
>
> It would be interesting to carry same analysis on method compilation
>

Digging further here is what I find:

compile sends generate: and answer a CompiledMethod
translate sends compile but throw the CompiledMethod away, and just answer
the AST.
Most senders of translate will also generate: (thus we generate: twice
quite often, loosing a 2x factor in compilation).

A 2x gain is a huge gain when installing big code bases, especially if the
custom is to throw image away and reconstruct.
No matter if a bot does the job, it does it for twice many watts and at the
end, we're waiting for the bot.

However, before changing anything, further clarification is required:
translate does one more thing, it catches ReparseAfterSourceEditing and
retry compilation (once).
So my question: are there some cases when generate: will cause
ReparseAfterSourceEditing?
That could happen in generation phase if some byte code limit is exceeded,
and an interactive handling corrects code...
I did not see any such condition, but code base is huge...

Re: [Pharo-dev] OpalCompiler evaluate speed

Reply via email to