Re: [Pharo-dev] OpalCompiler evaluate speed

Ben Coman Tue, 21 Nov 2017 15:33:27 -0800

On 22 November 2017 at 05:49, Nicolas Cellier <
nicolas.cellier.aka.n...@gmail.com> wrote:


>
>
> 2017-11-21 14:19 GMT+01:00 Nicolas Cellier <nicolas.cellier.aka.nice@
> gmail.com>:
>
>> I have an ArbitraryPrecisionFloatTests doing an exhaustive test for
>> printing and reevaluating all positve half precision float.
>>
>> That's about 2^15 or approximately 32k loop which evaluate snippets like
>>
>>     (ArbitraryPrecisionFloat readFrom: '1.123' readStream numBits: 10)
>>
>> The test was naively written with Compiler evaluate: and was using the
>> legacy Compiler.
>>
>> If I rewrite self class compiler evaluate: the test times out.
>> Let's see what increase is necessary:
>>
>>     [ ArbitraryPrecisionFloatTest new testPrintAndEvaluate  ] timeToRun.
>>     -> 3s with legacy Compiler
>>     -> 14s with OpalCompiler
>>
>> It's not unexpected that intermediate representation (IR) reification has
>> a cost, but here the 4.5x is a bit too much...
>> This test did account for 1/4 of total test duration already (3s out of
>> 12s).
>> With Opal, the total test duration doubles... (14s out of 23s)
>>
>> So let's analyze the hot spot with:
>>
>>     MessageTally  spyOn: [ ArbitraryPrecisionFloatTest new
>> testPrintAndEvaluate  ].
>>
>> (I didn't use AndreasSystemProfiler becuase outputs seems a bit garbbled,
>> no matter since the primitives do not account that much, a MessageTally
>> will do the job)
>>
>> I first see a hot spot which does not seem that necessary:
>>
>>       |    |24.6% {3447ms} RBMethodNode(RBProgramNode)>>formattedCode
>>
>> From the comments I understand that AST-based stuff requires a pattern
>> (DoIt) and an explicit return (^), but this expensive formatting seems too
>> much for just evaluating. i think that we should change that.
>>
>> Then comes:
>>
>>       |    |20.7% {2902ms} RBMethodNode>>generate:
>>
>> which is split in two halves, ATS->IR and IR->bytecode
>>
>>       |    |  |9.3% {1299ms} RBMethodNode>>generateIR
>>
>>       |    |  |  |11.4% {1596ms} IRMethod>>generate:
>>
>> But then I see this cost a 2nd time which also leave room for progress:
>>
>>       |                |10.9% {1529ms} RBMethodNode>>generateIR
>>
>>       |                |  |12.9% {1814ms} IRMethod>>generate:
>>
>> The first is in RBMethodNode>>generateWithSource, the second in
>> OpalCompiler>>compile
>>
>> Last comes the parse time (sourceCode -> AST)
>>
>>       |                  13.2% {1858ms} OpalCompiler>>parse
>>
>> Along with semantic analysis
>>
>>       |                  6.0% {837ms} OpalCompiler>>doSemanticAnalysis
>>
>> -----------------------------------
>>
>> For comparison, the legacy Compiler decomposes into:
>>
>>       |        |61.5% {2223ms} Parser>>parse:class:category:n
>> oPattern:context:notifying:ifFail:
>>
>> which more or less covers parse time + semantic analysis time.
>> That means that Opal does a fair work for this stage.
>>
>> Then, the direct AST->byteCode phase is:
>>
>>      |      16.9% {609ms} MethodNode>>generate
>>
>> IR costs almost a 5x on this phase, but we know it's the price to pay for
>> the additional features that it potentially offers. If only we would do it
>> once...
>>
>> And that's all for the legacy one...
>>
>> --------------------------------------
>>
>> This little exercize shows that a 2x acceleration of OpalCompiler
>> evaluate seems achievable:
>> - simplify the uselessely expensive formatted code
>> - generate bytecodes once, not twice
>>
>> Then it will be a bit more 2x slower than legacy, which is a better trade
>> for yet to come superior features potentially brought by Opal.
>>
>> It would be interesting to carry same analysis on method compilation
>>
>
> Digging further here is what I find:
>
> compile sends generate: and answer a CompiledMethod
> translate sends compile but throw the CompiledMethod away, and just answer
> the AST.
> Most senders of translate will also generate: (thus we generate: twice
> quite often, loosing a 2x factor in compilation).
>
> A 2x gain is a huge gain when installing big code bases, especially if the
> custom is to throw image away and reconstruct.
> No matter if a bot does the job, it does it for twice many watts and at
> the end, we're waiting for the bot.
>
> However, before changing anything, further clarification is required:
> translate does one more thing, it catches ReparseAfterSourceEditing and
> retry compilation (once).
> So my question: are there some cases when generate: will cause
> ReparseAfterSourceEditing?
>

I don't know the full answer about other cases, but I can provide the
background why ReparseAfterSourceEditing was added.

IIRC, a few years ago with the move to an AST based system, there was a
problem with syntax highlighting where
the AST referenced its original source which caused highlighting offsets
when reference to source modified in the editor.
Trying to work backwards from modified source to update all AST elements
source-location proved an intractable problem.
The workaround I found was to move only in a forward direction regenerating
AST from source every keystroke.
Performance was acceptable so this became the permanent solution.

I don't have access to an image to check, but you should find
ReparseAfterSourceEditing raised only in one location near editor #changed:
Maybe this should activate only for interactively modified code, and
disabled/bypassed for bulk loading code.
For testing purposes commenting it out should not harm the system, just
produce visual artifacts in syntax highlighting.



> That could happen in generation phase if some byte code limit is exceeded,
> and an interactive handling corrects code...
> I did not see any such condition, but code base is huge...
>

At worst, the impact should only be a temporary visual artifact. Corrected
on the next keystroke.
(unless ReparseAfterSourceEditing has been adopted for other than original
purpose, but I'd guess not)

cheers -ben

Re: [Pharo-dev] OpalCompiler evaluate speed

Reply via email to