2017-11-23 23:37 GMT+01:00 Clément Bera <[email protected]>: > Hi Nicolas. > > Just some comments: > > Another thing you can try is to remove the allocation of Opal's IR. It > seems people use the IR only through the IRBuilder, so the API can be kept > but it can generate directly bytecode instead of IR then bytecode. Removing > those allocations would speed things up. It means merging IRFix / > IRTranslator / IRBytecodeGenerator somehow and have the IRBuilder API > directly call the new resulting merged class. > > Yes, I was thinking of this optimization: use IR only when we need advanced tools/instrumentation.
Another thing is that when Opal became the default compiler, I evaluated > the speed and saw it was slower, but when loading large projects it seemed > loading time was dominated by Monticello / source reading / source loading > and the compilation time was overall not that significant (< 20% of time). > I don't know if this is still the case with GIT. I have problems currently > when editing some large methods (it seems in the IDE the method is compiled > at each keystroke...) and when doing OpalCompiler recompileAll (which I do > often since I edit bytecode sets) but else the performance of Opal seems to > be OK. Evaluate performance may be relevant in some cases but I've never > found such cases outside of the IDE in production. > > Best ! > > Clement > > Right, my case was generating the strings, so Compiler was the main contributor. For more general case, it should be benched. I was thinking of a tool for checking the example comments "some code example >>> some result". > On Thu, Nov 23, 2017 at 9:41 PM, Nicolas Cellier < > [email protected]> wrote: > >> >> >> 2017-11-22 0:31 GMT+01:00 Ben Coman <[email protected]>: >> >>> >>> >>> On 22 November 2017 at 05:49, Nicolas Cellier < >>> [email protected]> wrote: >>> >>>> >>>> >>>> 2017-11-21 14:19 GMT+01:00 Nicolas Cellier < >>>> [email protected]>: >>>> >>>>> I have an ArbitraryPrecisionFloatTests doing an exhaustive test for >>>>> printing and reevaluating all positve half precision float. >>>>> >>>>> That's about 2^15 or approximately 32k loop which evaluate snippets >>>>> like >>>>> >>>>> (ArbitraryPrecisionFloat readFrom: '1.123' readStream numBits: 10) >>>>> >>>>> The test was naively written with Compiler evaluate: and was using the >>>>> legacy Compiler. >>>>> >>>>> If I rewrite self class compiler evaluate: the test times out. >>>>> Let's see what increase is necessary: >>>>> >>>>> [ ArbitraryPrecisionFloatTest new testPrintAndEvaluate ] >>>>> timeToRun. >>>>> -> 3s with legacy Compiler >>>>> -> 14s with OpalCompiler >>>>> >>>>> It's not unexpected that intermediate representation (IR) reification >>>>> has a cost, but here the 4.5x is a bit too much... >>>>> This test did account for 1/4 of total test duration already (3s out >>>>> of 12s). >>>>> With Opal, the total test duration doubles... (14s out of 23s) >>>>> >>>>> So let's analyze the hot spot with: >>>>> >>>>> MessageTally spyOn: [ ArbitraryPrecisionFloatTest new >>>>> testPrintAndEvaluate ]. >>>>> >>>>> (I didn't use AndreasSystemProfiler becuase outputs seems a bit >>>>> garbbled, no matter since the primitives do not account that much, a >>>>> MessageTally will do the job) >>>>> >>>>> I first see a hot spot which does not seem that necessary: >>>>> >>>>> | |24.6% {3447ms} RBMethodNode(RBProgramNode)>>formattedCode >>>>> >>>>> From the comments I understand that AST-based stuff requires a pattern >>>>> (DoIt) and an explicit return (^), but this expensive formatting seems too >>>>> much for just evaluating. i think that we should change that. >>>>> >>>>> Then comes: >>>>> >>>>> | |20.7% {2902ms} RBMethodNode>>generate: >>>>> >>>>> which is split in two halves, ATS->IR and IR->bytecode >>>>> >>>>> | | |9.3% {1299ms} RBMethodNode>>generateIR >>>>> >>>>> | | | |11.4% {1596ms} IRMethod>>generate: >>>>> >>>>> But then I see this cost a 2nd time which also leave room for progress: >>>>> >>>>> | |10.9% {1529ms} RBMethodNode>>generateIR >>>>> >>>>> | | |12.9% {1814ms} IRMethod>>generate: >>>>> >>>>> The first is in RBMethodNode>>generateWithSource, the second in >>>>> OpalCompiler>>compile >>>>> >>>>> Last comes the parse time (sourceCode -> AST) >>>>> >>>>> | 13.2% {1858ms} OpalCompiler>>parse >>>>> >>>>> Along with semantic analysis >>>>> >>>>> | 6.0% {837ms} OpalCompiler>>doSemanticAnalysis >>>>> >>>>> ----------------------------------- >>>>> >>>>> For comparison, the legacy Compiler decomposes into: >>>>> >>>>> | |61.5% {2223ms} Parser>>parse:class:category:n >>>>> oPattern:context:notifying:ifFail: >>>>> >>>>> which more or less covers parse time + semantic analysis time. >>>>> That means that Opal does a fair work for this stage. >>>>> >>>>> Then, the direct AST->byteCode phase is: >>>>> >>>>> | 16.9% {609ms} MethodNode>>generate >>>>> >>>>> IR costs almost a 5x on this phase, but we know it's the price to pay >>>>> for the additional features that it potentially offers. If only we would >>>>> do >>>>> it once... >>>>> >>>>> And that's all for the legacy one... >>>>> >>>>> -------------------------------------- >>>>> >>>>> This little exercize shows that a 2x acceleration of OpalCompiler >>>>> evaluate seems achievable: >>>>> - simplify the uselessely expensive formatted code >>>>> - generate bytecodes once, not twice >>>>> >>>>> Then it will be a bit more 2x slower than legacy, which is a better >>>>> trade for yet to come superior features potentially brought by Opal. >>>>> >>>>> It would be interesting to carry same analysis on method compilation >>>>> >>>> >>>> Digging further here is what I find: >>>> >>>> compile sends generate: and answer a CompiledMethod >>>> translate sends compile but throw the CompiledMethod away, and just >>>> answer the AST. >>>> Most senders of translate will also generate: (thus we generate: twice >>>> quite often, loosing a 2x factor in compilation). >>>> >>>> A 2x gain is a huge gain when installing big code bases, especially if >>>> the custom is to throw image away and reconstruct. >>>> No matter if a bot does the job, it does it for twice many watts and at >>>> the end, we're waiting for the bot. >>>> >>>> However, before changing anything, further clarification is required: >>>> translate does one more thing, it catches ReparseAfterSourceEditing and >>>> retry compilation (once). >>>> So my question: are there some cases when generate: will cause >>>> ReparseAfterSourceEditing? >>>> >>> >>> I don't know the full answer about other cases, but I can provide the >>> background why ReparseAfterSourceEditing was added. >>> >>> IIRC, a few years ago with the move to an AST based system, there was a >>> problem with syntax highlighting where >>> the AST referenced its original source which caused highlighting offsets >>> when reference to source modified in the editor. >>> Trying to work backwards from modified source to update all AST elements >>> source-location proved an intractable problem. >>> The workaround I found was to move only in a forward direction >>> regenerating AST from source every keystroke. >>> Performance was acceptable so this became the permanent solution. >>> >>> I don't have access to an image to check, but you should find >>> ReparseAfterSourceEditing raised only in one location near editor #changed: >>> Maybe this should activate only for interactively modified code, and >>> disabled/bypassed for bulk loading code. >>> For testing purposes commenting it out should not harm the system, just >>> produce visual artifacts in syntax highlighting. >>> >>> >>> >>>> That could happen in generation phase if some byte code limit is >>>> exceeded, and an interactive handling corrects code... >>>> I did not see any such condition, but code base is huge... >>>> >>> >>> At worst, the impact should only be a temporary visual artifact. >>> Corrected on the next keystroke. >>> (unless ReparseAfterSourceEditing has been adopted for other than >>> original purpose, but I'd guess not) >>> >>> cheers -ben >>> >>> Hi Ben, >> Thanks for information. >> We must keep ReparseAfterSourceEditing, it does its job. >> >> But it just sounds like we have an inversion: >> >> translate (source code->AST) does call compile (source >> code->AST->bytecode in a CompiledMethod) >> >> I would expect the other way around: if we want to compile, we need to >> translate first. >> If we want to translate, we don't really need to compile, unless there's >> an hidden reason... >> Thus my question: is there an hidden reason? >> > > > > -- > Clément Béra > Pharo consortium engineer > https://clementbera.wordpress.com/ > Bâtiment B 40, avenue Halley 59650 > <https://maps.google.com/?q=40,+avenue+Halley+59650%C2%A0Villeneuve+d'Ascq&entry=gmail&source=g>Villeneuve > d > <https://maps.google.com/?q=40,+avenue+Halley+59650%C2%A0Villeneuve+d'Ascq&entry=gmail&source=g> > 'Ascq > <https://maps.google.com/?q=40,+avenue+Halley+59650%C2%A0Villeneuve+d'Ascq&entry=gmail&source=g> >
