THE PROBLEM: In today's #parrotsketch, Allison stated that Parrot 2.0 has an explicit goal to not only have pure-PIR code in production, but HLL code as well. Right now that's not feasible partly because HLL code runs VERY slowly. This is not the minor kind of slow that can be resolved by just iterating on fixes that each gain a few percent until it adds up to 2-3x improvement in the end. We're talking Real Slow -- Rakudo for example is 1-2 orders of magnitude too slow for production use on my various workloads. While (non-trivial) pure PIR code doesn't exactly scream, it is at least fast enough to do *some* production tasks; the vast difference between PIR and HLL performance levels is what bothers me for now.
(Yes, making non-trivial PIR execute 1-2 orders of magnitude faster would do the trick for several of my use cases, because it would carry the HLL performance "along for the ride". I doubt that jump will happen until Parrot truly goes JIT with a vengeance like the current browser JavaScript engines do. Methinks that's not a reasonable 2.0 goal. Plus, there'd still be that extra HLL performance just being wasted when instead it could be winning us kudos and karma.) THE ANALYSIS: A few weeks ago, I was able to spend some time trying to track down why this huge performance gap existed, at least for my code running in Rakudo. I suspect that other HLLs face similar problems, and I'd love to see what the rest of you have found. In any case, I found a number of issues, summarized in this list: 1. Every Perl 6 scope becomes a PIR sub. 2. Rakudo scopes are extra-heavy. 3. PCT has no optimization passes. THE DETAILS: 1. Every Perl 6 scope becomes a PIR sub. This results in inner loops becoming massive numbers of PIR sub calls, killing performance. Colloquial Perl 6 code has a LOT of small scopes and implicit loops, making this problem even worse. It is my understanding that this is fundamentally a limitation of Parrot's lexical scope handling; scopes and subs are currently inseparable. If this remains the case (hopefully for sound architectural reasons), then it may be valuable to detect cases when a scope can be optimized away. This is especially the case because: 2. Rakudo scopes are extra-heavy. In theory, every Perl 6 scope introduces a new lexical $_, $!, and $/, and must perform some exception housekeeping as well. A clever compiler could optimize these away in some cases, and luckily should more easily find these optimizations just where it matters most -- in tight loops. In practice, since Rakudo does not do these optimizations yet, it is often the case that the PIR to implement visible Perl 6 code in a small scope can be dwarfed by the boilerplate surrounding it. This boilerplate could be optimized a number of ways; for example, a new op could be written that creates all three special lexicals at once. Still, there's quite a bit of boilerplate there; it's unlikely to become truly smokin' fast using micro-optimization alone. 3. PCT has no optimization passes. Right now, PCT follows a notionally simple set of rules to convert PAST to POST to PIR, and does not try to do significant optimization at any point. This results in PIR code that a human would regard as silly -- consecutive loads of the same lexical, boxing and unboxing of constants, performing unchanging operations inside loops instead of hoisting them outside, and so on. I had a couple paragraphs here arguing why this should be solved with optimization passes added specifically to PCT and not the HLLs or the PIR compiler, but Tene assures me that was already the long-term plan. The important part of my message then becomes: It's time to get started. By 2.0, we should have some optimizations ready to go. It's not necessary to implement the most cutting-edge optimization algorithms -- joining the 80's ought to be sufficient. :-) THE TASKS Recollecting thoughts from above, here are some individual tasks that could be turned into tickets and/or release goals. 1. Add past_optimize and post_optimize passes to PCT. Start with just a single useful optimization in each. 2. Make it easy to plug in additional optimizations in a way that allows people to work in parallel on different optimizations without blocking on each other unduly. 3. Analyze the PIR output from HLL compiles of a number of scripts in different task domains, with different coding styles, and of various lengths. Collect a list of common targets for optimization (e.g. "redundant loads of same data"), and create tickets for implementing optimization plugins for them. Potentially coder-parallel. 4. Implement tickets created in task #3. Should be easily parallelized over multiple Parrot hackers. 5. Analyze and micro-optimize Rakudo's per-scope boilerplate. Focus first on optimizations that other HLLs may be able to use for their own particular boilerplate hell. 6. Reduce the overhead of PIR sub calls. I believe Allison is already working on refactoring that will help with this, but I suspect there will be more improvement possible even after her work is merged. 7. Allow lexical scopes to be separate from PIR subs. If sub calls become really cheap, this will be unnecessary. If sub calls remain expensive, this should be considered. -'f _______________________________________________ http://lists.parrot.org/mailman/listinfo/parrot-dev
