I believe there may be a difference in assumptions/premises. Let's see if we can tease those apart. I also realize I am to blame for not having had stated what my plans here were - which is quite possibly the reason for these differences :)
When we started this effort, we had no data to indicate this new algorithm has any value in the context of Turbofan. We had hints - LLVM's claims, and Mozilla's (the latter being perhaps the closest to our scenario). That's why I prioritized my work to first getting to numbers showing feasibility: can this reg allocator produce better code quality than the linear one, for a reasonable compile time cost. I think this is the main difference between the plan I'm executing on and the one you're proposing - if I understand it correctly: I believe your proposal starts from the premise that the greedy algorithm *will* replace the linear one; I start from the premise that we have (or had) no evidence that that should be the case, and so we need to convince ourselves of that first. I suppose this is a good place to ask: does this seem like a reasonable starting point, or is there something I'm missing perhaps? Currently, as per my last checkin, we have a sketch of the implementation behind a flag- it's *almost* passing all the tests (what's keeping it red is an assertion in 2 tests). I felt it served its purpose, though, and didn't want to spend time green-ing an implementation I was going to change anyway. This means we're basically at "step 2", however, given the difference in premises, it follows that my next steps are a bit diverging from what you're suggesting: My next step (which is what I'm working through now, this current CL is a stepping stone) is getting to a place where we have the data indicating code quality is indeed better, and drive changes in the Turbofan codebase based on the scenarios the new allocator is presenting. That means continuing measurements comparing it with LS, and driving the right heuristics in. At the same time, the problem of stability is there every time we change heuristics, because we have no reason to believe that a new way of splitting won't actually cause some security hole. I don't think we can realistically stabilize and then optimize, I think it needs to happen iteratively. That's why I want to start exercising the code as early as possible with the security team, so that I understand the mechanics of fuzz testing and start detecting bugs early... but this (fuzzing) will be very likely a continuous effort. This is also why I'd find beneficial to flip the switch on the algorithm in Canary. Of course, if we can't see code quality improvements, then why bother continuing :) The next step after we have strong evidence of code quality value (and fuzz/Canary issues fixed) is to look at opportunities for better compile time. I suspect we won't beat LS there. The value, by design, of LS, is fast compile time, but only reasonable code quality. This is not the trade-off greedy is making. True, LLVM reported better compile time, but it appears it was due to cleanup they meant to do - so we don't know if a reimplemented, clean LS wouldn't have had even better compile time there. This is likely another tripwire: we may find out that we can't find a better trade-off than linear gives us already. To be clear, I'm not suggesting punting compile time work until then - hence the work in this CL for more appropriate data structures for certain usecases. I do think that the "win" will be on code quality, and the compile time will be more of a "this is the best we think we can do (while still not beating LS)", which may be fine for an essentially AoT scenario - but we need the data to reason about this tradeoff. I wouldn't aggressively pursue changes to the live range model and so on until they became dictated by the nature of the algorithm - unless your point 4 suggests a full side-by-side coexistence of the 2 for a while? That may actually work, but even then I'd want to see if and how much a change is actually necessary (meaning, I want to see the turbo stats or a profiler motivating any change). Come to think of it, what I'm presenting is not so much different from the plan you outlined, it's more that I want to approach it iteratively, with more frequent trip wires, which I suppose is natural, given my stated assumptions. Coming back to more immediate topics, I'm not sure where the worry around the size of the CL come from. It's still behind a flag, so it shouldn't constitute any more of a risk than it currently is (from the last change) - which is to say, none. True, we're exploring turning it on in Canary, but I am more confident in hitting the prerequisites (green bots + fuzz tests) with this CL than with the prototype I had before. Production is still having the flag off, just like today. In the framework outlined above, this CL is a stepping stone towards collecting data supporting the value of replacing the allocator. I have a subsequent change that builds on this, after which I should be able to collect perf data, drive heuristics in, understand the motivation and extent for any further changes to other parts of the model, get fuzz / Canary crash reports, and see how all that turns back to affecting or not perf. To address 2 topics in 1 reply :) Jaroslav - the design LLVM has is motivated by having 2 variants of the allocator, one that backtracks and one that is faster, less optimal, and doesn't backtrack. I could see some further factoring possible in the current implementation, for maintainability reasons, at least, unless we also wanted to explore the 2 variant LLVM design; however, at this point, (given all the above) since I believe the priority is getting code quality in a good place, then getting the compile time in a good place, I'd punt too much such work until later. Actually, if you were making a suggestion outside the timeline of this CL, then yes, I absolutely agree. On Thu, Jun 11, 2015 at 1:15 AM <[email protected]> wrote: > This is actually a big CL, which I'm not sure we can get into landable > state > soonish. I suspect that we could proceed a lot faster if we split the work > into > manageable parts, something along these lines: > > 1.) Do the minimal amount of work required to make the "greedy allocator > bot" > green (i.e. passing all our current tests). > 2.) Implement the basic algorithm (as mentioned in > http://blog.llvm.org/2011/09/greedy-register-allocation-in-llvm-30.html) > without > any fancy heuristics, but with the interface introduced by LLVM to make the > splitting/spilling decisions. > 3.) Stabilize the basic algorithm and do measurements against LS. > 4.) Based on that, split the basic allocator from LS and use appropriate > data > structures instead of trying to share all datastructures with LS. > 5.) Start implementing the greedy heuristics on top of the basic allocator, > just > like LLVM did. > 6.) Weaponize. > > WDYT? > > https://codereview.chromium.org/1157663007/ > -- -- v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
