Re: [Mono-dev] [PATCH] A fastpath dead code elimination
Hi, The general JIT changes in the patch look harmless to me. There is a problem tough: the patch makes tests/marhal2.exe crash when run with -O=all. Zoltan On 11/15/05, Massimiliano Mantione [EMAIL PROTECTED] wrote: The attached patch implements deadce, but without using SSA. Instead, a simple alias analysis pass takes care of understanding when local variables [may] change value, and liveness computation has been modified to use this aliasing info if present. The alias analysis pass has O(n) complexity (n = code size), it is just a linear sweep on the list of instructions. Then, deadce operates one BB at a time, scanning the code linearly and using the liveness bits as start/end conditions (so it is O(n) as well). I positioned the deadce pass just between the liveness computation and the linear scan register allocator, so that: - it can reuse the liveness computation already computed for the linear regalloc, without requiring a new pass, and - it is downstream in the compilation pipeline, so it has the possibility to catch as many dead instructions as possible. The name fastpath deadce comes from Patrik Torstensson :-) He called it like this on IRC because it is on the fast compilation path (the one without SSA). But now, without other technical stuff, some numbers... First of all, the JIT overhead: the time to JIT mscorlib.dll with various compilation options, in seconds (time mono --compile-all): Options -all -all,ssa JIT time 1.05 1.35 Here we see that just computing SSA has a 30% overhead with respect to a compilation with no optimizations at all. Options default ssa deadce ssa,deadce JIT time 1.32 1.55 1.451.6 And here we see that the overhead of fastpath deadce on a default compilation is almost 10%, while ssa deadce has 21% (and note that in practice, since ssa does not work with aliasing, the new fastpath deadce is applied to more methods). So, the JIT overhead is reasonably small. And finally, some lovely benchmark... I tried SciMark, and here are the results: (MFlops, just the composite score) default205.92 -O=ssa,deadce 210.61 -O=deadce 207.50 -O=consprop,copyprop,deadce207.72 -O=consprop,copyprop,deadce,inline 209.82 -O=inline 213.63 -O=deadce,inline 214.52 So, the idea is that fastpath deadce has some effect :-) It is not as effective as ssa deadce, but it is useful anyway and most of all it does not incur the same JIT time overhead. BTW, I spent a *lot* of time trying to be sure that those numbers are accurate. I had to kill evolution, gaim, beagle, the wireless card, and carefully monitor system load otherwise the results were erratic (just of a few percent, but this is what we're looking at). I then executed the benchmark several (10) times for every case and took the *best* score for each (with the idea that it is the one where other things interfered less), and anyway all the best scores were almost identical. A funny thing: these are the results I have on a SciMark.exe compiled with MS .NET 2.0: default: 136.65 -O=consprop,copyprop,deadce: 137.52 As you can see, they are *low*!!! I still have to understand what the 2.0 csc does exactly, but it looks a real disaster for us :-( Watch out this quirk when doing/examining benchmarks! OK, that's all... please approve the patch :-) We can always refine it later... Ciao, Massi ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] [PATCH] A fastpath dead code elimination
On Thu, 2005-11-17 at 13:23 +0100, Zoltan Varga wrote: The general JIT changes in the patch look harmless to me. There is a problem tough: the patch makes tests/marhal2.exe crash when run with -O=all. That looks more SSAPRE than the patch (or, better, it happens if you specify -O=ssapre and disappears otherwise). SSAPRE and fastdce *cannot* happen togather (if SSAPRE is in, then SSA is built, and in the patch the JIT falls back to SSA deadce). Anyway, I still have to fully debug this, but this doesn't seem to involve fastdce here. I'll investigate more... Thanks, Massi ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] [PATCH] A fastpath dead code elimination
Hello, The alias analysis pass has O(n) complexity (n = code size), it is just a linear sweep on the list of instructions. Then, deadce operates one BB at a time, scanning the code linearly and using the liveness bits as start/end conditions (so it is O(n) as well). Is there is a threshold that will disable the optimization from running? Miguel ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] [PATCH] A fastpath dead code elimination
On Tue, 2005-11-15 at 15:14 -0500, Miguel de Icaza wrote: Hello, The alias analysis pass has O(n) complexity (n = code size), it is just a linear sweep on the list of instructions. Then, deadce operates one BB at a time, scanning the code linearly and using the liveness bits as start/end conditions (so it is O(n) as well). Is there is a threshold that will disable the optimization from running? Of course :-) If you do not provide the deadce flag, it will not be used. And even if we'll decide to enable it by default, it will be enough to specify -O=-deadce to switch it off. I didn't explain much of how the patch is coded, and/or how it can be used, so here are some more details. With this patch applied, -O=deadce activates fastpath deadce, unless some other flag (for now either ssapre or abcrem) required ssa. In this case, the traditional ssa deadce is used. However, I added one new flag: ssa, which forces ssa construction. If this is used, then deadce will default to the ssa one. Note, however, that if deadce has been specified, and ssa is in some way required (at least one of ssapre, abcrem or ssa was specified), it can *still* happen that some method will fall back to use fastpath deadce: this happens if ssa was disabled because of aliasing. Now, I *know* this sounds confusing at first. Here's the rationale: the idea is that the JIT should do what the user means with the flags. So, if the user asks deadce, the JIT does its best to do deadce. This is the same thing that happens now for consprop and copyprop, which have both ssa and fast versions, which are not equally effective but are called the same. And as soon as I'll commit the HSSA patch, there will be one hssa flag as well (at least it works so in my tree). So one will be able to choose betwen fastpath, ssa and hssa deadce, using -O=deadce, -O=ssa,deadce or -O=hssa,deadce. Of course in the long run ssa will go away and will be replaced by hssa, but at least for testing in between this can work. The alternative would be having one different flag for each kind of optimization, like fastdeadce, hssadeadce... However, this way the number of flags will get too high (and also ugly) with hssa. And consprop and copyprop already work this way (we use the same flags for the ssa and fast versions), so I decided go go on with the same philosophy. About the implementation... there's not much to say, the algorithms are classical, no rocket science here ;-) I tried not too mess up too much the liveness code, which is for sure the more impacted by this patch. I made so that the same code works both with and without aliasing information. When aliasing has been computed, it is possible to obtain one list of affected variables for each MonoInst, and liveness uses that to understand the effect of the inst on each variable. On the other hand, without aliasing info, the code relied on the presence of MONO_SSA_* flags to see what each instruction was doing and each instruction could have effect on just *one* local. In this case I decided to keep one list node as local variable, and use it to build a one element list of affected variables. This way all the logic of liveness has been transformed to use those affected variables lists, and a small preamble takes the real list if aliasing has been computed, or builds the one element list otherwise. So the patch is a bit hard to read, but it's not so terrible IMHO. And yes, I know that I'm slowing liveness computation a bit this way, but it's just a bit, and we have the advantage of not forking the liveness code! If anything isn't clear, just ask ;-) Ciao, Massi ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] [PATCH] A fastpath dead code elimination
On Tue, 2005-11-15 at 15:14 -0500, Miguel de Icaza wrote: Hello, The alias analysis pass has O(n) complexity (n = code size), it is just a linear sweep on the list of instructions. Then, deadce operates one BB at a time, scanning the code linearly and using the liveness bits as start/end conditions (so it is O(n) as well). Is there is a threshold that will disable the optimization from running? Massi's code is inside: if (cfg-opt MONO_OPT_LINEARS) { We already have: if ((cfg-num_varinfo 2000) !mono_compile_aot) { /* * we disable some optimizations if there are too many variables * because JIT time may become too expensive. The actual number needs * to be tweaked and eventually the non-linear algorithms should be fixed. */ cfg-opt = ~ (MONO_OPT_LINEARS | MONO_OPT_COPYPROP | MONO_OPT_CONSPROP); cfg-disable_ssa = TRUE; } Obviously though, that comment about tuning still applies. -- Ben ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list