On Fri, May 20, 2011 at 10:50:49AM -0700, Richard Henderson wrote: > On 05/20/2011 05:39 AM, Kirill Batuzov wrote: > > This series implements some basic machine-independent optimizations. They > > simplify code and allow liveness analysis do it's work better. > > > > Suppose we have following ARM code: > > > > movw r12, #0xb6db > > movt r12, #0xdb6d > > > > In TCG before optimizations we'll have: > > > > movi_i32 tmp8,$0xb6db > > mov_i32 r12,tmp8 > > mov_i32 tmp8,r12 > > ext16u_i32 tmp8,tmp8 > > movi_i32 tmp9,$0xdb6d0000 > > or_i32 tmp8,tmp8,tmp9 > > mov_i32 r12,tmp8 > > > > And after optimizations we'll have this: > > > > movi_i32 r12,$0xdb6db6db > > > > Here are performance evaluation results on SPEC CPU2000 integer tests in > > user-mode emulation on x86_64 host. There were 5 runs of each test on > > reference data set. The tables below show runtime in seconds for all these > > runs. > > I totally agree that this sort of optimization is needed in TCG. Essentially > all RISC guests have the same problem. When emulating one RISC upon another, > the problem may be exacerbated. E.g. Sparc on PPC -- sparc will use a 21/11 > bit split of the constant, ppc will use a 16/16 split of the constant, which > results in 3 insns in the generated code where 2 would do. > > You should be aware of prior work in this area by Aurelien Jarno: > > git://git.aurel32.net/qemu.git tcg-optimizations > > Given that's now 2 years old, and doesn't seem to be progressing, I hope your > patch series can get things going again...
I basically stopped working on constant propagation, as while the TCG code looked nicer, the resulting code was always slower. Since the discussion about TCG_AREG0, I have started to work again on the register allocation (see the first patch series I sent about that), I hope to have something ready by the end of the week-end. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net