Thanks, applied all.
On Thu, Jul 7, 2011 at 3:37 PM, Kirill Batuzov <batuz...@ispras.ru> wrote:
> This series implements some basic machine-independent optimizations. They
> simplify code and allow liveness analysis do it's work better.
>
> Suppose we have following ARM code:
>
> movw r12, #0xb6db
> movt r12, #0xdb6d
>
> In TCG before optimizations we'll have:
>
> movi_i32 tmp8,$0xb6db
> mov_i32 r12,tmp8
> mov_i32 tmp8,r12
> ext16u_i32 tmp8,tmp8
> movi_i32 tmp9,$0xdb6d0000
> or_i32 tmp8,tmp8,tmp9
> mov_i32 r12,tmp8
>
> And after optimizations we'll have this:
>
> movi_i32 r12,$0xdb6db6db
>
> Here are performance evaluation results on SPEC CPU2000 integer tests in
> user-mode emulation on x86_64 host. There were 5 runs of each test on
> reference data set. The tables below show runtime in seconds for all these
> runs.
>
> ARM guest without optimizations:
> Test name #1 #2 #3 #4 #5 Median
> 164.gzip 1408.891 1402.323 1407.623 1404.955 1405.396 1405.396
> 175.vpr 1245.31 1248.758 1247.936 1248.534 1247.534 1247.936
> 176.gcc 912.561 809.481 847.057 912.636 912.544 912.544
> 181.mcf 198.384 197.841 199.127 197.976 197.29 197.976
> 186.crafty 1545.881 1546.051 1546.002 1545.927 1545.945 1545.945
> 197.parser 3779.954 3779.878 3779.79 3779.94 3779.88 3779.88
> 252.eon 2563.168 2776.152 2776.395 2776.577 2776.202 2776.202
> 253.perlbmk 2591.781 2504.078 2507.07 2591.337 2463.401 2507.07
> 256.bzip2 1306.197 1304.639 1184.853 1305.141 1305.606 1305.141
> 300.twolf 2918.984 2918.926 2918.93 2918.97 2918.914 2918.93
>
> ARM guest with optimizations:
> Test name #1 #2 #3 #4 #5 Median Gain
> 164.gzip 1401.198 1376.337 1401.117 1401.23 1401.246 1401.198 0.30%
> 175.vpr 1247.964 1151.468 1247.76 1154.419 1242.017 1242.017 0.47%
> 176.gcc 896.882 918.546 918.297 851.465 918.39 918.297 -0.63%
> 181.mcf 198.19 197.399 198.421 198.663 198.312 198.312 -0.17%
> 186.crafty 1520.425 1520.362 1520.477 1520.445 1520.957 1520.445 1.65%
> 197.parser 3770.943 3770.927 3770.578 3771.048 3770.904 3770.927 0.24%
> 252.eon 2752.371 2752.111 2752.005 2752.214 2752.109 2752.111 0.87%
> 253.perlbmk 2577.462 2578.588 2493.567 2578.571 2578.318 2578.318 -2.84%
> 256.bzip2 1296.198 1271.128 1296.044 1296.321 1296.147 1296.147 0.69%
> 300.twolf 2888.984 2889.023 2889.225 2889.039 2889.05 2889.039 1.02%
>
>
> x86_64 guest without optimizations:
> Test name #1 #2 #3 #4 #5 Median
> 164.gzip 857.654 857.646 857.678 798.119 857.675 857.654
> 175.vpr 959.265 959.207 959.185 959.461 959.332 959.265
> 176.gcc 625.722 637.257 646.638 646.614 646.56 646.56
> 181.mcf 221.666 220.194 220.079 219.868 221.5 220.194
> 186.crafty 1129.531 1129.739 1129.573 1129.588 1129.624 1129.588
> 197.parser 1809.517 1809.516 1809.386 1809.477 1809.427 1809.477
> 253.perlbmk 1774.944 1776.046 1769.865 1774.052 1775.236 1774.944
> 254.gap 1061.033 1061.158 1061.064 1061.047 1061.01 1061.047
> 255.vortex 1871.261 1914.144 1914.057 1914.086 1914.127 1914.086
> 256.bzip2 918.916 1011.828 1011.819 1012.11 1011.932 1011.828
> 300.twolf 1332.797 1330.56 1330.687 1330.917 1330.602 1330.687
>
> x86_64 guest with optimizations:
> Test name #1 #2 #3 #4 #5 Median Gain
> 164.gzip 806.198 854.159 854.184 854.168 854.187 854.168 0.41%
> 175.vpr 955.905 950.86 955.876 876.397 955.957 955.876 1.82%
> 176.gcc 641.663 640.189 641.57 641.552 641.514 641.552 0.03%
> 181.mcf 217.619 218.627 218.699 217.977 216.955 217.977 1.18%
> 186.crafty 1123.909 1123.852 1123.917 1123.781 1123.805 1123.852 0.51%
> 197.parser 1813.94 1814.643 1815.286 1814.445 1813.72 1814.445 -0.27%
> 253.perlbmk 1791.536 1795.642 1793.0 1797.486 1791.401 1793.0 -1.02%
> 254.gap 1070.605 1070.216 1070.637 1070.168 1070.491 1070.491 -0.89%
> 255.vortex 1918.764 1918.573 1917.411 1918.287 1918.735 1918.573 -0.23%
> 256.bzip2 1017.179 1017.083 1017.283 1016.913 1017.189 1017.179 -0.53%
> 300.twolf 1321.072 1321.109 1321.019 1321.072 1321.004 1321.072 0.72%
>
> ARM guests for 254.gap and 255.vortex and x86_64 guest for 252.eon does not
> work under QEMU for some unrelated reason.
>
> Changes:
> v1 -> v2
> - State and Vals arrays merged to an array of structures.
> - Added reference counting of temp's copies. This helps to reset temp's state
> faster in most cases.
> - Do not make copy propagation through operations with TCG_OPF_CALL_CLOBBER
> or
> TCG_OPF_SIDE_EFFECTS flag.
> - Split some expression simplifications into independent switch.
> - Let compiler handle signed shifts and sign/zero extends in it's
> implementation defined way.
>
> v2 -> v3
> - Elements of equiv class are placed in a double-linked circular list so it's
> easier to choose a new representative.
> - CASE_OP_32_64 macro is used to reduce amount of ifdefdsi. Checkpatch is not
> happy about this change but I do not think spaces would be appropriate here.
> - Some constraints during copy propagation are relaxed.
> - Functions tcg_opt_gen_mov and tcg_opt_gen_movi are introduced to reduce
> code
> duplication.
>
> Kirill Batuzov (6):
> Add TCG optimizations stub
> Add copy and constant propagation.
> Do constant folding for basic arithmetic operations.
> Do constant folding for boolean operations.
> Do constant folding for shift operations.
> Do constant folding for unary operations.
>
> Makefile.target | 2 +-
> tcg/optimize.c | 568
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> tcg/tcg.c | 6 +
> tcg/tcg.h | 3 +
> 4 files changed, 578 insertions(+), 1 deletions(-)
> create mode 100644 tcg/optimize.c
>
> --
> 1.7.4.1
>
>
>