On Wed, Oct 31, 2012 at 7:04 AM, Yeongkyoon Lee <yeongkyoon....@samsung.com> wrote: > Here is the 8th version of the series optimizing TCG qemu_ld/st code > generation.
Thanks, applied all. > > v8: > - Rebase > > v7: > - Rebase and fix mistyping > > v6: > - Remove an extra argument of return addr from MMU helpers > Instead, embed the fast path addr to the slow path for helpers to use it > - Change some bitwise operations to bitfields of structure > - Change the name of function which handles finalization of TB code > generation > > v5: > - Remove RFC tag > > v4: > - Remove CONFIG_SOFTMMU pre-condition from configure > - Instead, add some CONFIG_SOFTMMU condition to TCG sources > - Remove some unnecessary comments > > v3: > - Support CONFIG_TCG_PASS_AREG0 > (expected to get more performance enhancement than others) > - Remove the configure option "--enable-ldst-optimization"" > - Make the optimization as default on i386 and x86_64 hosts > - Fix some mistyping and apply checkpatch.pl before committing > - Test i386, arm and sparc softmmu targets on i386 and x86_64 hosts > - Test linux-user-test-0.3 > > v2: > - Follow the submit rule of qemu > > v1: > - Initial commit request > > I think the generated codes from qemu_ld/st IRs are relatively heavy, which > are > up to 12 instructions for TLB hit case on i386 host. > This patch series enhance the code quality of TCG qemu_ld/st IRs by reducing > jump and enhancing locality. > Main idea is simple and has been already described in the comments in > tcg-target.c, which separates slow path (TLB miss case), and generates it at > the > end of TB. > > For example, the generated code from qemu_ld changes as follow. > Before: > (1) TLB check > (2) If hit fall through, else jump to TLB miss case (5) > (3) TLB hit case: Load value from host memory > (4) Jump to next code (6) > (5) TLB miss case: call MMU helper > (6) ... (next code) > > After: > (1) TLB check > (2) If hit fall through, else jump to TLB miss case (5) > (3) TLB hit case: Load value from host memory > (4) ... (next code) > ... > (5) TLB miss case: call MMU helper > (6) Jump to (8) > (7) [embedded addr of (4)] <- never executed but read by MMU helpers > (8) Return to next code (4) > > Following is some performance results measured based on qemu 1.0. > Although there was measurement error, the results was not negligible. > > * EEMBC CoreMark (before -> after) > - Guest: i386, Linux (Tizen platform) > - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux > - Results: 1135.6 -> 1179.9 (+3.9%) > > * nbench (before -> after) > - Guest: i386, Linux (linux-0.2.img included in QEMU source) > - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux > - Results > . MEMORY INDEX: 1.6782 -> 1.6818 (+0.2%) > . INTEGER INDEX: 1.8258 -> 1.877 (+2.8%) > . FLOATING-POINT INDEX: 0.5944 -> 0.5954 (+0.2%) > > Summarized features: > - The changes are wrapped by macro "CONFIG_QEMU_LDST_OPTIMIZATION" and > they are enabled by default on i386/x86_64 hosts > - Forced removal of the macro will cause compilation error on i386/x86_64 > hosts > - No implementations other than i386/x86_64 hosts yet > > In addition, I have tried to remove the generated codes of calling MMU helpers > for TLB miss case from end of TB, however, have not found good solution yet. > In my opinion, TLB hit case performance could be degraded if removing the > calling codes, because it needs to set runtime parameters, such as, data, > mmu index and return address, in register or stack though they are not used > in TLB hit case. > This remains as a further issue. > > Yeongkyoon Lee (3): > configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st > optimization > tcg: Add extended GETPC mechanism for MMU helpers with ldst > optimization > tcg: Optimize qemu_ld/st by generating slow paths at the end of a > block > > configure | 6 + > exec-all.h | 36 +++++ > exec.c | 11 ++ > softmmu_template.h | 16 +- > tcg/i386/tcg-target.c | 404 > ++++++++++++++++++++++++++++++++++--------------- > tcg/tcg.c | 12 ++ > tcg/tcg.h | 30 ++++ > 7 files changed, 381 insertions(+), 134 deletions(-) > > -- > 1.7.9.5 >