Re: [Qemu-devel] Expansion Ratio Issue

2014-06-03 Thread Sergey Fedorov
On 29.05.2014 13:04, Peter Maydell wrote:
 No, we don't in general have any benchmarking of TCG
 codegen. I think if we did do benchmarking we'd be interested
 in performance benchmarking -- code expansion ratio doesn't
 seem like a very interesting thing to measure to me.

Hi,

I have a plan to play with TCG performance benchmarking. And then try to
implement some optimizations. So maybe there would be some suggestions
on how to perform such benchmarking? What tests seems to be appropriate
for this task? I think the benchmarking should reflect real TCG use
cases. So what the most typical use cases for TCG are there? Seems that
system and user modes may be different from this point.

Appreciate any help.

Thanks,
Sergey.



[Qemu-devel] Expansion Ratio Issue

2014-05-29 Thread Chaos Shu
Hi all

 

I'm new to the list, and recently I'm digging in Qemu's source code, I've
got something confused me much, simply list the items:

 

1.   Any benchmarks paying attention to TCG code generate quality
measured by code expansion ratio? Of course I've got some news said that the
ratio maybe 4 or 5 in X86 to MIPS, that is to say 1 x86 insn to 4 or 5 mips
insns, Does it mean the industry level or average level? Any official report
given?

2.   I've noticed that once Apple merge from PowerPC to X86, they
developed the software named Rosetta which is described by apple to be
successful, is it the same to Qemu? Any internal infos covered?

3.   Assume that we just wanna x86 to arm, so may we can strip out the
little operations and work on insn to insn such as move in x86 to move in
arm, insn level translate but not insn-op-insn, I think there must be
someone have ever made this try, anyone got their news?

4.   Why Qemu use only one TCG runtime, I found a project named PQEMU
once try to make TCG running on multicore but it's out of date and got some
commercial issues, is there any project trying to make it go?

 

Case I'm really new to Qemu, maybe many results or achievements I don't
know, If anyone can provide some tips about things mentioned above, I'll
appreciate much.

 

Thanks

Chaos



Re: [Qemu-devel] Expansion Ratio Issue

2014-05-29 Thread Peter Maydell
On 29 May 2014 08:58, Chaos Shu chaos.s...@live.com wrote:
 1.   Any benchmarks paying attention to TCG code generate quality
 measured by code expansion ratio? Of course I’ve got some news said that the
 ratio maybe 4 or 5 in X86 to MIPS, that is to say 1 x86 insn to 4 or 5 mips
 insns, Does it mean the industry level or average level? Any official report
 given?

No, we don't in general have any benchmarking of TCG
codegen. I think if we did do benchmarking we'd be interested
in performance benchmarking -- code expansion ratio doesn't
seem like a very interesting thing to measure to me.

 2.   I’ve noticed that once Apple merge from PowerPC to X86, they
 developed the software named Rosetta which is described by apple to be
 successful, is it the same to Qemu? Any internal infos covered?

It's a similar concept, though as I understand it it focused
on doing translation for a single application (like QEMU's
linux-user mode, not like our system emulation mode). I have
no idea about its internal design.

 3.   Assume that we just wanna x86 to arm, so may we can strip out the
 little operations and work on insn to insn such as move in x86 to move in
 arm, insn level translate but not insn-op-insn, I think there must be
 someone have ever made this try, anyone got their news?

Certainly if you started from scratch with the intention of
doing a more specifically targeted design (and in particular
if you wanted to do single-application translation as your core
focus rather than as a bolt-on extension to system emulation)
you could probably get better performance than QEMU. QEMU
generally aims to be a general-purpose project, though.

Personally I would (even if doing only x86-to-ARM) still
include an intermediate representation of some form: the
history of compiler design shows that it has a lot of utility.

 4.   Why Qemu use only one TCG runtime, I found a project named PQEMU
 once try to make TCG running on multicore but it’s out of date and got some
 commercial issues, is there any project trying to make it go?

Not that I currently know of. Truly parallel TCG execution
of multiple guest cores is a hard problem, especially if you
want to produce maintainable solid code that can be included
upstream, rather than just enough of a prototype to demonstrate
proof of concept and run some simple benchmarks for an academic
paper.

thanks
-- PMM



Re: [Qemu-devel] Expansion Ratio Issue

2014-05-29 Thread Alex Bennée

Peter Maydell peter.mayd...@linaro.org writes:

 On 29 May 2014 08:58, Chaos Shu chaos.s...@live.com wrote:
 1.   Any benchmarks paying attention to TCG code generate quality
 measured by code expansion ratio? Of course I’ve got some news said that the
 ratio maybe 4 or 5 in X86 to MIPS, that is to say 1 x86 insn to 4 or 5 mips
 insns, Does it mean the industry level or average level? Any official report
 given?

 No, we don't in general have any benchmarking of TCG
 codegen. I think if we did do benchmarking we'd be interested
 in performance benchmarking -- code expansion ratio doesn't
 seem like a very interesting thing to measure to me.

Not to mention that raw instruction counts are probably misleading
compared to the effect you can get from instruction ordering and cache
behaviour.

 2.   I’ve noticed that once Apple merge from PowerPC to X86, they
 developed the software named Rosetta which is described by apple to be
 successful, is it the same to Qemu? Any internal infos covered?

 It's a similar concept, though as I understand it it focused
 on doing translation for a single application (like QEMU's
 linux-user mode, not like our system emulation mode). I have
 no idea about its internal design.

Rosetta was based on QuickTransit from Transitive (since bought by IBM).
It's broadly analogous to QEMU's linux-user mode emulation although it
attempted a more complete separation between translated and native
processes. In the QEMU world all the processes can see each other
which can cause issues if they are expecting certain endianess on
wire-formats.

The biggest difference internally is the translator was IR based, it
built up a DAG of operations which it manipulated/optimised much like a
compiler does before generating the final target code. QEMU instead
generates a simple set of TCG ops for each instruction which after a
little optimisation spits out a set of target instructions.

QuickTransit was certainly pretty high performance compared what else
was out there at the time. From memory it implemented a number of other
features to get this speed:

 * Group blocks - hot paths of basic blocks would be regenerated as a
   single block.
 * Block cache - it would cache previous translations to avoid heavy
   start-up cost
 * Native binding - it could optionally detect special code paths in
   translated code and replace them with a direct binding to a native
   code (e.g. call the native memcpy rather than translate the guest
   version).


 3.   Assume that we just wanna x86 to arm, so may we can strip out the
 little operations and work on insn to insn such as move in x86 to move in
 arm, insn level translate but not insn-op-insn, I think there must be
 someone have ever made this try, anyone got their news?

 Certainly if you started from scratch with the intention of
 doing a more specifically targeted design (and in particular
 if you wanted to do single-application translation as your core
 focus rather than as a bolt-on extension to system emulation)
 you could probably get better performance than QEMU. QEMU
 generally aims to be a general-purpose project, though.

 Personally I would (even if doing only x86-to-ARM) still
 include an intermediate representation of some form: the
 history of compiler design shows that it has a lot of utility.

 4.   Why Qemu use only one TCG runtime, I found a project named PQEMU
 once try to make TCG running on multicore but it’s out of date and got some
 commercial issues, is there any project trying to make it go?

In the linux-user case you do utilise multiple core with multiple
instances of QEMU running (which is handy for package building type
tasks). However fixing QEMU for fully multi-threaded translation is a
hard task. You may even find you don't get that much from it as ideally
you should spend more time running translated code than doing the
transaltion.


 Not that I currently know of. Truly parallel TCG execution
 of multiple guest cores is a hard problem, especially if you
 want to produce maintainable solid code that can be included
 upstream, rather than just enough of a prototype to demonstrate
 proof of concept and run some simple benchmarks for an academic
 paper.


 thanks
 -- PMM

-- 
Alex Bennée