Re: [Qemu-devel] emulated ARM performance vs real processor ?

2011-09-04 Thread Antti P Miettinen
Julien Heyman bidsom...@gmail.com writes:
 Hi,

 I was wondering if anyone had some data regarding the relative performance of
 any given ARM board emulated in QEMU versus the real thing. Yes, I do know
 this depends a lot on the host PC running qemu, but some ballpark/example
 figures would help. Say, I emulate a 400 Mhz ARM9 processor on a Core2Duo
 laptop @ 2 Ghz, what kind of performance/timing ratio should I expect, one way
 or the other ? For example, for boot time.
 I have no idea whether the overhead of emulation is over-compensated by the
 huge processing power of the host compared to the real HW target, and by which
 factor.

 Regards,
 Julien


Taking a look at:

http://adt.cs.upb.de/quf/quf2011_proceedings.pdf

page 20 (24th page in the PDF), figure 1b, the noprof bars, I'd expect
2GHz host to be on average faster than native target. The emulation
speed depends on how core intensive vs memory intensive your workload
is. Workloads that are memory bound in the target (e.g. gzip ASCII
compression) can me emulated much faster (e.g. factor of two) than core
bound workloads (e.g. mcrypt encryption).

--
http://www.iki.fi/~ananaza/




Re: [Qemu-devel] emulated ARM performance vs real processor ?

2011-09-04 Thread Peter Maydell
On 4 September 2011 18:42, Antti P Miettinen anan...@iki.fi wrote:
 The emulation
 speed depends on how core intensive vs memory intensive your workload
 is. Workloads that are memory bound in the target (e.g. gzip ASCII
 compression) can me emulated much faster (e.g. factor of two) than core
 bound workloads (e.g. mcrypt encryption).

Another factor is that if the workload makes heavy use of floating point
or SIMD instructions (VFP and Neon) then QEMU will do comparatively worse
than for a pure integer workload, because we have to emulate all the
fp calculations in software.

-- PMM



Re: [Qemu-devel] emulated ARM performance vs real processor ?

2011-09-02 Thread David Gilbert
On 1 September 2011 08:32, Julien Heyman bidsom...@gmail.com wrote:
 Hi,

 I was wondering if anyone had some data regarding the relative performance
 of any given ARM board emulated in QEMU versus the real thing. Yes, I do
 know this depends a lot on the host PC running qemu, but some
 ballpark/example figures would help. Say, I emulate a 400 Mhz ARM9 processor
 on a Core2Duo laptop @ 2 Ghz, what kind of performance/timing ratio should I
 expect, one way or the other ? For example, for boot time.
 I have no idea whether the overhead of emulation is over-compensated by the
 huge processing power of the host compared to the real HW target, and by
 which factor.

Comparing performance is always a bit tricky, and I've not really got
a solid set of benchmarks
ready to run to try it but to give some numbers:

1) Boot times
   Comparing the Linaro 11.08 ubuntu desktop images, time to boot to desktop

   Real Panda board (dual core A9 at 1GHz, 1GB RAM, running off SD
card) - 2minutes to desktop
   QEMU vexpress (2xA9 core, 1GB RAM, emulated sd card, running on a
Core2 Duo T9400 2.53GHz laptop) - 3minutes to desktop

   (The times are scarily close to exact minutes - timeout somewhere?)
   Now, QEMU system mode only ever uses one host core when emulating
multiple cores, so there is a factor 2 disadvantage there, but
on the plus side the memory bandwidth of the host and the disk speed
is probably much higher than the Panda.

2) Simple md5sum benchmark
   As a really simple benchmark the test:

time (dd if=/dev/zero bs=1024k count=1000 | md5sum)

Panda board 14.5s real, 10.7 user, 3.8s system
Emulated Overo board (single A8 processor on same laptop as above)
- 41s real, 24.7s user, 16.4s system
User mode emulated - 14.2s real, 14s user, 0.5s system
Native on x86 host - 3.2s real, 2.5s user, 1.2s system

So, that's two sets of pretty bogus dummy simple benchmarks!

I suppose one observation is that the boot time isn't that bad
compared to the real (different) hardware, the user mode emulation
was comparable to the Panda, but the system emulation on a simple test
seems a lot slower.

These things will vary wildly depending what your benchmark is; but as
a summary I'd say that the ARM system mode emulation is
fast enough to use interactively but CPU wise is noticeably slower
than user mode emulation.

Dave



Re: [Qemu-devel] emulated ARM performance vs real processor ?

2011-09-02 Thread Julien Heyman
Thanks Dave.
I use system emulation, and my main concern is just to know that the
actual board will run faster than the emulation. So based on your example,
and even though my target board (mini2440) is nowhere as fast as a Panda
board, this should be the case by a comfortable margin. Now, as I am
focusing on boot time, the time to read from flash (i.e. much faster in the
emulated context than on the real flash) will counter-balance this a lot.
Hopefully these two factors will even out and what I measure now will not be
dramatically different than what I will get on the real board, but...we'll
see.

Regards,
Julien

On Fri, Sep 2, 2011 at 4:31 PM, David Gilbert david.gilb...@linaro.orgwrote:

 On 1 September 2011 08:32, Julien Heyman bidsom...@gmail.com wrote:
  Hi,
 
  I was wondering if anyone had some data regarding the relative
 performance
  of any given ARM board emulated in QEMU versus the real thing. Yes, I do
  know this depends a lot on the host PC running qemu, but some
  ballpark/example figures would help. Say, I emulate a 400 Mhz ARM9
 processor
  on a Core2Duo laptop @ 2 Ghz, what kind of performance/timing ratio
 should I
  expect, one way or the other ? For example, for boot time.
  I have no idea whether the overhead of emulation is over-compensated by
 the
  huge processing power of the host compared to the real HW target, and by
  which factor.

 Comparing performance is always a bit tricky, and I've not really got
 a solid set of benchmarks
 ready to run to try it but to give some numbers:

 1) Boot times
   Comparing the Linaro 11.08 ubuntu desktop images, time to boot to desktop

   Real Panda board (dual core A9 at 1GHz, 1GB RAM, running off SD
 card) - 2minutes to desktop
   QEMU vexpress (2xA9 core, 1GB RAM, emulated sd card, running on a
 Core2 Duo T9400 2.53GHz laptop) - 3minutes to desktop

   (The times are scarily close to exact minutes - timeout somewhere?)
   Now, QEMU system mode only ever uses one host core when emulating
 multiple cores, so there is a factor 2 disadvantage there, but
 on the plus side the memory bandwidth of the host and the disk speed
 is probably much higher than the Panda.

 2) Simple md5sum benchmark
   As a really simple benchmark the test:

time (dd if=/dev/zero bs=1024k count=1000 | md5sum)

Panda board 14.5s real, 10.7 user, 3.8s system
Emulated Overo board (single A8 processor on same laptop as above)
 - 41s real, 24.7s user, 16.4s system
User mode emulated - 14.2s real, 14s user, 0.5s system
Native on x86 host - 3.2s real, 2.5s user, 1.2s system

 So, that's two sets of pretty bogus dummy simple benchmarks!

 I suppose one observation is that the boot time isn't that bad
 compared to the real (different) hardware, the user mode emulation
 was comparable to the Panda, but the system emulation on a simple test
 seems a lot slower.

 These things will vary wildly depending what your benchmark is; but as
 a summary I'd say that the ARM system mode emulation is
 fast enough to use interactively but CPU wise is noticeably slower
 than user mode emulation.

 Dave



Re: [Qemu-devel] emulated ARM performance vs real processor ?

2011-09-02 Thread David Gilbert
On 2 September 2011 17:04, Julien Heyman bidsom...@gmail.com wrote:
 Thanks Dave.
 I use system emulation, and my main concern is just to know that the
 actual board will run faster than the emulation. So based on your example,
 and even though my target board (mini2440) is nowhere as fast as a Panda
 board, this should be the case by a comfortable margin.

OK, but be careful - you will occasionally trip over something where the
emulation of it is particularly dire and the real board might be faster;
for example with the default flags SD card writes can be a factor of 10 slower
than real hardware, so relying on the real hardware always being faster
is dangerous. You'll probably get similar CPU emulation artefacts where
there are some instructions that are particularly nasty to emulate but
really cheap on the hardware.

Dave



Re: [Qemu-devel] emulated ARM performance vs real processor ?

2011-09-02 Thread M P
On Fri, Sep 2, 2011 at 5:04 PM, Julien Heyman bidsom...@gmail.com wrote:
 Thanks Dave.
 I use system emulation, and my main concern is just to know that the
 actual board will run faster than the emulation. So based on your example,
 and even though my target board (mini2440) is nowhere as fast as a Panda
 board, this should be the case by a comfortable margin. Now, as I am
 focusing on boot time, the time to read from flash (i.e. much faster in the
 emulated context than on the real flash) will counter-balance this a lot.
 Hopefully these two factors will even out and what I measure now will not be
 dramatically different than what I will get on the real board, but...we'll
 see.

I wrote the mini2440 support for qemu and used it a LOT, and it can
pretty easily emulate full speed on a core2. Some stuff is a bit
slower, but most is quite a bit faster somehow.
Note that emulates more than an armv4t, so if the code you run is not
compiled properly, it might just work in qemu, and fail miserably on
the real hardware..

Michael

 Regards,
 Julien

 On Fri, Sep 2, 2011 at 4:31 PM, David Gilbert david.gilb...@linaro.org
 wrote:

 On 1 September 2011 08:32, Julien Heyman bidsom...@gmail.com wrote:
  Hi,
 
  I was wondering if anyone had some data regarding the relative
  performance
  of any given ARM board emulated in QEMU versus the real thing. Yes, I do
  know this depends a lot on the host PC running qemu, but some
  ballpark/example figures would help. Say, I emulate a 400 Mhz ARM9
  processor
  on a Core2Duo laptop @ 2 Ghz, what kind of performance/timing ratio
  should I
  expect, one way or the other ? For example, for boot time.
  I have no idea whether the overhead of emulation is over-compensated by
  the
  huge processing power of the host compared to the real HW target, and by
  which factor.

 Comparing performance is always a bit tricky, and I've not really got
 a solid set of benchmarks
 ready to run to try it but to give some numbers:

 1) Boot times
   Comparing the Linaro 11.08 ubuntu desktop images, time to boot to
 desktop

   Real Panda board (dual core A9 at 1GHz, 1GB RAM, running off SD
 card) - 2minutes to desktop
   QEMU vexpress (2xA9 core, 1GB RAM, emulated sd card, running on a
 Core2 Duo T9400 2.53GHz laptop) - 3minutes to desktop

   (The times are scarily close to exact minutes - timeout somewhere?)
   Now, QEMU system mode only ever uses one host core when emulating
 multiple cores, so there is a factor 2 disadvantage there, but
 on the plus side the memory bandwidth of the host and the disk speed
 is probably much higher than the Panda.

 2) Simple md5sum benchmark
   As a really simple benchmark the test:

    time (dd if=/dev/zero bs=1024k count=1000 | md5sum)

    Panda board 14.5s real, 10.7 user, 3.8s system
    Emulated Overo board (single A8 processor on same laptop as above)
 - 41s real, 24.7s user, 16.4s system
    User mode emulated - 14.2s real, 14s user, 0.5s system
    Native on x86 host - 3.2s real, 2.5s user, 1.2s system

 So, that's two sets of pretty bogus dummy simple benchmarks!

 I suppose one observation is that the boot time isn't that bad
 compared to the real (different) hardware, the user mode emulation
 was comparable to the Panda, but the system emulation on a simple test
 seems a lot slower.

 These things will vary wildly depending what your benchmark is; but as
 a summary I'd say that the ARM system mode emulation is
 fast enough to use interactively but CPU wise is noticeably slower
 than user mode emulation.

 Dave





[Qemu-devel] emulated ARM performance vs real processor ?

2011-09-01 Thread Julien Heyman
 Hi,

I was wondering if anyone had some data regarding the relative performance
of any given ARM board emulated in QEMU versus the real thing. Yes, I do
know this depends a lot on the host PC running qemu, but some
ballpark/example figures would help. Say, I emulate a 400 Mhz ARM9 processor
on a Core2Duo laptop @ 2 Ghz, what kind of performance/timing ratio should I
expect, one way or the other ? For example, for boot time.
I have no idea whether the overhead of emulation is over-compensated by the
huge processing power of the host compared to the real HW target, and by
which factor.

Regards,
Julien