Re: [Qemu-devel] emulated ARM performance vs real processor ?
Julien Heyman bidsom...@gmail.com writes: Hi, I was wondering if anyone had some data regarding the relative performance of any given ARM board emulated in QEMU versus the real thing. Yes, I do know this depends a lot on the host PC running qemu, but some ballpark/example figures would help. Say, I emulate a 400 Mhz ARM9 processor on a Core2Duo laptop @ 2 Ghz, what kind of performance/timing ratio should I expect, one way or the other ? For example, for boot time. I have no idea whether the overhead of emulation is over-compensated by the huge processing power of the host compared to the real HW target, and by which factor. Regards, Julien Taking a look at: http://adt.cs.upb.de/quf/quf2011_proceedings.pdf page 20 (24th page in the PDF), figure 1b, the noprof bars, I'd expect 2GHz host to be on average faster than native target. The emulation speed depends on how core intensive vs memory intensive your workload is. Workloads that are memory bound in the target (e.g. gzip ASCII compression) can me emulated much faster (e.g. factor of two) than core bound workloads (e.g. mcrypt encryption). -- http://www.iki.fi/~ananaza/
Re: [Qemu-devel] emulated ARM performance vs real processor ?
On 4 September 2011 18:42, Antti P Miettinen anan...@iki.fi wrote: The emulation speed depends on how core intensive vs memory intensive your workload is. Workloads that are memory bound in the target (e.g. gzip ASCII compression) can me emulated much faster (e.g. factor of two) than core bound workloads (e.g. mcrypt encryption). Another factor is that if the workload makes heavy use of floating point or SIMD instructions (VFP and Neon) then QEMU will do comparatively worse than for a pure integer workload, because we have to emulate all the fp calculations in software. -- PMM
Re: [Qemu-devel] emulated ARM performance vs real processor ?
On 1 September 2011 08:32, Julien Heyman bidsom...@gmail.com wrote: Hi, I was wondering if anyone had some data regarding the relative performance of any given ARM board emulated in QEMU versus the real thing. Yes, I do know this depends a lot on the host PC running qemu, but some ballpark/example figures would help. Say, I emulate a 400 Mhz ARM9 processor on a Core2Duo laptop @ 2 Ghz, what kind of performance/timing ratio should I expect, one way or the other ? For example, for boot time. I have no idea whether the overhead of emulation is over-compensated by the huge processing power of the host compared to the real HW target, and by which factor. Comparing performance is always a bit tricky, and I've not really got a solid set of benchmarks ready to run to try it but to give some numbers: 1) Boot times Comparing the Linaro 11.08 ubuntu desktop images, time to boot to desktop Real Panda board (dual core A9 at 1GHz, 1GB RAM, running off SD card) - 2minutes to desktop QEMU vexpress (2xA9 core, 1GB RAM, emulated sd card, running on a Core2 Duo T9400 2.53GHz laptop) - 3minutes to desktop (The times are scarily close to exact minutes - timeout somewhere?) Now, QEMU system mode only ever uses one host core when emulating multiple cores, so there is a factor 2 disadvantage there, but on the plus side the memory bandwidth of the host and the disk speed is probably much higher than the Panda. 2) Simple md5sum benchmark As a really simple benchmark the test: time (dd if=/dev/zero bs=1024k count=1000 | md5sum) Panda board 14.5s real, 10.7 user, 3.8s system Emulated Overo board (single A8 processor on same laptop as above) - 41s real, 24.7s user, 16.4s system User mode emulated - 14.2s real, 14s user, 0.5s system Native on x86 host - 3.2s real, 2.5s user, 1.2s system So, that's two sets of pretty bogus dummy simple benchmarks! I suppose one observation is that the boot time isn't that bad compared to the real (different) hardware, the user mode emulation was comparable to the Panda, but the system emulation on a simple test seems a lot slower. These things will vary wildly depending what your benchmark is; but as a summary I'd say that the ARM system mode emulation is fast enough to use interactively but CPU wise is noticeably slower than user mode emulation. Dave
Re: [Qemu-devel] emulated ARM performance vs real processor ?
Thanks Dave. I use system emulation, and my main concern is just to know that the actual board will run faster than the emulation. So based on your example, and even though my target board (mini2440) is nowhere as fast as a Panda board, this should be the case by a comfortable margin. Now, as I am focusing on boot time, the time to read from flash (i.e. much faster in the emulated context than on the real flash) will counter-balance this a lot. Hopefully these two factors will even out and what I measure now will not be dramatically different than what I will get on the real board, but...we'll see. Regards, Julien On Fri, Sep 2, 2011 at 4:31 PM, David Gilbert david.gilb...@linaro.orgwrote: On 1 September 2011 08:32, Julien Heyman bidsom...@gmail.com wrote: Hi, I was wondering if anyone had some data regarding the relative performance of any given ARM board emulated in QEMU versus the real thing. Yes, I do know this depends a lot on the host PC running qemu, but some ballpark/example figures would help. Say, I emulate a 400 Mhz ARM9 processor on a Core2Duo laptop @ 2 Ghz, what kind of performance/timing ratio should I expect, one way or the other ? For example, for boot time. I have no idea whether the overhead of emulation is over-compensated by the huge processing power of the host compared to the real HW target, and by which factor. Comparing performance is always a bit tricky, and I've not really got a solid set of benchmarks ready to run to try it but to give some numbers: 1) Boot times Comparing the Linaro 11.08 ubuntu desktop images, time to boot to desktop Real Panda board (dual core A9 at 1GHz, 1GB RAM, running off SD card) - 2minutes to desktop QEMU vexpress (2xA9 core, 1GB RAM, emulated sd card, running on a Core2 Duo T9400 2.53GHz laptop) - 3minutes to desktop (The times are scarily close to exact minutes - timeout somewhere?) Now, QEMU system mode only ever uses one host core when emulating multiple cores, so there is a factor 2 disadvantage there, but on the plus side the memory bandwidth of the host and the disk speed is probably much higher than the Panda. 2) Simple md5sum benchmark As a really simple benchmark the test: time (dd if=/dev/zero bs=1024k count=1000 | md5sum) Panda board 14.5s real, 10.7 user, 3.8s system Emulated Overo board (single A8 processor on same laptop as above) - 41s real, 24.7s user, 16.4s system User mode emulated - 14.2s real, 14s user, 0.5s system Native on x86 host - 3.2s real, 2.5s user, 1.2s system So, that's two sets of pretty bogus dummy simple benchmarks! I suppose one observation is that the boot time isn't that bad compared to the real (different) hardware, the user mode emulation was comparable to the Panda, but the system emulation on a simple test seems a lot slower. These things will vary wildly depending what your benchmark is; but as a summary I'd say that the ARM system mode emulation is fast enough to use interactively but CPU wise is noticeably slower than user mode emulation. Dave
Re: [Qemu-devel] emulated ARM performance vs real processor ?
On 2 September 2011 17:04, Julien Heyman bidsom...@gmail.com wrote: Thanks Dave. I use system emulation, and my main concern is just to know that the actual board will run faster than the emulation. So based on your example, and even though my target board (mini2440) is nowhere as fast as a Panda board, this should be the case by a comfortable margin. OK, but be careful - you will occasionally trip over something where the emulation of it is particularly dire and the real board might be faster; for example with the default flags SD card writes can be a factor of 10 slower than real hardware, so relying on the real hardware always being faster is dangerous. You'll probably get similar CPU emulation artefacts where there are some instructions that are particularly nasty to emulate but really cheap on the hardware. Dave
Re: [Qemu-devel] emulated ARM performance vs real processor ?
On Fri, Sep 2, 2011 at 5:04 PM, Julien Heyman bidsom...@gmail.com wrote: Thanks Dave. I use system emulation, and my main concern is just to know that the actual board will run faster than the emulation. So based on your example, and even though my target board (mini2440) is nowhere as fast as a Panda board, this should be the case by a comfortable margin. Now, as I am focusing on boot time, the time to read from flash (i.e. much faster in the emulated context than on the real flash) will counter-balance this a lot. Hopefully these two factors will even out and what I measure now will not be dramatically different than what I will get on the real board, but...we'll see. I wrote the mini2440 support for qemu and used it a LOT, and it can pretty easily emulate full speed on a core2. Some stuff is a bit slower, but most is quite a bit faster somehow. Note that emulates more than an armv4t, so if the code you run is not compiled properly, it might just work in qemu, and fail miserably on the real hardware.. Michael Regards, Julien On Fri, Sep 2, 2011 at 4:31 PM, David Gilbert david.gilb...@linaro.org wrote: On 1 September 2011 08:32, Julien Heyman bidsom...@gmail.com wrote: Hi, I was wondering if anyone had some data regarding the relative performance of any given ARM board emulated in QEMU versus the real thing. Yes, I do know this depends a lot on the host PC running qemu, but some ballpark/example figures would help. Say, I emulate a 400 Mhz ARM9 processor on a Core2Duo laptop @ 2 Ghz, what kind of performance/timing ratio should I expect, one way or the other ? For example, for boot time. I have no idea whether the overhead of emulation is over-compensated by the huge processing power of the host compared to the real HW target, and by which factor. Comparing performance is always a bit tricky, and I've not really got a solid set of benchmarks ready to run to try it but to give some numbers: 1) Boot times Comparing the Linaro 11.08 ubuntu desktop images, time to boot to desktop Real Panda board (dual core A9 at 1GHz, 1GB RAM, running off SD card) - 2minutes to desktop QEMU vexpress (2xA9 core, 1GB RAM, emulated sd card, running on a Core2 Duo T9400 2.53GHz laptop) - 3minutes to desktop (The times are scarily close to exact minutes - timeout somewhere?) Now, QEMU system mode only ever uses one host core when emulating multiple cores, so there is a factor 2 disadvantage there, but on the plus side the memory bandwidth of the host and the disk speed is probably much higher than the Panda. 2) Simple md5sum benchmark As a really simple benchmark the test: time (dd if=/dev/zero bs=1024k count=1000 | md5sum) Panda board 14.5s real, 10.7 user, 3.8s system Emulated Overo board (single A8 processor on same laptop as above) - 41s real, 24.7s user, 16.4s system User mode emulated - 14.2s real, 14s user, 0.5s system Native on x86 host - 3.2s real, 2.5s user, 1.2s system So, that's two sets of pretty bogus dummy simple benchmarks! I suppose one observation is that the boot time isn't that bad compared to the real (different) hardware, the user mode emulation was comparable to the Panda, but the system emulation on a simple test seems a lot slower. These things will vary wildly depending what your benchmark is; but as a summary I'd say that the ARM system mode emulation is fast enough to use interactively but CPU wise is noticeably slower than user mode emulation. Dave
[Qemu-devel] emulated ARM performance vs real processor ?
Hi, I was wondering if anyone had some data regarding the relative performance of any given ARM board emulated in QEMU versus the real thing. Yes, I do know this depends a lot on the host PC running qemu, but some ballpark/example figures would help. Say, I emulate a 400 Mhz ARM9 processor on a Core2Duo laptop @ 2 Ghz, what kind of performance/timing ratio should I expect, one way or the other ? For example, for boot time. I have no idea whether the overhead of emulation is over-compensated by the huge processing power of the host compared to the real HW target, and by which factor. Regards, Julien