пт, 11 янв. 2019 г. в 12:52, Peter Maydell <peter.mayd...@linaro.org>: > > On Thu, 10 Jan 2019 at 19:33, Matwey V. Kornilov > <matwey.korni...@gmail.com> wrote: > > I am running the same application compiled for aarch64 and armv7l on > > x86_64 platform using qemu-user-linux tools. > > > > I see dramatic performance difference (30 times) between emulated > > architectures: aarch64 runs for ~4 minutes, armv7l runs for ~2 hours. > > I do understand that CPU architecture emulation is inherently slow > > thing, but my question is about the difference. > > > > How could I debug to understand what is the reason for such a big > > difference? I've already tried to run stress-ng compiled for this two > > architectures, but it leads to the same performance per second. > > > > I am running qemu 2.11, should I try other version? > > Yes, do try 3.1 -- we have done some overall TCG performance > improvements.
Indeed, qemu-arm from master runs for 4 minutes where 2.11 runs for 2 hours for me. It is impressive improvement. > > For a big difference between target architectures like that, > I would try starting by using some host performance tools on > the two runs to see where all the time is being taken in > the armv7l guest run -- is it all in translated guest code, > or is there more time (proportionally) spent in particular > parts of the QEMU C code? Does the armv7l version do > many more or different syscalls (check with the QEMU -strace > option) ? > > Also you should check performance on h/w 32 bit vs > 64-bit Arm if you can, to confirm that it's not just > that the guest application runs much slower there. > (If you don't have the arm hardware you could at least > check x86 32-bit vs 64-bit.) > > thanks > -- PMM -- With best regards, Matwey V. Kornilov