> On Feb 18, 2020, at 12:10 PM, BALATON Zoltan <bala...@eik.bme.hu> wrote: > > While other targets take advantage of using host FPU to do floating > point computations, this was disabled for PPC target because always > clearing exception flags before every FP op made it slightly slower > than emulating everyting with softfloat. To emulate some FPSCR bits, > clearing of fp_status may be necessary (unless these could be handled > e.g. using FP exceptions on host but there's no API for that in QEMU > yet) but preserving at least the inexact flag makes hardfloat usable > and faster than softfloat. Since most clients don't actually care > about this flag, we can gain some speed trading some emulation > accuracy. > > This patch implements a simple way to keep the inexact flag set for > hardfloat while still allowing to revert to softfloat for workloads > that need more accurate albeit slower emulation. (Set hardfloat > property of CPU, i.e. -cpu name,hardfloat=false for that.) There may > still be room for further improvement but this seems to increase > floating point performance. Unfortunately the softfloat case is slower > than before this patch so this patch only makes sense if the default > is also set to enable hardfloat. > > Because of the above this patch at the moment is mainly for testing > different workloads to evaluate how viable would this be in practice. > Thus, RFC and not ready for merge yet. > > Signed-off-by: BALATON Zoltan <bala...@eik.bme.hu> > --- > v2: use different approach to avoid needing if () in > helper_reset_fpstatus() but this does not seem to change overhead > much, also make it a single patch as adding the hardfloat option is > only a few lines; with this we can use same value at other places where > float_status is reset and maybe enable hardfloat for a few more places > for a little more performance but not too much. With this I got:
<snip> Thank you for working on this. It is about time we have a better FPU. I applied your patch over David Gibson's ppc-for-5.0 branch. It applied cleanly and compiled easily. Tests were done on a Mac OS 10.4.3 VM. The CPU was set to G3. I did several tests and here are my results: With hard float: - The USB audio device does not produce any sound. - Converting a MIDI file to AAC in iTunes happens at 0.4x (faster than soft float :) ). For my FPSCR test program, 21 tests failed. The high number is because the inexact exception is being set for situations it should not be set for. With soft float: - Some sound can be heard from the USB audio device. It isn't good sounding. I had to force quit Quicktime player because it stopped working. - Converting a MIDI file to AAC in iTunes happens at 0.3x (slower than hard float). - 13 tests failed with my FPSCR test program. This patch is a good start. I'm not worried about the Floating Point Status and Control Register flags being wrong since hardly any software bothers to check them. I think more optimizations can happen by simplifying the FPU. As it is now it makes a lot of calls per operation.