* Richard Henderson (r...@twiddle.net) wrote: Have you considered contributing something similar to this to glibc? I filed https://sourceware.org/bugzilla/show_bug.cgi?id=19920 a while back suggesting it would be useful to have it in libc to be used by things other than just qemu.
Dave > Changes from v2 to v3: > > * Unit testing. This includes having x86 attempt all versions of > the accelerator that will run on the hardware. Thus an avx2 host > will run the basic test 5 times (1.5sec on my laptop). > > * Drop the ppc and aarch64 specializations. I have improved the > basic integer version to the point that those vectorized versions > are not a win. > > In the case of my aarch64 mustang, the integer version is 4 times > faster than the neon version that I delete. With effort I was > able to rewrite the neon version to come to within a factor of 1.1, > but it remained slower than the integer. To be fair, gcc6 makes > very good use of ldp, so the integer path is *also* loading 16 bytes > per insn. > > I can forward my standalone aarch64 benchmark if anyone is interested. > > Note however that at least the avx2 acceleration is still very much > a win, being about 3 times faster on my laptop. Of course, it's > handling 4 times as much data per loop as the integer version, so > one can still see the overhead caused by using vector insns. > > For grins I wrote an avx512 version, if someone has a skylake upon > which to test and benchmark. That requires additional configure > checks, so I didn't bother to include it here. > > > r~ > > > Richard Henderson (9): > cutils: Move buffer_is_zero and subroutines to a new file > cutils: Remove SPLAT macro > cutils: Export only buffer_is_zero > cutils: Rearrange buffer_is_zero acceleration > cutils: Add test for buffer_is_zero > cutils: Add generic prefetch > cutils: Rewrite x86 buffer zero checking > cutils: Remove aarch64 buffer zero checking > cutils: Remove ppc buffer zero checking > > configure | 21 +-- > include/qemu/cutils.h | 3 +- > migration/ram.c | 2 +- > migration/rdma.c | 5 +- > tests/Makefile.include | 3 + > tests/test-bufferiszero.c | 78 +++++++++++ > util/Makefile.objs | 1 + > util/bufferiszero.c | 332 > ++++++++++++++++++++++++++++++++++++++++++++++ > util/cutils.c | 244 ---------------------------------- > 9 files changed, 423 insertions(+), 266 deletions(-) > create mode 100644 tests/test-bufferiszero.c > create mode 100644 util/bufferiszero.c > > -- > 2.7.4 > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK