On 1 July 2016 at 23:07, Richard Henderson <r...@twiddle.net> wrote: > On 06/30/2016 06:45 AM, Peter Maydell wrote: >> >> On 29 June 2016 at 09:47, <vija...@cavium.com> wrote: >>> >>> From: Vijay <vija...@cavium.com> >>> >>> Use Neon instructions to perform zero checking of >>> buffer. This is helps in reducing total migration time. >> >> >>> diff --git a/util/cutils.c b/util/cutils.c >>> index 5830a68..4779403 100644 >>> --- a/util/cutils.c >>> +++ b/util/cutils.c >>> @@ -184,6 +184,13 @@ int qemu_fdatasync(int fd) >>> #define SPLAT(p) _mm_set1_epi8(*(p)) >>> #define ALL_EQ(v1, v2) (_mm_movemask_epi8(_mm_cmpeq_epi8(v1, v2)) == >>> 0xFFFF) >>> #define VEC_OR(v1, v2) (_mm_or_si128(v1, v2)) >>> +#elif __aarch64__ >>> +#include "arm_neon.h" >>> +#define VECTYPE uint64x2_t >>> +#define ALL_EQ(v1, v2) \ >>> + ((vgetq_lane_u64(v1, 0) == vgetq_lane_u64(v2, 0)) && \ >>> + (vgetq_lane_u64(v1, 1) == vgetq_lane_u64(v2, 1))) >>> +#define VEC_OR(v1, v2) ((v1) | (v2)) >> >> >> Should be '#elif defined(__aarch64__)'. I have made this >> tweak and put this patch in target-arm.next. > > > Consider > > #define VECTYPE uint32x4_t > #define ALL_EQ(v1, v2) (vmaxvq_u32((v1) ^ (v2)) == 0)
Sounds good. Vijay, could you benchmark that variant, please? thanks -- PMM