Hello Matthieu!

I have looked at this in place of Anders and as far as I can tell this is not 
an arm64 issue but an arm issue. And even on arm __ARM_FEATURE_UNALIGNED is 1 
so it seems the problem only occurs if size equals 8.

In addition I did some performance testing of lttng_inline_memcpy by extracting 
it and adding it to a simple test program. It appears that the general 
performance increases on arm, arm64, arm on arm64 hardware and x86-64. But it 
also appears that on arm if you end up in memcpy the old code where you call 
memcpy directly is actually slightly faster.

Skipping the memcpy fallback on arm for unaligned copies of sizes 2 and 4 
further improves the performance and setting 
LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 yields the best performance on 
arm64.

Micke
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Reply via email to