On Sat, 27 Jun 2020 01:27:14 +0200
Christian Weisgerber <[email protected]> wrote:
> I'm also intrigued by this aside in the PowerPC ISA documentation:
> | Moreover, Load with Update instructions may take longer to execute
> | in some implementations than the corresponding pair of a non-update
> | Load instruction and an Add instruction.
> What does clang generate?
clang likes load/store with update instructions. For example, the
powerpc64 kernel has /sys/lib/libkern/memcpy.c, which copies bytes:
while (n-- > 0)
*t++ = *f++;
clang uses lbzu and stbu:
memcpy: cmpldi r5,0x0
memcpy+0x4: beqlr
memcpy+0x8: addi r4,r4,-1
memcpy+0xc: addi r6,r3,-1
memcpy+0x10: mtspr ctr,r5
memcpy+0x14: lbzu r5,1(r4)
memcpy+0x18: stbu r5,1(r6)
memcpy+0x1c: bdnz 0x26cd0d4 (memcpy+0x14)
memcpy+0x20: blr
> I think we should consider dropping this "optimized" memmove.S on
> both powerpc and powerpc64.
I might want to benchmark memmove.S against memmove.c to check if
those unaligned accesses are too slow. First I would have to write
a benchmark.