On Thu, 8 Nov 2012, Mark Kettenis wrote:
On Tuesday 21 August 2012, Stefan Fritsch wrote:
On x86, the xchg operation between reg and mem has an implicit lock
prefix, i.e. it is a relatively expensive atomic operation. This is
not needed here.
OKs, anyone?
What you say makes sense, although it might matter only on MP
(capable) systems.
True, but MP is the norm nowadays.
If you really want to make things faster, I
suppose you could change the code into something like
pushl %esi
pushl %edi
movl 12(%esp),%edi
movl 16(%esp),%esi
That's true. Like this (suggestions for a better label name are
welcome):
--- locore.s
+++ locore.s
@@ -789,7 +789,7 @@ ENTRY(bcopy)
pushl %edi
movl 12(%esp),%esi
movl 16(%esp),%edi
- movl 20(%esp),%ecx
+bcopy2: movl 20(%esp),%ecx
movl %edi,%eax
subl %esi,%eax
cmpl %ecx,%eax # overlapping?
@@ -827,13 +827,15 @@ ENTRY(bcopy)
ret
/*
- * Emulate memcpy() by swapping the first two arguments and calling bcopy()
+ * Emulate memcpy() by loading the first two arguments in reverse order
+ * and jumping into bcopy()
*/
ENTRY(memcpy)
- movl 4(%esp),%ecx
- xchg 8(%esp),%ecx
- movl %ecx,4(%esp)
- jmp _C_LABEL(bcopy)
+ pushl %esi
+ pushl %edi
+ movl 12(%esp),%edi
+ movl 16(%esp),%esi
+ jmp bcopy2
/*****************************************************************************/