Re: [PATCH v2] string: Use memcpy() within memmove() when we can

2021-01-18 Thread Tom Rini
On Fri, Dec 11, 2020 at 02:59:23PM +0100, Patrick Delaunay wrote:

> A common use of memmove() can be handled by memcpy(). Also memcpy()
> includes an optimization for large sizes: it copies a word at a time. So
> we can get a speed-up by calling memcpy() to handle our move in this case.
> 
> Update memmove() to call also memcpy() if the source don't overlap
> the destination (src + count <= dest).
> 
> Signed-off-by: Patrick Delaunay 

Applied to u-boot/master, thanks!

-- 
Tom


signature.asc
Description: PGP signature


[PATCH v2] string: Use memcpy() within memmove() when we can

2020-12-11 Thread Patrick Delaunay
A common use of memmove() can be handled by memcpy(). Also memcpy()
includes an optimization for large sizes: it copies a word at a time. So
we can get a speed-up by calling memcpy() to handle our move in this case.

Update memmove() to call also memcpy() if the source don't overlap
the destination (src + count <= dest).

Signed-off-by: Patrick Delaunay 
---
Hi,

V2 of http://patchwork.ozlabs.org/project/uboot/list/?series=216620

This patch allows to save 38ms for Kernel Image extraction (7327624 Bytes)
from FIT loaded at 0xC200 for ARMV7 board STM32MP157C-EV1,
and with kernel destination = Load Address: 0xc400,
located after the FIT without overlap, compared with
destination = Load Address: 0xc0008000.

-> 14,332 us vs 52,239 in bootstage report

In this case the memmove funtion is called in common/image.c::memmove_wd()
to handle overlap.

Patrick

Changes in v2:
- Add a comment on potential issue if the memcpy is not doing a
  forward-copying

 lib/string.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/lib/string.c b/lib/string.c
index ae7835f600..73b984123d 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -567,7 +567,19 @@ void * memmove(void * dest,const void *src,size_t count)
 {
char *tmp, *s;
 
-   if (dest <= src) {
+   if (dest <= src || (src + count) <= dest) {
+   /*
+* Use the fast memcpy implementation (ARCH optimized or lib/string.c) 
when it is possible:
+* - when dest is before src (assuming that memcpy is doing 
forward-copying)
+* - when destination don't overlap the source buffer (src + count <= 
dest)
+*
+* WARNING: the first optimisation cause an issue, when 
__HAVE_ARCH_MEMCPY is defined,
+*  __HAVE_ARCH_MEMMOVE is not defined and if the memcpy 
ARCH-specific
+*  implementation is not doing a forward-copying.
+*
+* No issue today because memcpy is doing a forward-copying in 
lib/string.c and for ARM32
+* architecture; no other arches use __HAVE_ARCH_MEMCPY without 
__HAVE_ARCH_MEMMOVE.
+*/
memcpy(dest, src, count);
} else {
tmp = (char *) dest + count;
-- 
2.17.1