Re: [PATCH] x86_64/lib: improve the performance of memmove

2010-09-16 Thread Miao Xie
On Fri, 17 Sep 2010 08:55:18 +0800, ykzhao wrote: > On Thu, 2010-09-16 at 15:16 +0800, Miao Xie wrote: >> On Thu, 16 Sep 2010 08:48:25 +0200 (cest), Andi Kleen wrote: When the dest and the src do overlap and the memory area is large, memmove of x86_64 is very inefficient, and it led

Re: [PATCH] x86_64/lib: improve the performance of memmove

2010-09-16 Thread ykzhao
On Thu, 2010-09-16 at 15:16 +0800, Miao Xie wrote: > On Thu, 16 Sep 2010 08:48:25 +0200 (cest), Andi Kleen wrote: > >> When the dest and the src do overlap and the memory area is large, memmove > >> of > >> x86_64 is very inefficient, and it led to bad performance, such as btrfs's > >> file > >> de

Re: [PATCH] x86_64/lib: improve the performance of memmove

2010-09-16 Thread George Spelvin
> void *memmove(void *dest, const void *src, size_t count) > { > if (dest < src) { > return memcpy(dest, src, count); > } else { > - char *p = dest + count; > - const char *s = src + count; > - while (count--) > - *

Re: [PATCH] x86_64/lib: improve the performance of memmove

2010-09-16 Thread Miao Xie
On Thu, 16 Sep 2010 18:47:59 +0800 , Miao Xie wrote: On Thu, 16 Sep 2010 12:11:41 +0200, Andi Kleen wrote: On Thu, 16 Sep 2010 17:29:32 +0800 Miao Xie wrote: Ok was a very broken patch. Sorry should have really done some more work on it. Anyways hopefully the corrected version is good for test

Re: [PATCH] x86_64/lib: improve the performance of memmove

2010-09-16 Thread Miao Xie
On Thu, 16 Sep 2010 12:11:41 +0200, Andi Kleen wrote: On Thu, 16 Sep 2010 17:29:32 +0800 Miao Xie wrote: Ok was a very broken patch. Sorry should have really done some more work on it. Anyways hopefully the corrected version is good for testing. -Andi title: x86_64/lib: improve the perform

Re: [PATCH] x86_64/lib: improve the performance of memmove

2010-09-16 Thread Andi Kleen
On Thu, 16 Sep 2010 17:29:32 +0800 Miao Xie wrote: Ok was a very broken patch. Sorry should have really done some more work on it. Anyways hopefully the corrected version is good for testing. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send th

Re: [PATCH] x86_64/lib: improve the performance of memmove

2010-09-16 Thread Miao Xie
On Thu, 16 Sep 2010 10:40:08 +0200, Andi Kleen wrote: On Thu, 16 Sep 2010 15:16:31 +0800 Miao Xie wrote: On Thu, 16 Sep 2010 08:48:25 +0200 (cest), Andi Kleen wrote: When the dest and the src do overlap and the memory area is large, memmove of x86_64 is very inefficient, and it led to bad per

Re: [PATCH] x86_64/lib: improve the performance of memmove

2010-09-16 Thread Andi Kleen
On Thu, 16 Sep 2010 15:16:31 +0800 Miao Xie wrote: > On Thu, 16 Sep 2010 08:48:25 +0200 (cest), Andi Kleen wrote: > >> When the dest and the src do overlap and the memory area is large, > >> memmove of > >> x86_64 is very inefficient, and it led to bad performance, such as > >> btrfs's file > >>

Re: [PATCH] x86_64/lib: improve the performance of memmove

2010-09-16 Thread Miao Xie
On Thu, 16 Sep 2010 08:48:25 +0200 (cest), Andi Kleen wrote: When the dest and the src do overlap and the memory area is large, memmove of x86_64 is very inefficient, and it led to bad performance, such as btrfs's file deletion performance. This patch improved the performance of memmove on x86_64

Re: [PATCH] x86_64/lib: improve the performance of memmove

2010-09-15 Thread Andi Kleen
> When the dest and the src do overlap and the memory area is large, memmove > of > x86_64 is very inefficient, and it led to bad performance, such as btrfs's > file > deletion performance. This patch improved the performance of memmove on > x86_64 > by using __memcpy_bwd() instead of byte copy whe

[PATCH] x86_64/lib: improve the performance of memmove

2010-09-15 Thread Miao Xie
When the dest and the src do overlap and the memory area is large, memmove of x86_64 is very inefficient, and it led to bad performance, such as btrfs's file deletion performance. This patch improved the performance of memmove on x86_64 by using __memcpy_bwd() instead of byte copy when doing large