Re: [PATCH] x86_64/lib: improve the performance of memmove

2010-09-16 Thread Miao Xie
On Fri, 17 Sep 2010 08:55:18 +0800, ykzhao wrote:
> On Thu, 2010-09-16 at 15:16 +0800, Miao Xie wrote:
>> On Thu, 16 Sep 2010 08:48:25 +0200 (cest), Andi Kleen wrote:
 When the dest and the src do overlap and the memory area is large, memmove
 of
 x86_64 is very inefficient, and it led to bad performance, such as btrfs's
 file
 deletion performance. This patch improved the performance of memmove on
 x86_64
 by using __memcpy_bwd() instead of byte copy when doing large memory area
 copy
 (len>   64).
>>>
>>>
>>> I still don't understand why you don't simply use a backwards
>>> string copy (with std) ? That should be much simpler and
>>> hopefully be as optimized for kernel copies on recent CPUs.
>>
>> But according to the comment of memcpy, some CPUs don't support "REP" 
>> instruction,
> 
> Where do you find that the "REP" instruction is not supported on some
> CPUs? The comment in arch/x86/lib/memcpy_64.s only states that some CPUs
> will run faster when using string copy instruction.

Sorry! I misread the comment.

>> so I think we must implement a backwards string copy by other method for 
>> those CPUs,
>> But that implement is complex, so I write it as a function -- __memcpy_bwd().
> 
> Will you please look at tip/x86/ tree(mem branch)? The memory copy on
> x86_64 is already optimized.

Thanks for your reminding! It is very helpful.
Miao
 
> thanks.
>  Yakui
>>
>> Thanks!
>> Miao
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86_64/lib: improve the performance of memmove

2010-09-16 Thread ykzhao
On Thu, 2010-09-16 at 15:16 +0800, Miao Xie wrote:
> On Thu, 16 Sep 2010 08:48:25 +0200 (cest), Andi Kleen wrote:
> >> When the dest and the src do overlap and the memory area is large, memmove
> >> of
> >> x86_64 is very inefficient, and it led to bad performance, such as btrfs's
> >> file
> >> deletion performance. This patch improved the performance of memmove on
> >> x86_64
> >> by using __memcpy_bwd() instead of byte copy when doing large memory area
> >> copy
> >> (len>  64).
> >
> >
> > I still don't understand why you don't simply use a backwards
> > string copy (with std) ? That should be much simpler and
> > hopefully be as optimized for kernel copies on recent CPUs.
> 
> But according to the comment of memcpy, some CPUs don't support "REP" 
> instruction,

Where do you find that the "REP" instruction is not supported on some
CPUs? The comment in arch/x86/lib/memcpy_64.s only states that some CPUs
will run faster when using string copy instruction. 
  
> so I think we must implement a backwards string copy by other method for 
> those CPUs,
> But that implement is complex, so I write it as a function -- __memcpy_bwd().

Will you please look at tip/x86/ tree(mem branch)? The memory copy on
x86_64 is already optimized. 

thanks.
Yakui
> 
> Thanks!
> Miao
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: add support for mixed data+metadata block groups

2010-09-16 Thread Josef Bacik
There are just a few things that need to be fixed in the kernel to support mixed
data+metadata block groups.  Mostly we just need to make sure that if we are
using mixed block groups that we continue to allocate mixed block groups as we
need them.  Also we need to make sure __find_space_info will find our space info
if we search for DATA or METADATA only.  Tested this with xfstests and it works
nicely.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/extent-tree.c |   24 +---
 1 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 91a0a41..c0a9ad3 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -547,7 +547,7 @@ static struct btrfs_space_info *__find_space_info(struct 
btrfs_fs_info *info,
 
rcu_read_lock();
list_for_each_entry_rcu(found, head, list) {
-   if (found->flags == flags) {
+   if (found->flags & flags) {
rcu_read_unlock();
return found;
}
@@ -3267,6 +3267,13 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
*trans,
spin_unlock(&space_info->lock);
 
/*
+* If we have mixed data/metadata chunks we want to make sure we keep
+* allocating mixed chunks instead of individual chunks.
+*/
+   if (space_info->flags != flags)
+   flags = space_info->flags;
+
+   /*
 * if we're doing a data chunk, go ahead and make sure that
 * we keep a reasonable number of metadata chunks allocated in the
 * FS as well.
@@ -4793,6 +4800,7 @@ static noinline int find_free_extent(struct 
btrfs_trans_handle *trans,
bool found_uncached_bg = false;
bool failed_cluster_refill = false;
bool failed_alloc = false;
+   bool use_cluster = true;
u64 ideal_cache_percent = 0;
u64 ideal_cache_offset = 0;
 
@@ -4807,16 +4815,26 @@ static noinline int find_free_extent(struct 
btrfs_trans_handle *trans,
return -ENOSPC;
}
 
+   /*
+* If the space info is for both data and metadata it means we have a
+* small filesystem and we can't use the clustering stuff.
+*/
+   if ((space_info->flags & (BTRFS_BLOCK_GROUP_METADATA |
+ BTRFS_BLOCK_GROUP_DATA)) ==
+  ((BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA)))
+   use_cluster = false;
+
if (orig_root->ref_cows || empty_size)
allowed_chunk_alloc = 1;
 
-   if (data & BTRFS_BLOCK_GROUP_METADATA) {
+   if (data & BTRFS_BLOCK_GROUP_METADATA && use_cluster) {
last_ptr = &root->fs_info->meta_alloc_cluster;
if (!btrfs_test_opt(root, SSD))
empty_cluster = 64 * 1024;
}
 
-   if ((data & BTRFS_BLOCK_GROUP_DATA) && btrfs_test_opt(root, SSD)) {
+   if ((data & BTRFS_BLOCK_GROUP_DATA) && use_cluster &&
+   btrfs_test_opt(root, SSD)) {
last_ptr = &root->fs_info->data_alloc_cluster;
}
 
-- 
1.6.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: check cache->caching_ctl before returning if caching has started

2010-09-16 Thread Josef Bacik
With the free space disk caching we can mark the block group as started with the
caching, but we don't have a caching ctl.  This can race with anybody else who
tries to get the caching ctl before we cache (this is very hard to do btw).  So
instead check to see if cache->caching_ctl is set, and if not return NULL.
Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/extent-tree.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8ef6aa6..91a0a41 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -242,6 +242,12 @@ get_caching_control(struct btrfs_block_group_cache *cache)
return NULL;
}
 
+   /* We're loading it the fast way, so we don't have a caching_ctl. */
+   if (!cache->caching_ctl) {
+   spin_unlock(&cache->lock);
+   return NULL;
+   }
+
ctl = cache->caching_ctl;
atomic_inc(&ctl->count);
spin_unlock(&cache->lock);
-- 
1.6.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs-progs: add support for mixed data+metadata block groups

2010-09-16 Thread Josef Bacik
So alot of crazy people (I'm looking at you Meego) want to use btrfs on phones
and such with small devices.  Unfortunately the way we split out metadata/data
chunks it makes space usage inefficient for volumes that are smaller than
1gigabyte.  So add a -M option for mixing metadata+data, and default to this
mixed mode if the filesystem is less than or equal to 1 gigabyte.  I've tested
this with xfstests on a 100mb filesystem and everything is a-ok.

Signed-off-by: Josef Bacik 
---
 btrfs-vol.c  |4 +-
 btrfs_cmds.c |   13 +-
 mkfs.c   |  121 --
 utils.c  |   10 ++--
 utils.h  |2 +-
 5 files changed, 103 insertions(+), 47 deletions(-)

diff --git a/btrfs-vol.c b/btrfs-vol.c
index 8069778..7200bbc 100644
--- a/btrfs-vol.c
+++ b/btrfs-vol.c
@@ -129,7 +129,9 @@ int main(int ac, char **av)
exit(1);
}
if (cmd == BTRFS_IOC_ADD_DEV) {
-   ret = btrfs_prepare_device(devfd, device, 1, &dev_block_count);
+   int mixed = 0;
+
+   ret = btrfs_prepare_device(devfd, device, 1, &dev_block_count, 
&mixed);
if (ret) {
fprintf(stderr, "Unable to init %s\n", device);
exit(1);
diff --git a/btrfs_cmds.c b/btrfs_cmds.c
index 8031c58..683aec0 100644
--- a/btrfs_cmds.c
+++ b/btrfs_cmds.c
@@ -705,6 +705,7 @@ int do_add_volume(int nargs, char **args)
int devfd, res;
u64 dev_block_count = 0;
struct stat st;
+   int mixed = 0;
 
devfd = open(args[i], O_RDWR);
if (!devfd) {
@@ -727,7 +728,7 @@ int do_add_volume(int nargs, char **args)
continue;
}
 
-   res = btrfs_prepare_device(devfd, args[i], 1, &dev_block_count);
+   res = btrfs_prepare_device(devfd, args[i], 1, &dev_block_count, 
&mixed);
if (res) {
fprintf(stderr, "ERROR: Unable to init '%s'\n", 
args[i]);
close(devfd);
@@ -889,8 +890,14 @@ int do_df_filesystem(int nargs, char **argv)
memset(description, 0, 80);
 
if (flags & BTRFS_BLOCK_GROUP_DATA) {
-   snprintf(description, 5, "%s", "Data");
-   written += 4;
+   if (flags & BTRFS_BLOCK_GROUP_METADATA) {
+   snprintf(description, 15, "%s",
+"Data+Metadata");
+   written += 14;
+   } else {
+   snprintf(description, 5, "%s", "Data");
+   written += 4;
+   }
} else if (flags & BTRFS_BLOCK_GROUP_SYSTEM) {
snprintf(description, 7, "%s", "System");
written += 6;
diff --git a/mkfs.c b/mkfs.c
index 2e99b95..4610f58 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -69,7 +69,7 @@ static u64 parse_size(char *s)
return atol(s) * mult;
 }
 
-static int make_root_dir(struct btrfs_root *root)
+static int make_root_dir(struct btrfs_root *root, int mixed)
 {
struct btrfs_trans_handle *trans;
struct btrfs_key location;
@@ -88,30 +88,47 @@ static int make_root_dir(struct btrfs_root *root)
 0, BTRFS_MKFS_SYSTEM_GROUP_SIZE);
BUG_ON(ret);
 
-   ret = btrfs_alloc_chunk(trans, root->fs_info->extent_root,
-   &chunk_start, &chunk_size,
-   BTRFS_BLOCK_GROUP_METADATA);
-   BUG_ON(ret);
-   ret = btrfs_make_block_group(trans, root, 0,
-BTRFS_BLOCK_GROUP_METADATA,
-BTRFS_FIRST_CHUNK_TREE_OBJECTID,
-chunk_start, chunk_size);
-   BUG_ON(ret);
+   if (mixed) {
+   ret = btrfs_alloc_chunk(trans, root->fs_info->extent_root,
+   &chunk_start, &chunk_size,
+   BTRFS_BLOCK_GROUP_METADATA |
+   BTRFS_BLOCK_GROUP_DATA);
+   BUG_ON(ret);
+   ret = btrfs_make_block_group(trans, root, 0,
+BTRFS_BLOCK_GROUP_METADATA |
+BTRFS_BLOCK_GROUP_DATA,
+BTRFS_FIRST_CHUNK_TREE_OBJECTID,
+chunk_start, chunk_size);
+   BUG_ON(ret);
+   printf("Created a data/metadata chunk of size %llu\n", 
chunk_size);
+   } else {
+   ret = btrfs_alloc_chunk(trans, root->fs_info->extent_root,
+   &chunk_start, &chunk_size,
+   BTRFS_BLOCK_GROUP_METADATA);
+ 

[PATCH] Btrfs: stop trying to shrink delalloc if there are no inodes to reclaim

2010-09-16 Thread Josef Bacik
In very severe ENOSPC cases we can run out of inodes to do delalloc on, which
means we'll just keep looping trying to shrink delalloc.  Instead, if we fail to
shrink delalloc 3 times in a row break out since we're not likely to make any
progress.  Tested this with a 100mb fs an xfstests test 13.  Before the patch it
would hang the box, with the patch we get -ENOSPC like we should.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/extent-tree.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8db9330..8ef6aa6 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3327,6 +3327,7 @@ static int shrink_delalloc(struct btrfs_trans_handle 
*trans,
u64 reserved;
u64 max_reclaim;
u64 reclaimed = 0;
+   int no_reclaim = 0;
int pause = 1;
int ret;
 
@@ -3343,12 +3344,16 @@ static int shrink_delalloc(struct btrfs_trans_handle 
*trans,
while (1) {
ret = btrfs_start_one_delalloc_inode(root, trans ? 1 : 0);
if (!ret) {
+   if (no_reclaim > 2)
+   break;
+   no_reclaim++;
__set_current_state(TASK_INTERRUPTIBLE);
schedule_timeout(pause);
pause <<= 1;
if (pause > HZ / 10)
pause = HZ / 10;
} else {
+   no_reclaim = 0;
pause = 1;
}
 
-- 
1.6.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add the "btrfs filesystem label" command

2010-09-16 Thread Goffredo Baroncelli
On Thursday, 16 September, 2010, Mike Fedyk wrote:
> On Mon, Sep 13, 2010 at 12:24 PM, Goffredo Baroncelli
>  wrote:
[...]
> > +
> > +   if(ret != 0)
> > +   {
> > +  fprintf(stderr, "FATAL: the filesystem has to be
> > unmounted\n");
> > +  return -2;
> > +   }
> > +   get_label_unmounted(btrfs_dev);
> > +   return 0;
> > +}
> > +
> > +
> 
> Why can't the label be read while the fs is mounted?  It shouldn't
> hurt anything.  I can read the superblock on my ext3 fs while it's
> mounted...  This is what people have come to expect.

The main reason is that if a filesystem is mounted, the data read from 
userspace may be outdated by the internal data.

In the original patch (the one of Morey Roof), there was a kernel space code 
that handled this case. In order to simplify the patch I split the code in two 
step the user space and the kernel space.

This is the user space portion, and because it has to work without the kernel 
space portion, when it is required the code fail explicitly.

I hope to publish the 2nd patch during the next weeks.

Regards
G.Baroncelli

-- 
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) 
Key fingerprint = 4769 7E51 5293 D36C 814E  C054 BF04 F161 3DC5 0512
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86_64/lib: improve the performance of memmove

2010-09-16 Thread George Spelvin
>  void *memmove(void *dest, const void *src, size_t count)
>  {
>   if (dest < src) {
>   return memcpy(dest, src, count);
>   } else {
> - char *p = dest + count;
> - const char *s = src + count;
> - while (count--)
> - *--p = *--s;
> + return memcpy_backwards(dest, src, count);
>   }
>   return dest;
>  }

Er... presumably, the forward-copy case is somewhat better optimized,
so should be preferred if the areas don't overlap; that is, dest >
src by more than the sount.  Assuming that size_t can hold a pointer:

if ((size_t)src - (size_t)dest >= count)
return memcpy(dest, src, count);
else
return memcpy_backwards(dest, src, count);

Or, using GCC's arithmetic on void * extension,
if ((size_t)(src - dest) >= count)
... etc.

If src == dest, it doesn't matter which you use.  You could skip the
copy entirely, but presumably that case doesn't arise often enough to
be worth testing for:

if ((size_t)(src - dest) >= count)
return memcpy(dest, src, count);
else if (src - dest != 0)
return memcpy_backwards(dest, src, count);
else
return dest;
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] Removing a subvolume by an ordinary user

2010-09-16 Thread kreij...@libero.it
Hi all,

currently BTRFS doesn't allow an ordinary user to remove a subvolume (or 
snapshot). I think that the reasons is simple: a subvolume may contain 
files/directories owned by other user.
Allowing an ordinary user to remove a subvolume means allowing an ordinary 
user to remove filess/directories owned by other user. And this is not good.

Moreover BTRFS removes  a subvolume asynchronously, so it is not possible to 
return an error like “hey you are trying to remove a not your file  ! Don’t do 
it !”.

My idea is to add another ioctl that permits to remove a subvolume only when 
it is empty and its host directory is writable by the user… like a directory. 
An option is to allow to remove an empty subvolume  with the unlink(2) syscall: 
no more tool is needed ! 
This will solve a lot of problem:
-   Consistently with the current unlink(2) behavior
-   The kernel has not to do complicate check
-   There no is necessity to add another interface to wait the releasing of 
the 
space (see other thread reserving an IOCTL number; other details ).
The disadvantage is that it should be slower than the currently 
implementation.
Of course I don’t want to remove the existing interface. I want only to add 
another one.

Comments ? Thoughts ?
Regards
G.Baroncelli 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86_64/lib: improve the performance of memmove

2010-09-16 Thread Miao Xie

On Thu, 16 Sep 2010 18:47:59 +0800
, Miao Xie wrote:

On Thu, 16 Sep 2010 12:11:41 +0200, Andi Kleen wrote:

On Thu, 16 Sep 2010 17:29:32 +0800
Miao Xie wrote:


Ok was a very broken patch. Sorry should have really done some more
work on it. Anyways hopefully the corrected version is good for
testing.

-Andi


The test result is following:
Len Src Unalign Dest UnalignPatch applied   Without Patch   
--- --- -   -
8   0   0   0s 421117us 0s 70203us
8   0   3   0s 252622us 0s 42114us
8   0   7   0s 252663us 0s 42111us
8   3   0   0s 252666us 0s 42114us
8   3   3   0s 252667us 0s 42113us
8   3   7   0s 252667us 0s 42112us
32  0   0   0s 252672us 0s 114301us
32  0   3   0s 252676us 0s 114306us
32  0   7   0s 252663us 0s 114300us
32  3   0   0s 252661us 0s 114305us
32  3   3   0s 252663us 0s 114300us
32  3   7   0s 252668us 0s 114304us
64  0   0   0s 252672us 0s 236119us
64  0   3   0s 264671us 0s 236120us
64  0   7   0s 264702us 0s 236127us
64  3   0   0s 270701us 0s 236128us
64  3   3   0s 287236us 0s 236809us
64  3   7   0s 287257us 0s 236123us

According to the above result, old version is better than the new one when the
memory area is small.

Len Src Unalign Dest UnalignPatch applied   Without Patch   
--- --- -   -
256 0   0   0s 281886us 0s 813660us
256 0   3   0s 332169us 0s 813645us
256 0   7   0s 342961us 0s 813639us
256 3   0   0s 305305us 0s 813634us
256 3   3   0s 386939us 0s 813638us
256 3   7   0s 370511us 0s 814335us
512 0   0   0s 310716us 1s 584677us
512 0   3   0s 456420us 1s 583353us
512 0   7   0s 468236us 1s 583248us
512 3   0   0s 493987us 1s 583659us
512 3   3   0s 588041us 1s 584294us
512 3   7   0s 605489us 1s 583650us
10240   0   0s 406971us 3s 123644us
10240   3   0s 748419us 3s 126514us
10240   7   0s 756153us 3s 127178us
10243   0   0s 854681us 3s 130013us
10243   3   1s 46828us  3s 140190us
10243   7   1s 35886us  3s 135508us

the new version is better when the memory area is large.

Thanks!
Miao





title: x86_64/lib: improve the performance of memmove

Implement the 64bit memmmove backwards case using string instructions

Signed-off-by: Andi Kleen 
Signed-off-by: Miao Xie 
---
arch/x86/lib/memcpy_64.S | 29 +
arch/x86/lib/memmove_64.c | 8 
2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index bcbcd1e..9de5e9a 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -141,3 +141,32 @@ ENDPROC(__memcpy)
.byte .Lmemcpy_e - .Lmemcpy_c
.byte .Lmemcpy_e - .Lmemcpy_c
.previous
+
+/*
+ * Copy memory backwards (for memmove)
+ * rdi target
+ * rsi source
+ * rdx count
+ */
+
+ENTRY(memcpy_backwards)
+ CFI_STARTPROC
+ std
+ movq %rdi, %rax
+ movl %edx, %ecx
+ addq %rdx, %rdi
+ addq %rdx, %rsi
+ leaq -8(%rdi), %rdi
+ leaq -8(%rsi), %rsi
+ shrl $3, %ecx
+ andl $7, %edx
+ rep movsq
+ addq $7, %rdi
+ addq $7, %rsi
+ movl %edx, %ecx
+ rep movsb
+ cld
+ ret
+ CFI_ENDPROC
+ENDPROC(memcpy_backwards)
+
diff --git a/arch/x86/lib/memmove_64.c b/arch/x86/lib/memmove_64.c
index 0a33909..6774fd8 100644
--- a/arch/x86/lib/memmove_64.c
+++ b/arch/x86/lib/memmove_64.c
@@ -5,16 +5,16 @@
#include 
#include 

+extern void * asmlinkage memcpy_backwards(void *dst, const void *src,
+ size_t count);
+
#undef memmove
void *memmove(void *dest, const void *src, size_t count)
{
if (dest < src) {
return memcpy(dest, src, count);
} else {
- char *p = dest + count;
- const char *s = src + count;
- while (count--)
- *--p = *--s;
+ return memcpy_backwards(dest, src, count);
}
return dest;
}


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at 

Re: [PATCH] x86_64/lib: improve the performance of memmove

2010-09-16 Thread Miao Xie

On Thu, 16 Sep 2010 12:11:41 +0200, Andi Kleen wrote:

On Thu, 16 Sep 2010 17:29:32 +0800
Miao Xie  wrote:


Ok was a very broken patch. Sorry should have really done some more
work on it. Anyways hopefully the corrected version is good for
testing.

-Andi



title: x86_64/lib: improve the performance of memmove

Implement the 64bit memmmove backwards case using string instructions

Signed-off-by: Andi Kleen 
Signed-off-by: Miao Xie 
---
 arch/x86/lib/memcpy_64.S  |   29 +
 arch/x86/lib/memmove_64.c |8 
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index bcbcd1e..9de5e9a 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -141,3 +141,32 @@ ENDPROC(__memcpy)
.byte .Lmemcpy_e - .Lmemcpy_c
.byte .Lmemcpy_e - .Lmemcpy_c
.previous
+
+/*
+ * Copy memory backwards (for memmove)
+ * rdi target
+ * rsi source
+ * rdx count
+ */
+
+ENTRY(memcpy_backwards)
+   CFI_STARTPROC
+   std
+   movq %rdi, %rax
+   movl %edx, %ecx
+   addq %rdx, %rdi
+   addq %rdx, %rsi
+   leaq -8(%rdi), %rdi
+   leaq -8(%rsi), %rsi
+   shrl $3, %ecx
+   andl $7, %edx
+   rep movsq
+   addq $7, %rdi
+   addq $7, %rsi
+   movl %edx, %ecx
+   rep movsb
+   cld
+   ret
+   CFI_ENDPROC
+ENDPROC(memcpy_backwards)
+   
diff --git a/arch/x86/lib/memmove_64.c b/arch/x86/lib/memmove_64.c
index 0a33909..6774fd8 100644
--- a/arch/x86/lib/memmove_64.c
+++ b/arch/x86/lib/memmove_64.c
@@ -5,16 +5,16 @@
 #include 
 #include 
 
+extern void * asmlinkage memcpy_backwards(void *dst, const void *src,

+   size_t count);
+
 #undef memmove
 void *memmove(void *dest, const void *src, size_t count)
 {
if (dest < src) {
return memcpy(dest, src, count);
} else {
-   char *p = dest + count;
-   const char *s = src + count;
-   while (count--)
-   *--p = *--s;
+   return memcpy_backwards(dest, src, count);
}
return dest;
 }
--
1.7.0.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86_64/lib: improve the performance of memmove

2010-09-16 Thread Andi Kleen
On Thu, 16 Sep 2010 17:29:32 +0800
Miao Xie  wrote:


Ok was a very broken patch. Sorry should have really done some more
work on it. Anyways hopefully the corrected version is good for
testing.

-Andi


-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86_64/lib: improve the performance of memmove

2010-09-16 Thread Miao Xie

On Thu, 16 Sep 2010 10:40:08 +0200, Andi Kleen wrote:

On Thu, 16 Sep 2010 15:16:31 +0800
Miao Xie  wrote:


On Thu, 16 Sep 2010 08:48:25 +0200 (cest), Andi Kleen wrote:

When the dest and the src do overlap and the memory area is large,
memmove of
x86_64 is very inefficient, and it led to bad performance, such as
btrfs's file
deletion performance. This patch improved the performance of
memmove on x86_64
by using __memcpy_bwd() instead of byte copy when doing large
memory area copy
(len>   64).



I still don't understand why you don't simply use a backwards
string copy (with std) ? That should be much simpler and
hopefully be as optimized for kernel copies on recent CPUs.


But according to the comment of memcpy, some CPUs don't support "REP"
instruction,


I think you misread the comment. REP prefixes are in all x86 CPUs.
On some very old systems it wasn't optimized very well,
but it probably doesn't make too much sense to optimize for those.
On newer CPUs in fact REP should be usually faster than
an unrolled loop.

I'm not sure how optimized the backwards copy is, but most likely
it is optimized too.

Here's an untested patch that implements backwards copy with string
instructions. Could you run it through your test harness?


Ok, I'll do it.


+
+/*
+ * Copy memory backwards (for memmove)
+ * rdi target
+ * rsi source
+ * rdx count
+ */
+
+ENTRY(memcpy_backwards):


s/://


+   CFI_STARTPROC
+   std
+   movq %rdi, %rax
+   movl %edx, %ecx
+   add  %rdx, %rdi
+   add  %rdx, %rsi


-   add  %rdx, %rdi
-   add  %rdx, %rsi
+   addq  %rdx, %rdi
+   addq  %rdx, %rsi

Besides that, the address that %rdi/%rsi pointed to is over the end of
mempry area that going to be copied, so we must tune the address to be
correct.
+   leaq -8(%rdi), %rdi
+   leaq -8(%rsi), %rsi


+   shrl $3, %ecx
+   andl $7, %edx
+   rep movsq


The same as above.
+   leaq 8(%rdi), %rdi
+   leaq 8(%rsi), %rsi
+   decq %rsi
+   decq %rdi


+   movl %edx, %ecx
+   rep movsb
+   cld
+   ret
+   CFI_ENDPROC
+ENDPROC(memcpy_backwards)
+   
diff --git a/arch/x86/lib/memmove_64.c b/arch/x86/lib/memmove_64.c
index 0a33909..6c00304 100644
--- a/arch/x86/lib/memmove_64.c
+++ b/arch/x86/lib/memmove_64.c
@@ -5,16 +5,16 @@
  #include
  #include

+extern void asmlinkage memcpy_backwards(void *dst, const void *src,
+   size_t count);


The type of the return value must be "void *".

Thanks
Miao


+
  #undef memmove
  void *memmove(void *dest, const void *src, size_t count)
  {
if (dest<  src) {
return memcpy(dest, src, count);
} else {
-   char *p = dest + count;
-   const char *s = src + count;
-   while (count--)
-   *--p = *--s;
+   return memcpy_backwards(dest, src, count);
}
return dest;
  }


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86_64/lib: improve the performance of memmove

2010-09-16 Thread Andi Kleen
On Thu, 16 Sep 2010 15:16:31 +0800
Miao Xie  wrote:

> On Thu, 16 Sep 2010 08:48:25 +0200 (cest), Andi Kleen wrote:
> >> When the dest and the src do overlap and the memory area is large,
> >> memmove of
> >> x86_64 is very inefficient, and it led to bad performance, such as
> >> btrfs's file
> >> deletion performance. This patch improved the performance of
> >> memmove on x86_64
> >> by using __memcpy_bwd() instead of byte copy when doing large
> >> memory area copy
> >> (len>  64).
> >
> >
> > I still don't understand why you don't simply use a backwards
> > string copy (with std) ? That should be much simpler and
> > hopefully be as optimized for kernel copies on recent CPUs.
> 
> But according to the comment of memcpy, some CPUs don't support "REP"
> instruction, 

I think you misread the comment. REP prefixes are in all x86 CPUs.
On some very old systems it wasn't optimized very well,
but it probably doesn't make too much sense to optimize for those.
On newer CPUs in fact REP should be usually faster than 
an unrolled loop.

I'm not sure how optimized the backwards copy is, but most likely
it is optimized too.

Here's an untested patch that implements backwards copy with string
instructions. Could you run it through your test harness?

-Andi


Implement the 64bit memmmove backwards case using string instructions

Signed-off-by: Andi Kleen 

diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index bcbcd1e..6e8258d 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -141,3 +141,28 @@ ENDPROC(__memcpy)
.byte .Lmemcpy_e - .Lmemcpy_c
.byte .Lmemcpy_e - .Lmemcpy_c
.previous
+
+/* 
+ * Copy memory backwards (for memmove)
+ * rdi target
+ * rsi source
+ * rdx count
+ */
+
+ENTRY(memcpy_backwards):
+   CFI_STARTPROC
+   std
+   movq %rdi, %rax
+   movl %edx, %ecx
+   add  %rdx, %rdi
+   add  %rdx, %rsi
+   shrl $3, %ecx
+   andl $7, %edx
+   rep movsq
+   movl %edx, %ecx
+   rep movsb
+   cld
+   ret
+   CFI_ENDPROC
+ENDPROC(memcpy_backwards)
+   
diff --git a/arch/x86/lib/memmove_64.c b/arch/x86/lib/memmove_64.c
index 0a33909..6c00304 100644
--- a/arch/x86/lib/memmove_64.c
+++ b/arch/x86/lib/memmove_64.c
@@ -5,16 +5,16 @@
 #include 
 #include 
 
+extern void asmlinkage memcpy_backwards(void *dst, const void *src,
+   size_t count);
+
 #undef memmove
 void *memmove(void *dest, const void *src, size_t count)
 {
if (dest < src) {
return memcpy(dest, src, count);
} else {
-   char *p = dest + count;
-   const char *s = src + count;
-   while (count--)
-   *--p = *--s;
+   return memcpy_backwards(dest, src, count);
}
return dest;
 }



-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86_64/lib: improve the performance of memmove

2010-09-16 Thread Miao Xie

On Thu, 16 Sep 2010 08:48:25 +0200 (cest), Andi Kleen wrote:

When the dest and the src do overlap and the memory area is large, memmove
of
x86_64 is very inefficient, and it led to bad performance, such as btrfs's
file
deletion performance. This patch improved the performance of memmove on
x86_64
by using __memcpy_bwd() instead of byte copy when doing large memory area
copy
(len>  64).



I still don't understand why you don't simply use a backwards
string copy (with std) ? That should be much simpler and
hopefully be as optimized for kernel copies on recent CPUs.


But according to the comment of memcpy, some CPUs don't support "REP" 
instruction,
so I think we must implement a backwards string copy by other method for those 
CPUs,
But that implement is complex, so I write it as a function -- __memcpy_bwd().

Thanks!
Miao
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html