Re: [PATCH] Btrfs: cleanup btrfs_alloc_path()'s caller code

2011-04-11 Thread Tsutomu Itoh
(2011/04/12 7:46), Yoshinori Sano wrote:
> Thank you for your review.
> I modified the previous patch.

Other points still existed. I'm sorry not to point it out at a time.

> 
> Specifically, all the callers that calls the following are modified
> because of the lack of return value check:
> - btrfs_drop_snapshot()
> - btrfs_update_inode()
> - wc->process_func()
> 
> However, BUG_ON() code are increased by this modification.
> 
> Regards,
> 
> Signed-off-by: Yoshinori Sano 
> ---
>  fs/btrfs/dir-item.c |2 +
>  fs/btrfs/extent-tree.c  |   12 +++---
>  fs/btrfs/file-item.c|6 +++-
>  fs/btrfs/file.c |3 +-
>  fs/btrfs/free-space-cache.c |4 ++-
>  fs/btrfs/inode.c|   44 --
>  fs/btrfs/relocation.c   |4 ++-
>  fs/btrfs/root-tree.c|6 +++-
>  fs/btrfs/transaction.c  |9 ++-
>  fs/btrfs/tree-log.c |   24 +++---
>  fs/btrfs/volumes.c  |8 +-
>  11 files changed, 84 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/btrfs/dir-item.c b/fs/btrfs/dir-item.c
> index c62f02f..e60bf8e 100644
> --- a/fs/btrfs/dir-item.c
> +++ b/fs/btrfs/dir-item.c
> @@ -142,6 +142,8 @@ int btrfs_insert_dir_item(struct
> btrfs_trans_handle *trans, struct btrfs_root
>   key.offset = btrfs_name_hash(name, name_len);
> 
>   path = btrfs_alloc_path();
> + if (!path)
> + return -ENOMEM;
>   path->leave_spinning = 1;
> 
>   data_size = sizeof(*dir_item) + name_len;
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index f619c3c..b830db8 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -645,7 +645,8 @@ int btrfs_lookup_extent(struct btrfs_root *root,
> u64 start, u64 len)
>   struct btrfs_path *path;
> 
>   path = btrfs_alloc_path();
> - BUG_ON(!path);
> + if (!path)
> + return -ENOMEM;
>   key.objectid = start;
>   key.offset = len;
>   btrfs_set_key_type(&key, BTRFS_EXTENT_ITEM_KEY);
> @@ -5531,7 +5532,8 @@ static int alloc_reserved_tree_block(struct
> btrfs_trans_handle *trans,
>   u32 size = sizeof(*extent_item) + sizeof(*block_info) + sizeof(*iref);
> 
>   path = btrfs_alloc_path();
> - BUG_ON(!path);
> + if (!path)
> + return -ENOMEM;
> 
>   path->leave_spinning = 1;
>   ret = btrfs_insert_empty_item(trans, fs_info->extent_root, path,
> @@ -6302,7 +6304,8 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>   int level;
> 
>   path = btrfs_alloc_path();
> - BUG_ON(!path);
> + if (!path)
> + return -ENOMEM;
> 
>   wc = kzalloc(sizeof(*wc), GFP_NOFS);
>   BUG_ON(!wc);
> @@ -8699,7 +8702,8 @@ int btrfs_remove_block_group(struct
> btrfs_trans_handle *trans,
>   spin_unlock(&cluster->refill_lock);
> 
>   path = btrfs_alloc_path();
> - BUG_ON(!path);
> + if (!path)
> + return -ENOMEM;
> 
>   inode = lookup_free_space_inode(root, block_group, path);
>   if (!IS_ERR(inode)) {
> diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
> index a6a9d4e..097911e 100644
> --- a/fs/btrfs/file-item.c
> +++ b/fs/btrfs/file-item.c
> @@ -281,7 +281,8 @@ int btrfs_lookup_csums_range(struct btrfs_root
> *root, u64 start, u64 end,
>   u16 csum_size = btrfs_super_csum_size(&root->fs_info->super_copy);
> 
>   path = btrfs_alloc_path();
> - BUG_ON(!path);
> + if (!path)
> + return -ENOMEM;
> 
>   key.objectid = BTRFS_EXTENT_CSUM_OBJECTID;
>   key.offset = start;
> @@ -665,7 +666,8 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle 
> *trans,
>   btrfs_super_csum_size(&root->fs_info->super_copy);
> 
>   path = btrfs_alloc_path();
> - BUG_ON(!path);
> + if (!path)
> + return -ENOMEM;
>   sector_sum = sums->sums;
>  again:
>   next_offset = (u64)-1;
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index e621ea5..fe623ea 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -599,7 +599,8 @@ int btrfs_mark_extent_written(struct
> btrfs_trans_handle *trans,
>   btrfs_drop_extent_cache(inode, start, end - 1, 0);
> 
>   path = btrfs_alloc_path();
> - BUG_ON(!path);
> + if (!path)
> + return -ENOMEM;

 fs/btrfs/inode.c
  5827 ret = btrfs_mark_extent_written(trans, inode,
  5828 ordered->file_offset,
  5829 ordered->file_offset +
  5830 ordered->len);
  5831 if (ret) {
  5832 err = ret;
  5833 goto out_unlock;
  5834 }

I think that you should insert WARN_ON() or BUG_ON() before 'goto out_unlock'.

>  again:
>   recow = 0;
>   split = start;
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index f56

Re: [PATCH] Btrfs: avoid taking the chunk_mutex in do_chunk_alloc

2011-04-11 Thread liubo
On 04/12/2011 08:30 AM, Josef Bacik wrote:
> Everytime we try to allocate disk space we try and see if we can pre-emptively
> allocate a chunk, but in the common case we don't allocate anything, so there 
> is
> no sense in taking the chunk_mutex at all.  So instead if we are allocating a
> chunk, mark it in the space_info so we don't get two people trying to allocate
> at the same time.  Thanks,
> 
> Signed-off-by: Josef Bacik 
> ---
>  fs/btrfs/ctree.h   |5 +++--
>  fs/btrfs/extent-tree.c |   24 ++--
>  2 files changed, 25 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 0d00a07..a566780 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -740,10 +740,11 @@ struct btrfs_space_info {
>*/
>   unsigned long reservation_progress;
>  
> - int full;   /* indicates that we cannot allocate any more
> + int full:1; /* indicates that we cannot allocate any more
>  chunks for this space */
> - int force_alloc;/* set if we need to force a chunk alloc for
> + int force_alloc:1;  /* set if we need to force a chunk alloc for
>  this space */
> + int chunk_alloc:1;  /* set if we are allocating a chunk */
>  
>   struct list_head list;
>  
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index f619c3c..80c048f 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3020,6 +3020,7 @@ static int update_space_info(struct btrfs_fs_info 
> *info, u64 flags,
>   found->bytes_may_use = 0;
>   found->full = 0;
>   found->force_alloc = 0;
> + found->chunk_alloc = 0;
>   *space_info = found;
>   list_add_rcu(&found->list, &info->space_info);
>   atomic_set(&found->caching_threads, 0);
> @@ -3273,10 +3274,9 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
> *trans,
>  {
>   struct btrfs_space_info *space_info;
>   struct btrfs_fs_info *fs_info = extent_root->fs_info;
> + int wait_for_alloc = 0;
>   int ret = 0;
>  
> - mutex_lock(&fs_info->chunk_mutex);
> -
>   flags = btrfs_reduce_alloc_profile(extent_root, flags);
>  
>   space_info = __find_space_info(extent_root->fs_info, flags);
> @@ -3287,6 +3287,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
> *trans,
>   }
>   BUG_ON(!space_info);
>  
> +again:
>   spin_lock(&space_info->lock);
>   if (space_info->force_alloc)
>   force = 1;
> @@ -3299,9 +3300,27 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
> *trans,
> alloc_bytes)) {
>   spin_unlock(&space_info->lock);
>   goto out;

hmm, the "goto" will lead to problems, cause in "out" clause there is a 
mutex_unlock(), which
we do not have a mutex_lock yet.

thanks,
liubo

> + } else if (space_info->chunk_alloc) {
> + wait_for_alloc = 1;
> + } else {
> + space_info->chunk_alloc = 1;
>   }
>   spin_unlock(&space_info->lock);
>  
> + mutex_lock(&fs_info->chunk_mutex);
> +
> + /*
> +  * The chunk_mutex is held throughout the entirety of a chunk
> +  * allocation, so once we've acquired the chunk_mutex we know that the
> +  * other guy is done and we need to recheck and see if we should
> +  * allocate.
> +  */
> + if (wait_for_alloc) {
> + mutex_unlock(&fs_info->chunk_mutex);
> + wait_for_alloc = 0;
> + goto again;
> + }
> +
>   /*
>* If we have mixed data/metadata chunks we want to make sure we keep
>* allocating mixed chunks instead of individual chunks.
> @@ -3327,6 +3346,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
> *trans,
>   space_info->full = 1;
>   else
>   ret = 1;
> + space_info->chunk_alloc = 0;
>   space_info->force_alloc = 0;
>   spin_unlock(&space_info->lock);
>  out:

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mark internal functions static

2011-04-11 Thread Daniel J Blueman
On 11 April 2011 23:45, Josef Bacik  wrote:
> On 04/11/2011 11:40 AM, Daniel J Blueman wrote:
>>
>> Hi Chris,
>>
>> This didn't make it in before, so updating to 2.6.39-rc2 and resending:
>>
>> Prevent needless exporting of internal functions from compilation
>> units by marking them static.
>>
>
> Looks like you have line wrapping on or something, the page looks mangled.
>  Thanks,

The only way I can solve this in gmail webmail is by attaching the patch.

Is this acceptable? I guess if the mailing list strips patches, I
guess using both may be a get-out-of-jail...

Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: avoid taking the chunk_mutex in do_chunk_alloc

2011-04-11 Thread Josef Bacik
Everytime we try to allocate disk space we try and see if we can pre-emptively
allocate a chunk, but in the common case we don't allocate anything, so there is
no sense in taking the chunk_mutex at all.  So instead if we are allocating a
chunk, mark it in the space_info so we don't get two people trying to allocate
at the same time.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/ctree.h   |5 +++--
 fs/btrfs/extent-tree.c |   24 ++--
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0d00a07..a566780 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -740,10 +740,11 @@ struct btrfs_space_info {
 */
unsigned long reservation_progress;
 
-   int full;   /* indicates that we cannot allocate any more
+   int full:1; /* indicates that we cannot allocate any more
   chunks for this space */
-   int force_alloc;/* set if we need to force a chunk alloc for
+   int force_alloc:1;  /* set if we need to force a chunk alloc for
   this space */
+   int chunk_alloc:1;  /* set if we are allocating a chunk */
 
struct list_head list;
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f619c3c..80c048f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3020,6 +3020,7 @@ static int update_space_info(struct btrfs_fs_info *info, 
u64 flags,
found->bytes_may_use = 0;
found->full = 0;
found->force_alloc = 0;
+   found->chunk_alloc = 0;
*space_info = found;
list_add_rcu(&found->list, &info->space_info);
atomic_set(&found->caching_threads, 0);
@@ -3273,10 +3274,9 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
*trans,
 {
struct btrfs_space_info *space_info;
struct btrfs_fs_info *fs_info = extent_root->fs_info;
+   int wait_for_alloc = 0;
int ret = 0;
 
-   mutex_lock(&fs_info->chunk_mutex);
-
flags = btrfs_reduce_alloc_profile(extent_root, flags);
 
space_info = __find_space_info(extent_root->fs_info, flags);
@@ -3287,6 +3287,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
*trans,
}
BUG_ON(!space_info);
 
+again:
spin_lock(&space_info->lock);
if (space_info->force_alloc)
force = 1;
@@ -3299,9 +3300,27 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
*trans,
  alloc_bytes)) {
spin_unlock(&space_info->lock);
goto out;
+   } else if (space_info->chunk_alloc) {
+   wait_for_alloc = 1;
+   } else {
+   space_info->chunk_alloc = 1;
}
spin_unlock(&space_info->lock);
 
+   mutex_lock(&fs_info->chunk_mutex);
+
+   /*
+* The chunk_mutex is held throughout the entirety of a chunk
+* allocation, so once we've acquired the chunk_mutex we know that the
+* other guy is done and we need to recheck and see if we should
+* allocate.
+*/
+   if (wait_for_alloc) {
+   mutex_unlock(&fs_info->chunk_mutex);
+   wait_for_alloc = 0;
+   goto again;
+   }
+
/*
 * If we have mixed data/metadata chunks we want to make sure we keep
 * allocating mixed chunks instead of individual chunks.
@@ -3327,6 +3346,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
*trans,
space_info->full = 1;
else
ret = 1;
+   space_info->chunk_alloc = 0;
space_info->force_alloc = 0;
spin_unlock(&space_info->lock);
 out:
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: cleanup btrfs_alloc_path()'s caller code

2011-04-11 Thread Yoshinori Sano
Thank you for your review.
I modified the previous patch.

Specifically, all the callers that calls the following are modified
because of the lack of return value check:
- btrfs_drop_snapshot()
- btrfs_update_inode()
- wc->process_func()

However, BUG_ON() code are increased by this modification.

Regards,

Signed-off-by: Yoshinori Sano 
---
 fs/btrfs/dir-item.c |2 +
 fs/btrfs/extent-tree.c  |   12 +++---
 fs/btrfs/file-item.c|6 +++-
 fs/btrfs/file.c |3 +-
 fs/btrfs/free-space-cache.c |4 ++-
 fs/btrfs/inode.c|   44 --
 fs/btrfs/relocation.c   |4 ++-
 fs/btrfs/root-tree.c|6 +++-
 fs/btrfs/transaction.c  |9 ++-
 fs/btrfs/tree-log.c |   24 +++---
 fs/btrfs/volumes.c  |8 +-
 11 files changed, 84 insertions(+), 38 deletions(-)

diff --git a/fs/btrfs/dir-item.c b/fs/btrfs/dir-item.c
index c62f02f..e60bf8e 100644
--- a/fs/btrfs/dir-item.c
+++ b/fs/btrfs/dir-item.c
@@ -142,6 +142,8 @@ int btrfs_insert_dir_item(struct
btrfs_trans_handle *trans, struct btrfs_root
key.offset = btrfs_name_hash(name, name_len);

path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
path->leave_spinning = 1;

data_size = sizeof(*dir_item) + name_len;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f619c3c..b830db8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -645,7 +645,8 @@ int btrfs_lookup_extent(struct btrfs_root *root,
u64 start, u64 len)
struct btrfs_path *path;

path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;
key.objectid = start;
key.offset = len;
btrfs_set_key_type(&key, BTRFS_EXTENT_ITEM_KEY);
@@ -5531,7 +5532,8 @@ static int alloc_reserved_tree_block(struct
btrfs_trans_handle *trans,
u32 size = sizeof(*extent_item) + sizeof(*block_info) + sizeof(*iref);

path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;

path->leave_spinning = 1;
ret = btrfs_insert_empty_item(trans, fs_info->extent_root, path,
@@ -6302,7 +6304,8 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
int level;

path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;

wc = kzalloc(sizeof(*wc), GFP_NOFS);
BUG_ON(!wc);
@@ -8699,7 +8702,8 @@ int btrfs_remove_block_group(struct
btrfs_trans_handle *trans,
spin_unlock(&cluster->refill_lock);

path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;

inode = lookup_free_space_inode(root, block_group, path);
if (!IS_ERR(inode)) {
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index a6a9d4e..097911e 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -281,7 +281,8 @@ int btrfs_lookup_csums_range(struct btrfs_root
*root, u64 start, u64 end,
u16 csum_size = btrfs_super_csum_size(&root->fs_info->super_copy);

path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;

key.objectid = BTRFS_EXTENT_CSUM_OBJECTID;
key.offset = start;
@@ -665,7 +666,8 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans,
btrfs_super_csum_size(&root->fs_info->super_copy);

path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;
sector_sum = sums->sums;
 again:
next_offset = (u64)-1;
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index e621ea5..fe623ea 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -599,7 +599,8 @@ int btrfs_mark_extent_written(struct
btrfs_trans_handle *trans,
btrfs_drop_extent_cache(inode, start, end - 1, 0);

path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;
 again:
recow = 0;
split = start;
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index f561c95..101b96c 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -522,6 +522,7 @@ int btrfs_write_out_cache(struct btrfs_root *root,
int num_checksums;
int entries = 0;
int bitmaps = 0;
+   int err;
int ret = 0;
bool next_page = false;

@@ -853,7 +854,8 @@ out_free:
BTRFS_I(inode)->generation = 0;
}
kfree(checksums);
-   btrfs_update_inode(trans, root, inode);
+   err = btrfs_update_inode(trans, root, inode);
+   BUG_ON(err);
iput(inode);
return ret;
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index cc60228..c023053 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -201,9 +201,9 @@ static noinline int insert_inline_extent(struct
btrfs_trans_handle *trans,

Re: v2.6.38-6555-ga44f99c: null pointer dereference on -ENOSPC

2011-04-11 Thread Sergei Trofimovich
On Sun, 3 Apr 2011 00:10:28 +0300
Sergei Trofimovich  wrote:

> > The partition is a physical ~5GB --mixed lzo compressed partition.

I used wrong patch for btrfs-progs to support --mixed, so please treat
it as OOps on horribly corrupted FS.

-- 

  Sergei


signature.asc
Description: PGP signature


[PATCH v4] btrfs: properly handle overlapping areas in memmove_extent_buffer

2011-04-11 Thread Sergei Trofimovich
Fix data corruption caused by memcpy() usage on overlapping data.
I've observed it first when found out usermode linux crash on btrfs.

Сall chain is the following:
[ cut here ]
WARNING: at /home/slyfox/linux-2.6/fs/btrfs/extent_io.c:3900 
memcpy_extent_buffer+0x1a5/0x219()
Call Trace:
6fa39a58:  [<601b495e>] _raw_spin_unlock_irqrestore+0x18/0x1c
6fa39a68:  [<60029ad9>] warn_slowpath_common+0x59/0x70
6fa39aa8:  [<60029b05>] warn_slowpath_null+0x15/0x17
6fa39ab8:  [<600efc97>] memcpy_extent_buffer+0x1a5/0x219
6fa39b48:  [<600efd9f>] memmove_extent_buffer+0x94/0x208
6fa39bc8:  [<600becbf>] btrfs_del_items+0x214/0x473
6fa39c78:  [<600ce1b0>] btrfs_delete_one_dir_name+0x7c/0xda
6fa39cc8:  [<600dad6b>] __btrfs_unlink_inode+0xad/0x25d
6fa39d08:  [<600d7864>] btrfs_start_transaction+0xe/0x10
6fa39d48:  [<600dc9ff>] btrfs_unlink_inode+0x1b/0x3b
6fa39d78:  [<600e04bc>] btrfs_unlink+0x70/0xef
6fa39dc8:  [<6007f0d0>] vfs_unlink+0x58/0xa3
6fa39df8:  [<60080278>] do_unlinkat+0xd4/0x162
6fa39e48:  [<600517db>] call_rcu_sched+0xe/0x10
6fa39e58:  [<600452a8>] __put_cred+0x58/0x5a
6fa39e78:  [<6007446c>] sys_faccessat+0x154/0x166
6fa39ed8:  [<60080317>] sys_unlink+0x11/0x13
6fa39ee8:  [<60016b80>] handle_syscall+0x58/0x70
6fa39f08:  [<60021377>] userspace+0x2d4/0x381
6fa39fc8:  [<60014507>] fork_handler+0x62/0x69
---[ end trace 70b0ca2ef0266b93 ]---

http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg09302.html

Signed-off-by: Sergei Trofimovich 
Reviewed-by: Josef Bacik 
---
Changes since v3:
- Added Josef's Reviewed-by

Changes since v2:
- Code style cleanup
- 2 versions of patch: BUG_ON and WARN_ON variants,
   _but_ see below why I prefer BUG_ON

Changes since v1:
 
   else
   src_kaddr = dst_kaddr;

 +BUG_ON(abs(src_off - dst_off)<  len);
   memcpy(dst_kaddr + dst_off, src_kaddr + src_off, len);  

Too eager BUG_ON. Now used only for src_page == dst_page.
 
 -if (dst_offset<  src_offset) {
 +if (abs(dst_offset - src_offset)>= len) {  

abs() is not a good thing to use un unsigned values. aded helper 
overlapping_areas.

 fs/btrfs/extent_io.c |   14 +++---
 1 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 20ddb28..10db989 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3885,6 +3885,12 @@ static void move_pages(struct page *dst_page, struct 
page *src_page,
kunmap_atomic(dst_kaddr, KM_USER0);
 }
 
+static inline bool areas_overlap(unsigned long src, unsigned long dst, 
unsigned long len)
+{
+   unsigned long distance = (src > dst) ? src - dst : dst - src;
+   return distance < len;
+}
+
 static void copy_pages(struct page *dst_page, struct page *src_page,
   unsigned long dst_off, unsigned long src_off,
   unsigned long len)
@@ -3892,10 +3898,12 @@ static void copy_pages(struct page *dst_page, struct 
page *src_page,
char *dst_kaddr = kmap_atomic(dst_page, KM_USER0);
char *src_kaddr;
 
-   if (dst_page != src_page)
+   if (dst_page != src_page) {
src_kaddr = kmap_atomic(src_page, KM_USER1);
-   else
+   } else {
src_kaddr = dst_kaddr;
+   BUG_ON(areas_overlap(src_off, dst_off, len));
+   }
 
memcpy(dst_kaddr + dst_off, src_kaddr + src_off, len);
kunmap_atomic(dst_kaddr, KM_USER0);
@@ -3970,7 +3978,7 @@ void memmove_extent_buffer(struct extent_buffer *dst, 
unsigned long dst_offset,
   "len %lu len %lu\n", dst_offset, len, dst->len);
BUG_ON(1);
}
-   if (dst_offset < src_offset) {
+   if (!areas_overlap(src_offset, dst_offset, len)) {
memcpy_extent_buffer(dst, dst_offset, src_offset, len);
return;
}
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: only drop the trans_mutex in join_transaction if we have to

2011-04-11 Thread Josef Bacik
I noticed we are dropping the trans_mutex and then immediately re-acquiring it
in the common case in join_transaction.  Instead of doing that, just drop it if
we have to (like we have to commit or end the transaction or something) and
otherwise just keep a hold of it.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/transaction.c |   20 +---
 1 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index c571734..cde38f9 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -181,6 +181,7 @@ static struct btrfs_trans_handle *start_transaction(struct 
btrfs_root *root,
struct btrfs_transaction *cur_trans;
int retries = 0;
int ret;
+   bool lock = (type != TRANS_JOIN_NOLOCK);
 
if (root->fs_info->fs_state & BTRFS_SUPER_FLAG_ERROR)
return ERR_PTR(-EROFS);
@@ -189,7 +190,7 @@ again:
if (!h)
return ERR_PTR(-ENOMEM);
 
-   if (type != TRANS_JOIN_NOLOCK)
+   if (lock)
mutex_lock(&root->fs_info->trans_mutex);
if (may_wait_transaction(root, type))
wait_current_trans(root);
@@ -197,15 +198,13 @@ again:
ret = join_transaction(root);
if (ret < 0) {
kmem_cache_free(btrfs_trans_handle_cachep, h);
-   if (type != TRANS_JOIN_NOLOCK)
+   if (lock)
mutex_unlock(&root->fs_info->trans_mutex);
return ERR_PTR(ret);
}
 
cur_trans = root->fs_info->running_transaction;
atomic_inc(&cur_trans->use_count);
-   if (type != TRANS_JOIN_NOLOCK)
-   mutex_unlock(&root->fs_info->trans_mutex);
 
h->transid = cur_trans->transid;
h->transaction = cur_trans;
@@ -217,6 +216,11 @@ again:
 
smp_mb();
if (cur_trans->blocked && may_wait_transaction(root, type)) {
+   /*
+* No need to test for lock since may_wait_transaction will only
+* return 1 if we had to take the trans_mutex anyway.
+*/
+   mutex_unlock(&root->fs_info->trans_mutex);
btrfs_commit_transaction(h, root);
goto again;
}
@@ -225,6 +229,8 @@ again:
ret = btrfs_trans_reserve_metadata(h, root, num_items);
if (ret == -EAGAIN && !retries) {
retries++;
+   if (lock)
+   mutex_unlock(&root->fs_info->trans_mutex);
btrfs_commit_transaction(h, root);
goto again;
} else if (ret == -EAGAIN) {
@@ -236,15 +242,15 @@ again:
}
 
if (ret < 0) {
+   if (lock)
+   mutex_unlock(&root->fs_info->trans_mutex);
btrfs_end_transaction(h, root);
return ERR_PTR(ret);
}
}
 
-   if (type != TRANS_JOIN_NOLOCK)
-   mutex_lock(&root->fs_info->trans_mutex);
record_root_in_trans(h, root);
-   if (type != TRANS_JOIN_NOLOCK)
+   if (lock)
mutex_unlock(&root->fs_info->trans_mutex);
 
if (!current->journal_info && type != TRANS_USERSPACE)
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: avoid taking the trans_mutex in btrfs_end_transaction V2

2011-04-11 Thread Josef Bacik
I've been working on making our O_DIRECT latency not suck and I noticed we were
taking the trans_mutex in btrfs_end_transaction.  So to do this we convert
num_writers and use_count to atomic_t's and just decrement them in
btrfs_end_transaction.  Instead of deleting the transaction from the trans list
in put_transaction we do that in btrfs_commit_transaction() since that's the
only time it actually needs to be removed from the list.  Thanks,

Signed-off-by: Josef Bacik 
---
V1->V2: make put_transaction just free the transaction and then change
btrfs_commit_transaction to do the trans list deletion.

 fs/btrfs/disk-io.c |2 +-
 fs/btrfs/transaction.c |   37 -
 fs/btrfs/transaction.h |4 ++--
 3 files changed, 19 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8f1d44b..68c84c8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3057,7 +3057,7 @@ static int btrfs_cleanup_transaction(struct btrfs_root 
*root)
btrfs_destroy_pinned_extent(root,
root->fs_info->pinned_extents);
 
-   t->use_count = 0;
+   atomic_set(&t->use_count, 0);
list_del_init(&t->list);
memset(t, 0, sizeof(*t));
kmem_cache_free(btrfs_transaction_cachep, t);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 4583008..c571734 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -32,10 +32,8 @@
 
 static noinline void put_transaction(struct btrfs_transaction *transaction)
 {
-   WARN_ON(transaction->use_count == 0);
-   transaction->use_count--;
-   if (transaction->use_count == 0) {
-   list_del_init(&transaction->list);
+   WARN_ON(atomic_read(&transaction->use_count) == 0);
+   if (atomic_dec_and_test(&transaction->use_count)) {
memset(transaction, 0, sizeof(*transaction));
kmem_cache_free(btrfs_transaction_cachep, transaction);
}
@@ -60,14 +58,14 @@ static noinline int join_transaction(struct btrfs_root 
*root)
if (!cur_trans)
return -ENOMEM;
root->fs_info->generation++;
-   cur_trans->num_writers = 1;
+   atomic_set(&cur_trans->num_writers, 1);
cur_trans->num_joined = 0;
cur_trans->transid = root->fs_info->generation;
init_waitqueue_head(&cur_trans->writer_wait);
init_waitqueue_head(&cur_trans->commit_wait);
cur_trans->in_commit = 0;
cur_trans->blocked = 0;
-   cur_trans->use_count = 1;
+   atomic_set(&cur_trans->use_count, 1);
cur_trans->commit_done = 0;
cur_trans->start_time = get_seconds();
 
@@ -88,7 +86,7 @@ static noinline int join_transaction(struct btrfs_root *root)
root->fs_info->running_transaction = cur_trans;
spin_unlock(&root->fs_info->new_trans_lock);
} else {
-   cur_trans->num_writers++;
+   atomic_inc(&cur_trans->num_writers);
cur_trans->num_joined++;
}
 
@@ -145,7 +143,7 @@ static void wait_current_trans(struct btrfs_root *root)
cur_trans = root->fs_info->running_transaction;
if (cur_trans && cur_trans->blocked) {
DEFINE_WAIT(wait);
-   cur_trans->use_count++;
+   atomic_inc(&cur_trans->use_count);
while (1) {
prepare_to_wait(&root->fs_info->transaction_wait, &wait,
TASK_UNINTERRUPTIBLE);
@@ -205,7 +203,7 @@ again:
}
 
cur_trans = root->fs_info->running_transaction;
-   cur_trans->use_count++;
+   atomic_inc(&cur_trans->use_count);
if (type != TRANS_JOIN_NOLOCK)
mutex_unlock(&root->fs_info->trans_mutex);
 
@@ -336,7 +334,7 @@ int btrfs_wait_for_commit(struct btrfs_root *root, u64 
transid)
goto out_unlock;  /* nothing committing|committed */
}
 
-   cur_trans->use_count++;
+   atomic_inc(&cur_trans->use_count);
mutex_unlock(&root->fs_info->trans_mutex);
 
wait_for_commit(root, cur_trans);
@@ -466,18 +464,14 @@ static int __btrfs_end_transaction(struct 
btrfs_trans_handle *trans,
wake_up_process(info->transaction_kthread);
}
 
-   if (lock)
-   mutex_lock(&info->trans_mutex);
WARN_ON(cur_trans != info->running_transaction);
-   WARN_ON(cur_trans->num_writers < 1);
-   cur_trans->num_writers--;
+   WARN_ON(atomic_read(&cur_trans->num_writers) < 1);
+   atomic_dec(&cur_trans->num_writers);
 
smp_mb();
if (waitqueue_active(&cur_trans->writer_wait))
wake_up(&cur_trans->writer_wait);
put_transaction(cur_trans);
-   if (lock)
- 

Re: [PATCH] Btrfs: avoid taking the trans_mutex in btrfs_end_transaction

2011-04-11 Thread Chris Mason
Excerpts from Josef Bacik's message of 2011-04-11 15:49:17 -0400:
> I've been working on making our O_DIRECT latency not suck and I noticed we 
> were
> taking the trans_mutex in btrfs_end_transaction.  So to do this we convert
> num_writers and use_count to atomic_t's and just decrement them in
> btrfs_end_transaction.  I got rid of the put_transaction() in
> btrfs_end_transaction() since we will never free the transaction from
> btrfs_end_transaction().  Tested this with xfstests and everything went
> smoothly.  Thanks,

Hmmm, this is smart, there's no reason to take the lock here.  But
there's one little problem:

> @@ -466,19 +465,20 @@ static int __btrfs_end_transaction(struct 
> btrfs_trans_handle *trans,
>  wake_up_process(info->transaction_kthread);
>  }
>  
> -if (lock)
> -mutex_lock(&info->trans_mutex);
>  WARN_ON(cur_trans != info->running_transaction);
> -WARN_ON(cur_trans->num_writers < 1);
> -cur_trans->num_writers--;
> +WARN_ON(atomic_read(&cur_trans->num_writers) < 1);
> +atomic_dec(&cur_trans->num_writers);

We've just decremented num_writers, which could have been the only thing
preventing a commit from finishing.  The entire commit could finish at
any time after this line.

>  
>  smp_mb();
>  if (waitqueue_active(&cur_trans->writer_wait))
>  wake_up(&cur_trans->writer_wait);
> -put_transaction(cur_trans);
> -if (lock)
> -mutex_unlock(&info->trans_mutex);
>  
> +/*
> + * A trans handle will never actually be putting the last reference of a
> + * transaction, so just dec the use count to avoid taking the trans
> + * mutex.
> + */
> +atomic_dec(&cur_trans->use_count);

Which could make this the last and final reference on
cur_trans->use_count.

There's no reason we need the lock in put_transaction, other than
manipulating the list of running transactions.  But I don't see any
reason we can't delete ourselves from the list when the commit is done,
which would let you keep a lock less put_transaction.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix easily get into ENOSPC in mixed case

2011-04-11 Thread Sergei Trofimovich
On Mon, 11 Apr 2011 14:29:21 +0800
liubo  wrote:

> On 04/09/2011 05:55 AM, Sergei Trofimovich wrote:
> > [  100.500011] Call Trace:
> > [  100.500011]  [] vfs_unlink+0x80/0xf0
> > [  100.500011]  [] do_unlinkat+0x173/0x1b0
> > [  100.500011]  [] ? fsnotify_find_inode_mark+0x3b/0x50
> > [  100.500011]  [] ? filp_close+0x61/0x90
> > [  100.500011]  [] sys_unlinkat+0x1d/0x40
> > [  100.500011]  [] system_call_fastpath+0x16/0x1b
> > [  100.500011] Code: 4c 8b 65 e0 48 8b 5d d8 4c 8b 6d e8 4c 8b 75 f0 4c 8b 
> > 7d f8 c9 c3 0f 1f 40 00 4c 89 fe 4c 89 ef e8 05 d0 ff ff 85 c0 74 bb 0f 0b 
> > <0f> 0b 89 c3 eb cd 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 
> > [  100.500011] RIP  [] btrfs_unlink+0xd1/0xe0 [btrfs]
> > [  100.500011]  RSP 
> > [  100.525672] ---[ end trace 7e63b9144b7307fe ]---
> > 
> > Looks like I won't be able to test your patch until this thing will go away 
> > first.
> 
> Thanks a lot for testing, though.
> 
> I guess something messed up your btrfs metadata, cause when btrfs_unlink() 
> wanted to remove A,
> it found that A was just missing...
Looks like it's ret = -28 (a ENOSPC). Yes, you are right. Moreover, as Arne 
found out,
I used wrong patch for btrfs-progs to create --mixed filesystems. I set wrong 
bit in
superblock (the 8ULL << 0, not 4ULL << 0) - the COMPRESS_LZO, so my metadata is
really screwed.

Please, disregard all my OOpses reported against --mixed FS. Sorry.

-- 

  Sergei


signature.asc
Description: PGP signature


Mount btrfs segmentation fault

2011-04-11 Thread Leszek Ciesielski
Hi,

after my machine hung during "btrfs filesystem balance /mnt", I am now
unable to mount the filesystem due to a segmentation fault. Is there
any hope of gaining access to the filesystem? What more should I send
to help diagnose the bug? (I am aware that an fs can get corrupted,
but this should never force mount to segfault, am I right?)

At the moment of the crash I was running latest released version
(2.6.38 + 0.19r2), the logs attached were generated with git master.

Regards,

Leszek
[   41.801741] device label home devid 7 transid 24300 /dev/sda8
[   41.816375] device label home devid 5 transid 24300 /dev/sdb8
[   41.840439] device label home devid 6 transid 24300 /dev/sdc8
[   41.848799] device label home devid 7 transid 24300 /dev/sda8
[   41.851987] btrfs: use zlib compression
[   41.852000] btrfs: enabling disk space caching
[   47.150008] [ cut here ]
[   47.150193] kernel BUG at fs/btrfs/extent-tree.c:1399!
[   47.150371] invalid opcode:  [#1] PREEMPT SMP 
[   47.150377] last sysfs file: /sys/devices/virtual/bdi/btrfs-1/uevent
[   47.150377] CPU 1 
[   47.150377] Modules linked in: it87 hwmon_vid vboxnetadp vboxnetflt vboxdrv 
usbhid nvidia(P) i2c_piix4 ehci_hcd ohci_hcd r8169 usbcore k10temp hwmon mii
[   47.150377] 
[   47.150377] Pid: 7864, comm: mount Tainted: PW   2.6.38+ #1 Gigabyte 
Technology Co., Ltd. GA-MA770T-UD3P/GA-MA770T-UD3P
[   47.150377] RIP: 0010:[]  [] 
lookup_inline_extent_backref+0xde/0x3a4
[   47.150377] RSP: 0018:88022ef35798  EFLAGS: 00010202
[   47.150377] RAX: 0001 RBX: 88022bd0af30 RCX: 0003
[   47.150377] RDX: 0001 RSI:  RDI: 88022bd6a880
[   47.150377] RBP: 88022ef35838 R08: 01d4 R09: 88022ef35680
[   47.150377] R10: 88022ef356d8 R11: 88022ef35618 R12: 00b0
[   47.150377] R13: 88020fe57380 R14: 0001 R15: 0007
[   47.150377] FS:  7fd05cbb0740() GS:8800cfb0() 
knlGS:
[   47.150377] CS:  0010 DS:  ES:  CR0: 8005003b
[   47.150377] CR2: 7fc774f95f80 CR3: 000229446000 CR4: 06e0
[   47.150377] DR0:  DR1:  DR2: 
[   47.150377] DR3:  DR6: 0ff0 DR7: 0400
[   47.150377] Process mount (pid: 7864, threadinfo 88022ef34000, task 
8802290dd7c0)
[   47.150377] Stack:
[   47.150377]  8802296cc000  88022ef357c8 
88022ef358e0
[   47.150377]   016098523000 88022ef35838 
88022c8e4800
[   47.150377]  2ef357e8 0296 016098523000 
001000a8
[   47.150377] Call Trace:
[   47.150377]  [] __btrfs_free_extent+0xc3/0x588
[   47.150377]  [] ? T.844+0x12/0x2d
[   47.150377]  [] vfs_kern_mount+0xb8/0x1d1
[   47.150377]  [] do_kern_mount+0x48/0xd8
[   47.150377]  [] do_mount+0x729/0x791
[   47.150377]  [] ? copy_mount_options+0xd6/0x13b
[   47.150377]  [] sys_mount+0x83/0xbd
[   47.150377]  [] system_call_fastpath+0x16/0x1b
[   47.150377] Code: 44 8b 45 a4 48 8b 75 98 48 8d 55 b0 41 b9 01 00 00 00 48 
89 d9 4c 89 ef e8 f0 95 ff ff 83 f8 00 41 89 c6 0f 8c 94 02 00 00 74 04 <0f> 0b 
eb fe 4c 8b 33 8b 73 40 4c 89 f7 e8 f9 ea ff ff 41 89 c7 
[   47.150377] RIP  [] lookup_inline_extent_backref+0xde/0x3a4
[   47.150377]  RSP 
[   47.163919] ---[ end trace 2f498c6a919af78a ]---
[   47.164113] note: mount[7864] exited with preempt_count 3
[   47.164316] BUG: scheduling while atomic: mount/7864/0x1004
[   47.164499] Modules linked in: it87 hwmon_vid vboxnetadp vboxnetflt vboxdrv 
usbhid nvidia(P) i2c_piix4 ehci_hcd ohci_hcd r8169 usbcore k10temp hwmon mii
[   47.165034] Pid: 7864, comm: mount Tainted: P  D W   2.6.38+ #1
[   47.165039] Call Trace:
[   47.165048]  [] ? __schedule_bug+0x57/0x5c
[   47.165056]  [] ? schedule+0xe3/0x695
[   47.165067]  [] ? __cond_resched+0x13/0x1f
[   47.165073]  [] ? _cond_resched+0x16/0x1d
[   47.165083]  [] ? unmap_vmas+0x718/0x78d
[   47.165096]  [] ? exit_mmap+0x82/0xe6
[   47.165106]  [] ? mmput+0x3e/0xe5
[   47.165114]  [] ? exit_mm+0x129/0x136
[   47.165122]  [] ? do_exit+0x211/0x7a3
[   47.165129]  [] ? kmsg_dump+0xe5/0xf4
[   47.165139]  [] ? oops_end+0xb1/0xb9
[   47.165147]  [] ? die+0x55/0x5e
[   47.165156]  [] ? do_trap+0x11c/0x12b
[   47.165164]  [] ? do_invalid_op+0x91/0x9a
[   47.165173]  [] ? lookup_inline_extent_backref+0xde/0x3a4
[   47.165182]  [] ? read_block_for_search+0x97/0x2d1
[   47.165193]  [] ? invalid_op+0x15/0x20
[   47.165203]  [] ? lookup_inline_extent_backref+0xde/0x3a4
[   47.165212]  [] ? lookup_inline_extent_backref+0xd0/0x3a4
[   47.165222]  [] ? __btrfs_free_extent+0xc3/0x588
[   47.165230]  [] ? T.844+0x12/0x2d
[   47.165237]  [] ? __slab_free+0x7f/0x108
[   47.165245]  [] ? get_parent_ip+0x11/0x41
[   47.165253]  [] ? run_clustered_refs+0x695/0x6e5
[   47.165264]  [] ? btrfs_run_delayed_refs+0xcc/0x17c
[   47.165

[PATCH] Btrfs: avoid taking the trans_mutex in btrfs_end_transaction

2011-04-11 Thread Josef Bacik
I've been working on making our O_DIRECT latency not suck and I noticed we were
taking the trans_mutex in btrfs_end_transaction.  So to do this we convert
num_writers and use_count to atomic_t's and just decrement them in
btrfs_end_transaction.  I got rid of the put_transaction() in
btrfs_end_transaction() since we will never free the transaction from
btrfs_end_transaction().  Tested this with xfstests and everything went
smoothly.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/disk-io.c |2 +-
 fs/btrfs/transaction.c |   42 +-
 fs/btrfs/transaction.h |4 ++--
 3 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8f1d44b..68c84c8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3057,7 +3057,7 @@ static int btrfs_cleanup_transaction(struct btrfs_root 
*root)
btrfs_destroy_pinned_extent(root,
root->fs_info->pinned_extents);
 
-   t->use_count = 0;
+   atomic_set(&t->use_count, 0);
list_del_init(&t->list);
memset(t, 0, sizeof(*t));
kmem_cache_free(btrfs_transaction_cachep, t);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 4583008..dbacd2c 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -32,9 +32,8 @@
 
 static noinline void put_transaction(struct btrfs_transaction *transaction)
 {
-   WARN_ON(transaction->use_count == 0);
-   transaction->use_count--;
-   if (transaction->use_count == 0) {
+   WARN_ON(atomic_read(&transaction->use_count) == 0);
+   if (atomic_dec_and_test(&transaction->use_count)) {
list_del_init(&transaction->list);
memset(transaction, 0, sizeof(*transaction));
kmem_cache_free(btrfs_transaction_cachep, transaction);
@@ -60,14 +59,14 @@ static noinline int join_transaction(struct btrfs_root 
*root)
if (!cur_trans)
return -ENOMEM;
root->fs_info->generation++;
-   cur_trans->num_writers = 1;
+   atomic_set(&cur_trans->num_writers, 1);
cur_trans->num_joined = 0;
cur_trans->transid = root->fs_info->generation;
init_waitqueue_head(&cur_trans->writer_wait);
init_waitqueue_head(&cur_trans->commit_wait);
cur_trans->in_commit = 0;
cur_trans->blocked = 0;
-   cur_trans->use_count = 1;
+   atomic_set(&cur_trans->use_count, 1);
cur_trans->commit_done = 0;
cur_trans->start_time = get_seconds();
 
@@ -88,7 +87,7 @@ static noinline int join_transaction(struct btrfs_root *root)
root->fs_info->running_transaction = cur_trans;
spin_unlock(&root->fs_info->new_trans_lock);
} else {
-   cur_trans->num_writers++;
+   atomic_inc(&cur_trans->num_writers);
cur_trans->num_joined++;
}
 
@@ -145,7 +144,7 @@ static void wait_current_trans(struct btrfs_root *root)
cur_trans = root->fs_info->running_transaction;
if (cur_trans && cur_trans->blocked) {
DEFINE_WAIT(wait);
-   cur_trans->use_count++;
+   atomic_inc(&cur_trans->use_count);
while (1) {
prepare_to_wait(&root->fs_info->transaction_wait, &wait,
TASK_UNINTERRUPTIBLE);
@@ -205,7 +204,7 @@ again:
}
 
cur_trans = root->fs_info->running_transaction;
-   cur_trans->use_count++;
+   atomic_inc(&cur_trans->use_count);
if (type != TRANS_JOIN_NOLOCK)
mutex_unlock(&root->fs_info->trans_mutex);
 
@@ -336,7 +335,7 @@ int btrfs_wait_for_commit(struct btrfs_root *root, u64 
transid)
goto out_unlock;  /* nothing committing|committed */
}
 
-   cur_trans->use_count++;
+   atomic_inc(&cur_trans->use_count);
mutex_unlock(&root->fs_info->trans_mutex);
 
wait_for_commit(root, cur_trans);
@@ -466,19 +465,20 @@ static int __btrfs_end_transaction(struct 
btrfs_trans_handle *trans,
wake_up_process(info->transaction_kthread);
}
 
-   if (lock)
-   mutex_lock(&info->trans_mutex);
WARN_ON(cur_trans != info->running_transaction);
-   WARN_ON(cur_trans->num_writers < 1);
-   cur_trans->num_writers--;
+   WARN_ON(atomic_read(&cur_trans->num_writers) < 1);
+   atomic_dec(&cur_trans->num_writers);
 
smp_mb();
if (waitqueue_active(&cur_trans->writer_wait))
wake_up(&cur_trans->writer_wait);
-   put_transaction(cur_trans);
-   if (lock)
-   mutex_unlock(&info->trans_mutex);
 
+   /*
+* A trans handle will never actually be putting the last reference of a
+   

Re: [PATCH v3] Re: btrfs does not work on usermode linux

2011-04-11 Thread Josef Bacik

On 04/11/2011 03:44 PM, Sergei Trofimovich wrote:

Fix data corruption caused by memcpy() usage on overlapping data.
I've observed it first when found out usermode linux crash on btrfs.


Changes since v2:
- Code style cleanup
- 2 versions of patch: BUG_ON and WARN_ON variants,
   _but_ see below why I prefer BUG_ON

Changes since v1:


else
src_kaddr = dst_kaddr;

+   BUG_ON(abs(src_off - dst_off)<  len);
memcpy(dst_kaddr + dst_off, src_kaddr + src_off, len);


Too eager BUG_ON. Now used only for src_page == dst_page.


-   if (dst_offset<  src_offset) {
+   if (abs(dst_offset - src_offset)>= len) {


abs() is not a good thing to use un unsigned values. aded helper 
overlapping_areas.

On Mon, 11 Apr 2011 11:37:57 -0400
Josef Bacik  wrote:


+   {
you will want to turn that into

if (dst_page != src_page) {


done


Also maybe BUG_ON() is a little strong, since the kernel will do this
right, it just screws up UML.  So maybe just do a WARN_ON() so we notice
it.  Thanks,


I'm afaid I didn't understand this part. The commit I've found a deviation
was linux's implementation of memcpy (UML uses it as kernel does). Why the
kernel differs to UML in that respect? Seems I don't know/understand something
fundamental here.
So, if data overlaps - it's a moment before data corruption, thus BUG_ON.

Another thought is (if memcpy semantics differ from standard C's function):
does linux's memcpy guarantee copying direction behaviour?
If it does, then it's really a weird memmove and x86/memcpy_64.S is a bit 
broken.

Attached both patches, I personally like BUG_ON variant.
Pick the one you like more :]

Thanks for the feedback!



Fair enough, BUG_ON() it is.  Repost that version and you can add my

Reviewed-by: Josef Bacik 

Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Re: btrfs does not work on usermode linux

2011-04-11 Thread Niklas Schnelle
I think the problem here is that memcpy beahviour changed in recent
glibc in this regard see here http://lwn.net/Articles/414467/ 

On Mon, 2011-04-11 at 22:44 +0300, Sergei Trofimovich wrote:
> > Fix data corruption caused by memcpy() usage on overlapping data.
> > I've observed it first when found out usermode linux crash on btrfs.  
> 
> Changes since v2:
> - Code style cleanup
> - 2 versions of patch: BUG_ON and WARN_ON variants,
>   _but_ see below why I prefer BUG_ON
> 
> Changes since v1:
> 
> > else
> > src_kaddr = dst_kaddr;
> >  
> > +   BUG_ON(abs(src_off - dst_off) < len);
> > memcpy(dst_kaddr + dst_off, src_kaddr + src_off, len);  
> 
> Too eager BUG_ON. Now used only for src_page == dst_page.
> 
> > -   if (dst_offset < src_offset) {
> > +   if (abs(dst_offset - src_offset) >= len) {  
> 
> abs() is not a good thing to use un unsigned values. aded helper 
> overlapping_areas.
> 
> On Mon, 11 Apr 2011 11:37:57 -0400
> Josef Bacik  wrote:
> 
> > +   {
> > you will want to turn that into
> > 
> > if (dst_page != src_page) {
> 
> done
> 
> > Also maybe BUG_ON() is a little strong, since the kernel will do this 
> > right, it just screws up UML.  So maybe just do a WARN_ON() so we notice 
> > it.  Thanks,
> 
> I'm afaid I didn't understand this part. The commit I've found a deviation
> was linux's implementation of memcpy (UML uses it as kernel does). Why the
> kernel differs to UML in that respect? Seems I don't know/understand something
> fundamental here.
> So, if data overlaps - it's a moment before data corruption, thus BUG_ON.
> 
> Another thought is (if memcpy semantics differ from standard C's function):
> does linux's memcpy guarantee copying direction behaviour?
> If it does, then it's really a weird memmove and x86/memcpy_64.S is a bit 
> broken.
> 
> Attached both patches, I personally like BUG_ON variant.
> Pick the one you like more :]
> 
> Thanks for the feedback!
> 



signature.asc
Description: This is a digitally signed message part


[PATCH v3] Re: btrfs does not work on usermode linux

2011-04-11 Thread Sergei Trofimovich
> Fix data corruption caused by memcpy() usage on overlapping data.
> I've observed it first when found out usermode linux crash on btrfs.  

Changes since v2:
- Code style cleanup
- 2 versions of patch: BUG_ON and WARN_ON variants,
  _but_ see below why I prefer BUG_ON

Changes since v1:

>   else
>   src_kaddr = dst_kaddr;
>  
> + BUG_ON(abs(src_off - dst_off) < len);
>   memcpy(dst_kaddr + dst_off, src_kaddr + src_off, len);  

Too eager BUG_ON. Now used only for src_page == dst_page.

> - if (dst_offset < src_offset) {
> + if (abs(dst_offset - src_offset) >= len) {  

abs() is not a good thing to use un unsigned values. aded helper 
overlapping_areas.

On Mon, 11 Apr 2011 11:37:57 -0400
Josef Bacik  wrote:

> + {
> you will want to turn that into
> 
> if (dst_page != src_page) {

done

> Also maybe BUG_ON() is a little strong, since the kernel will do this 
> right, it just screws up UML.  So maybe just do a WARN_ON() so we notice 
> it.  Thanks,

I'm afaid I didn't understand this part. The commit I've found a deviation
was linux's implementation of memcpy (UML uses it as kernel does). Why the
kernel differs to UML in that respect? Seems I don't know/understand something
fundamental here.
So, if data overlaps - it's a moment before data corruption, thus BUG_ON.

Another thought is (if memcpy semantics differ from standard C's function):
does linux's memcpy guarantee copying direction behaviour?
If it does, then it's really a weird memmove and x86/memcpy_64.S is a bit 
broken.

Attached both patches, I personally like BUG_ON variant.
Pick the one you like more :]

Thanks for the feedback!

-- 

  Sergei
From aaaf03ebcdee3f65e898016b14bc81c66bfdd31c Mon Sep 17 00:00:00 2001
From: Sergei Trofimovich 
Date: Sun, 10 Apr 2011 23:19:53 +0300
Subject: [PATCH 1/2] btrfs: properly handle overlapping areas in memmove_extent_buffer
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fix data corruption caused by memcpy() usage on overlapping data.
I've observed it first when found out usermode linux crash on btrfs.

Сall chain is the following:
[ cut here ]
WARNING: at /home/slyfox/linux-2.6/fs/btrfs/extent_io.c:3900 memcpy_extent_buffer+0x1a5/0x219()
Call Trace:
6fa39a58:  [<601b495e>] _raw_spin_unlock_irqrestore+0x18/0x1c
6fa39a68:  [<60029ad9>] warn_slowpath_common+0x59/0x70
6fa39aa8:  [<60029b05>] warn_slowpath_null+0x15/0x17
6fa39ab8:  [<600efc97>] memcpy_extent_buffer+0x1a5/0x219
6fa39b48:  [<600efd9f>] memmove_extent_buffer+0x94/0x208
6fa39bc8:  [<600becbf>] btrfs_del_items+0x214/0x473
6fa39c78:  [<600ce1b0>] btrfs_delete_one_dir_name+0x7c/0xda
6fa39cc8:  [<600dad6b>] __btrfs_unlink_inode+0xad/0x25d
6fa39d08:  [<600d7864>] btrfs_start_transaction+0xe/0x10
6fa39d48:  [<600dc9ff>] btrfs_unlink_inode+0x1b/0x3b
6fa39d78:  [<600e04bc>] btrfs_unlink+0x70/0xef
6fa39dc8:  [<6007f0d0>] vfs_unlink+0x58/0xa3
6fa39df8:  [<60080278>] do_unlinkat+0xd4/0x162
6fa39e48:  [<600517db>] call_rcu_sched+0xe/0x10
6fa39e58:  [<600452a8>] __put_cred+0x58/0x5a
6fa39e78:  [<6007446c>] sys_faccessat+0x154/0x166
6fa39ed8:  [<60080317>] sys_unlink+0x11/0x13
6fa39ee8:  [<60016b80>] handle_syscall+0x58/0x70
6fa39f08:  [<60021377>] userspace+0x2d4/0x381
6fa39fc8:  [<60014507>] fork_handler+0x62/0x69
---[ end trace 70b0ca2ef0266b93 ]---

http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg09302.html

Signed-off-by: Sergei Trofimovich 
---
 fs/btrfs/extent_io.c |   14 +++---
 1 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 20ddb28..10db989 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3885,6 +3885,12 @@ static void move_pages(struct page *dst_page, struct page *src_page,
 	kunmap_atomic(dst_kaddr, KM_USER0);
 }
 
+static inline bool areas_overlap(unsigned long src, unsigned long dst, unsigned long len)
+{
+	unsigned long distance = (src > dst) ? src - dst : dst - src;
+	return distance < len;
+}
+
 static void copy_pages(struct page *dst_page, struct page *src_page,
 		   unsigned long dst_off, unsigned long src_off,
 		   unsigned long len)
@@ -3892,10 +3898,12 @@ static void copy_pages(struct page *dst_page, struct page *src_page,
 	char *dst_kaddr = kmap_atomic(dst_page, KM_USER0);
 	char *src_kaddr;
 
-	if (dst_page != src_page)
+	if (dst_page != src_page) {
 		src_kaddr = kmap_atomic(src_page, KM_USER1);
-	else
+	} else {
 		src_kaddr = dst_kaddr;
+		BUG_ON(areas_overlap(src_off, dst_off, len));
+	}
 
 	memcpy(dst_kaddr + dst_off, src_kaddr + src_off, len);
 	kunmap_atomic(dst_kaddr, KM_USER0);
@@ -3970,7 +3978,7 @@ void memmove_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
 		   "len %lu len %lu\n", dst_offset, len, dst->len);
 		BUG_ON(1);
 	}
-	if (dst_offset < src_offset) {
+	if (!areas_overlap(src_offset, dst_offset, len)) {
 		memcpy_extent_buffer(dst, dst_offset, src_offset, len);
 		return;
 	}
-- 

Re: [PATCH] btrfs: quasi-round-robin for chunk allocation

2011-04-11 Thread Chris Mason
Excerpts from Arne Jansen's message of 2011-04-11 13:42:49 -0400:
> On 08.02.2011 19:03, Arne Jansen wrote:
> > In a multi device setup, the chunk allocator currently always allocates
> > chunks on the devices in the same order. This leads to a very uneven
> > distribution, especially with RAID1 or RAID10 and an uneven number of
> > devices.
> > This patch always sorts the device before allocating, and allocates the
> > stripes on the devices with the most available space, as long as there
> > is enough space available. In a low space situation, it first tries to
> > maximize striping.
> > The patch also simplifies the allocator and reduces the checks for
> > corner cases. Additionally, alloc_start is now always heeded.
> 
> It's been quite some weeks since this patch has been posted. In the
> meantime several issues popped up that are already addressed by this
> patch.
> Currently multi-device btrfs just don't work as advertised, a 'balance'
> operation does not spread the data evenly over all disks and with
> multiple devices a substantial amount of space can be wasted.
> I'd be happy if someone could have a look at the patch so that chris
> can pull it. Diff did some funny things to it so it's best to just pull
> it in and read the results.

My big complaint about this patch is that it is a feature fix mixed in
with a huge cleanup.  It's hard to differentiate between the two.  Could
you please separate it out so we can review them independently?

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: quasi-round-robin for chunk allocation

2011-04-11 Thread Arne Jansen

On 08.02.2011 19:03, Arne Jansen wrote:

In a multi device setup, the chunk allocator currently always allocates
chunks on the devices in the same order. This leads to a very uneven
distribution, especially with RAID1 or RAID10 and an uneven number of
devices.
This patch always sorts the device before allocating, and allocates the
stripes on the devices with the most available space, as long as there
is enough space available. In a low space situation, it first tries to
maximize striping.
The patch also simplifies the allocator and reduces the checks for
corner cases. Additionally, alloc_start is now always heeded.


It's been quite some weeks since this patch has been posted. In the
meantime several issues popped up that are already addressed by this
patch.
Currently multi-device btrfs just don't work as advertised, a 'balance'
operation does not spread the data evenly over all disks and with
multiple devices a substantial amount of space can be wasted.
I'd be happy if someone could have a look at the patch so that chris
can pull it. Diff did some funny things to it so it's best to just pull
it in and read the results.

Thanks,
Arne



Signed-off-by: Arne Jansen
---
  fs/btrfs/super.c   |   25 +++
  fs/btrfs/volumes.c |  469 +---
  fs/btrfs/volumes.h |   16 +--
  3 files changed, 181 insertions(+), 329 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index f4e45fd..1664a25 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -864,6 +864,31 @@ static int btrfs_remount(struct super_block *sb, int 
*flags, char *data)
return 0;
  }

+/* Used to sort the devices by max_avail(descending sort) */
+int btrfs_cmp_device_free_bytes(const void *dev_info1, const void *dev_info2)
+{
+   if (((struct btrfs_device_info *)dev_info1)->max_avail>
+   ((struct btrfs_device_info *)dev_info2)->max_avail)
+   return -1;
+   else if (((struct btrfs_device_info *)dev_info1)->max_avail<
+((struct btrfs_device_info *)dev_info2)->max_avail)
+   return 1;
+   else
+   return 0;
+}
+
+/*
+ * sort the devices by max_avail, in which max free extent size of each device
+ * is stored.(Descending Sort)
+ */
+static inline void btrfs_descending_sort_devices(
+   struct btrfs_device_info *devices,
+   size_t nr_devices)
+{
+   sort(devices, nr_devices, sizeof(struct btrfs_device_info),
+btrfs_cmp_device_free_bytes, NULL);
+}
+
  /*
   * The helper to calc the free space on the devices that can be used to store
   * file data.
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f2d2f4c..a868ca4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -859,10 +859,7 @@ int find_free_dev_extent(struct btrfs_trans_handle *trans,
/* we don't want to overwrite the superblock on the drive,
 * so we make sure to start at an offset of at least 1MB
 */
-   search_start = 1024 * 1024;
-
-   if (root->fs_info->alloc_start + num_bytes<= search_end)
-   search_start = max(root->fs_info->alloc_start, search_start);
+   search_start = max(root->fs_info->alloc_start, 1024ull * 1024);

max_hole_start = search_start;
max_hole_size = 0;
@@ -2255,418 +2252,262 @@ static int btrfs_add_system_chunk(struct 
btrfs_trans_handle *trans,
return 0;
  }

-static noinline u64 chunk_bytes_by_type(u64 type, u64 calc_size,
-   int num_stripes, int sub_stripes)
+/*
+ * sort the devices in descending order by max_avail, total_avail
+ */
+static int btrfs_cmp_device_info(const void *a, const void *b)
  {
-   if (type&  (BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_DUP))
-   return calc_size;
-   else if (type&  BTRFS_BLOCK_GROUP_RAID10)
-   return calc_size * (num_stripes / sub_stripes);
-   else
-   return calc_size * num_stripes;
-}
+   const struct btrfs_device_info *di_a = a;
+   const struct btrfs_device_info *di_b = b;

-/* Used to sort the devices by max_avail(descending sort) */
-int btrfs_cmp_device_free_bytes(const void *dev_info1, const void *dev_info2)
-{
-   if (((struct btrfs_device_info *)dev_info1)->max_avail>
-   ((struct btrfs_device_info *)dev_info2)->max_avail)
+   if (di_a->max_avail>  di_b->max_avail)
return -1;
-   else if (((struct btrfs_device_info *)dev_info1)->max_avail<
-((struct btrfs_device_info *)dev_info2)->max_avail)
+   if (di_a->max_avail<  di_b->max_avail)
return 1;
-   else
-   return 0;
+   if (di_a->total_avail>  di_b->total_avail)
+   return -1;
+   if (di_a->total_avail<  di_b->total_avail)
+   return 1;
+   return 0;
  }

-static int __btrfs_calc_nstripes(struct btrfs_fs_devices *fs_devices, u64 type,
-   

Re: [PATCH] mark internal functions static

2011-04-11 Thread Josef Bacik

On 04/11/2011 11:40 AM, Daniel J Blueman wrote:

Hi Chris,

This didn't make it in before, so updating to 2.6.39-rc2 and resending:

Prevent needless exporting of internal functions from compilation
units by marking them static.



Looks like you have line wrapping on or something, the page looks 
mangled.  Thanks,


Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6.29-rc2] insert_dir_item hitting assertion during log replay

2011-04-11 Thread Daniel J Blueman
On 11 April 2011 23:32, Josef Bacik  wrote:
> On 04/10/2011 04:29 AM, Daniel J Blueman wrote:
>>
>> When rebooting from a crash, thus during log replay on 2.6.29-rc2,
>> btrfs_insert_dir_item caused an assertion failure [1]. The fs was
>> being mounted clear_cache on an SSD.
>>
>> Probably it's not so easy to reproduce, but better to report it...
>>
>
> Do you still have this fs, and does it still panic the same way on mount?
>  Thanks,

I still have this fs, though it didn't panic at next mount. I guess
this creates a case for cooking a script that eg logically disconnects
a block device during activity (hdparm or echo 1 >delete) then
reconnects it for remount...let me know if interested.

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix user annotation in ioctl.c

2011-04-11 Thread Josef Bacik

On 04/11/2011 11:56 AM, Daniel J Blueman wrote:

Fix address space annotation correct in ioctl.c.

Signed-off-by: Daniel J Blueman



Reviewed-by: Josef Bacik 

Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fix user annotation in ioctl.c

2011-04-11 Thread Daniel J Blueman
Fix address space annotation correct in ioctl.c.

Signed-off-by: Daniel J Blueman 

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index cfc264f..0474ec3 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2287,7 +2287,7 @@ long btrfs_ioctl_space_info(struct btrfs_root
*root, void __user *arg)
struct btrfs_ioctl_space_info space;
struct btrfs_ioctl_space_info *dest;
struct btrfs_ioctl_space_info *dest_orig;
-   struct btrfs_ioctl_space_info *user_dest;
+   struct btrfs_ioctl_space_info __user *user_dest;
struct btrfs_space_info *info;
u64 types[] = {BTRFS_BLOCK_GROUP_DATA,
   BTRFS_BLOCK_GROUP_SYSTEM,
@@ -2387,7 +2387,7 @@ long btrfs_ioctl_space_info(struct btrfs_root
*root, void __user *arg)
up_read(&info->groups_sem);
}

-   user_dest = (struct btrfs_ioctl_space_info *)
+   user_dest = (struct btrfs_ioctl_space_info __user *)
(arg + sizeof(struct btrfs_ioctl_space_args));

if (copy_to_user(user_dest, dest_orig, alloc_size))
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Re: btrfs does not work on usermode linux

2011-04-11 Thread Josef Bacik

On 04/10/2011 04:58 PM, Sergei Trofimovich wrote:

On Sun, 10 Apr 2011 23:24:03 +0300
Sergei Trofimovich  wrote:


Fix data corruption caused by memcpy() usage on overlapping data.
I've observed it first when found out usermode linux crash on btrfs.


Changes since v1:


else
src_kaddr = dst_kaddr;

+   BUG_ON(abs(src_off - dst_off)<  len);
memcpy(dst_kaddr + dst_off, src_kaddr + src_off, len);


Too eager BUG_ON. Now used only for src_page == dst_page.


-   if (dst_offset<  src_offset) {
+   if (abs(dst_offset - src_offset)>= len) {


abs() is not a good thing to use un unsigned values. aded helper 
overlapping_areas.



Very nice catch, one nit


if (dst_page != src_page)
src_kaddr = kmap_atomic(src_page, KM_USER1);
else
+   {
src_kaddr = dst_kaddr;
+   BUG_ON(areas_overlap(src_off, dst_off, len));
+   }

you will want to turn that into

if (dst_page != src_page) {
src_kaddr = kmap_atomic(src_page, KM_USER1);
} else {
src_kaddr = dst_kaddr;
BUG_ON(areas_overlap(src_off, dst_off, len));
}

Also maybe BUG_ON() is a little strong, since the kernel will do this 
right, it just screws up UML.  So maybe just do a WARN_ON() so we notice 
it.  Thanks,


Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mark internal functions static

2011-04-11 Thread Daniel J Blueman
Hi Chris,

This didn't make it in before, so updating to 2.6.39-rc2 and resending:

Prevent needless exporting of internal functions from compilation
units by marking them static.

Signed-off-by: Daniel J Blueman 

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 84d7ca1..6581b37 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -74,7 +74,7 @@ noinline void btrfs_set_path_blocking(struct btrfs_path *p)
  * retake all the spinlocks in the path.  You can safely use NULL
  * for held
  */
-noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
+static noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
struct extent_buffer *held)
 {
int i;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8f1d44b..59f2567 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2313,7 +2313,7 @@ static int write_dev_supers(struct btrfs_device *device,
return errors < i ? 0 : -1;
 }

-int write_all_supers(struct btrfs_root *root, int max_mirrors)
+static int write_all_supers(struct btrfs_root *root, int max_mirrors)
 {
struct list_head *head;
struct btrfs_device *dev;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f619c3c..5394255 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -75,7 +75,7 @@ static int block_group_bits(struct
btrfs_block_group_cache *cache, u64 bits)
return (cache->flags & bits) == bits;
 }

-void btrfs_get_block_group(struct btrfs_block_group_cache *cache)
+static void btrfs_get_block_group(struct btrfs_block_group_cache *cache)
 {
atomic_inc(&cache->count);
 }
@@ -3586,7 +3586,7 @@ static void block_rsv_add_bytes(struct
btrfs_block_rsv *block_rsv,
spin_unlock(&block_rsv->lock);
 }

-void block_rsv_release_bytes(struct btrfs_block_rsv *block_rsv,
+static void block_rsv_release_bytes(struct btrfs_block_rsv *block_rsv,
 struct btrfs_block_rsv *dest, u64 num_bytes)
 {
struct btrfs_space_info *space_info = block_rsv->space_info;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6541339..6370184 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5305,7 +5305,7 @@ out:
return em;
 }

-struct extent_map *btrfs_get_extent_fiemap(struct inode *inode,
struct page *page,
+static struct extent_map *btrfs_get_extent_fiemap(struct inode
*inode, struct page *page,
   size_t pg_offset, u64 start, u64 len,
   int create)
 {
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index cfc264f..de76a6d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2281,7 +2281,7 @@ static void get_block_group_info(struct
list_head *groups_list,
}
 }

-long btrfs_ioctl_space_info(struct btrfs_root *root, void __user *arg)
+static long btrfs_ioctl_space_info(struct btrfs_root *root, void __user *arg)
 {
struct btrfs_ioctl_space_args space_args;
struct btrfs_ioctl_space_info space;
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6.29-rc2] insert_dir_item hitting assertion during log replay

2011-04-11 Thread Josef Bacik

On 04/10/2011 04:29 AM, Daniel J Blueman wrote:

When rebooting from a crash, thus during log replay on 2.6.29-rc2,
btrfs_insert_dir_item caused an assertion failure [1]. The fs was
being mounted clear_cache on an SSD.

Probably it's not so easy to reproduce, but better to report it...



Do you still have this fs, and does it still panic the same way on 
mount?  Thanks,


Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Error Handling Failed Devices

2011-04-11 Thread Ian Celing
I've been testing BTRFS and by-and-large the file system runs well. The 
most troubling issue I've run into so far is the inability to 
consistently recover from a failed device. If I simulate a failed device 
and try to balance media across the remaining drives I never get a 
consistent result.


#Sometimes I am able to delete the failed device and re-balance on the 
remaining drives but then the data doesn't balance properly and is 
apparently lost.


#Sometimes I am unable to delete the failed device and I am stuck with 
"Device Missing" message with unaccessible data on that drive.


#Sometimes I am able to delete the failed device. Add a new device. 
Re-balance, but the new array won't mount because of "wrong fs type, bad 
option, bad superblock on /dev/sdx".


I am using the latest kernel as of 2011-04-06. I read in earlier 
messages that this functionality isn't quite working yet as planned - 
That deleting devices really only serves to remove a 'working' disk for 
use elsewhere. I don't find a lot of feedback on this seemingly huge 
problem with the file system.



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balancing start - and stop?

2011-04-11 Thread Helmut Hullen
Hallo, Stephane,

Du meintest am 11.04.11:

>> [780504.726067] btrfs: relocating block group 3917710622720 flags 20
> [...]
>> At this rate, the balancing would be over in about 8 years.
> [...]

> Hurray! The btrfs balance eventually ran through after almost exactly
> 2 weeks. It didn't get down to 0:

Congratulations!

> There hasn't been any change in allocation though:

> # df -h /backup
> FilesystemSize  Used Avail Use% Mounted on
> /dev/sda4 8.2T  3.5T  3.2T  53% /backup

> # btrfs fi df /backup
> Data, RAID0: total=3.42TB, used=3.41TB
> System, RAID1: total=16.00MB, used=228.00KB
> Metadata, RAID1: total=28.00GB, used=20.47GB

> Still 1.5TB missing.

Seems to be the same problem I've just mourned about.

Just expand "available" to "at least available". And ignore the value in  
"Data, RAID0: total=3.42TB".

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wrong values in "df" and "btrfs filesystem df"

2011-04-11 Thread Helmut Hullen
Hallo, Stephane,

Du meintest am 11.04.11:

> What's the implication of having disks of differing sizes? Does
> that mean that the extra space on larger disks is lost?

Seems to work.
I've tried:

/dev/sda   140 GByte
/dev/sdb   140 GByte
/dev/sdc70 GByte

mkfs.btrfs -d raid0 -m raid1 /dev/sdb1 /dev/sdc1

mounted, more than 140 GByte free

Filled with more than 140 GByte (/dev/sdc1 was full to the brim)

btrfs device add /dev/sda1 ...
btrfs filesystem balance ...

Needed many hours, but then more than 210 GByte were usable.

Filled up to about 220 GByte; /dev/sdc1 was again full to the brim

btrfs device delete /dev/sdc1
umount
mount

All looks as expected, only the 2 bigger devices are seen, and they  
contain the expected files.

And that looks good: my major interest in btrfs is working in that way -  
adding a bigger device, deleting a smaller device.

Kernel 2.6.38.1
btrfs from november 2010


Only the values shown with "df" and "btrfs filesystem df" need getting  
used to; maybe "available" has to be seen as "at least available".

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/8] btrfs: Balance progress monitoring

2011-04-11 Thread Hugo Mills
On Mon, Apr 11, 2011 at 07:34:00AM +0200, Helmut Hullen wrote:
> Hallo, Hugo,
> 
> Du meintest am 10.04.11:
> 
> > This patch introduces a basic form of progress monitoring for balance
> > operations, by counting the number of block groups remaining. The
> > information is exposed to userspace by an ioctl.
> 
> Just for curiosity:
> 
> If I remember correct then "btrfs device delete" shows growing and  
> shrinking numbers, resp. on the remaining and on the deleting  
> partition(s).
> 
> Can this patch show the remaining number of block groups too?

   This patch doesn't touch the code in device delete, but once this
patch series is finished and accepted, I'll take a look. We can
probably share a good chunk of the code between the two.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- "How deep will this sub go?" "Oh,  she'll go all the way to ---   
the bottom if we don't stop her."


signature.asc
Description: Digital signature


Re: btrfs balancing start - and stop?

2011-04-11 Thread Stephane Chazelas
2011-04-06 12:43:50 +0100, Stephane Chazelas:
[...]
> The rate is going down. It's now down to about 14kB/s
> 
> [658654.295752] btrfs: relocating block group 3919858106368 flags 20
> [671932.913235] btrfs: relocating block group 3919589670912 flags 20
> [686189.296126] btrfs: relocating block group 3919321235456 flags 20
> [701511.523990] btrfs: relocating block group 391905280 flags 20
> [718591.316339] btrfs: relocating block group 3918784364544 flags 20
> [725567.081031] btrfs: relocating block group 3918515929088 flags 20
> [744415.011581] btrfs: relocating block group 3918247493632 flags 20
> [762365.021458] btrfs: relocating block group 3917979058176 flags 20
> [780504.726067] btrfs: relocating block group 3917710622720 flags 20
[...]
> At this rate, the balancing would be over in about 8 years.
[...]

Hurray! The btrfs balance eventually ran through after almost exactly 2 weeks.
It didn't get down to 0:

[1189505.152717] btrfs: found 60527 extents
[1189505.440565] btrfs: relocating block group 3910731300864 flags 20
[1199805.071045] btrfs: found 60235 extents
[1199805.447821] btrfs: relocating block group 3910462865408 flags 20
[1207914.737372] btrfs: found 58039 extents

iostat reckons 9TB have been written to  disk in the whole
process (4.5TB read from them (!?)).

There hasn't been any change in allocation though:

# df -h /backup
FilesystemSize  Used Avail Use% Mounted on
/dev/sda4 8.2T  3.5T  3.2T  53% /backup
# btrfs fi df /backup
Data, RAID0: total=3.42TB, used=3.41TB
System, RAID1: total=16.00MB, used=228.00KB
Metadata, RAID1: total=28.00GB, used=20.47GB
# btrfs fi show
Label: none  uuid: a0ae35c4-51f2-405f-a4bb-e4f134b1d193
Total devices 3 FS bytes used 3.43TB
devid4 size 2.73TB used 1.17TB path /dev/sdc
devid3 size 2.73TB used 1.17TB path /dev/sdb
devid2 size 2.70TB used 1.14TB path /dev/sda4

Btrfs Btrfs v0.19

Still 1.5TB missing.

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wrong values in "df" and "btrfs filesystem df"

2011-04-11 Thread Arne Jansen
On 11.04.2011 09:29, Stephane Chazelas wrote:
> 2011-04-10 18:13:51 +0800, Miao Xie:
> [...]

> 
> What's the implication of having disks of differing sizes? Does
> that mean that the extra space on larger disks is lost?

Yes. Currently the allocator cannot handle different sizes well,
especially when mirroring is involved. I sent a patch for this
to the list some weeks ago (see quasi-round-robin), but it hasn't
been merged yet.

-Arne

> 
> Thanks,
> Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wrong values in "df" and "btrfs filesystem df"

2011-04-11 Thread Stephane Chazelas
2011-04-10 18:13:51 +0800, Miao Xie:
[...]
> >> # df /srv/MM
> >>
> >> Filesystem   1K-blocks  Used Available Use% Mounted on
> >> /dev/sdd15846053400 1593436456 2898463184  36% /srv/MM
> >>
> >> # btrfs filesystem df /srv/MM
> >>
> >> Data, RAID0: total=1.67TB, used=1.48TB
> >> System, RAID1: total=16.00MB, used=112.00KB
> >> System: total=4.00MB, used=0.00
> >> Metadata, RAID1: total=3.75GB, used=2.26GB
> >>
> >> # btrfs-show
> >>
> >> Label: MMedia  uuid: 120b036a-883f-46aa-bd9a-cb6a1897c8d2
> >>Total devices 3 FS bytes used 1.48TB
> >>devid3 size 1.81TB used 573.76GB path /dev/sdb1
> >>devid2 size 1.81TB used 573.77GB path /dev/sde1
> >>devid1 size 1.82TB used 570.01GB path /dev/sdd1
> >>
> >> Btrfs Btrfs v0.19
> >>
> >> 
> >>
> >> "df" shows an "Available" value which isn't related to any real value.  
> > 
> >I _think_ that value is the amount of space not allocated to any
> > block group. If that's so, then Available (from df) plus the three
> > "total" values (from btrfs fi df) should equal the size value from df.
> 
> This value excludes the space that can not be allocated to any block group,
> This feature was implemented to fix the bug df command add the disk space, 
> which
> can not be allocated to any block group forever, into the "Available" value.
> (see the changelog of the commit 6d07bcec969af335d4e35b3921131b7929bd634e)
> 
> This implementation just like fake chunk allocation, but the fake allocation
> just allocate the space from two of these three disks, doesn't spread the
> stripes over all the disks, which has enough space.
[...]

Hi Miao,

would you care to expand a bit on that. In Helmut's case above
where all the drives have at least 1.2TB free, how would there
be un-allocatable space?

What's the implication of having disks of differing sizes? Does
that mean that the extra space on larger disks is lost?

Thanks,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html