Re: Unmountable btrfs - part II

2011-08-08 Thread Jan Schubert
Maciej Piechotka uzytkownik2 at gmail.com writes:

 btrfsck: root-tree.c:46: btrfs_find_last_root: Assertion
 `!(path-slots[0] == 0)' failed.

Maciej, I see such Assertion failed messages usually while playing around 
with 
different kernels and different versions of btrfs-progs. Please make sure they 
match, meaning btrfs-progs compiled using the currently running kernel.

Just an idea,
Jan

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: space cache generation (...) does not match inode (...)

2011-08-08 Thread Josef Bacik
On 08/06/2011 10:16 PM, Andrew Lutomirski wrote:
 I've always gotten space cache generation warnings, but some time
 after 3.0 they started going nuts.  I get:
 
 space cache generation (14667727114112179905) does not match inode (154185)
 
 and other similar messages (with a huge number and a smaller number)
 at rates higher than one message per ms.  They don't happen
 constantly, but they come in bursts big enough to fill my log buffer.
 

Yeah sorry that's going to happen when you first switch to 3.0.  We
switched the space cache stuff over to using the normal checksumming
code so all old space cache is going to look invalid.  This is nothing
to worry about, it will just end up discarded and re-generated.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: space cache generation (...) does not match inode (...)

2011-08-08 Thread Andrew Lutomirski
On Mon, Aug 8, 2011 at 8:14 AM, Josef Bacik jo...@redhat.com wrote:
 On 08/06/2011 10:16 PM, Andrew Lutomirski wrote:
 I've always gotten space cache generation warnings, but some time
 after 3.0 they started going nuts.  I get:

 space cache generation (14667727114112179905) does not match inode (154185)

 and other similar messages (with a huge number and a smaller number)
 at rates higher than one message per ms.  They don't happen
 constantly, but they come in bursts big enough to fill my log buffer.


 Yeah sorry that's going to happen when you first switch to 3.0.  We
 switched the space cache stuff over to using the normal checksumming
 code so all old space cache is going to look invalid.  This is nothing
 to worry about, it will just end up discarded and re-generated.  Thanks,

Can you put in a rate limit and make the message less alarming?
There's enough log spam from it that I can't see anything else in my
log.

--Andy


 Josef

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: space cache generation (...) does not match inode (...)

2011-08-08 Thread Josef Bacik
On 08/08/2011 08:17 AM, Andrew Lutomirski wrote:
 On Mon, Aug 8, 2011 at 8:14 AM, Josef Bacik jo...@redhat.com wrote:
 On 08/06/2011 10:16 PM, Andrew Lutomirski wrote:
 I've always gotten space cache generation warnings, but some time
 after 3.0 they started going nuts.  I get:

 space cache generation (14667727114112179905) does not match inode (154185)

 and other similar messages (with a huge number and a smaller number)
 at rates higher than one message per ms.  They don't happen
 constantly, but they come in bursts big enough to fill my log buffer.


 Yeah sorry that's going to happen when you first switch to 3.0.  We
 switched the space cache stuff over to using the normal checksumming
 code so all old space cache is going to look invalid.  This is nothing
 to worry about, it will just end up discarded and re-generated.  Thanks,
 
 Can you put in a rate limit and make the message less alarming?
 There's enough log spam from it that I can't see anything else in my
 log.
 

Yeah I'll do that now, thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: ratelimit the generation printk for the free space cache

2011-08-08 Thread Josef Bacik
A user reported getting spammed when moving to 3.0 by this message.  Since we
switched to the normal checksumming infrastructure all old free space caches
will be wrong and need to be regenerated so people are likely to see this
message a lot, so ratelimit it so it doesn't fill up their logs and freak them
out.  Thanks,

Reported-by: Andrew Lutomirski l...@mit.edu
Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/free-space-cache.c |   12 +++-
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 85bbac9..44a6323 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -20,6 +20,7 @@
 #include linux/sched.h
 #include linux/slab.h
 #include linux/math64.h
+#include linux/ratelimit.h
 #include ctree.h
 #include free-space-cache.h
 #include transaction.h
@@ -337,11 +338,12 @@ int __load_free_space_cache(struct btrfs_root *root, 
struct inode *inode,
 
gen = addr;
if (*gen != BTRFS_I(inode)-generation) {
-   printk(KERN_ERR btrfs: space cache generation
-   (%llu) does not match inode (%llu)\n,
-  (unsigned long long)*gen,
-  (unsigned long long)
-  BTRFS_I(inode)-generation);
+   printk_ratelimited(KERN_ERR btrfs: space cache
+generation (%llu) does not match 
+   inode (%llu)\n,
+   (unsigned long long)*gen,
+   (unsigned long long)
+   BTRFS_I(inode)-generation);
kunmap(page);
unlock_page(page);
page_cache_release(page);
-- 
1.7.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix an oops of log replay

2011-08-08 Thread Andy Lutomirski

On 08/06/2011 04:35 AM, Liu Bo wrote:

When btrfs recovers from a crash, it may hit the oops below:

[ cut here ]
kernel BUG at fs/btrfs/inode.c:4580!
[...]
RIP: 0010:[a03df251]  [a03df251] btrfs_add_link+0x161/0x1c0 
[btrfs]
[...]
Call Trace:
  [a03e7b31] ? btrfs_inode_ref_index+0x31/0x80 [btrfs]
  [a04054e9] add_inode_ref+0x319/0x3f0 [btrfs]
  [a0407087] replay_one_buffer+0x2c7/0x390 [btrfs]
  [a040444a] walk_down_log_tree+0x32a/0x480 [btrfs]
  [a0404695] walk_log_tree+0xf5/0x240 [btrfs]
  [a0406cc0] btrfs_recover_log_trees+0x250/0x350 [btrfs]
  [a0406dc0] ? btrfs_recover_log_trees+0x350/0x350 [btrfs]
  [a03d18b2] open_ctree+0x1442/0x17d0 [btrfs]
[...]

This comes from that while replaying an inode ref item, we forget to
check those old conflicting DIR_ITEM and DIR_INDEX items in fs/file tree,
then we will come to conflict corners which lead to BUG_ON().

Signed-off-by: Liu Boliubo2...@cn.fujitsu.com
---
  fs/btrfs/tree-log.c |   28 
  1 files changed, 24 insertions(+), 4 deletions(-)


This fixes the oops for me.  The bug was a regression in 2.6.39, I believe.

Tested-by: Andy Lutomirski l...@mit.edu

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: set i_size properly when fallocating and we already have an extent

2011-08-08 Thread Josef Bacik
xfstests exposed a problem with preallocate when it fallocates a range that
already has an extent.  We don't set the new i_size properly because we see that
we already have an extent.  This isn't right and we should update i_size if the
space already exists.  With this patch we now pass xfstests 075.  Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/file.c |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 658d669..5f8264a 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1638,11 +1638,15 @@ static long btrfs_fallocate(struct file *file, int mode,
 
cur_offset = alloc_start;
while (1) {
+   u64 actual_end;
+
em = btrfs_get_extent(inode, NULL, 0, cur_offset,
  alloc_end - cur_offset, 0);
BUG_ON(IS_ERR_OR_NULL(em));
last_byte = min(extent_map_end(em), alloc_end);
+   actual_end = min_t(u64, extent_map_end(em), offset + len);
last_byte = (last_byte + mask)  ~mask;
+
if (em-block_start == EXTENT_MAP_HOLE ||
(cur_offset = inode-i_size 
 !test_bit(EXTENT_FLAG_PREALLOC, em-flags))) {
@@ -1655,6 +1659,16 @@ static long btrfs_fallocate(struct file *file, int mode,
free_extent_map(em);
break;
}
+   } else if (actual_end  inode-i_size 
+  !(mode  FALLOC_FL_KEEP_SIZE)) {
+   /*
+* We didn't need to allocate any more space, but we
+* still extended the size of the file so we need to
+* update i_size.
+*/
+   inode-i_ctime = CURRENT_TIME;
+   i_size_write(inode, actual_end);
+   btrfs_ordered_update_i_size(inode, actual_end, NULL);
}
free_extent_map(em);
 
-- 
1.7.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: Don't BUG_ON kzalloc error in btrfs_lookup_csums_range()

2011-08-08 Thread Mark Fasheh
Unfortunately it isn't enough to just exit here - the kzalloc() happens in a
loop and the allocated items are added to a linked list whose head is passed
in from the caller.

To fix the BUG_ON() and also provide the semantic that the list passed in is
only modified on success, I create function-local temporary list that we add
items too. If no error is met, that list is spliced to the callers at the
end of the function. Otherwise the list will be walked and all items freed
before the error value is returned.

I did a simple test on this patch by forcing an error at the kzalloc() point
and verifying that when this hits (git clone seemed to exercise this), the
function throws the proper error. Unfortunately but predictably, we later
hit a BUG_ON(ret) type line that still hasn't been fixed up ;)

Signed-off-by: Mark Fasheh mfas...@suse.com
---
 fs/btrfs/file-item.c |   15 +--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index b910694..679fbff 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -284,6 +284,7 @@ int btrfs_lookup_csums_range(struct btrfs_root *root, u64 
start, u64 end,
struct btrfs_ordered_sum *sums;
struct btrfs_sector_sum *sector_sum;
struct btrfs_csum_item *item;
+   LIST_HEAD(tmplist);
unsigned long offset;
int ret;
size_t size;
@@ -358,7 +359,10 @@ int btrfs_lookup_csums_range(struct btrfs_root *root, u64 
start, u64 end,
MAX_ORDERED_SUM_BYTES(root));
sums = kzalloc(btrfs_ordered_sum_size(root, size),
GFP_NOFS);
-   BUG_ON(!sums);
+   if (!sums) {
+   ret = -ENOMEM;
+   goto fail;
+   }
 
sector_sum = sums-sums;
sums-bytenr = start;
@@ -380,12 +384,19 @@ int btrfs_lookup_csums_range(struct btrfs_root *root, u64 
start, u64 end,
offset += csum_size;
sector_sum++;
}
-   list_add_tail(sums-list, list);
+   list_add_tail(sums-list, tmplist);
}
path-slots[0]++;
}
ret = 0;
 fail:
+   while (ret  0  !list_empty(tmplist)) {
+   sums = list_entry(tmplist, struct btrfs_ordered_sum, list);
+   list_del(sums-list);
+   kfree(sums);
+   }
+   list_splice_tail(tmplist, list);
+
btrfs_free_path(path);
return ret;
 }
-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: kill unused parts of block_rsv

2011-08-08 Thread Josef Bacik
The priority and refill_used flags are not used anymore, and neither is the
usage counter, so just remove them from btrfs_block_rsv.

Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/ctree.h   |3 ---
 fs/btrfs/extent-tree.c |   23 ++-
 fs/btrfs/relocation.c  |2 --
 3 files changed, 6 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 6071dab..edc1cf0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -774,9 +774,6 @@ struct btrfs_block_rsv {
u64 reserved;
struct btrfs_space_info *space_info;
spinlock_t lock;
-   atomic_t usage;
-   unsigned int priority:8;
-   unsigned int refill_used:1;
unsigned int full:1;
 };
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index fc2686c..e01cd8c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3659,8 +3659,6 @@ void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv)
 {
memset(rsv, 0, sizeof(*rsv));
spin_lock_init(rsv-lock);
-   atomic_set(rsv-usage, 1);
-   rsv-priority = 6;
 }
 
 struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root)
@@ -3681,10 +3679,8 @@ struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct 
btrfs_root *root)
 void btrfs_free_block_rsv(struct btrfs_root *root,
  struct btrfs_block_rsv *rsv)
 {
-   if (rsv  atomic_dec_and_test(rsv-usage)) {
-   btrfs_block_rsv_release(root, rsv, (u64)-1);
-   kfree(rsv);
-   }
+   btrfs_block_rsv_release(root, rsv, (u64)-1);
+   kfree(rsv);
 }
 
 int btrfs_block_rsv_add(struct btrfs_trans_handle *trans,
@@ -3734,13 +3730,10 @@ int btrfs_block_rsv_check(struct btrfs_trans_handle 
*trans,
if (!ret)
return 0;
 
-   if (block_rsv-refill_used) {
-   ret = reserve_metadata_bytes(trans, root, block_rsv,
-num_bytes, 0);
-   if (!ret) {
-   block_rsv_add_bytes(block_rsv, num_bytes, 0);
-   return 0;
-   }
+   ret = reserve_metadata_bytes(trans, root, block_rsv, num_bytes, 0);
+   if (!ret) {
+   block_rsv_add_bytes(block_rsv, num_bytes, 0);
+   return 0;
}
 
if (commit_trans) {
@@ -3859,16 +3852,12 @@ static void init_global_block_rsv(struct btrfs_fs_info 
*fs_info)
 
space_info = __find_space_info(fs_info, BTRFS_BLOCK_GROUP_SYSTEM);
fs_info-chunk_block_rsv.space_info = space_info;
-   fs_info-chunk_block_rsv.priority = 10;
 
space_info = __find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
fs_info-global_block_rsv.space_info = space_info;
-   fs_info-global_block_rsv.priority = 10;
-   fs_info-global_block_rsv.refill_used = 1;
fs_info-delalloc_block_rsv.space_info = space_info;
fs_info-trans_block_rsv.space_info = space_info;
fs_info-empty_block_rsv.space_info = space_info;
-   fs_info-empty_block_rsv.priority = 10;
 
fs_info-extent_root-block_rsv = fs_info-global_block_rsv;
fs_info-csum_root-block_rsv = fs_info-global_block_rsv;
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 545b043..aeaed99 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3650,8 +3650,6 @@ int prepare_to_relocate(struct reloc_control *rc)
if (ret)
return ret;
 
-   rc-block_rsv-refill_used = 1;
-
memset(rc-cluster, 0, sizeof(rc-cluster));
rc-search_start = rc-block_group-key.objectid;
rc-extents_found = 0;
-- 
1.7.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: don't try to commit in btrfs_block_rsv_check

2011-08-08 Thread Josef Bacik
We will try and reserve metadata bytes in btrfs_block_rsv_check and if we cannot
because we have a transaction open it will return EAGAIN, so we do not need to
try and commit the transaction again.

Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/extent-tree.c |   29 -
 1 files changed, 4 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e01cd8c..9e602d8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3708,7 +3708,6 @@ int btrfs_block_rsv_check(struct btrfs_trans_handle 
*trans,
  u64 min_reserved, int min_factor)
 {
u64 num_bytes = 0;
-   int commit_trans = 0;
int ret = -ENOSPC;
 
if (!block_rsv)
@@ -3720,13 +3719,12 @@ int btrfs_block_rsv_check(struct btrfs_trans_handle 
*trans,
if (min_reserved  num_bytes)
num_bytes = min_reserved;
 
-   if (block_rsv-reserved = num_bytes) {
+   if (block_rsv-reserved = num_bytes)
ret = 0;
-   } else {
+   else
num_bytes -= block_rsv-reserved;
-   commit_trans = 1;
-   }
spin_unlock(block_rsv-lock);
+
if (!ret)
return 0;
 
@@ -3736,26 +3734,7 @@ int btrfs_block_rsv_check(struct btrfs_trans_handle 
*trans,
return 0;
}
 
-   if (commit_trans) {
-   struct btrfs_space_info *sinfo = block_rsv-space_info;
-
-   if (trans)
-   return -EAGAIN;
-
-   spin_lock(sinfo-lock);
-   if (sinfo-bytes_pinned  num_bytes) {
-   spin_unlock(sinfo-lock);
-   return -ENOSPC;
-   }
-   spin_unlock(sinfo-lock);
-
-   trans = btrfs_join_transaction(root);
-   BUG_ON(IS_ERR(trans));
-   ret = btrfs_commit_transaction(trans, root);
-   return 0;
-   }
-
-   return -ENOSPC;
+   return ret;
 }
 
 int btrfs_block_rsv_migrate(struct btrfs_block_rsv *src_rsv,
-- 
1.7.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: optimize how we account for space in truncate

2011-08-08 Thread Josef Bacik
Currently we're starting and stopping a transaction for no real reason, so kill
that and just reserve enough space as if we can truncate all in one transaction.
Also use btrfs_block_rsv_check() for our reserve to minimize the amount of space
we may have to allocate for our slack space.  Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/inode.c |   58 +++---
 1 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4aa4ea9..808ad07 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6452,6 +6452,7 @@ static int btrfs_truncate(struct inode *inode)
struct btrfs_trans_handle *trans;
unsigned long nr;
u64 mask = root-sectorsize - 1;
+   u64 min_size = btrfs_calc_trans_metadata_size(root, 2);
 
ret = btrfs_truncate_page(inode-i_mapping, inode-i_size);
if (ret)
@@ -6500,17 +6501,21 @@ static int btrfs_truncate(struct inode *inode)
if (!rsv)
return -ENOMEM;
 
-   trans = btrfs_start_transaction(root, 4);
+   /*
+* 2 for the truncate slack space
+* 1 for the orphan item we're going to add
+* 1 for the orphan item deletion
+* 1 for updating the inode.
+*/
+   trans = btrfs_start_transaction(root, 5);
if (IS_ERR(trans)) {
err = PTR_ERR(trans);
goto out;
}
 
-   /*
-* Reserve space for the truncate process.  Truncate should be adding
-* space, but if there are snapshots it may end up using space.
-*/
-   ret = btrfs_truncate_reserve_metadata(trans, root, rsv);
+   /* Migrate the slack space for the truncate to our reserve */
+   ret = btrfs_block_rsv_migrate(root-fs_info-trans_block_rsv, rsv,
+ min_size);
BUG_ON(ret);
 
ret = btrfs_orphan_add(trans, inode);
@@ -6519,21 +6524,6 @@ static int btrfs_truncate(struct inode *inode)
goto out;
}
 
-   nr = trans-blocks_used;
-   btrfs_end_transaction(trans, root);
-   btrfs_btree_balance_dirty(root, nr);
-
-   /*
-* Ok so we've already migrated our bytes over for the truncate, so here
-* just reserve the one slot we need for updating the inode.
-*/
-   trans = btrfs_start_transaction(root, 1);
-   if (IS_ERR(trans)) {
-   err = PTR_ERR(trans);
-   goto out;
-   }
-   trans-block_rsv = rsv;
-
/*
 * setattr is responsible for setting the ordered_data_close flag,
 * but that is only tested during the last file release.  That
@@ -6555,20 +6545,30 @@ static int btrfs_truncate(struct inode *inode)
btrfs_add_ordered_operation(trans, root, inode);
 
while (1) {
+   ret = btrfs_block_rsv_check(trans, root, rsv, min_size, 0);
+   if (ret) {
+   /*
+* This can only happen with the original transaction we
+* started above, every other time we shouldn't have a
+* transaction started yet.
+*/
+   if (ret == -EAGAIN)
+   goto end_trans;
+   err = ret;
+   break;
+   }
+
if (!trans) {
-   trans = btrfs_start_transaction(root, 3);
+   /* Just need the 1 for updating the inode */
+   trans = btrfs_start_transaction(root, 1);
if (IS_ERR(trans)) {
err = PTR_ERR(trans);
goto out;
}
-
-   ret = btrfs_truncate_reserve_metadata(trans, root,
- rsv);
-   BUG_ON(ret);
-
-   trans-block_rsv = rsv;
}
 
+   trans-block_rsv = rsv;
+
ret = btrfs_truncate_inode_items(trans, root, inode,
 inode-i_size,
 BTRFS_EXTENT_DATA_KEY);
@@ -6583,7 +6583,7 @@ static int btrfs_truncate(struct inode *inode)
err = ret;
break;
}
-
+end_trans:
nr = trans-blocks_used;
btrfs_end_transaction(trans, root);
trans = NULL;
-- 
1.7.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs slowdown

2011-08-08 Thread Sage Weil
Hi Christian,

Are you still seeing this slowness?

sage


On Wed, 27 Jul 2011, Christian Brunner wrote:
 2011/7/25 Chris Mason chris.ma...@oracle.com:
  Excerpts from Christian Brunner's message of 2011-07-25 03:54:47 -0400:
  Hi,
 
  we are running a ceph cluster with btrfs as it's base filesystem
  (kernel 3.0). At the beginning everything worked very well, but after
  a few days (2-3) things are getting very slow.
 
  When I look at the object store servers I see heavy disk-i/o on the
  btrfs filesystems (disk utilization is between 60% and 100%). I also
  did some tracing on the Cepp-Object-Store-Daemon, but I'm quite
  certain, that the majority of the disk I/O is not caused by ceph or
  any other userland process.
 
  When reboot the system(s) the problems go away for another 2-3 days,
  but after that, it starts again. I'm not sure if the problem is
  related to the kernel warning I've reported last week. At least there
  is no temporal relationship between the warning and the slowdown.
 
  Any hints on how to trace this would be welcome.
 
  The easiest way to trace this is with latencytop.
 
  Apply this patch:
 
  http://oss.oracle.com/~mason/latencytop.patch
 
  And then use latencytop -c for a few minutes while the system is slow.
  Send the output here and hopefully we'll be able to figure it out.
 
 I've now installed latencytop. Attached are two output files: The
 first is from yesterday and was created aproxematly half an hour after
 the boot. The second on is from today, uptime is 19h. The load on the
 system is already rising. Disk utilization is approximately at 50%.
 
 Thanks for your help.
 
 Christian
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: “bio too big” regression and silent data corruption in 3.0

2011-08-08 Thread Alexandre Oliva
On Aug  7, 2011, Alexandre Oliva ol...@lsd.ic.unicamp.br wrote:

 tl;dr version: 3.0 produces “bio too big” dmesg entries and silently
 corrupts data in “meta-raid1/data-single” configurations on disks with
 different max_hw_sectors, where 2.6.38 worked fine.

FWIW, I just got the same problem with 2.6.38.  No idea how I hadn't hit
it before, but it's not a 3.0 regression, just a regular (but IMHO very
serious) bug.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: make insert_ptr() void

2011-08-08 Thread Mark Fasheh
insert_ptr() always returns zero, so all the exta error handling can go
away.  This makes it trivial to also make copy_for_split() a void function
as it's only return was from insert_ptr(). Finally, this all makes the
BUG_ON(ret) in split_leaf() meaningless so I removed that.

Signed-off-by: Mark Fasheh mfas...@suse.de
---
 fs/btrfs/ctree.c |   59 -
 1 files changed, 18 insertions(+), 41 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 011cab3..41605ac 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2123,12 +2123,10 @@ static noinline int insert_new_root(struct 
btrfs_trans_handle *trans,
  *
  * slot and level indicate where you want the key to go, and
  * blocknr is the block the key points to.
- *
- * returns zero on success and  0 on any error
  */
-static int insert_ptr(struct btrfs_trans_handle *trans, struct btrfs_root
- *root, struct btrfs_path *path, struct btrfs_disk_key
- *key, u64 bytenr, int slot, int level)
+static void insert_ptr(struct btrfs_trans_handle *trans, struct btrfs_root
+  *root, struct btrfs_path *path, struct btrfs_disk_key
+  *key, u64 bytenr, int slot, int level)
 {
struct extent_buffer *lower;
int nritems;
@@ -2152,7 +2150,6 @@ static int insert_ptr(struct btrfs_trans_handle *trans, 
struct btrfs_root
btrfs_set_node_ptr_generation(lower, slot, trans-transid);
btrfs_set_header_nritems(lower, nritems + 1);
btrfs_mark_buffer_dirty(lower);
-   return 0;
 }
 
 /*
@@ -2173,7 +2170,6 @@ static noinline int split_node(struct btrfs_trans_handle 
*trans,
struct btrfs_disk_key disk_key;
int mid;
int ret;
-   int wret;
u32 c_nritems;
 
c = path-nodes[level];
@@ -2230,11 +2226,8 @@ static noinline int split_node(struct btrfs_trans_handle 
*trans,
btrfs_mark_buffer_dirty(c);
btrfs_mark_buffer_dirty(split);
 
-   wret = insert_ptr(trans, root, path, disk_key, split-start,
- path-slots[level + 1] + 1,
- level + 1);
-   if (wret)
-   ret = wret;
+   insert_ptr(trans, root, path, disk_key, split-start,
+  path-slots[level + 1] + 1, level + 1);
 
if (path-slots[level] = mid) {
path-slots[level] -= mid;
@@ -2724,18 +2717,16 @@ out:
  *
  * returns 0 if all went well and  0 on failure.
  */
-static noinline int copy_for_split(struct btrfs_trans_handle *trans,
-  struct btrfs_root *root,
-  struct btrfs_path *path,
-  struct extent_buffer *l,
-  struct extent_buffer *right,
-  int slot, int mid, int nritems)
+static noinline void copy_for_split(struct btrfs_trans_handle *trans,
+   struct btrfs_root *root,
+   struct btrfs_path *path,
+   struct extent_buffer *l,
+   struct extent_buffer *right,
+   int slot, int mid, int nritems)
 {
int data_copy_size;
int rt_data_off;
int i;
-   int ret = 0;
-   int wret;
struct btrfs_disk_key disk_key;
 
nritems = nritems - mid;
@@ -2763,12 +2754,9 @@ static noinline int copy_for_split(struct 
btrfs_trans_handle *trans,
}
 
btrfs_set_header_nritems(l, mid);
-   ret = 0;
btrfs_item_key(right, disk_key, 0);
-   wret = insert_ptr(trans, root, path, disk_key, right-start,
- path-slots[1] + 1, 1);
-   if (wret)
-   ret = wret;
+   insert_ptr(trans, root, path, disk_key, right-start,
+  path-slots[1] + 1, 1);
 
btrfs_mark_buffer_dirty(right);
btrfs_mark_buffer_dirty(l);
@@ -2786,8 +2774,6 @@ static noinline int copy_for_split(struct 
btrfs_trans_handle *trans,
}
 
BUG_ON(path-slots[0]  0);
-
-   return ret;
 }
 
 /*
@@ -2976,12 +2962,8 @@ again:
if (split == 0) {
if (mid = slot) {
btrfs_set_header_nritems(right, 0);
-   wret = insert_ptr(trans, root, path,
- disk_key, right-start,
- path-slots[1] + 1, 1);
-   if (wret)
-   ret = wret;
-
+   insert_ptr(trans, root, path, disk_key, right-start,
+  path-slots[1] + 1, 1);
btrfs_tree_unlock(path-nodes[0]);
free_extent_buffer(path-nodes[0]);
path-nodes[0] = right;
@@ -2989,12 +2971,8 @@ again:
path-slots[1] += 1;
} else {
   

Re: “bio too big” regression and silent data corruption in 3.0

2011-08-08 Thread Alexandre Oliva
On Aug  7, 2011, Alexandre Oliva ol...@lsd.ic.unicamp.br wrote:

 2. Removing a partition from the filesystem (say, the external disk)
 didn't relocate “single” block groups as such to other disks, as
 expected.

/me reads some code and resets expectations about RAID0 in btrfs ;-)

update_block_group_flags is what does this.  It doesn't care what was
chosen when the filesystem was created, it just forces RAID0 if more
than 1 disk remains:

/* turn single device chunks into raid0 */
return stripped | BTRFS_BLOCK_GROUP_RAID0;

Is this really intended?  Given my current understanding that RAID0
doesn't mean striping over all disks, but only over two disks, I guess I
might even be interested in it, but...  I still think the user's choice
should be honored, but I don't see where the choice is stored (if it is
at all).


 I wonder, why can't btrfs mark at least mounted partitions as busy, in
 much the same way that swap, md and various filesystems do, to avoid
 such accidental reuses?

Heh.  And *unmark* them when they're removed, too...  As in, it won't
let me create a new filesystem in a partition that was just removed from
a filesystem, if that was the partition listed in /etc/mtab.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS partition won't mount

2011-08-08 Thread C Anthony Risinger
On Wed, Aug 3, 2011 at 3:50 PM, Hugo Mills h...@carfax.org.uk wrote:

   Try the instructions on the wiki at [1]. (And please feed back
 and/or fix any issues you have with the instructions -- they're still
 quite new and probably have awkward corners).

 [1] 
 https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_can.27t_mount_my_filesystem.2C_and_I_get_a_kernel_oops.21

this worked perfectly for me ... just saved my night from tedious
restoration :-)

im on kernel 3.0.1 -- hard poweroff led to that problem.  i haven't
had any issues for some time ... im not sure what the problem was
exactly, but sometimes systemd gets a little twacky and takes a year
to shutdown ... guess i got a little impatient :-)

anyways, thanks for the integration work!

-- 

C Anthony
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: “bio too big” regression and silent data corruption in 3.0

2011-08-08 Thread Alexandre Oliva
On Aug  7, 2011, Alexandre Oliva ol...@lsd.ic.unicamp.br wrote:

 in very much the same way that it appears to be impossible to go
 back from RAID1 to DUP metadata once you temporarily add a second disk,
 and any metadata block group happens to be allocated before you remove
 it (why couldn't it go back to DUP, rather than refusing the removal
 outright, which prevents even single block groups from being moved?)

Which also appears to be intentional.  The code to suport this is right
there in update_block_group_flags, but btrfs_rm_device refuses to let it
do its job, denying the removal attempt right away, without any means to
bypass the test.  Could at least an option to bypass the test be
introduced, through say a mount option, some /sys setting, whatever?

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Handle NULL inode return from btrfs_lookup_dentry()

2011-08-08 Thread Tsutomu Itoh
Hi, Mark,

(2011/08/06 1:48), Mark Fasheh wrote:
 Right now in create_snapshot(), we'll BUG() if btrfs_lookup_dentry() returns
 a NULL inode (negative dentry). Getting a negative dentry here probably
 isn't ever expected to happen however two things lead me to believe that we
 should trap this anyway:
 
 - I don't see any possiblity of serious fs corruption from handling the
   error.  I do wonder though if we could have an orphaned snapshot?  Even
   if we did that doesn't strike me as needing to crash the machine. (Q:
   Perhaps going read-only is the eventual solution here?)
 
 - It's very trivial to pass an -ENOENT back to userspace as we're pretty
   high up the call path at this point.

I have already posted the same purpose patch. Please look at the following.

 http://marc.info/?l=linux-btrfsm=130932339824237w=2

Thanks,
Tsutomu

 
 Signed-off-by: Mark Fasheh mfas...@suse.com
 ---
  fs/btrfs/ioctl.c |8 +---
  1 files changed, 5 insertions(+), 3 deletions(-)
 
 diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
 index 7cf0133..fc9525f 100644
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -498,14 +498,16 @@ static int create_snapshot(struct btrfs_root *root, 
 struct dentry *dentry,
   if (ret)
   goto fail;
  
 + ret = 0;
   inode = btrfs_lookup_dentry(dentry-d_parent-d_inode, dentry);
   if (IS_ERR(inode)) {
   ret = PTR_ERR(inode);
   goto fail;
 - }
 - BUG_ON(!inode);
 + } else if (inode == NULL)
 + ret = -ENOENT;
 +
   d_instantiate(dentry, inode);
 - ret = 0;
 +
  fail:
   kfree(pending_snapshot);
   return ret;


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html