Re: new metadata reader/writer locks in integration-test

2011-07-21 Thread Miao Xie
On thu, 21 Jul 2011 20:53:24 -0400, Chris Mason wrote:
 Hi everyone,

 I just rebased Josef's enospc fixes into integration-test, it should fix
 the warnings in extent-tree.c

>>>
>>> Unfortunately, I got the following messages.
>>>
>>>
>>> Jul 21 09:41:22 luna kernel: [ cut here ]
>>> Jul 21 09:41:22 luna kernel: WARNING: at fs/btrfs/extent-tree.c:5564 
>>> btrfs_alloc_reserved_file_extent+0xf8/0x100 [btrfs]()
>>> Jul 21 09:41:22 luna kernel: Hardware name: PRIMERGY
>>> Jul 21 09:41:22 luna kernel: Modules linked in: btrfs zlib_deflate crc32c 
>>> libcrc32c autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand acpi_cpufreq 
>>> freq_table mperf ipv6 ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm 
>>> uinput ppdev parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt 
>>> iTCO_vendor_support tg3 shpchp pci_hotplug i3000_edac edac_core ext4 
>>> mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom megaraid_sas floppy 
>>> pata_acpi ata_generic ata_piix libata scsi_mod [last unloaded: microcode]
>>> Jul 21 09:41:22 luna kernel: Pid: 5517, comm: btrfs-endio-wri Tainted: G
>>> W   2.6.39btrfs-tc1+ #1
>>> Jul 21 09:41:22 luna kernel: Call Trace:
>>> Jul 21 09:41:22 luna kernel: [] 
>>> warn_slowpath_common+0x7f/0xc0
>>> Jul 21 09:41:22 luna kernel: [] 
>>> warn_slowpath_null+0x1a/0x20
>>> Jul 21 09:41:22 luna kernel: [] 
>>> btrfs_alloc_reserved_file_extent+0xf8/0x100 [btrfs]
>>> Jul 21 09:41:22 luna kernel: [] 
>>> insert_reserved_file_extent.clone.0+0x201/0x270 [btrfs]
>>> Jul 21 09:41:22 luna kernel: [] 
>>> btrfs_finish_ordered_io+0x2eb/0x360 [btrfs]
>>> Jul 21 09:41:22 luna kernel: [] ? 
>>> try_to_del_timer_sync+0x83/0xe0
>>> Jul 21 09:41:22 luna kernel: [] 
>>> btrfs_writepage_end_io_hook+0x50/0xa0 [btrfs]
>>> Jul 21 09:41:22 luna kernel: [] 
>>> end_compressed_bio_write+0x86/0xf0 [btrfs]
>>> Jul 21 09:41:22 luna kernel: [] bio_endio+0x1d/0x40
>>> Jul 21 09:41:22 luna kernel: [] 
>>> end_workqueue_fn+0xf4/0x130 [btrfs]
>>> Jul 21 09:41:22 luna kernel: [] worker_loop+0x13e/0x540 
>>> [btrfs]
>>> Jul 21 09:41:22 luna kernel: [] ? 
>>> btrfs_queue_worker+0x2d0/0x2d0 [btrfs]
>>> Jul 21 09:41:22 luna kernel: [] ? 
>>> btrfs_queue_worker+0x2d0/0x2d0 [btrfs]
>>> Jul 21 09:41:22 luna kernel: [] kthread+0x96/0xa0
>>> Jul 21 09:41:22 luna kernel: [] 
>>> kernel_thread_helper+0x4/0x10
>>> Jul 21 09:41:22 luna kernel: [] ? 
>>> kthread_worker_fn+0x1a0/0x1a0
>>> Jul 21 09:41:22 luna kernel: [] ? gs_change+0x13/0x13
>>> Jul 21 09:41:22 luna kernel: ---[ end trace 02c1fa3044677043 ]---
>>>
>>
>> a very similar warning here, but without compression involved:
> 
> Ok, these are probably the enospc fixes.  Could you please try bisecting
> out some of Josef's patches?

I did binary search and found the following patch led to this problem.

commit 97ffc7d564f55787c7d9ea557d5d30d9ecb2f003
Author: Josef Bacik 
Date:   Fri Jul 15 18:29:11 2011 +

Btrfs: don't be as agressive with delalloc metadata reservations

Currently we reserve enough space to COW an entirely full btree for every ex
we have reserved for an inode.  This _sucks_, because you only need to COW o
and then everybody else is ok.  Unfortunately we don't know we'll all be abl
get into the same transaction so that's what we have had to do.  But the glo
reserve holds a reservation large enough to cover a large percentage of all 
metadata currently in the fs.  So all we really need to account for is any n
blocks that we may allocate.  So fix this by
  ……

The reason is the calculation of the reservation is wrong, the nodes in the 
search path
may be split, and new nodes may be created, but the above patch didn't reserve 
space for
these new nodes.

The following patch can fix it. Though my test passed, I still need Arne's 
verification
to make sure it can fix all the reported problems.
Arne, Could you test it for me?

Subject: [PATCH] Btrfs: fix wrong calculation of the reservation for the 
transaction

At worst, Btrfs may split all the nodes in the search path, so we must take
those new nodes into account when we calculate the space that need be reserved.

Signed-off-by: Miao Xie 
---
 fs/btrfs/ctree.h |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index d813a67..4f23819 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2133,10 +2133,16 @@ static inline bool btrfs_mixed_space_info(struct 
btrfs_space_info *space_info)
 }
 
 /* extent-tree.c */
+/*
+ * This inline function is used to calc the size of new nodes/leaves that we
+ * may create. At worst, we may split all the nodes in the path and create
+ * two leaves for the insertion of one item.
+ */
 static inline u64 btrfs_calc_trans_metadata_size(struct btrfs_root *root,
 unsigned num_items)
 {
-   return root->leafsize * 3 * num_items;
+   return (root->leafsize * 2 + root->nodesize * (BTRFS_MA

Re: [PATCH 6/7] btrfs: Don't BUG_ON alloc_path errors in find_next_chunk

2011-07-21 Thread Tsutomu Itoh
(2011/07/22 4:48), Mark Fasheh wrote:
> I also removed the BUG_ON from error return of find_next_chunk in
> init_first_rw_device(). It turns out that the only caller of
> init_first_rw_device() also BUGS on any nonzero return so no actual behavior
> change has occurred here.
> 
> do_chunk_alloc() also needed an update since it calls btrfs_alloc_chunk()
> which can now return -ENOMEM. Instead of setting space_info->full on any
> error from btrfs_alloc_chunk() I catch and return every error value _except_
> -ENOSPC. Thanks goes to Tsutomu Itoh for pointing that issue out.
> 
> Signed-off-by: Mark Fasheh 
> ---
>  fs/btrfs/extent-tree.c |3 +++
>  fs/btrfs/volumes.c |6 --
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index aa91773..ff339b2 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3277,6 +3277,9 @@ again:
>   }
>  
>   ret = btrfs_alloc_chunk(trans, extent_root, flags);
> + if (ret < 0 && ret != -ENOSPC)
> + return ret;
> +

You need mutex_unlock() before return.

Thanks,
Tsutomu


>   spin_lock(&space_info->lock);
>   if (ret)
>   space_info->full = 1;
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 530a2fc..90d956c 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1037,7 +1037,8 @@ static noinline int find_next_chunk(struct btrfs_root 
> *root,
>   struct btrfs_key found_key;
>  
>   path = btrfs_alloc_path();
> - BUG_ON(!path);
> + if (!path)
> + return -ENOMEM;
>  
>   key.objectid = objectid;
>   key.offset = (u64)-1;
> @@ -2663,7 +2664,8 @@ static noinline int init_first_rw_device(struct 
> btrfs_trans_handle *trans,
>  
>   ret = find_next_chunk(fs_info->chunk_root,
> BTRFS_FIRST_CHUNK_TREE_OBJECTID, &chunk_offset);
> - BUG_ON(ret);
> + if (ret)
> + return ret;
>  
>   alloc_profile = BTRFS_BLOCK_GROUP_METADATA |
>   (fs_info->metadata_alloc_profile &


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: new metadata reader/writer locks in integration-test

2011-07-21 Thread Chris Mason
Excerpts from Arne Jansen's message of 2011-07-21 01:46:55 -0400:
> On 21.07.2011 02:48, Tsutomu Itoh wrote:
> > (2011/07/21 2:21), Chris Mason wrote:
> >> Excerpts from Chris Mason's message of 2011-07-19 13:30:22 -0400:
> >>> Hi everyone,
> >>>
> >>> I've pushed out a new integration-test branch, and it includes a new
> >>> reader/writer locking scheme for the btree locks.
> >>>
> >>> We've seen a number of benchmarks dominated by contention on the root
> >>> node lock.  This changes our locks into a simple reader/writer lock.
> >>> They are based on mutexes so that we still take advantage of the mutex
> >>> adaptive spins for write locks (rwsemaphores were much slower).
> >>>
> >>> I'm also sending the individual commits, please do take a look.
> >>
> >> Hi everyone,
> >>
> >> I just rebased Josef's enospc fixes into integration-test, it should fix
> >> the warnings in extent-tree.c
> >>
> > 
> > Unfortunately, I got the following messages.
> > 
> > 
> > Jul 21 09:41:22 luna kernel: [ cut here ]
> > Jul 21 09:41:22 luna kernel: WARNING: at fs/btrfs/extent-tree.c:5564 
> > btrfs_alloc_reserved_file_extent+0xf8/0x100 [btrfs]()
> > Jul 21 09:41:22 luna kernel: Hardware name: PRIMERGY
> > Jul 21 09:41:22 luna kernel: Modules linked in: btrfs zlib_deflate crc32c 
> > libcrc32c autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand acpi_cpufreq 
> > freq_table mperf ipv6 ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm 
> > uinput ppdev parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt 
> > iTCO_vendor_support tg3 shpchp pci_hotplug i3000_edac edac_core ext4 
> > mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom megaraid_sas floppy 
> > pata_acpi ata_generic ata_piix libata scsi_mod [last unloaded: microcode]
> > Jul 21 09:41:22 luna kernel: Pid: 5517, comm: btrfs-endio-wri Tainted: G
> > W   2.6.39btrfs-tc1+ #1
> > Jul 21 09:41:22 luna kernel: Call Trace:
> > Jul 21 09:41:22 luna kernel: [] 
> > warn_slowpath_common+0x7f/0xc0
> > Jul 21 09:41:22 luna kernel: [] 
> > warn_slowpath_null+0x1a/0x20
> > Jul 21 09:41:22 luna kernel: [] 
> > btrfs_alloc_reserved_file_extent+0xf8/0x100 [btrfs]
> > Jul 21 09:41:22 luna kernel: [] 
> > insert_reserved_file_extent.clone.0+0x201/0x270 [btrfs]
> > Jul 21 09:41:22 luna kernel: [] 
> > btrfs_finish_ordered_io+0x2eb/0x360 [btrfs]
> > Jul 21 09:41:22 luna kernel: [] ? 
> > try_to_del_timer_sync+0x83/0xe0
> > Jul 21 09:41:22 luna kernel: [] 
> > btrfs_writepage_end_io_hook+0x50/0xa0 [btrfs]
> > Jul 21 09:41:22 luna kernel: [] 
> > end_compressed_bio_write+0x86/0xf0 [btrfs]
> > Jul 21 09:41:22 luna kernel: [] bio_endio+0x1d/0x40
> > Jul 21 09:41:22 luna kernel: [] 
> > end_workqueue_fn+0xf4/0x130 [btrfs]
> > Jul 21 09:41:22 luna kernel: [] worker_loop+0x13e/0x540 
> > [btrfs]
> > Jul 21 09:41:22 luna kernel: [] ? 
> > btrfs_queue_worker+0x2d0/0x2d0 [btrfs]
> > Jul 21 09:41:22 luna kernel: [] ? 
> > btrfs_queue_worker+0x2d0/0x2d0 [btrfs]
> > Jul 21 09:41:22 luna kernel: [] kthread+0x96/0xa0
> > Jul 21 09:41:22 luna kernel: [] 
> > kernel_thread_helper+0x4/0x10
> > Jul 21 09:41:22 luna kernel: [] ? 
> > kthread_worker_fn+0x1a0/0x1a0
> > Jul 21 09:41:22 luna kernel: [] ? gs_change+0x13/0x13
> > Jul 21 09:41:22 luna kernel: ---[ end trace 02c1fa3044677043 ]---
> > 
> 
> a very similar warning here, but without compression involved:

Ok, these are probably the enospc fixes.  Could you please try bisecting
out some of Josef's patches?

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] btrfs: don't BUG_ON allocation errors in btrfs_drop_snapshot

2011-07-21 Thread Tsutomu Itoh
Hi, Mark,

(2011/07/22 4:48), Mark Fasheh wrote:
> In addition to properly handling allocation failure from btrfs_alloc_path, I
> also fixed up the kzalloc error handling code immediately below it.
> 
> Signed-off-by: Mark Fasheh 
> ---
>  fs/btrfs/extent-tree.c |8 ++--
>  1 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index ff339b2..4cf5257 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -6271,10 +6271,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>   int level;
>  
>   path = btrfs_alloc_path();
> - BUG_ON(!path);
> + if (!path)
> + return -ENOMEM;
>  
>   wc = kzalloc(sizeof(*wc), GFP_NOFS);
> - BUG_ON(!wc);
> + if (!wc) {
> + btrfs_free_path(path);
> + return -ENOMEM;
> + }
>  
>   trans = btrfs_start_transaction(tree_root, 0);
>   BUG_ON(IS_ERR(trans));

Currently, callers of btrfs_drop_snapshot() ignore the return code.
But btrfs_drop_snapshot() detects the error by BUG_ON.

The caller still ignore the return code though your modification returns
the error code to the caller. 
So, we can not detect error. I don't think that it is good.

Thanks,
Tsutomu

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Broken btrfs?

2011-07-21 Thread Jan Schubert
On 07/18/2011 10:29 AM, Jan Schmidt wrote:
> If you are on a 3.0 kernel, get the most current version of btrfs
> tools from Hugo's integration-20110705 branch at
> http://git.darksatanic.net/repo/btrfs-progs-unstable.git/ and do a
> scrub. -Jan 

Thx Jan, I did. This is the result:

scrub status for 03201fc0-7695-4468-9a10-f61ad79f23ca
scrub started at Thu Jul 21 22:27:31 2011 and finished after 787
seconds
total bytes scrubbed: 173.91GB with 2211 errors
error details: csum=2211
corrected errors: 0, uncorrectable errors: 2211

Any help what to do now? Should I stick with this filesystem or create a
new one?

The good thing is, running 3.0 does not crash the system anymore while
accessing corrupt data but just printing an I/O error.

TiA,
Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 8/8] btrfs: new ioctls to do logical->inode and inode->path resolving

2011-07-21 Thread Andi Kleen
Jan Schmidt  writes:
> +
> +static long btrfs_ioctl_logical_to_ino(struct btrfs_root *root,
> + void __user *arg)
> +{
> + int ret = 0;
> + int size;
> + u64 extent_offset;
> + struct btrfs_ioctl_logical_ino_args *loi;
> + struct btrfs_data_container *inodes = NULL;
> + struct btrfs_path *path = NULL;
> + struct btrfs_key key;

This really needs to be root-only for obvious reasons.
The same for the ino_path function

> +
> + loi = memdup_user(arg, sizeof(*loi));
> + if (IS_ERR(loi)) {
> + ret = PTR_ERR(loi);
> + loi = NULL;
> + goto out;
> + }
> +
> + path = btrfs_alloc_path();
> + if (!path) {
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + size = min(loi->size, 4096);

This is likely a root hole. loi->size is signed! Consider the case
of a negative value being passed in.

Same for the earlier function.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] btrfs: Don't BUG_ON alloc_path errors in btrfs_balance()

2011-07-21 Thread Mark Fasheh
Dealing with this seems trivial - the only caller of btrfs_balance() is
btrfs_ioctl() which passes the error code directly back to userspace. There
also isn't much state to unwind (if I'm wrong about this point, we can
always safely move the allocation to the top of btrfs_balance() anyway).

Signed-off-by: Mark Fasheh 
---
 fs/btrfs/volumes.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 19450bc..530a2fc 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2061,8 +2061,10 @@ int btrfs_balance(struct btrfs_root *dev_root)
 
/* step two, relocate all the chunks */
path = btrfs_alloc_path();
-   BUG_ON(!path);
-
+   if (!path) {
+   ret = -ENOMEM;
+   goto error;
+   }
key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
key.offset = (u64)-1;
key.type = BTRFS_CHUNK_ITEM_KEY;
-- 
1.7.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] btrfs: Don't BUG_ON alloc_path errors in find_next_chunk

2011-07-21 Thread Mark Fasheh
I also removed the BUG_ON from error return of find_next_chunk in
init_first_rw_device(). It turns out that the only caller of
init_first_rw_device() also BUGS on any nonzero return so no actual behavior
change has occurred here.

do_chunk_alloc() also needed an update since it calls btrfs_alloc_chunk()
which can now return -ENOMEM. Instead of setting space_info->full on any
error from btrfs_alloc_chunk() I catch and return every error value _except_
-ENOSPC. Thanks goes to Tsutomu Itoh for pointing that issue out.

Signed-off-by: Mark Fasheh 
---
 fs/btrfs/extent-tree.c |3 +++
 fs/btrfs/volumes.c |6 --
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index aa91773..ff339b2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3277,6 +3277,9 @@ again:
}
 
ret = btrfs_alloc_chunk(trans, extent_root, flags);
+   if (ret < 0 && ret != -ENOSPC)
+   return ret;
+
spin_lock(&space_info->lock);
if (ret)
space_info->full = 1;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 530a2fc..90d956c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1037,7 +1037,8 @@ static noinline int find_next_chunk(struct btrfs_root 
*root,
struct btrfs_key found_key;
 
path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;
 
key.objectid = objectid;
key.offset = (u64)-1;
@@ -2663,7 +2664,8 @@ static noinline int init_first_rw_device(struct 
btrfs_trans_handle *trans,
 
ret = find_next_chunk(fs_info->chunk_root,
  BTRFS_FIRST_CHUNK_TREE_OBJECTID, &chunk_offset);
-   BUG_ON(ret);
+   if (ret)
+   return ret;
 
alloc_profile = BTRFS_BLOCK_GROUP_METADATA |
(fs_info->metadata_alloc_profile &
-- 
1.7.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] btrfs: don't BUG_ON allocation errors in btrfs_drop_snapshot

2011-07-21 Thread Mark Fasheh
In addition to properly handling allocation failure from btrfs_alloc_path, I
also fixed up the kzalloc error handling code immediately below it.

Signed-off-by: Mark Fasheh 
---
 fs/btrfs/extent-tree.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ff339b2..4cf5257 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -6271,10 +6271,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
int level;
 
path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;
 
wc = kzalloc(sizeof(*wc), GFP_NOFS);
-   BUG_ON(!wc);
+   if (!wc) {
+   btrfs_free_path(path);
+   return -ENOMEM;
+   }
 
trans = btrfs_start_transaction(tree_root, 0);
BUG_ON(IS_ERR(trans));
-- 
1.7.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] btrfs: Don't BUG_ON alloc_path errors in btrfs_truncate_inode_items

2011-07-21 Thread Mark Fasheh
I moved the path allocation up a few lines to the top of the function so
that we couldn't get into the state where we've dropped delayed items and
the extent cache but fail due to -ENOMEM.

Signed-off-by: Mark Fasheh 
---
 fs/btrfs/inode.c |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8be7d7a..a0faf7d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3172,6 +3172,11 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle 
*trans,
 
BUG_ON(new_size > 0 && min_type != BTRFS_EXTENT_DATA_KEY);
 
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+   path->reada = -1;
+
if (root->ref_cows || root == root->fs_info->tree_root)
btrfs_drop_extent_cache(inode, new_size & (~mask), (u64)-1, 0);
 
@@ -3184,10 +3189,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle 
*trans,
if (min_type == 0 && root == BTRFS_I(inode)->root)
btrfs_kill_delayed_inode_items(inode);
 
-   path = btrfs_alloc_path();
-   BUG_ON(!path);
-   path->reada = -1;
-
key.objectid = ino;
key.offset = (u64)-1;
key.type = (u8)-1;
-- 
1.7.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/7] btrfs: Don't BUG_ON alloc_path errors in btrfs_read_locked_inode

2011-07-21 Thread Mark Fasheh
btrfs_iget() also needed an update so that errors from btrfs_locked_inode()
are caught and bubbled back up.

Signed-off-by: Mark Fasheh 
---
 fs/btrfs/inode.c |   22 +-
 1 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a0faf7d..8882999 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2518,7 +2518,9 @@ static void btrfs_read_locked_inode(struct inode *inode)
filled = true;
 
path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   goto make_bad;
+
path->leave_spinning = 1;
memcpy(&location, &BTRFS_I(inode)->location, sizeof(location));
 
@@ -3973,6 +3975,7 @@ struct inode *btrfs_iget(struct super_block *s, struct 
btrfs_key *location,
 struct btrfs_root *root, int *new)
 {
struct inode *inode;
+   int bad_inode = 0;
 
inode = btrfs_iget_locked(s, location->objectid, root);
if (!inode)
@@ -3982,10 +3985,19 @@ struct inode *btrfs_iget(struct super_block *s, struct 
btrfs_key *location,
BTRFS_I(inode)->root = root;
memcpy(&BTRFS_I(inode)->location, location, sizeof(*location));
btrfs_read_locked_inode(inode);
-   inode_tree_add(inode);
-   unlock_new_inode(inode);
-   if (new)
-   *new = 1;
+   if (!is_bad_inode(inode)) {
+   inode_tree_add(inode);
+   unlock_new_inode(inode);
+   if (new)
+   *new = 1;
+   } else {
+   bad_inode = 1;
+   }
+   }
+
+   if (bad_inode) {
+   iput(inode);
+   inode = ERR_PTR(-ESTALE);
}
 
return inode;
-- 
1.7.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] btrfs: Don't BUG_ON alloc_path errors in replay_one_buffer()

2011-07-21 Thread Mark Fasheh
The two ->process_func call sites in tree-log.c which were ignoring a return
code have also been updated to gracefully exit as well.

Signed-off-by: Mark Fasheh 
---
 fs/btrfs/tree-log.c |   12 +---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 4ce8a9f..f3cacc0 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -1617,7 +1617,8 @@ static int replay_one_buffer(struct btrfs_root *log, 
struct extent_buffer *eb,
return 0;
 
path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;
 
nritems = btrfs_header_nritems(eb);
for (i = 0; i < nritems; i++) {
@@ -1723,7 +1724,9 @@ static noinline int walk_down_log_tree(struct 
btrfs_trans_handle *trans,
return -ENOMEM;
 
if (*level == 1) {
-   wc->process_func(root, next, wc, ptr_gen);
+   ret = wc->process_func(root, next, wc, ptr_gen);
+   if (ret)
+   return ret;
 
path->slots[*level]++;
if (wc->free) {
@@ -1788,8 +1791,11 @@ static noinline int walk_up_log_tree(struct 
btrfs_trans_handle *trans,
parent = path->nodes[*level + 1];
 
root_owner = btrfs_header_owner(parent);
-   wc->process_func(root, path->nodes[*level], wc,
+   ret = wc->process_func(root, path->nodes[*level], wc,
 btrfs_header_generation(path->nodes[*level]));
+   if (ret)
+   return ret;
+
if (wc->free) {
struct extent_buffer *next;
 
-- 
1.7.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] btrfs: don't BUG_ON btrfs_alloc_path() errors

2011-07-21 Thread Mark Fasheh
This patch fixes many callers of btrfs_alloc_path() which BUG_ON allocation
failure. All the sites that are fixed in this patch were checked by me to
be fairly trivial to fix because of at least one of two criteria:

 - Callers of the function catch errors from it already so bubbling the
   error up will be handled.
 - Callers of the function might BUG_ON any nonzero return code in which
   case there is no behavior changed (but we still got to remove a BUG_ON)

The following functions were updated:

btrfs_lookup_extent, alloc_reserved_tree_block, btrfs_remove_block_group,
btrfs_lookup_csums_range, btrfs_csum_file_blocks, btrfs_mark_extent_written,
btrfs_inode_by_name, btrfs_new_inode, btrfs_symlink,
insert_reserved_file_extent, and run_delalloc_nocow

Signed-off-by: Mark Fasheh 
---
 fs/btrfs/extent-tree.c |   12 +---
 fs/btrfs/file-item.c   |7 +--
 fs/btrfs/file.c|3 ++-
 fs/btrfs/inode.c   |   18 +-
 4 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 71cd456..aa91773 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -667,7 +667,9 @@ int btrfs_lookup_extent(struct btrfs_root *root, u64 start, 
u64 len)
struct btrfs_path *path;
 
path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;
+
key.objectid = start;
key.offset = len;
btrfs_set_key_type(&key, BTRFS_EXTENT_ITEM_KEY);
@@ -5494,7 +5496,8 @@ static int alloc_reserved_tree_block(struct 
btrfs_trans_handle *trans,
u32 size = sizeof(*extent_item) + sizeof(*block_info) + sizeof(*iref);
 
path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;
 
path->leave_spinning = 1;
ret = btrfs_insert_empty_item(trans, fs_info->extent_root, path,
@@ -7162,7 +7165,10 @@ int btrfs_remove_block_group(struct btrfs_trans_handle 
*trans,
spin_unlock(&cluster->refill_lock);
 
path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path) {
+   ret = -ENOMEM;
+   goto out;
+   }
 
inode = lookup_free_space_inode(root, block_group, path);
if (!IS_ERR(inode)) {
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 90d4ee5..f92ff0e 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -282,7 +282,8 @@ int btrfs_lookup_csums_range(struct btrfs_root *root, u64 
start, u64 end,
u16 csum_size = btrfs_super_csum_size(&root->fs_info->super_copy);
 
path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;
 
if (search_commit) {
path->skip_locking = 1;
@@ -672,7 +673,9 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans,
btrfs_super_csum_size(&root->fs_info->super_copy);
 
path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;
+
sector_sum = sums->sums;
 again:
next_offset = (u64)-1;
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index fa4ef18..23d1d81 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -855,7 +855,8 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle 
*trans,
btrfs_drop_extent_cache(inode, start, end - 1, 0);
 
path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;
 again:
recow = 0;
split = start;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3601f0a..8be7d7a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1070,7 +1070,8 @@ static noinline int run_delalloc_nocow(struct inode 
*inode,
u64 ino = btrfs_ino(inode);
 
path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;
 
nolock = is_free_space_inode(root, inode);
 
@@ -1644,7 +1645,8 @@ static int insert_reserved_file_extent(struct 
btrfs_trans_handle *trans,
int ret;
 
path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;
 
path->leave_spinning = 1;
 
@@ -3713,7 +3715,8 @@ static int btrfs_inode_by_name(struct inode *dir, struct 
dentry *dentry,
int ret = 0;
 
path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;
 
di = btrfs_lookup_dir_item(NULL, root, path, btrfs_ino(dir), name,
namelen, 0);
@@ -4438,7 +4441,8 @@ static struct inode *btrfs_new_inode(struct 
btrfs_trans_handle *trans,
int owner;
 
path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return ERR_PTR(-ENOMEM);
 
inode = new_inode(root->fs_info->sb);
if (!inode) {
@@ -7194,7 +7198,11 @@ static int btrfs_symlink(struct inode *dir, struct 
dentry *dentry,
goto out_unlock;
 

[PATCH 0/7] btrfs: don't BUG_ON btrfs_alloc_path errors v2

2011-07-21 Thread Mark Fasheh
Changelog:
  - Updated patch 6 after review from Tsutomu Itoh

Hi,

The following patches attempt to replace all the paths where we
BUG_ON the return value of btrfs_alloc_path with proper error handling. It's
pretty clear that these places aren't BUGing because of code error. To be
explicit, much of the code is doing something like this:

path = btrfs_alloc_path();
BUG_ON(!path);

which can be fixed by sending -ENOMEM back up the stack instead of the BUG.

The first patch in my series fixes the most trivial sites in one go.
The patches after the 1st fix one (more complicated) site each. In the patch
descriptions I try my best to describe the thought process that went behind
each change.

Generally my guiding principle is that we want to "bubble up" some
of the BUG_ON's that can be trapped and handled at a higher level -- the lower
layer has an error and instead of killing the machine, sends it back up the
stack for later handling

I tested the patches with some kernel builds and snapshot commands.
Please review - comments and feedback are welcome.

The patches can also be had from git:

git pull 
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/btrfs-error-handling.git 
alloc_path
--Mark

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: Make extent-io callbacks that never fail return void

2011-07-21 Thread Jeff Mahoney
 The set/clear bit and the extent split/merge hooks only ever return 0.

 Changing them to return void simplifies the error handling cases later.

 This patch changes the hook prototypes, the single implementation of each,
 and the functions that call them to return void instead.

 Since all four of these hooks execute under a spinlock, they're necessarily
 simple.

Signed-off-by: Jeff Mahoney 
---
 fs/btrfs/extent_io.c |   52 +--
 fs/btrfs/extent_io.h |   18 -
 fs/btrfs/inode.c |   26 ++---
 3 files changed, 34 insertions(+), 62 deletions(-)

--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -254,14 +254,14 @@ static void merge_cb(struct extent_io_tr
  *
  * This should be called with the tree lock held.
  */
-static int merge_state(struct extent_io_tree *tree,
-  struct extent_state *state)
+static void merge_state(struct extent_io_tree *tree,
+   struct extent_state *state)
 {
struct extent_state *other;
struct rb_node *other_node;
 
if (state->state & (EXTENT_IOBITS | EXTENT_BOUNDARY))
-   return 0;
+   return;
 
other_node = rb_prev(&state->rb_node);
if (other_node) {
@@ -288,19 +288,13 @@ static int merge_state(struct extent_io_
state = NULL;
}
}
-
-   return 0;
 }
 
-static int set_state_cb(struct extent_io_tree *tree,
+static void set_state_cb(struct extent_io_tree *tree,
 struct extent_state *state, int *bits)
 {
-   if (tree->ops && tree->ops->set_bit_hook) {
-   return tree->ops->set_bit_hook(tree->mapping->host,
-  state, bits);
-   }
-
-   return 0;
+   if (tree->ops && tree->ops->set_bit_hook)
+   tree->ops->set_bit_hook(tree->mapping->host, state, bits);
 }
 
 static void clear_state_cb(struct extent_io_tree *tree,
@@ -326,7 +320,6 @@ static int insert_state(struct extent_io
 {
struct rb_node *node;
int bits_to_set = *bits & ~EXTENT_CTLBITS;
-   int ret;
 
if (end < start) {
printk(KERN_ERR "btrfs end < start %llu %llu\n",
@@ -336,9 +329,7 @@ static int insert_state(struct extent_io
}
state->start = start;
state->end = end;
-   ret = set_state_cb(tree, state, bits);
-   if (ret)
-   return ret;
+   set_state_cb(tree, state, bits);
 
if (bits_to_set & EXTENT_DIRTY)
tree->dirty_bytes += end - start + 1;
@@ -359,13 +350,11 @@ static int insert_state(struct extent_io
return 0;
 }
 
-static int split_cb(struct extent_io_tree *tree, struct extent_state *orig,
+static void split_cb(struct extent_io_tree *tree, struct extent_state *orig,
 u64 split)
 {
if (tree->ops && tree->ops->split_extent_hook)
-   return tree->ops->split_extent_hook(tree->mapping->host,
-   orig, split);
-   return 0;
+   tree->ops->split_extent_hook(tree->mapping->host, orig, split);
 }
 
 /*
@@ -671,23 +660,18 @@ out:
return 0;
 }
 
-static int set_state_bits(struct extent_io_tree *tree,
+static void set_state_bits(struct extent_io_tree *tree,
   struct extent_state *state,
   int *bits)
 {
-   int ret;
int bits_to_set = *bits & ~EXTENT_CTLBITS;
 
-   ret = set_state_cb(tree, state, bits);
-   if (ret)
-   return ret;
+   set_state_cb(tree, state, bits);
if ((bits_to_set & EXTENT_DIRTY) && !(state->state & EXTENT_DIRTY)) {
u64 range = state->end - state->start + 1;
tree->dirty_bytes += range;
}
state->state |= bits_to_set;
-
-   return 0;
 }
 
 static void cache_state(struct extent_state *state,
@@ -779,9 +763,7 @@ hit_next:
goto out;
}
 
-   err = set_state_bits(tree, state, &bits);
-   if (err)
-   goto out;
+   set_state_bits(tree, state, &bits);
 
next_node = rb_next(node);
cache_state(state, cached_state);
@@ -830,9 +812,7 @@ hit_next:
if (err)
goto out;
if (state->end <= end) {
-   err = set_state_bits(tree, state, &bits);
-   if (err)
-   goto out;
+   set_state_bits(tree, state, &bits);
cache_state(state, cached_state);
merge_state(tree, state);
if (last_end == (u64)-1)
@@ -895,11 +875,7 @@ hit_next:
err = split_state(tree, state, prealloc, end + 1);
BUG_ON(err == -EEXIST);
 
-   err = set_state_bits(tree, preall

WARNING: at fs/btrfs/inode.c:2204

2011-07-21 Thread Christian Brunner
I'm running a Ceph Object Store with 3.0-rc7 and patches from Josef.
Occasionally I get the attached warning.

Everything seems to be working after this warning, but I am concerned...

Thanks,
Christian

[13319.808020] [ cut here ]
[13319.813284] WARNING: at fs/btrfs/inode.c:2204
btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]()
[13319.822563] Hardware name: ProLiant DL180 G6
[13319.827586] Modules linked in: btrfs zlib_deflate libcrc32c bonding
ipv6 serio_raw pcspkr ghes hed iTCO_wdt iTCO_vendor_support
i7core_edac edac_core ixgbe dca mdio iomemory_vsl(P) hpsa squashfs
[last unloaded: scsi_wait_scan]
[13319.851192] Pid: 23617, comm: kworker/6:0 Tainted: P
3.0.0-1.fits.2.el6.x86_64 #1
[13319.860661] Call Trace:
[13319.863433]  [] warn_slowpath_common+0x7f/0xc0
[13319.870172]  [] warn_slowpath_null+0x1a/0x20
[13319.876724]  [] btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]
[13319.884633]  [] commit_fs_roots+0xc5/0x1b0 [btrfs]
[13319.891762]  []
btrfs_commit_transaction+0x3ce/0x840 [btrfs]
[13319.899917]  [] ? dequeue_task_fair+0x20f/0x220
[13319.906726]  [] ? __switch_to+0x12b/0x320
[13319.912943]  [] ? wake_up_bit+0x40/0x40
[13319.918971]  [] ? btrfs_end_transaction+0x20/0x20 [btrfs]
[13319.926775]  [] do_async_commit+0x1f/0x30 [btrfs]
[13319.933825]  [] process_one_work+0x128/0x450
[13319.940419]  [] worker_thread+0x17b/0x3c0
[13319.946670]  [] ? manage_workers+0x220/0x220
[13319.953210]  [] kthread+0x96/0xa0
[13319.958682]  [] kernel_thread_helper+0x4/0x10
[13319.965316]  [] ? kthread_worker_fn+0x1a0/0x1a0
[13319.972183]  [] ? gs_change+0x13/0x13
[13319.978065] ---[ end trace 942778a443791443 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: don't be as agressive with delalloc metadata reservations V2

2011-07-21 Thread Christian Brunner
2011/7/18 Josef Bacik :
> On 07/18/2011 02:11 PM, Josef Bacik wrote:
>> Currently we reserve enough space to COW an entirely full btree for every 
>> extent
>> we have reserved for an inode.  This _sucks_, because you only need to COW 
>> once,
>> and then everybody else is ok.  Unfortunately we don't know we'll all be 
>> able to
>> get into the same transaction so that's what we have had to do.  But the 
>> global
>> reserve holds a reservation large enough to cover a large percentage of all 
>> the
>> metadata currently in the fs.  So all we really need to account for is any 
>> new
>> blocks that we may allocate.  So fix this by
>>
>> 1) Passing to btrfs_alloc_free_block() wether this is a new block or a COW
>> block.  If it is a COW block we use the global reserve, if not we use the
>> trans->block_rsv.
>> 2) Reduce the amount of space we reserve.  Since we don't need to account for
>> cow'ing the tree we can just keep track of new blocks to reserve, which 
>> greatly
>> reduces the reservation amount.
>>
>> This makes my basic random write test go from 3 mb/s to 75 mb/s.  I've tested
>> this with my horrible ENOSPC test and it seems to work out fine.  Thanks,
>>
>> Signed-off-by: Josef Bacik 
>> ---
>> V1->V2:
>> -fix a problem reported by Liubo, we need to make sure that we move bytes
>> over for any new extents we may add to the extent tree so we don't get a 
>> bunch
>> of warnings.
>> -fix the global reserve to reserve 50% of the metadata space currently used.

When I run this patch I get a lot of messages like these (V1 seemed to
run fine).

Regards,
Christian

Jul 21 15:25:59 os00 kernel: [   35.411360] [ cut here ]
Jul 21 15:25:59 os00 kernel: [   35.416589] WARNING: at
fs/btrfs/extent-tree.c:5564
btrfs_alloc_reserved_file_extent+0xf8/0x100 [btrfs]()
Jul 21 15:25:59 os00 kernel: [   35.427311] Hardware name: ProLiant DL180 G6
Jul 21 15:25:59 os00 kernel: [   35.432326] Modules linked in: btrfs
zlib_deflate libcrc32c bonding ipv6 serio_raw pcspkr ghes hed iTCO_wdt
iTCO_vendor_support ixgbe dca mdio i7core_edac edac_core
iomemory_vsl(P) hpsa squashfs usb_storage [last unloaded:
scsi_wait_scan]
Jul 21 15:25:59 os00 kernel: [   35.456799] Pid: 1876, comm:
btrfs-endio-wri Tainted: P3.0.0-1.fits.4.el6.x86_64 #1
Jul 21 15:25:59 os00 kernel: [   35.466610] Call Trace:
Jul 21 15:25:59 os00 kernel: [   35.469497]  []
warn_slowpath_common+0x7f/0xc0
Jul 21 15:25:59 os00 kernel: [   35.476254]  []
warn_slowpath_null+0x1a/0x20
Jul 21 15:25:59 os00 kernel: [   35.482839]  []
btrfs_alloc_reserved_file_extent+0xf8/0x100 [btrfs]
Jul 21 15:25:59 os00 kernel: [   35.491683]  []
insert_reserved_file_extent.clone.0+0x201/0x270 [btrfs]
Jul 21 15:25:59 os00 kernel: [   35.500912]  []
btrfs_finish_ordered_io+0x2eb/0x360 [btrfs]
Jul 21 15:25:59 os00 kernel: [   35.508978]  [] ?
try_to_del_timer_sync+0x81/0xe0
Jul 21 15:25:59 os00 kernel: [   35.516081]  []
btrfs_writepage_end_io_hook+0x4c/0xa0 [btrfs]
Jul 21 15:25:59 os00 kernel: [   35.524340]  []
end_compressed_bio_write+0x86/0xf0 [btrfs]
Jul 21 15:25:59 os00 kernel: [   35.532259]  []
bio_endio+0x1d/0x40
Jul 21 15:25:59 os00 kernel: [   35.538034]  []
end_workqueue_fn+0xf4/0x130 [btrfs]
Jul 21 15:25:59 os00 kernel: [   35.545384]  []
worker_loop+0x13e/0x540 [btrfs]
Jul 21 15:25:59 os00 kernel: [   35.552307]  [] ?
btrfs_queue_worker+0x2d0/0x2d0 [btrfs]
Jul 21 15:25:59 os00 kernel: [   35.560039]  [] ?
btrfs_queue_worker+0x2d0/0x2d0 [btrfs]
Jul 21 15:25:59 os00 kernel: [   35.567768]  []
kthread+0x96/0xa0
Jul 21 15:25:59 os00 kernel: [   35.573275]  []
kernel_thread_helper+0x4/0x10
Jul 21 15:25:59 os00 kernel: [   35.579931]  [] ?
kthread_worker_fn+0x1a0/0x1a0
Jul 21 15:25:59 os00 kernel: [   35.586816]  [] ?
gs_change+0x13/0x13
Jul 21 15:25:59 os00 kernel: [   35.592779] ---[ end trace d87e2733f1e978b8 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/8] btrfs: added helper functions to iterate backrefs

2011-07-21 Thread Jan Schmidt
These helper functions iterate back references and call a function for each
backref. There is also a function to resolve an inode to a path in the
file system.

Signed-off-by: Jan Schmidt 
---
 fs/btrfs/Makefile  |3 +-
 fs/btrfs/backref.c |  748 
 fs/btrfs/backref.h |   62 +
 fs/btrfs/ioctl.h   |   10 +
 4 files changed, 822 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 9b72dcf..c63f649 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -7,4 +7,5 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o 
root-tree.o dir-item.o \
   extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \
   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
   export.o tree-log.o acl.o free-space-cache.o zlib.o lzo.o \
-  compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o
+  compression.o delayed-ref.o relocation.o delayed-inode.o backref.o \
+  scrub.o
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
new file mode 100644
index 000..477f154
--- /dev/null
+++ b/fs/btrfs/backref.c
@@ -0,0 +1,748 @@
+/*
+ * Copyright (C) 2011 STRATO.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include "ctree.h"
+#include "disk-io.h"
+#include "backref.h"
+
+struct __data_ref {
+   struct list_head list;
+   u64 inum;
+   u64 root;
+   u64 extent_data_item_offset;
+};
+
+struct __shared_ref {
+   struct list_head list;
+   u64 disk_byte;
+};
+
+static int __inode_info(u64 inum, u64 ioff, u8 key_type,
+   struct btrfs_root *fs_root, struct btrfs_path *path,
+   struct btrfs_key *found_key)
+{
+   int ret;
+   struct btrfs_key key;
+   struct extent_buffer *eb;
+
+   key.type = key_type;
+   key.objectid = inum;
+   key.offset = ioff;
+
+   ret = btrfs_search_slot(NULL, fs_root, &key, path, 0, 0);
+   if (ret < 0)
+   return ret;
+
+   eb = path->nodes[0];
+   if (ret && path->slots[0] >= btrfs_header_nritems(eb)) {
+   ret = btrfs_next_leaf(fs_root, path);
+   if (ret)
+   return ret;
+   eb = path->nodes[0];
+   }
+
+   btrfs_item_key_to_cpu(eb, found_key, path->slots[0]);
+   if (found_key->type != key.type || found_key->objectid != key.objectid)
+   return 1;
+
+   return 0;
+}
+
+/*
+ * this makes the path point to (inum INODE_ITEM ioff)
+ */
+int inode_item_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
+   struct btrfs_path *path)
+{
+   struct btrfs_key key;
+   return __inode_info(inum, ioff, BTRFS_INODE_ITEM_KEY, fs_root, path,
+   &key);
+}
+
+static int inode_ref_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
+   struct btrfs_path *path, int strict,
+   u64 *out_parent_inum,
+   struct extent_buffer **out_iref_eb,
+   int *out_slot)
+{
+   int ret;
+   struct btrfs_key found_key;
+
+   ret = __inode_info(inum, ioff, BTRFS_INODE_REF_KEY, fs_root, path,
+   &found_key);
+
+   if (!ret) {
+   if (out_slot)
+   *out_slot = path->slots[0];
+   if (out_iref_eb)
+   *out_iref_eb = path->nodes[0];
+   if (out_parent_inum)
+   *out_parent_inum = found_key.offset;
+   }
+
+   btrfs_release_path(path);
+   return ret;
+}
+
+/*
+ * this iterates to turn a btrfs_inode_ref into a full filesystem path. 
elements
+ * of the path are separated by '/' and the path is guaranteed to be
+ * 0-terminated. the path is only given within the current file system.
+ * Therefore, it never starts with a '/'. the caller is responsible to provide
+ * "size" bytes in "dest". the dest buffer will be filled backwards. finally,
+ * the start point of the resulting string is returned. this pointer is within
+ * dest, normally.
+ * in case the path buffer would overflow, the pointer is decremented further
+ * as if output was written to the buffer, though no more output is actually
+ * generated. that way, the caller 

[PATCH v5 6/8] btrfs scrub: use int for mirror_num, not u64

2011-07-21 Thread Jan Schmidt
the rest of the code uses int mirror_num, and so should scrub

Signed-off-by: Jan Schmidt 
---
 fs/btrfs/scrub.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 59caf8f..41a0114 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -65,7 +65,7 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix);
 struct scrub_page {
u64 flags;  /* extent flags */
u64 generation;
-   u64 mirror_num;
+   int mirror_num;
int have_csum;
u8  csum[BTRFS_CSUM_SIZE];
 };
@@ -776,7 +776,7 @@ nomem:
 }
 
 static int scrub_page(struct scrub_dev *sdev, u64 logical, u64 len,
- u64 physical, u64 flags, u64 gen, u64 mirror_num,
+ u64 physical, u64 flags, u64 gen, int mirror_num,
  u8 *csum, int force)
 {
struct scrub_bio *sbio;
@@ -873,7 +873,7 @@ static int scrub_find_csum(struct scrub_dev *sdev, u64 
logical, u64 len,
 
 /* scrub extent tries to collect up to 64 kB for each bio */
 static int scrub_extent(struct scrub_dev *sdev, u64 logical, u64 len,
-   u64 physical, u64 flags, u64 gen, u64 mirror_num)
+   u64 physical, u64 flags, u64 gen, int mirror_num)
 {
int ret;
u8 csum[BTRFS_CSUM_SIZE];
@@ -919,7 +919,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_dev 
*sdev,
u64 physical;
u64 logical;
u64 generation;
-   u64 mirror_num;
+   int mirror_num;
 
u64 increment = map->stripe_len;
u64 offset;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 3/8] btrfs scrub: print paths of corrupted files

2011-07-21 Thread Jan Schmidt
While scrubbing, we may encounter various errors. Previously, a logical
address was printed to the log only. Now, all paths belonging to that
address are resolved and printed separately. That should work for hardlinks
as well as reflinks.

Signed-off-by: Jan Schmidt 
---
 fs/btrfs/scrub.c |  169 --
 1 files changed, 163 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 35099fa..221fd5c 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -17,10 +17,12 @@
  */
 
 #include 
+#include 
 #include "ctree.h"
 #include "volumes.h"
 #include "disk-io.h"
 #include "ordered-data.h"
+#include "backref.h"
 
 /*
  * This is only the first step towards a full-features scrub. It reads all
@@ -100,6 +102,19 @@ struct scrub_dev {
spinlock_t  stat_lock;
 };
 
+struct scrub_warning {
+   struct btrfs_path   *path;
+   u64 extent_item_size;
+   char*scratch_buf;
+   char*msg_buf;
+   const char  *errstr;
+   sector_tsector;
+   u64 logical;
+   struct btrfs_device *dev;
+   int msg_bufsize;
+   int scratch_bufsize;
+};
+
 static void scrub_free_csums(struct scrub_dev *sdev)
 {
while (!list_empty(&sdev->csum_list)) {
@@ -195,6 +210,143 @@ nomem:
return ERR_PTR(-ENOMEM);
 }
 
+static int scrub_print_warning_inode(u64 inum, u64 offset, u64 root, void *ctx)
+{
+   u64 isize;
+   u32 nlink;
+   int ret;
+   int i;
+   struct extent_buffer *eb;
+   struct btrfs_inode_item *inode_item;
+   struct scrub_warning *swarn = ctx;
+   struct btrfs_fs_info *fs_info = swarn->dev->dev_root->fs_info;
+   struct inode_fs_paths *ipath = NULL;
+   struct btrfs_root *local_root;
+   struct btrfs_key root_key;
+
+   root_key.objectid = root;
+   root_key.type = BTRFS_ROOT_ITEM_KEY;
+   root_key.offset = (u64)-1;
+   local_root = btrfs_read_fs_root_no_name(fs_info, &root_key);
+   if (IS_ERR(local_root)) {
+   ret = PTR_ERR(local_root);
+   goto err;
+   }
+
+   ret = inode_item_info(inum, 0, local_root, swarn->path);
+   if (ret) {
+   btrfs_release_path(swarn->path);
+   goto err;
+   }
+
+   eb = swarn->path->nodes[0];
+   inode_item = btrfs_item_ptr(eb, swarn->path->slots[0],
+   struct btrfs_inode_item);
+   isize = btrfs_inode_size(eb, inode_item);
+   nlink = btrfs_inode_nlink(eb, inode_item);
+   btrfs_release_path(swarn->path);
+
+   ipath = init_ipath(4096, local_root, swarn->path);
+   ret = paths_from_inode(inum, ipath);
+
+   if (ret < 0)
+   goto err;
+
+   /*
+* we deliberately ignore the bit ipath might have been too small to
+* hold all of the paths here
+*/
+   for (i = 0; i < ipath->fspath->elem_cnt; ++i)
+   printk(KERN_WARNING "btrfs: %s at logical %llu on dev "
+   "%s, sector %llu, root %llu, inode %llu, offset %llu, "
+   "length %llu, links %u (path: %s)\n", swarn->errstr,
+   swarn->logical, swarn->dev->name,
+   (unsigned long long)swarn->sector, root, inum, offset,
+   min(isize - offset, (u64)PAGE_SIZE), nlink,
+   ipath->fspath->str[i]);
+
+   free_ipath(ipath);
+   return 0;
+
+err:
+   printk(KERN_WARNING "btrfs: %s at logical %llu on dev "
+   "%s, sector %llu, root %llu, inode %llu, offset %llu: path "
+   "resolving failed with ret=%d\n", swarn->errstr,
+   swarn->logical, swarn->dev->name,
+   (unsigned long long)swarn->sector, root, inum, offset, ret);
+
+   free_ipath(ipath);
+   return 0;
+}
+
+static void scrub_print_warning(const char *errstr, struct scrub_bio *sbio,
+   int ix)
+{
+   struct btrfs_device *dev = sbio->sdev->dev;
+   struct btrfs_fs_info *fs_info = dev->dev_root->fs_info;
+   struct btrfs_path *path;
+   struct btrfs_key found_key;
+   struct extent_buffer *eb;
+   struct btrfs_extent_item *ei;
+   struct scrub_warning swarn;
+   u32 item_size;
+   int ret;
+   u64 ref_root;
+   u8 ref_level;
+   unsigned long ptr = 0;
+   const int bufsize = 4096;
+   u64 extent_offset;
+
+   path = btrfs_alloc_path();
+
+   swarn.scratch_buf = kmalloc(bufsize, GFP_NOFS);
+   swarn.msg_buf = kmalloc(bufsize, GFP_NOFS);
+   swarn.sector = (sbio->physical + ix * PAGE_SIZE) >> 9;
+   swarn.logical = sbio->logical + ix * PAGE_SIZE;
+   swarn.errstr = errstr;
+   swarn.dev = dev;
+   swarn.msg_bufsize = bufsize;
+   swarn.scratch_bufsize = bufsize;

[PATCH v5 7/8] btrfs scrub: add fixup code for errors on nodatasum files

2011-07-21 Thread Jan Schmidt
This removes a FIXME comment and introduces the first part of nodatasum
fixup: It gets the corresponding inode for a logical address and triggers a
regular readpage for the corrupted sector.

Once we have on-the-fly error correction our error will be automatically
corrected. The correction code is expected to clear the newly introduced
EXTENT_DAMAGED flag, making scrub report that error as "corrected" instead
of "uncorrectable" eventually.

Signed-off-by: Jan Schmidt 
---
 fs/btrfs/extent_io.h |1 +
 fs/btrfs/scrub.c |  188 --
 2 files changed, 183 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 22bf366..2734fd9 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -17,6 +17,7 @@
 #define EXTENT_NODATASUM (1 << 10)
 #define EXTENT_DO_ACCOUNTING (1 << 11)
 #define EXTENT_FIRST_DELALLOC (1 << 12)
+#define EXTENT_DAMAGED (1 << 13)
 #define EXTENT_IOBITS (EXTENT_LOCKED | EXTENT_WRITEBACK)
 #define EXTENT_CTLBITS (EXTENT_DO_ACCOUNTING | EXTENT_FIRST_DELALLOC)
 
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 41a0114..db09f01 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -22,6 +22,7 @@
 #include "volumes.h"
 #include "disk-io.h"
 #include "ordered-data.h"
+#include "transaction.h"
 #include "backref.h"
 
 /*
@@ -89,6 +90,7 @@ struct scrub_dev {
int first_free;
int curr;
atomic_tin_flight;
+   atomic_tfixup_cnt;
spinlock_t  list_lock;
wait_queue_head_t   list_wait;
u16 csum_size;
@@ -102,6 +104,14 @@ struct scrub_dev {
spinlock_t  stat_lock;
 };
 
+struct scrub_fixup_nodatasum {
+   struct scrub_dev*sdev;
+   u64 logical;
+   struct btrfs_root   *root;
+   struct btrfs_work   work;
+   int mirror_num;
+};
+
 struct scrub_warning {
struct btrfs_path   *path;
u64 extent_item_size;
@@ -190,12 +200,13 @@ struct scrub_dev *scrub_setup_dev(struct btrfs_device 
*dev)
 
if (i != SCRUB_BIOS_PER_DEV-1)
sdev->bios[i]->next_free = i + 1;
-else
+   else
sdev->bios[i]->next_free = -1;
}
sdev->first_free = 0;
sdev->curr = -1;
atomic_set(&sdev->in_flight, 0);
+   atomic_set(&sdev->fixup_cnt, 0);
atomic_set(&sdev->cancel_req, 0);
sdev->csum_size = btrfs_super_csum_size(&fs_info->super_copy);
INIT_LIST_HEAD(&sdev->csum_list);
@@ -347,6 +358,151 @@ out:
kfree(swarn.msg_buf);
 }
 
+static int scrub_fixup_readpage(u64 inum, u64 offset, u64 root, void *ctx)
+{
+   struct page *page;
+   unsigned long index;
+   struct scrub_fixup_nodatasum *fixup = ctx;
+   int ret;
+   int corrected;
+   struct btrfs_key key;
+   struct inode *inode;
+   u64 end = offset + PAGE_SIZE - 1;
+   struct btrfs_root *local_root;
+
+   key.objectid = root;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+   key.offset = (u64)-1;
+   local_root = btrfs_read_fs_root_no_name(fixup->root->fs_info, &key);
+   if (IS_ERR(local_root))
+   return PTR_ERR(local_root);
+
+   key.type = BTRFS_INODE_ITEM_KEY;
+   key.objectid = inum;
+   key.offset = 0;
+   inode = btrfs_iget(fixup->root->fs_info->sb, &key, local_root, NULL);
+   if (IS_ERR(inode))
+   return PTR_ERR(inode);
+
+   ret = set_extent_bit(&BTRFS_I(inode)->io_tree, offset, end,
+   EXTENT_DAMAGED, 0, NULL, NULL, GFP_NOFS);
+
+   /* set_extent_bit should either succeed or give proper error */
+   WARN_ON(ret > 0);
+   if (ret)
+   return ret < 0 ? ret : -EFAULT;
+
+   index = offset >> PAGE_CACHE_SHIFT;
+
+   page = find_or_create_page(inode->i_mapping, index, GFP_NOFS);
+   if (!page)
+   return -ENOMEM;
+
+   ret = extent_read_full_page(&BTRFS_I(inode)->io_tree, page,
+   btrfs_get_extent, fixup->mirror_num);
+   wait_on_page_locked(page);
+   corrected = !test_range_bit(&BTRFS_I(inode)->io_tree, offset, end,
+   EXTENT_DAMAGED, 0, NULL);
+
+   if (corrected)
+   WARN_ON(!PageUptodate(page));
+   else
+   clear_extent_bit(&BTRFS_I(inode)->io_tree, offset, end,
+   EXTENT_DAMAGED, 0, 0, NULL, GFP_NOFS);
+
+   put_page(page);
+   iput(inode);
+
+   if (ret < 0)
+   return ret;
+
+   if (ret == 0 && corrected) {
+   /*
+* we only need to call readpage for one of the inodes belonging
+* to this extent. so make iterate_extent_inodes stop
+

[PATCH v5 4/8] btrfs scrub: bugfix: mirror_num off by one

2011-07-21 Thread Jan Schmidt
Fix the mirror_num determination in scrub_stripe. The rest of the scrub code
did not use mirror_num for anything important and that error went unnoticed.
The nodatasum fixup patch of this set depends on a correct mirror_num.

Signed-off-by: Jan Schmidt 
---
 fs/btrfs/scrub.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 221fd5c..59caf8f 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -452,7 +452,7 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix)
 * first find a good copy
 */
for (i = 0; i < multi->num_stripes; ++i) {
-   if (i == sbio->spag[ix].mirror_num)
+   if (i + 1 == sbio->spag[ix].mirror_num)
continue;
 
if (scrub_fixup_io(READ, multi->stripes[i].dev->bdev,
@@ -930,21 +930,21 @@ static noinline_for_stack int scrub_stripe(struct 
scrub_dev *sdev,
if (map->type & BTRFS_BLOCK_GROUP_RAID0) {
offset = map->stripe_len * num;
increment = map->stripe_len * map->num_stripes;
-   mirror_num = 0;
+   mirror_num = 1;
} else if (map->type & BTRFS_BLOCK_GROUP_RAID10) {
int factor = map->num_stripes / map->sub_stripes;
offset = map->stripe_len * (num / map->sub_stripes);
increment = map->stripe_len * factor;
-   mirror_num = num % map->sub_stripes;
+   mirror_num = num % map->sub_stripes + 1;
} else if (map->type & BTRFS_BLOCK_GROUP_RAID1) {
increment = map->stripe_len;
-   mirror_num = num % map->num_stripes;
+   mirror_num = num % map->num_stripes + 1;
} else if (map->type & BTRFS_BLOCK_GROUP_DUP) {
increment = map->stripe_len;
-   mirror_num = num % map->num_stripes;
+   mirror_num = num % map->num_stripes + 1;
} else {
increment = map->stripe_len;
-   mirror_num = 0;
+   mirror_num = 1;
}
 
path = btrfs_alloc_path();
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 8/8] btrfs: new ioctls to do logical->inode and inode->path resolving

2011-07-21 Thread Jan Schmidt
these ioctls make use of the new functions initially added for scrub. they
return all inodes belonging to a logical address (BTRFS_IOC_LOGICAL_INO) and
all paths belonging to an inode (BTRFS_IOC_INO_PATHS).

Signed-off-by: Jan Schmidt 
---
 fs/btrfs/ioctl.c |  134 ++
 fs/btrfs/ioctl.h |   19 
 2 files changed, 153 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a3c4751..5299b40 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -51,6 +51,7 @@
 #include "volumes.h"
 #include "locking.h"
 #include "inode-map.h"
+#include "backref.h"
 
 /* Mask out flags that are inappropriate for the given type of inode. */
 static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
@@ -2836,6 +2837,135 @@ static long btrfs_ioctl_scrub_progress(struct 
btrfs_root *root,
return ret;
 }
 
+static long btrfs_ioctl_ino_to_path(struct btrfs_root *root, void __user *arg)
+{
+   int ret = 0;
+   int i;
+   unsigned long rel_ptr;
+   int size;
+   struct btrfs_ioctl_ino_path_args *ipa;
+   struct inode_fs_paths *ipath = NULL;
+   struct btrfs_path *path;
+
+   path = btrfs_alloc_path();
+   if (!path) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   ipa = memdup_user(arg, sizeof(*ipa));
+   if (IS_ERR(ipa)) {
+   ret = PTR_ERR(ipa);
+   ipa = NULL;
+   goto out;
+   }
+
+   size = min(ipa->size, 4096);
+   ipath = init_ipath(size, root, path);
+   if (IS_ERR(ipath)) {
+   ret = PTR_ERR(ipath);
+   ipath = NULL;
+   goto out;
+   }
+
+   ret = paths_from_inode(ipa->inum, ipath);
+   if (ret < 0)
+   goto out;
+
+   for (i = 0; i < ipath->fspath->elem_cnt; ++i) {
+   rel_ptr = ipath->fspath->str[i] - (char *)ipath->fspath->str;
+   ipath->fspath->str[i] = (void *)rel_ptr;
+   }
+
+   ret = copy_to_user(ipa->fspath, ipath->fspath, size);
+   if (ret) {
+   ret = -EFAULT;
+   goto out;
+   }
+
+out:
+   btrfs_free_path(path);
+   free_ipath(ipath);
+   kfree(ipa);
+
+   return ret;
+}
+
+static int build_ino_list(u64 inum, u64 offset, u64 root, void *ctx)
+{
+   struct btrfs_data_container *inodes = ctx;
+
+   inodes->size -= 3 * sizeof(u64);
+   if (inodes->size > 0) {
+   inodes->val[inodes->elem_cnt] = inum;
+   inodes->val[inodes->elem_cnt + 1] = offset;
+   inodes->val[inodes->elem_cnt + 2] = root;
+   inodes->elem_cnt += 3;
+   } else {
+   inodes->elem_missed += 3;
+   }
+
+   return 0;
+}
+
+static long btrfs_ioctl_logical_to_ino(struct btrfs_root *root,
+   void __user *arg)
+{
+   int ret = 0;
+   int size;
+   u64 extent_offset;
+   struct btrfs_ioctl_logical_ino_args *loi;
+   struct btrfs_data_container *inodes = NULL;
+   struct btrfs_path *path = NULL;
+   struct btrfs_key key;
+
+   loi = memdup_user(arg, sizeof(*loi));
+   if (IS_ERR(loi)) {
+   ret = PTR_ERR(loi);
+   loi = NULL;
+   goto out;
+   }
+
+   path = btrfs_alloc_path();
+   if (!path) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   size = min(loi->size, 4096);
+   inodes = init_data_container(size);
+   if (IS_ERR(inodes)) {
+   ret = PTR_ERR(inodes);
+   inodes = NULL;
+   goto out;
+   }
+
+   ret = extent_from_logical(root->fs_info, loi->logical, path, &key);
+
+   if (ret & BTRFS_EXTENT_FLAG_TREE_BLOCK)
+   ret = -ENOENT;
+   if (ret < 0)
+   goto out;
+
+   extent_offset = loi->logical - key.objectid;
+   ret = iterate_extent_inodes(root->fs_info, path, key.objectid,
+   extent_offset, build_ino_list, inodes);
+
+   if (ret < 0)
+   goto out;
+
+   ret = copy_to_user(loi->inodes, inodes, size);
+   if (ret)
+   ret = -EFAULT;
+
+out:
+   btrfs_free_path(path);
+   kfree(inodes);
+   kfree(loi);
+
+   return ret;
+}
+
 long btrfs_ioctl(struct file *file, unsigned int
cmd, unsigned long arg)
 {
@@ -2893,6 +3023,10 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_tree_search(file, argp);
case BTRFS_IOC_INO_LOOKUP:
return btrfs_ioctl_ino_lookup(file, argp);
+   case BTRFS_IOC_INO_PATHS:
+   return btrfs_ioctl_ino_to_path(root, argp);
+   case BTRFS_IOC_LOGICAL_INO:
+   return btrfs_ioctl_logical_to_ino(root, argp);
case BTRFS_IOC_SPACE_INFO:
return btrfs_ioctl_space_info(root, argp);
case BTRFS_IOC_SYNC:
diff --git a/fs/btrfs/ioctl.h b

[PATCH v5 5/8] btrfs: add mirror_num to extent_read_full_page

2011-07-21 Thread Jan Schmidt
Currently, extent_read_full_page always assumes we are trying to read mirror
0, which generally is the best we can do. To add flexibility, pass it as a
parameter. This will be needed by scrub fixup code.

Signed-off-by: Jan Schmidt 
---
 fs/btrfs/disk-io.c   |2 +-
 fs/btrfs/extent_io.c |6 +++---
 fs/btrfs/extent_io.h |2 +-
 fs/btrfs/inode.c |2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1ac8db5d..b898319 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -874,7 +874,7 @@ static int btree_readpage(struct file *file, struct page 
*page)
 {
struct extent_io_tree *tree;
tree = &BTRFS_I(page->mapping->host)->io_tree;
-   return extent_read_full_page(tree, page, btree_get_extent);
+   return extent_read_full_page(tree, page, btree_get_extent, 0);
 }
 
 static int btree_releasepage(struct page *page, gfp_t gfp_flags)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b181a94..b78f665 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2111,16 +2111,16 @@ static int __extent_read_full_page(struct 
extent_io_tree *tree,
 }
 
 int extent_read_full_page(struct extent_io_tree *tree, struct page *page,
-   get_extent_t *get_extent)
+   get_extent_t *get_extent, int mirror_num)
 {
struct bio *bio = NULL;
unsigned long bio_flags = 0;
int ret;
 
-   ret = __extent_read_full_page(tree, page, get_extent, &bio, 0,
+   ret = __extent_read_full_page(tree, page, get_extent, &bio, mirror_num,
  &bio_flags);
if (bio)
-   ret = submit_one_bio(READ, bio, 0, bio_flags);
+   ret = submit_one_bio(READ, bio, mirror_num, bio_flags);
return ret;
 }
 
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index a11a92e..22bf366 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -177,7 +177,7 @@ int unlock_extent_cached(struct extent_io_tree *tree, u64 
start, u64 end,
 int try_lock_extent(struct extent_io_tree *tree, u64 start, u64 end,
gfp_t mask);
 int extent_read_full_page(struct extent_io_tree *tree, struct page *page,
- get_extent_t *get_extent);
+ get_extent_t *get_extent, int mirror_num);
 int __init extent_io_init(void);
 void extent_io_exit(void);
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4a13730..730ee3d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6250,7 +6250,7 @@ int btrfs_readpage(struct file *file, struct page *page)
 {
struct extent_io_tree *tree;
tree = &BTRFS_I(page->mapping->host)->io_tree;
-   return extent_read_full_page(tree, page, btrfs_get_extent);
+   return extent_read_full_page(tree, page, btrfs_get_extent, 0);
 }
 
 static int btrfs_writepage(struct page *page, struct writeback_control *wbc)
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 0/8] Btrfs scrub: print path to corrupted files and trigger nodatasum fixup

2011-07-21 Thread Jan Schmidt
While testing raid-auto-repair patches I'm going to send out later, I just found
the very last bug in my current scrub patch series:

Changelog v4->v5:
- fixed a deadlock when fixup is taking longer while scrub is about to end

Original message follows:

This patch set introduces two new features for scrub. They share the backref
iteration code which is the reason they made it into the same patch set.

The first feature adds printk statements in case scrub finds an error which list
all affected files. You will need patch 1, 2 and 3 for that.

The second feature adds the trigger which enables us to correct i/o errors in
case the affected extent does not have a checksum (nodatasum), eventually. You
will need patch 1, 4, 5 and 6 for that.

I tried to apply all patches to the current cmason/for-linus branch and to
Arne's current for-chris branch. They do apply with no errors (some offsets
possible).

The new ioctl()s can be tested from usermode by applying the patch series
[PATCH v2 0/3] Btrfs-progs: add the first "inspect-internal" commands
from this mailing list to the user land tools.

Please review.

Next I'm starting to make up my mind how to implement on-the-fly error
correction correctly. This will enable us to rewrite good data whenever we
encounter a bad copy. I have some preliminary patches already, the stress in the
first sentence is on "correctly". The second feature mentioned in this patch
series will then automatically use that code, too.

Changelog v1->v2:
- Various cleanup, sensible error codes as suggested by David Sterba

Changelog v2->v3:
- evaluation and iteration of shared refs
- support for in-tree refs (v2 iterated inline refs only)
- never call an interator function without releasing the path
- iterate_irefs now returns -ENOENT in case no refs are found
- some stupid bugs removed where release_path was called too early
- ioctls added to provide new functions to user mode
- bugfixes for cases where search_slot found the very end of a leaf
- bugfix: use right fs root for readpage instead of fs_root->fs_info
- based on current cmason/for-linus

Changelog v3->v4:
- fixed a regression with mirror_num that could prevent error correction
- based on current cmason/for-linus

Please try it and report errors (or confirm there are none, of course). I can
provide a place to pull from if anyone likes.

-Jan

Jan Schmidt (8):
  btrfs: added helper functions to iterate backrefs
  btrfs scrub: added unverified_errors
  btrfs scrub: print paths of corrupted files
  btrfs scrub: bugfix: mirror_num off by one
  btrfs: add mirror_num to extent_read_full_page
  btrfs scrub: use int for mirror_num, not u64
  btrfs scrub: add fixup code for errors on nodatasum files
  btrfs: new ioctls to do logical->inode and inode->path resolving

 fs/btrfs/Makefile|3 +-
 fs/btrfs/backref.c   |  748 ++
 fs/btrfs/backref.h   |   62 +
 fs/btrfs/disk-io.c   |2 +-
 fs/btrfs/extent_io.c |6 +-
 fs/btrfs/extent_io.h |3 +-
 fs/btrfs/inode.c |2 +-
 fs/btrfs/ioctl.c |  134 +
 fs/btrfs/ioctl.h |   29 ++
 fs/btrfs/scrub.c |  414 +---
 10 files changed, 1363 insertions(+), 40 deletions(-)
 create mode 100644 fs/btrfs/backref.c
 create mode 100644 fs/btrfs/backref.h

-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 2/8] btrfs scrub: added unverified_errors

2011-07-21 Thread Jan Schmidt
In normal operation, scrub is reading data sequentially in large portions.
In case of an i/o error, we try to find the corrupted area(s) by issuing
page sized read requests. With this commit we increment the
unverified_errors counter if all of the small size requests succeed.

Userland patches carrying such conspicous events to the administrator should
already be around.

Signed-off-by: Jan Schmidt 
---
 fs/btrfs/scrub.c |   37 ++---
 1 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index a8d03d5..35099fa 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -201,18 +201,25 @@ nomem:
  * recheck_error gets called for every page in the bio, even though only
  * one may be bad
  */
-static void scrub_recheck_error(struct scrub_bio *sbio, int ix)
+static int scrub_recheck_error(struct scrub_bio *sbio, int ix)
 {
+   struct scrub_dev *sdev = sbio->sdev;
+   u64 sector = (sbio->physical + ix * PAGE_SIZE) >> 9;
+
if (sbio->err) {
-   if (scrub_fixup_io(READ, sbio->sdev->dev->bdev,
-  (sbio->physical + ix * PAGE_SIZE) >> 9,
+   if (scrub_fixup_io(READ, sbio->sdev->dev->bdev, sector,
   sbio->bio->bi_io_vec[ix].bv_page) == 0) {
if (scrub_fixup_check(sbio, ix) == 0)
-   return;
+   return 0;
}
}
 
+   spin_lock(&sdev->stat_lock);
+   ++sdev->stat.read_errors;
+   spin_unlock(&sdev->stat_lock);
+
scrub_fixup(sbio, ix);
+   return 1;
 }
 
 static int scrub_fixup_check(struct scrub_bio *sbio, int ix)
@@ -382,8 +389,14 @@ static void scrub_checksum(struct btrfs_work *work)
int ret;
 
if (sbio->err) {
+   ret = 0;
for (i = 0; i < sbio->count; ++i)
-   scrub_recheck_error(sbio, i);
+   ret |= scrub_recheck_error(sbio, i);
+   if (!ret) {
+   spin_lock(&sdev->stat_lock);
+   ++sdev->stat.unverified_errors;
+   spin_unlock(&sdev->stat_lock);
+   }
 
sbio->bio->bi_flags &= ~(BIO_POOL_MASK - 1);
sbio->bio->bi_flags |= 1 << BIO_UPTODATE;
@@ -396,10 +409,6 @@ static void scrub_checksum(struct btrfs_work *work)
bi->bv_offset = 0;
bi->bv_len = PAGE_SIZE;
}
-
-   spin_lock(&sdev->stat_lock);
-   ++sdev->stat.read_errors;
-   spin_unlock(&sdev->stat_lock);
goto out;
}
for (i = 0; i < sbio->count; ++i) {
@@ -420,8 +429,14 @@ static void scrub_checksum(struct btrfs_work *work)
WARN_ON(1);
}
kunmap_atomic(buffer, KM_USER0);
-   if (ret)
-   scrub_recheck_error(sbio, i);
+   if (ret) {
+   ret = scrub_recheck_error(sbio, i);
+   if (!ret) {
+   spin_lock(&sdev->stat_lock);
+   ++sdev->stat.unverified_errors;
+   spin_unlock(&sdev->stat_lock);
+   }
+   }
}
 
 out:
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BUG] Chunk allocation fails when the system meta-data block group is full

2011-07-21 Thread Miao Xie
Hi, Everyone

I found there is an bug in the code of the chunk allocation by reading
the code, That is:

  If we allocate lots of the meta-data chunks or data chunks, and make
  the system meta-data block group be full, then we can not allocate
  any chunk for ever, even though there is lots of free disk space.

It is because Btrfs do not allocate any new system meta-data chunk when
the old block group is full, and then we have no system meta-data space
to store the new meta-data chunk information.

This bug is hard to be triggered in the normal way, because we need
lots of disk space to allocate new meta-data chunks, and fill the
system meta-data block group. So I used a tricky method to triggered
this bug:
1. modify the source of Btrfs to exclude most free space of the system
   meta-data block group, and change the max size of the deta chunk,
   by this way, we can allocate lots of the chunks and fill the system
   meta-data block group easily. (See the attached patch)
2. create a new Btrfs filesystem. (Data profile: single)
3. mount the new filesystem.
4. create a large file
(Oops happened)
[ cut here ]
kernel BUG at fs/btrfs/volumes.c:2602!
[SNIP]
Call Trace:
 [] btrfs_alloc_chunk+0x71/0x84 [btrfs]
 [] do_chunk_alloc+0x28e/0x2f3 [btrfs]
 [] btrfs_reserve_extent+0xfb/0x1c2 [btrfs]
 [] cow_file_range+0x1c0/0x32b [btrfs]
 [] run_delalloc_range+0xb7/0x33f [btrfs]
 [] __extent_writepage+0x1c1/0x5d0 [btrfs]
 [] ? clear_extent_buffer_uptodate+0x85/0x85 [btrfs]
 [] extent_write_cache_pages.clone.0+0x176/0x2ad [btrfs]
 [] extent_writepages+0x3e/0x53 [btrfs]
 [] ? uncompress_inline+0x122/0x122 [btrfs]
 [] btrfs_writepages+0x22/0x24 [btrfs]
 [] do_writepages+0x1c/0x28
 [] writeback_single_inode+0xc2/0x1c3
 [] writeback_sb_inodes+0xcc/0x15a
 [] writeback_inodes_wb+0x10a/0x11c
 [] balance_dirty_pages_ratelimited_nr+0x2f9/0x3fd
 [] __btrfs_buffered_write+0x298/0x315 [btrfs]
 [] ? file_update_time+0xf2/0x10c
 [] btrfs_file_aio_write+0x3c7/0x47e [btrfs]
 [] do_sync_write+0xc6/0x103
 [] ? security_file_permission+0x29/0x2e
 [] vfs_write+0xa9/0x105
 [] sys_write+0x45/0x6c
 [] system_call_fastpath+0x16/0x1b
[SNIP] 
RIP  [] __finish_chunk_alloc+0x176/0x1f8 [btrfs]
 RSP 
---[ end trace 5a55cd7f2763cc4c ]---

If my analysis is right, and this bug actually exists, I think we can fix this 
bug by
splitting the chunk allocation to two steps:

  1. do chunk allocation and in-memory information update
  2. update the meta-data and the system meta-data according to all the new 
chunks
 allocated at the 1st step.

And we also split the 1st step to 3 sub-steps:

  1. If we want to allocate a system meta-data chunk, or the free space of old
 system meta-data block group is not enough though we don't want to allocate
 a system meta-data chunk, we allocate a new system meta-data chunk and 
update
 the system meta-data space information in the memory.
  2. If we want to allocate a meta-data chunk, or the free space of old 
meta-data
 block group is not enough though we don't want to allocate a meta-data 
chunk,
 we allocate a new meta-data chunk and update the meta-data space 
information
 in the memory.
  3. If we want to allocate a data chunk, we allocate a new data chunk.

Does anyone have other good idea to fix it?

Thanks
Miao

(The patch that make the bug be triggered easily)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1860fa8..8d4ab87 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7012,6 +7012,12 @@ int btrfs_read_block_groups(struct btrfs_root *root)
 */
exclude_super_stripes(root, cache);
 
+   if (cache->flags & BTRFS_BLOCK_GROUP_SYSTEM) {
+   ret = add_excluded_extent(root, cache->key.objectid,
+ cache->key.offset - 4096);
+   BUG_ON(ret);
+   }
+
/*
 * check for two cases, either we are full, and therefore
 * don't need to bother with the caching work since we won't
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 19450bc..96c0c5e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2357,8 +2357,10 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle 
*trans,
}
 
if (type & BTRFS_BLOCK_GROUP_DATA) {
-   max_stripe_size = 1024 * 1024 * 1024;
-   max_chunk_size = 10 * max_stripe_size;
+// max_stripe_size = 1024 * 1024 * 1024;
+// max_chunk_size = 10 * max_stripe_size;
+   max_stripe_size = 64 * 1024 * 1024;
+   max_chunk_size = 2 * max_stripe_size;
} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
max_stripe_size = 256 * 1024 * 1024;
max_chunk_size = max_stripe_size;

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org