Re: please review snapshot corruption path with delayed metadata insertion

2011-07-07 Thread Tsutomu Itoh
Hi, Chris,

(2011/07/08 5:26), Chris Mason wrote:
> Excerpts from Tsutomu Itoh's message of 2011-07-01 04:11:28 -0400:
>> Hi, Miao,
>>
>> (2011/06/30 15:32), Miao Xie wrote:
>>> Hi, Itoh-san
>>>
>>> Could you test the following patch to check whether it can fix the bug or 
>>> not?
>>> I have tested it on my x86_64 machine by your test script for two days, it 
>>> worked well.
>>
>> I ran my test script about a day, I was not able to reproduce this BUG.
> 
> Can you please try this patch with the inode_cache option (in addition
> to Miao's code).

Unfortunately, I encountered following panic.

=

btrfs: relocating block group 17746100224 flags 20
btrfs: relocating block group 12377391104 flags 9
btrfs: found 4181 extents
[ cut here ]
kernel BUG at fs/btrfs/relocation.c:2502!
invalid opcode:  [#1] SMP
last sysfs file: /sys/kernel/mm/ksm/run
CPU 0
Modules linked in: btrfs zlib_deflate crc32c libcrc32c autofs4 sunrpc 8021q 
garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 ext3 jbd 
dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc parport sg 
pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp pci_hotplug 
i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom 
megaraid_sas pata_acpi ata_generic ata_piix libata scsi_mod floppy [last 
unloaded: microcode]

Pid: 26214, comm: btrfs Not tainted 2.6.39btrfs-test5+ #2 FUJITSU-SV  
PRIMERGY/D2399
RIP: 0010:[]  [] do_relocation+0x562/0x590 
[btrfs]
RSP: 0018:8801622519a8  EFLAGS: 00010202
RAX: 0001 RBX: 8800d2754140 RCX: 0001
RDX:  RSI: 8800 RDI: 
RBP: 880162251a78 R08:  R09: 02e9
R10:  R11: 0026 R12: 880161f2fb40
R13: 8800cd81eac0 R14: 880080038000 R15: 
FS:  7f4081d05740() GS:88019fc0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 0033cfea6a60 CR3: 00015d345000 CR4: 06f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process btrfs (pid: 26214, threadinfo 88016225, task 880161c3eab0)
Stack:
 880191f006d0 8800cd81eac0 880191f005b0 8800cd81eb00
 88016225b000 880079e0 000162251a48 88016225
 880162251a78 880193a26930 000100251a78 880193a26930
Call Trace:
 [] ? block_rsv_add_bytes+0x2b/0x70 [btrfs]
 [] relocate_tree_blocks+0x62b/0x6e0 [btrfs]
 [] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
 [] ? add_data_references+0x263/0x280 [btrfs]
 [] relocate_block_group+0x272/0x620 [btrfs]
 [] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs]
 [] ? btrfs_tree_unlock+0x50/0x50 [btrfs]
 [] btrfs_relocate_chunk+0x8b/0x670 [btrfs]
 [] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs]
 [] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
 [] ? btrfs_previous_item+0xb1/0x150 [btrfs]
 [] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
 [] btrfs_balance+0x21a/0x2a0 [btrfs]
 [] ? path_openat+0x101/0x3d0
 [] btrfs_ioctl+0x798/0xd20 [btrfs]
 [] ? handle_mm_fault+0x148/0x270
 [] ? do_page_fault+0x1d8/0x4b0
 [] do_vfs_ioctl+0x9a/0x540
 [] sys_ioctl+0xa1/0xb0
 [] system_call_fastpath+0x16/0x1b
Code: 0f 0b 0f 1f 80 00 00 00 00 eb f7 0f 0b eb fe 0f 0b 0f 1f 84 00 00 00 00 
00 eb f6 0f 0b eb fe 0f 0b 0f 1f 84 00 00 00 00 00 eb f6 <0f> 0b eb fe 48 83 7a 
68 00 0f 1f 44 00 00 0f 84 d2 fa ff ff 0f
RIP  [] do_relocation+0x562/0x590 [btrfs]
 RSP 

(gdb) l *do_relocation+0x562
0x6f922 is in do_relocation (fs/btrfs/relocation.c:2502).
2497ret = btrfs_search_slot(trans, root, key, path, 
0, 1);
2498if (ret < 0) {
2499err = ret;
2500break;
2501}
2502BUG_ON(ret > 0);
2503
2504if (!upper->eb) {
2505upper->eb = path->nodes[upper->level];
2506path->nodes[upper->level] = NULL;
(gdb)

> 
> commit d0243d46f7a1e4cd57c74fa14556be65b454687d
> Author: Chris Mason 
> Date:   Thu Jul 7 15:53:12 2011 -0400
> 
> Btrfs: write out free inode cache before taking snapshots
> 
> The btrfs snapshotting code requires that once a root has been
> snapshotted, we don't change it during a commit
> 
> But the free inode cache was changing the roots when it root the cache,
> which lead to corruptions.
> 
> This fixes things by making sure we write the cache while we are taking
> the snapshot, and that we don't write it again later.
> 
> Signed-off-by: Chris Mason 
> 
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index bf0d615..d594cf7 100644
> -

Re: please review snapshot corruption path with delayed metadata insertion

2011-07-07 Thread Chris Mason
Excerpts from Tsutomu Itoh's message of 2011-07-07 19:51:09 -0400:
> Hi, Chris,
> 
> (2011/07/08 5:26), Chris Mason wrote:
> > Excerpts from Tsutomu Itoh's message of 2011-07-01 04:11:28 -0400:
> >> Hi, Miao,
> >>
> >> (2011/06/30 15:32), Miao Xie wrote:
> >>> Hi, Itoh-san
> >>>
> >>> Could you test the following patch to check whether it can fix the bug or 
> >>> not?
> >>> I have tested it on my x86_64 machine by your test script for two days, 
> >>> it worked well.
> >>
> >> I ran my test script about a day, I was not able to reproduce this BUG.
> > 
> > Can you please try this patch with the inode_cache option (in addition
> > to Miao's code).
> 
> In my clarification.
> 
> I do only have to apply this patch to 'btrfs-unstable + (current)for-linus'?
> or, other patches also necessary?
> 

Hi, sorry that I wasn't clear.  You can apply it to the current
for-linus branch, which has Miao's fix to keep from doing delayed
metadata updates on the relocation inode.

-chris

> Thanks,
> Tsutomu
> 
> > 
> > commit d0243d46f7a1e4cd57c74fa14556be65b454687d
> > Author: Chris Mason 
> > Date:   Thu Jul 7 15:53:12 2011 -0400
> > 
> > Btrfs: write out free inode cache before taking snapshots
> > 
> > The btrfs snapshotting code requires that once a root has been
> > snapshotted, we don't change it during a commit
> > 
> > But the free inode cache was changing the roots when it root the cache,
> > which lead to corruptions.
> > 
> > This fixes things by making sure we write the cache while we are taking
> > the snapshot, and that we don't write it again later.
> > 
> > Signed-off-by: Chris Mason 
> > 
> > diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> > index bf0d615..d594cf7 100644
> > --- a/fs/btrfs/free-space-cache.c
> > +++ b/fs/btrfs/free-space-cache.c
> > @@ -1651,6 +1651,7 @@ int __btrfs_add_free_space(struct 
> > btrfs_free_space_ctl *ctl,
> >  info->bytes = bytes;
> >  
> >  spin_lock(&ctl->tree_lock);
> > +ctl->dirty = 1;
> >  
> >  if (try_merge_free_space(ctl, info, true))
> >  goto link;
> > @@ -1691,6 +1692,7 @@ int btrfs_remove_free_space(struct 
> > btrfs_block_group_cache *block_group,
> >  int ret = 0;
> >  
> >  spin_lock(&ctl->tree_lock);
> > +ctl->dirty = 1;
> >  
> >  again:
> >  info = tree_search_offset(ctl, offset, 0, 0);
> > @@ -2589,6 +2591,7 @@ u64 btrfs_find_ino_for_alloc(struct btrfs_root 
> > *fs_root)
> >  if (entry->bytes == 0)
> >  free_bitmap(ctl, entry);
> >  }
> > +ctl->dirty = 1;
> >  out:
> >  spin_unlock(&ctl->tree_lock);
> >  
> > @@ -2688,6 +2691,10 @@ int btrfs_write_out_ino_cache(struct btrfs_root 
> > *root,
> >  printk(KERN_ERR "btrfs: failed to write free ino cache "
> > "for root %llu\n", root->root_key.objectid);
> >  
> > +/* we write out at transaction commit time, there's no racing. */
> > +if (ret == 0)
> > +ctl->dirty = 0;
> > +
> >  iput(inode);
> >  return ret;
> >  }
> > diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
> > index 8f2613f..1e92c93 100644
> > --- a/fs/btrfs/free-space-cache.h
> > +++ b/fs/btrfs/free-space-cache.h
> > @@ -35,6 +35,11 @@ struct btrfs_free_space_ctl {
> >  int free_extents;
> >  int total_bitmaps;
> >  int unit;
> > +/*
> > + * record if we've changed since written.  This can turn
> > + * into a bit field if we need more flags
> > + */
> > +unsigned long dirty;
> >  u64 start;
> >  struct btrfs_free_space_op *op;
> >  void *private;
> > diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
> > index b4087e0..e7c1493 100644
> > --- a/fs/btrfs/inode-map.c
> > +++ b/fs/btrfs/inode-map.c
> > @@ -376,6 +376,7 @@ void btrfs_init_free_ino_ctl(struct btrfs_root *root)
> >  ctl->start = 0;
> >  ctl->private = NULL;
> >  ctl->op = &free_ino_op;
> > +ctl->dirty = 1;
> >  
> >  /*
> >   * Initially we allow to use 16K of ram to cache chunks of
> > @@ -417,6 +418,9 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
> >  if (!btrfs_test_opt(root, INODE_MAP_CACHE))
> >  return 0;
> >  
> > +if (!ctl->dirty)
> > +return 0;
> > +
> >  path = btrfs_alloc_path();
> >  if (!path)
> >  return -ENOMEM;
> > @@ -485,6 +489,24 @@ out:
> >  return ret;
> >  }
> >  
> > +/*
> > + * this tries to save the cache, but if it fails for any reason we clear
> > + * the dirty flag so that it won't be saved again during this commit.
> > + *
> > + * This is used by the snapshotting code to make sure we don't corrupt the
> > + * FS by saving the inode cache after the snapshot is taken.
> > + */
> > +int btrfs_force_save_ino_cache(struct btrfs_root *root,
> > +   struct btrfs_trans_handle *trans)
> > +{
> > +struct btrfs_free_space_ctl *ctl = root->free_ino_ctl;
> > +int ret;
> > +ret = btrfs_save_ino_cache(root, trans)

Re: please review snapshot corruption path with delayed metadata insertion

2011-07-07 Thread Tsutomu Itoh
Hi, Chris,

(2011/07/08 5:26), Chris Mason wrote:
> Excerpts from Tsutomu Itoh's message of 2011-07-01 04:11:28 -0400:
>> Hi, Miao,
>>
>> (2011/06/30 15:32), Miao Xie wrote:
>>> Hi, Itoh-san
>>>
>>> Could you test the following patch to check whether it can fix the bug or 
>>> not?
>>> I have tested it on my x86_64 machine by your test script for two days, it 
>>> worked well.
>>
>> I ran my test script about a day, I was not able to reproduce this BUG.
> 
> Can you please try this patch with the inode_cache option (in addition
> to Miao's code).

In my clarification.

I do only have to apply this patch to 'btrfs-unstable + (current)for-linus'?
or, other patches also necessary?

Thanks,
Tsutomu

> 
> commit d0243d46f7a1e4cd57c74fa14556be65b454687d
> Author: Chris Mason 
> Date:   Thu Jul 7 15:53:12 2011 -0400
> 
> Btrfs: write out free inode cache before taking snapshots
> 
> The btrfs snapshotting code requires that once a root has been
> snapshotted, we don't change it during a commit
> 
> But the free inode cache was changing the roots when it root the cache,
> which lead to corruptions.
> 
> This fixes things by making sure we write the cache while we are taking
> the snapshot, and that we don't write it again later.
> 
> Signed-off-by: Chris Mason 
> 
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index bf0d615..d594cf7 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -1651,6 +1651,7 @@ int __btrfs_add_free_space(struct btrfs_free_space_ctl 
> *ctl,
>   info->bytes = bytes;
>  
>   spin_lock(&ctl->tree_lock);
> + ctl->dirty = 1;
>  
>   if (try_merge_free_space(ctl, info, true))
>   goto link;
> @@ -1691,6 +1692,7 @@ int btrfs_remove_free_space(struct 
> btrfs_block_group_cache *block_group,
>   int ret = 0;
>  
>   spin_lock(&ctl->tree_lock);
> + ctl->dirty = 1;
>  
>  again:
>   info = tree_search_offset(ctl, offset, 0, 0);
> @@ -2589,6 +2591,7 @@ u64 btrfs_find_ino_for_alloc(struct btrfs_root *fs_root)
>   if (entry->bytes == 0)
>   free_bitmap(ctl, entry);
>   }
> + ctl->dirty = 1;
>  out:
>   spin_unlock(&ctl->tree_lock);
>  
> @@ -2688,6 +2691,10 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root,
>   printk(KERN_ERR "btrfs: failed to write free ino cache "
>  "for root %llu\n", root->root_key.objectid);
>  
> + /* we write out at transaction commit time, there's no racing. */
> + if (ret == 0)
> + ctl->dirty = 0;
> +
>   iput(inode);
>   return ret;
>  }
> diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
> index 8f2613f..1e92c93 100644
> --- a/fs/btrfs/free-space-cache.h
> +++ b/fs/btrfs/free-space-cache.h
> @@ -35,6 +35,11 @@ struct btrfs_free_space_ctl {
>   int free_extents;
>   int total_bitmaps;
>   int unit;
> + /*
> +  * record if we've changed since written.  This can turn
> +  * into a bit field if we need more flags
> +  */
> + unsigned long dirty;
>   u64 start;
>   struct btrfs_free_space_op *op;
>   void *private;
> diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
> index b4087e0..e7c1493 100644
> --- a/fs/btrfs/inode-map.c
> +++ b/fs/btrfs/inode-map.c
> @@ -376,6 +376,7 @@ void btrfs_init_free_ino_ctl(struct btrfs_root *root)
>   ctl->start = 0;
>   ctl->private = NULL;
>   ctl->op = &free_ino_op;
> + ctl->dirty = 1;
>  
>   /*
>* Initially we allow to use 16K of ram to cache chunks of
> @@ -417,6 +418,9 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
>   if (!btrfs_test_opt(root, INODE_MAP_CACHE))
>   return 0;
>  
> + if (!ctl->dirty)
> + return 0;
> +
>   path = btrfs_alloc_path();
>   if (!path)
>   return -ENOMEM;
> @@ -485,6 +489,24 @@ out:
>   return ret;
>  }
>  
> +/*
> + * this tries to save the cache, but if it fails for any reason we clear
> + * the dirty flag so that it won't be saved again during this commit.
> + *
> + * This is used by the snapshotting code to make sure we don't corrupt the
> + * FS by saving the inode cache after the snapshot is taken.
> + */
> +int btrfs_force_save_ino_cache(struct btrfs_root *root,
> +struct btrfs_trans_handle *trans)
> +{
> + struct btrfs_free_space_ctl *ctl = root->free_ino_ctl;
> + int ret;
> + ret = btrfs_save_ino_cache(root, trans);
> +
> + ctl->dirty = 0;
> + return ret;
> +}
> +
>  static int btrfs_find_highest_objectid(struct btrfs_root *root, u64 
> *objectid)
>  {
>   struct btrfs_path *path;
> diff --git a/fs/btrfs/inode-map.h b/fs/btrfs/inode-map.h
> index ddb347b..2be060e 100644
> --- a/fs/btrfs/inode-map.h
> +++ b/fs/btrfs/inode-map.h
> @@ -7,7 +7,8 @@ void btrfs_return_ino(struct btrfs_root *root, u64 objectid);
>  int btrfs_find_free_ino(struc

Re: [PATCH v1 0/2] Btrfs-progs: commands "resolve inode" and "resolve logical"

2011-07-07 Thread Goffredo Baroncelli
On 07/07/2011 06:01 PM, Jan Schmidt wrote:
> The kernel patch series just sent (Subject: "Btrfs: scrub: print path to
> corrupted files and trigger nodatasum fixup") introduces two new ioctls to
> do in-kernel filesystem path construction. This series provides the
> corresponding userspace changes, adding two new commands to the btrfs utility:

Which is the aim of these commands ? It seems more a "debug" utilities
than a standard command. If so, these commands may be put under a new
group called "debug" or "test" or whichever we decided to use. But,
please, highlight the fact that these commands aren't for a general use.

I suggest to use

btrfs debug resolve ...

Or better

btrfs inspect resolve ...

> 
> --
> btrfs resolve inode [-v]  
>   resolves an  to all filesystem paths local to the fs mounted
>   at .
>   -v  print count of returned and missed paths
> 
> btrfs resolve logical [-v] [-P]  
>   resolves a  address to all filesystem paths in the file
>   system mounted at  and all its subvolumes.
>   -v  print count of returned and missed inode/offset/root
>   triples
>   -P  do not resolve the path but stop after finding all
>   inodes at this logical address and print them instead
> --
> 
> These patches are based on Hugo's current integration branch.
> 
> Please try them out and report bugs here. I'll send an update to the manpages
> later.

Please update the man pages at the same time of the code. Develop the
man page coupled with the code may help to design a "good interface"
(from an user point of view) and to explain better the aim of the new
command.


BR
G.Baroncelli


> 
> -Jan
> 
> Jan Schmidt (2):
>   btrfs-list: split list_subvols
>   added ioctls and commands to resolve inodes and logical addresses
> 
>  btrfs-list.c |  139 ++
>  btrfs.c  |   10 +++
>  btrfs_cmds.c |  177 
> ++
>  btrfs_cmds.h |3 +
>  ioctl.h  |   29 ++
>  5 files changed, 323 insertions(+), 35 deletions(-)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 8/8] new ioctls to do logical->inode and inode->path resolving

2011-07-07 Thread Jan Schmidt
Hi,

On 07/08/2011 12:29 AM, Hugo Mills wrote:
>Hi, Jan,
> 
> On Thu, Jul 07, 2011 at 05:48:33PM +0200, Jan Schmidt wrote:
>> these ioctls make use of the new functions initially added for scrub. they
>> return all inodes belonging to a logical address (BTRFS_IOC_LOGICAL_INO) and
>> all paths belonging to an inode (BTRFS_IOC_INO_PATHS).
> 
>I've not read this patch in detail, so I may have missed something,
> but why do we need new ioctls for these functions, when we have
> BTRFS_IOC_TREE_SEARCH, which will allow us to perform the same two
> operations using existing kernel-side infrastructure?
> 
>Hugo.

Note that those ioctls do a lot more than just one tree search. You are
right, we could implement all this with (quite a few)
BTRFS_IOC_TREE_SEARCH ioctls.

Especially resolving all file system paths for an inode needs really a
lot of searches. I like to have logic requiring deep knowledge of the
internals of btrfs trees in kernel, generally. Not to mention that this
way we are safe to run this on a file system under load and still can
get consistent results.

Last but not least, if we want to use this code for general error
reporting to the kernel log (e.g. by scrub), we need all the resolving
code in kernel anyway. So I'd like to provide that functionality to user
space, too.

-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 8/8] new ioctls to do logical->inode and inode->path resolving

2011-07-07 Thread Hugo Mills
   Hi, Jan,

On Thu, Jul 07, 2011 at 05:48:33PM +0200, Jan Schmidt wrote:
> these ioctls make use of the new functions initially added for scrub. they
> return all inodes belonging to a logical address (BTRFS_IOC_LOGICAL_INO) and
> all paths belonging to an inode (BTRFS_IOC_INO_PATHS).

   I've not read this patch in detail, so I may have missed something,
but why do we need new ioctls for these functions, when we have
BTRFS_IOC_TREE_SEARCH, which will allow us to perform the same two
operations using existing kernel-side infrastructure?

   Hugo.

> Signed-off-by: Jan Schmidt 
> ---
>  fs/btrfs/ioctl.c |  134 
> ++
>  fs/btrfs/ioctl.h |   19 
>  2 files changed, 153 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index a3c4751..5299b40 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -51,6 +51,7 @@
>  #include "volumes.h"
>  #include "locking.h"
>  #include "inode-map.h"
> +#include "backref.h"
>  
>  /* Mask out flags that are inappropriate for the given type of inode. */
>  static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
> @@ -2836,6 +2837,135 @@ static long btrfs_ioctl_scrub_progress(struct 
> btrfs_root *root,
>   return ret;
>  }
>  
> +static long btrfs_ioctl_ino_to_path(struct btrfs_root *root, void __user 
> *arg)
> +{
> + int ret = 0;
> + int i;
> + unsigned long rel_ptr;
> + int size;
> + struct btrfs_ioctl_ino_path_args *ipa;
> + struct inode_fs_paths *ipath = NULL;
> + struct btrfs_path *path;
> +
> + path = btrfs_alloc_path();
> + if (!path) {
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + ipa = memdup_user(arg, sizeof(*ipa));
> + if (IS_ERR(ipa)) {
> + ret = PTR_ERR(ipa);
> + ipa = NULL;
> + goto out;
> + }
> +
> + size = min(ipa->size, 4096);
> + ipath = init_ipath(size, root, path);
> + if (IS_ERR(ipath)) {
> + ret = PTR_ERR(ipath);
> + ipath = NULL;
> + goto out;
> + }
> +
> + ret = paths_from_inode(ipa->inum, ipath);
> + if (ret < 0)
> + goto out;
> +
> + for (i = 0; i < ipath->fspath->elem_cnt; ++i) {
> + rel_ptr = ipath->fspath->str[i] - (char *)ipath->fspath->str;
> + ipath->fspath->str[i] = (void *)rel_ptr;
> + }
> +
> + ret = copy_to_user(ipa->fspath, ipath->fspath, size);
> + if (ret) {
> + ret = -EFAULT;
> + goto out;
> + }
> +
> +out:
> + btrfs_free_path(path);
> + free_ipath(ipath);
> + kfree(ipa);
> +
> + return ret;
> +}
> +
> +static int build_ino_list(u64 inum, u64 offset, u64 root, void *ctx)
> +{
> + struct btrfs_data_container *inodes = ctx;
> +
> + inodes->size -= 3 * sizeof(u64);
> + if (inodes->size > 0) {
> + inodes->val[inodes->elem_cnt] = inum;
> + inodes->val[inodes->elem_cnt + 1] = offset;
> + inodes->val[inodes->elem_cnt + 2] = root;
> + inodes->elem_cnt += 3;
> + } else {
> + inodes->elem_missed += 3;
> + }
> +
> + return 0;
> +}
> +
> +static long btrfs_ioctl_logical_to_ino(struct btrfs_root *root,
> + void __user *arg)
> +{
> + int ret = 0;
> + int size;
> + u64 extent_offset;
> + struct btrfs_ioctl_logical_ino_args *loi;
> + struct btrfs_data_container *inodes = NULL;
> + struct btrfs_path *path = NULL;
> + struct btrfs_key key;
> +
> + loi = memdup_user(arg, sizeof(*loi));
> + if (IS_ERR(loi)) {
> + ret = PTR_ERR(loi);
> + loi = NULL;
> + goto out;
> + }
> +
> + path = btrfs_alloc_path();
> + if (!path) {
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + size = min(loi->size, 4096);
> + inodes = init_data_container(size);
> + if (IS_ERR(inodes)) {
> + ret = PTR_ERR(inodes);
> + inodes = NULL;
> + goto out;
> + }
> +
> + ret = extent_from_logical(root->fs_info, loi->logical, path, &key);
> +
> + if (ret & BTRFS_EXTENT_FLAG_TREE_BLOCK)
> + ret = -ENOENT;
> + if (ret < 0)
> + goto out;
> +
> + extent_offset = loi->logical - key.objectid;
> + ret = iterate_extent_inodes(root->fs_info, path, key.objectid,
> + extent_offset, build_ino_list, inodes);
> +
> + if (ret < 0)
> + goto out;
> +
> + ret = copy_to_user(loi->inodes, inodes, size);
> + if (ret)
> + ret = -EFAULT;
> +
> +out:
> + btrfs_free_path(path);
> + kfree(inodes);
> + kfree(loi);
> +
> + return ret;
> +}
> +
>  long btrfs_ioctl(struct file *file, unsigned int
>   cmd, unsigned long arg)
>  {
> @@ -2893,6 +3023,10 @@ long btrfs_ioctl(struct file *file, unsigned int
>   return btrfs_i

Re: please review snapshot corruption path with delayed metadata insertion

2011-07-07 Thread Chris Mason
Excerpts from Tsutomu Itoh's message of 2011-07-01 04:11:28 -0400:
> Hi, Miao,
> 
> (2011/06/30 15:32), Miao Xie wrote:
> > Hi, Itoh-san
> > 
> > Could you test the following patch to check whether it can fix the bug or 
> > not?
> > I have tested it on my x86_64 machine by your test script for two days, it 
> > worked well.
> 
> I ran my test script about a day, I was not able to reproduce this BUG.

Can you please try this patch with the inode_cache option (in addition
to Miao's code).

commit d0243d46f7a1e4cd57c74fa14556be65b454687d
Author: Chris Mason 
Date:   Thu Jul 7 15:53:12 2011 -0400

Btrfs: write out free inode cache before taking snapshots

The btrfs snapshotting code requires that once a root has been
snapshotted, we don't change it during a commit

But the free inode cache was changing the roots when it root the cache,
which lead to corruptions.

This fixes things by making sure we write the cache while we are taking
the snapshot, and that we don't write it again later.

Signed-off-by: Chris Mason 

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index bf0d615..d594cf7 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -1651,6 +1651,7 @@ int __btrfs_add_free_space(struct btrfs_free_space_ctl 
*ctl,
info->bytes = bytes;
 
spin_lock(&ctl->tree_lock);
+   ctl->dirty = 1;
 
if (try_merge_free_space(ctl, info, true))
goto link;
@@ -1691,6 +1692,7 @@ int btrfs_remove_free_space(struct 
btrfs_block_group_cache *block_group,
int ret = 0;
 
spin_lock(&ctl->tree_lock);
+   ctl->dirty = 1;
 
 again:
info = tree_search_offset(ctl, offset, 0, 0);
@@ -2589,6 +2591,7 @@ u64 btrfs_find_ino_for_alloc(struct btrfs_root *fs_root)
if (entry->bytes == 0)
free_bitmap(ctl, entry);
}
+   ctl->dirty = 1;
 out:
spin_unlock(&ctl->tree_lock);
 
@@ -2688,6 +2691,10 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root,
printk(KERN_ERR "btrfs: failed to write free ino cache "
   "for root %llu\n", root->root_key.objectid);
 
+   /* we write out at transaction commit time, there's no racing. */
+   if (ret == 0)
+   ctl->dirty = 0;
+
iput(inode);
return ret;
 }
diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
index 8f2613f..1e92c93 100644
--- a/fs/btrfs/free-space-cache.h
+++ b/fs/btrfs/free-space-cache.h
@@ -35,6 +35,11 @@ struct btrfs_free_space_ctl {
int free_extents;
int total_bitmaps;
int unit;
+   /*
+* record if we've changed since written.  This can turn
+* into a bit field if we need more flags
+*/
+   unsigned long dirty;
u64 start;
struct btrfs_free_space_op *op;
void *private;
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index b4087e0..e7c1493 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -376,6 +376,7 @@ void btrfs_init_free_ino_ctl(struct btrfs_root *root)
ctl->start = 0;
ctl->private = NULL;
ctl->op = &free_ino_op;
+   ctl->dirty = 1;
 
/*
 * Initially we allow to use 16K of ram to cache chunks of
@@ -417,6 +418,9 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
if (!btrfs_test_opt(root, INODE_MAP_CACHE))
return 0;
 
+   if (!ctl->dirty)
+   return 0;
+
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
@@ -485,6 +489,24 @@ out:
return ret;
 }
 
+/*
+ * this tries to save the cache, but if it fails for any reason we clear
+ * the dirty flag so that it won't be saved again during this commit.
+ *
+ * This is used by the snapshotting code to make sure we don't corrupt the
+ * FS by saving the inode cache after the snapshot is taken.
+ */
+int btrfs_force_save_ino_cache(struct btrfs_root *root,
+  struct btrfs_trans_handle *trans)
+{
+   struct btrfs_free_space_ctl *ctl = root->free_ino_ctl;
+   int ret;
+   ret = btrfs_save_ino_cache(root, trans);
+
+   ctl->dirty = 0;
+   return ret;
+}
+
 static int btrfs_find_highest_objectid(struct btrfs_root *root, u64 *objectid)
 {
struct btrfs_path *path;
diff --git a/fs/btrfs/inode-map.h b/fs/btrfs/inode-map.h
index ddb347b..2be060e 100644
--- a/fs/btrfs/inode-map.h
+++ b/fs/btrfs/inode-map.h
@@ -7,7 +7,8 @@ void btrfs_return_ino(struct btrfs_root *root, u64 objectid);
 int btrfs_find_free_ino(struct btrfs_root *root, u64 *objectid);
 int btrfs_save_ino_cache(struct btrfs_root *root,
 struct btrfs_trans_handle *trans);
-
+int btrfs_force_save_ino_cache(struct btrfs_root *root,
+  struct btrfs_trans_handle *trans);
 int btrfs_find_free_objectid(struct btrfs_root *root, u64 *objectid);
 
 #endif
diff --git a

[PATCH] Btrfs-progs: bugfix: bail out when check_mounted_where returns an error

2011-07-07 Thread Jan Schmidt
Signed-off-by: Jan Schmidt 
---
 scrub.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/scrub.c b/scrub.c
index 22052ed..8270431 100644
--- a/scrub.c
+++ b/scrub.c
@@ -942,6 +942,8 @@ static int scrub_fs_info(int fd, char *path,
  &fs_devices_mnt);
if (!ret)
return -EINVAL;
+   if (ret < 0)
+   return ret;
fi_args->num_devices = 1;
fi_args->max_id = fs_devices_mnt->latest_devid;
i = fs_devices_mnt->latest_devid;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 2/2] added ioctls and commands to resolve inodes and logical addresses

2011-07-07 Thread Jan Schmidt
two new commands that make use of the new path resolving functions
implemented for scrub, doing the resolving in-kernel. the result for both
commands is a list of files belonging to that inode / logical address.

Signed-off-by: Jan Schmidt 
---
 btrfs-list.c |   35 
 btrfs.c  |   10 +++
 btrfs_cmds.c |  177 ++
 btrfs_cmds.h |3 +
 ioctl.h  |   29 ++
 5 files changed, 254 insertions(+), 0 deletions(-)

diff --git a/btrfs-list.c b/btrfs-list.c
index dd685c2..cbf6a08 100644
--- a/btrfs-list.c
+++ b/btrfs-list.c
@@ -900,3 +900,38 @@ int find_updated_files(int fd, u64 root_id, u64 oldest_gen)
printf("transid marker was %llu\n", (unsigned long long)max_found);
return ret;
 }
+
+char *path_for_root(int fd, u64 root)
+{
+   struct root_lookup root_lookup;
+   struct rb_node *n;
+   char *ret_path = NULL;
+   int ret;
+
+   ret = __list_subvol_search(fd, &root_lookup);
+   if (ret < 0)
+   return ERR_PTR(ret);
+
+   ret = __list_subvol_fill_paths(fd, &root_lookup);
+   if (ret < 0)
+   return ERR_PTR(ret);
+
+   n = rb_last(&root_lookup.root);
+   while (n) {
+   struct root_info *entry;
+   u64 root_id;
+   u64 parent_id;
+   u64 level;
+   char *path;
+   entry = rb_entry(n, struct root_info, rb_node);
+   resolve_root(&root_lookup, entry, &root_id, &parent_id, &level,
+   &path);
+   if (root_id == root)
+   ret_path = path;
+   else
+   free(path);
+   n = rb_prev(n);
+   }
+
+   return ret_path;
+}
diff --git a/btrfs.c b/btrfs.c
index 67d6f6f..86d356b 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -178,6 +178,16 @@ static struct Command commands[] = {
"Remove a device from a filesystem.",
  NULL
},
+   { do_ino_to_path, -2,
+ "resolve inode", "[-v]  \n"
+   "get file system paths for the given inode.",
+ NULL
+   },
+   { do_logical_to_ino, -2,
+ "resolve logical", "[-v] [-P]  \n"
+   "get file system paths for the given logical address.",
+ NULL
+   },
{ 0, 0, 0, 0 }
 };
 
diff --git a/btrfs_cmds.c b/btrfs_cmds.c
index 0612f34..2db5d31 100644
--- a/btrfs_cmds.c
+++ b/btrfs_cmds.c
@@ -1545,3 +1545,180 @@ int do_df_filesystem(int nargs, char **argv)
 
return 0;
 }
+
+static int __ino_to_path_fd(u64 inum, int fd, int verbose, const char *prepend)
+{
+   int ret;
+   int i;
+   struct btrfs_ioctl_ino_path_args ipa;
+   struct btrfs_data_container *fspath;
+
+   fspath = malloc(4096);
+   if (!fspath)
+   return 1;
+
+   ipa.inum = inum;
+   ipa.size = 4096;
+   ipa.fspath = fspath;
+
+   ret = ioctl(fd, BTRFS_IOC_INO_PATHS, &ipa);
+   if (ret) {
+   printf("ioctl ret=%d, error: %s\n", ret, strerror(errno));
+   goto out;
+   }
+
+   if (verbose)
+   printf("ioctl ret=%d, size=%d, cnt=%d, missed=%d\n", ret,
+   fspath->size, fspath->elem_cnt, fspath->elem_missed);
+
+   for (i = 0; i < fspath->elem_cnt; ++i) {
+   fspath->str[i] += (unsigned long)fspath->str;
+   if (prepend)
+   printf("%s/%s\n", prepend, fspath->str[i]);
+   else
+   printf("%s\n", fspath->str[i]);
+   }
+
+out:
+   free(fspath);
+   return ret;
+}
+
+int do_ino_to_path(int nargs, char **argv)
+{
+   int fd;
+   int verbose = 0;
+
+   optind = 1;
+   while (1) {
+   int c = getopt(nargs, argv, "v");
+   if (c < 0)
+   break;
+   switch (c) {
+   case 'v':
+   verbose = 1;
+   break;
+   default:
+   fprintf(stderr, "invalid arguments for ipath\n");
+   return 1;
+   }
+   }
+   if (nargs - optind != 2) {
+   fprintf(stderr, "invalid arguments for ipath\n");
+   return 1;
+   }
+
+   fd = open_file_or_dir(argv[optind+1]);
+   if (fd < 0) {
+   fprintf(stderr, "ERROR: can't access '%s'\n", argv[optind+1]);
+   return 12;
+   }
+
+   return __ino_to_path_fd(atoll(argv[optind]), fd, verbose,
+   argv[optind+1]);
+}
+
+int do_logical_to_ino(int nargs, char **argv)
+{
+   int ret;
+   int fd;
+   int i;
+   int verbose = 0;
+   int getpath = 1;
+   int bytes_left;
+   struct btrfs_ioctl_logical_ino_args loi;
+   struct btrfs_data_container *inodes;
+   char full_path[4096];
+   char *path_ptr;
+
+   optind = 1;
+   while (1) {
+   in

[PATCH v1 0/2] Btrfs-progs: commands "resolve inode" and "resolve logical"

2011-07-07 Thread Jan Schmidt
The kernel patch series just sent (Subject: "Btrfs: scrub: print path to
corrupted files and trigger nodatasum fixup") introduces two new ioctls to
do in-kernel filesystem path construction. This series provides the
corresponding userspace changes, adding two new commands to the btrfs utility:

--
btrfs resolve inode [-v]  
resolves an  to all filesystem paths local to the fs mounted
at .
-v  print count of returned and missed paths

btrfs resolve logical [-v] [-P]  
resolves a  address to all filesystem paths in the file
system mounted at  and all its subvolumes.
-v  print count of returned and missed inode/offset/root
triples
-P  do not resolve the path but stop after finding all
inodes at this logical address and print them instead
--

These patches are based on Hugo's current integration branch.

Please try them out and report bugs here. I'll send an update to the manpages
later.

-Jan

Jan Schmidt (2):
  btrfs-list: split list_subvols
  added ioctls and commands to resolve inodes and logical addresses

 btrfs-list.c |  139 ++
 btrfs.c  |   10 +++
 btrfs_cmds.c |  177 ++
 btrfs_cmds.h |3 +
 ioctl.h  |   29 ++
 5 files changed, 323 insertions(+), 35 deletions(-)

-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 1/2] btrfs-list: split list_subvols

2011-07-07 Thread Jan Schmidt
split list_subvols to separate functions and allow printing only in the
containing function. lets us make use of those functions when resolving
logical addresses.

Signed-off-by: Jan Schmidt 
---
 btrfs-list.c |  104 ++---
 1 files changed, 69 insertions(+), 35 deletions(-)

diff --git a/btrfs-list.c b/btrfs-list.c
index 07b179a..dd685c2 100644
--- a/btrfs-list.c
+++ b/btrfs-list.c
@@ -199,10 +199,9 @@ static int add_root(struct root_lookup *root_lookup,
  * This can't be called until all the root_info->path fields are filled
  * in by lookup_ino_path
  */
-static int resolve_root(struct root_lookup *rl, struct root_info *ri, int 
print_parent)
+static int resolve_root(struct root_lookup *rl, struct root_info *ri,
+   u64 *root_id, u64 *parent_id, u64 *top_id, char **path)
 {
-   u64 top_id;
-   u64 parent_id = 0;
char *full_path = NULL;
int len = 0;
struct root_info *found;
@@ -211,6 +210,7 @@ static int resolve_root(struct root_lookup *rl, struct 
root_info *ri, int print_
 * we go backwards from the root_info object and add pathnames
 * from parent directories as we go.
 */
+   *parent_id = 0;
found = ri;
while (1) {
char *tmp;
@@ -234,13 +234,12 @@ static int resolve_root(struct root_lookup *rl, struct 
root_info *ri, int print_
 
next = found->ref_tree;
/* record the first parent */
-   if ( parent_id == 0 ) {
-   parent_id = next;
-   }
+   if (*parent_id == 0)
+   *parent_id = next;
 
/* if the ref_tree refers to ourselves, we're at the top */
if (next == found->root_id) {
-   top_id = next;
+   *top_id = next;
break;
}
 
@@ -250,20 +249,15 @@ static int resolve_root(struct root_lookup *rl, struct 
root_info *ri, int print_
 */
found = tree_search(&rl->root, next);
if (!found) {
-   top_id = next;
+   *top_id = next;
break;
}
}
-   if (print_parent) {
-   printf("ID %llu parent %llu top level %llu path %s\n",
-  (unsigned long long)ri->root_id, (unsigned long 
long)parent_id, (unsigned long long)top_id,
-   full_path);
-   } else {
-   printf("ID %llu top level %llu path %s\n",
-  (unsigned long long)ri->root_id, (unsigned long 
long)top_id,
-   full_path);
-   }
-   free(full_path);
+
+   *root_id = ri->root_id;
+   *parent_id = ri->root_id;
+   *path = full_path;
+
return 0;
 }
 
@@ -560,10 +554,8 @@ build:
return full;
 }
 
-int list_subvols(int fd, int print_parent)
+static int __list_subvol_search(int fd, struct root_lookup *root_lookup)
 {
-   struct root_lookup root_lookup;
-   struct rb_node *n;
int ret;
struct btrfs_ioctl_search_args args;
struct btrfs_ioctl_search_key *sk = &args.key;
@@ -574,9 +566,11 @@ int list_subvols(int fd, int print_parent)
char *name;
u64 dir_id;
int i;
-   int e;
 
-   root_lookup_init(&root_lookup);
+   root_lookup_init(root_lookup);
+   memset(&args, 0, sizeof(args));
+
+   root_lookup_init(root_lookup);
 
memset(&args, 0, sizeof(args));
 
@@ -603,12 +597,8 @@ int list_subvols(int fd, int print_parent)
 
while(1) {
ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args);
-   e = errno;
-   if (ret < 0) {
-   fprintf(stderr, "ERROR: can't perform the search - 
%s\n",
-   strerror(e));
+   if (ret < 0)
return ret;
-   }
/* the ioctl returns the number of item it found in nr_items */
if (sk->nr_items == 0)
break;
@@ -629,7 +619,7 @@ int list_subvols(int fd, int print_parent)
name = (char *)(ref + 1);
dir_id = btrfs_stack_root_ref_dirid(ref);
 
-   add_root(&root_lookup, sh->objectid, sh->offset,
+   add_root(root_lookup, sh->objectid, sh->offset,
 dir_id, name, name_len);
}
 
@@ -657,11 +647,15 @@ int list_subvols(int fd, int print_parent)
} else
break;
}
-   /*
-* now we have an rbtree full of root_info objects, but we need to fill
-* in their path names within the subvol that is referencing each one.
-*/
-   n = rb_first(&root_lookup.root);
+
+   return 0;
+}
+
+static int __l

[PATCH v3 3/8] scrub: print paths of corrupted files

2011-07-07 Thread Jan Schmidt
While scrubbing, we may encounter various errors. Previously, a logical
address was printed to the log only. Now, all paths belonging to that
address are resolved and printed separately. That should work for hardlinks
as well as reflinks.

Signed-off-by: Jan Schmidt 
---
 fs/btrfs/scrub.c |  169 --
 1 files changed, 163 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 35099fa..221fd5c 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -17,10 +17,12 @@
  */
 
 #include 
+#include 
 #include "ctree.h"
 #include "volumes.h"
 #include "disk-io.h"
 #include "ordered-data.h"
+#include "backref.h"
 
 /*
  * This is only the first step towards a full-features scrub. It reads all
@@ -100,6 +102,19 @@ struct scrub_dev {
spinlock_t  stat_lock;
 };
 
+struct scrub_warning {
+   struct btrfs_path   *path;
+   u64 extent_item_size;
+   char*scratch_buf;
+   char*msg_buf;
+   const char  *errstr;
+   sector_tsector;
+   u64 logical;
+   struct btrfs_device *dev;
+   int msg_bufsize;
+   int scratch_bufsize;
+};
+
 static void scrub_free_csums(struct scrub_dev *sdev)
 {
while (!list_empty(&sdev->csum_list)) {
@@ -195,6 +210,143 @@ nomem:
return ERR_PTR(-ENOMEM);
 }
 
+static int scrub_print_warning_inode(u64 inum, u64 offset, u64 root, void *ctx)
+{
+   u64 isize;
+   u32 nlink;
+   int ret;
+   int i;
+   struct extent_buffer *eb;
+   struct btrfs_inode_item *inode_item;
+   struct scrub_warning *swarn = ctx;
+   struct btrfs_fs_info *fs_info = swarn->dev->dev_root->fs_info;
+   struct inode_fs_paths *ipath = NULL;
+   struct btrfs_root *local_root;
+   struct btrfs_key root_key;
+
+   root_key.objectid = root;
+   root_key.type = BTRFS_ROOT_ITEM_KEY;
+   root_key.offset = (u64)-1;
+   local_root = btrfs_read_fs_root_no_name(fs_info, &root_key);
+   if (IS_ERR(local_root)) {
+   ret = PTR_ERR(local_root);
+   goto err;
+   }
+
+   ret = inode_item_info(inum, 0, local_root, swarn->path);
+   if (ret) {
+   btrfs_release_path(swarn->path);
+   goto err;
+   }
+
+   eb = swarn->path->nodes[0];
+   inode_item = btrfs_item_ptr(eb, swarn->path->slots[0],
+   struct btrfs_inode_item);
+   isize = btrfs_inode_size(eb, inode_item);
+   nlink = btrfs_inode_nlink(eb, inode_item);
+   btrfs_release_path(swarn->path);
+
+   ipath = init_ipath(4096, local_root, swarn->path);
+   ret = paths_from_inode(inum, ipath);
+
+   if (ret < 0)
+   goto err;
+
+   /*
+* we deliberately ignore the bit ipath might have been too small to
+* hold all of the paths here
+*/
+   for (i = 0; i < ipath->fspath->elem_cnt; ++i)
+   printk(KERN_WARNING "btrfs: %s at logical %llu on dev "
+   "%s, sector %llu, root %llu, inode %llu, offset %llu, "
+   "length %llu, links %u (path: %s)\n", swarn->errstr,
+   swarn->logical, swarn->dev->name,
+   (unsigned long long)swarn->sector, root, inum, offset,
+   min(isize - offset, (u64)PAGE_SIZE), nlink,
+   ipath->fspath->str[i]);
+
+   free_ipath(ipath);
+   return 0;
+
+err:
+   printk(KERN_WARNING "btrfs: %s at logical %llu on dev "
+   "%s, sector %llu, root %llu, inode %llu, offset %llu: path "
+   "resolving failed with ret=%d\n", swarn->errstr,
+   swarn->logical, swarn->dev->name,
+   (unsigned long long)swarn->sector, root, inum, offset, ret);
+
+   free_ipath(ipath);
+   return 0;
+}
+
+static void scrub_print_warning(const char *errstr, struct scrub_bio *sbio,
+   int ix)
+{
+   struct btrfs_device *dev = sbio->sdev->dev;
+   struct btrfs_fs_info *fs_info = dev->dev_root->fs_info;
+   struct btrfs_path *path;
+   struct btrfs_key found_key;
+   struct extent_buffer *eb;
+   struct btrfs_extent_item *ei;
+   struct scrub_warning swarn;
+   u32 item_size;
+   int ret;
+   u64 ref_root;
+   u8 ref_level;
+   unsigned long ptr = 0;
+   const int bufsize = 4096;
+   u64 extent_offset;
+
+   path = btrfs_alloc_path();
+
+   swarn.scratch_buf = kmalloc(bufsize, GFP_NOFS);
+   swarn.msg_buf = kmalloc(bufsize, GFP_NOFS);
+   swarn.sector = (sbio->physical + ix * PAGE_SIZE) >> 9;
+   swarn.logical = sbio->logical + ix * PAGE_SIZE;
+   swarn.errstr = errstr;
+   swarn.dev = dev;
+   swarn.msg_bufsize = bufsize;
+   swarn.scratch_bufsize = bufsize;

[PATCH v3 0/8] Btrfs: scrub: print path to corrupted files and trigger nodatasum fixup

2011-07-07 Thread Jan Schmidt
This patch set introduces two new features for scrub. They share the backref
iteration code which is the reason they made it into the same patch set.

The first feature adds printk statements in case scrub finds an error which list
all affected files. You will need patch 1, 2 and 3 for that.

The second feature adds the trigger which enables us to correct i/o errors in
case the affected extent does not have a checksum (nodatasum), eventually. You
will need patch 1, 4, 5 and 6 for that.

I tried to apply all patches to the current cmason/for-linus branch and to
Arne's current for-chris branch. They do apply with no errors (some offsets
possible).

Please review.

Next I'm starting to make up my mind how to implement on-the-fly error
correction correctly. This will enable us to rewrite good data whenever we
encounter a bad copy. I have some preliminary patches already, the stress in the
first sentence is on "correctly". The second feature mentioned in this patch
series will then automatically use that code, too.

Changelog v1->v2:
- Various cleanup, sensible error codes as suggested by David Sterba

Changelog v2->v3:
- evaluation and iteration of shared refs
- support for in-tree refs (v2 iterated inline refs only)
- never call an interator function without releasing the path
- iterate_irefs now returns -ENOENT in case no refs are found
- some stupid bugs removed where release_path was called too early
- ioctls added to provide new functions to user mode
- bugfixes for cases where search_slot found the very end of a leaf
- bugfix: use right fs root for readpage instead of fs_root->fs_info
- based on current cmason/for-linus

A patch series to use the new ioctls from usermode will follow shortly. Please
try it and report errors (or confirm there are none, of course).

-Jan

Jan Schmidt (8):
  added helper functions to iterate backrefs
  scrub: added unverified_errors
  scrub: print paths of corrupted files
  scrub: bugfix: mirror_num off by one
  add mirror_num to extent_read_full_page
  scrub: use int for mirror_num, not u64
  scrub: add fixup code for errors on nodatasum files
  new ioctls to do logical->inode and inode->path resolving

 fs/btrfs/Makefile|3 +-
 fs/btrfs/backref.c   |  748 ++
 fs/btrfs/backref.h   |   62 +
 fs/btrfs/disk-io.c   |2 +-
 fs/btrfs/extent_io.c |6 +-
 fs/btrfs/extent_io.h |3 +-
 fs/btrfs/inode.c |2 +-
 fs/btrfs/ioctl.c |  134 +
 fs/btrfs/ioctl.h |   29 ++
 fs/btrfs/scrub.c |  412 +---
 10 files changed, 1362 insertions(+), 39 deletions(-)
 create mode 100644 fs/btrfs/backref.c
 create mode 100644 fs/btrfs/backref.h

-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/8] added helper functions to iterate backrefs

2011-07-07 Thread Jan Schmidt
These helper functions iterate back references and call a function for each
backref. There is also a function to resolve an inode to a path in the
file system.

Signed-off-by: Jan Schmidt 
---
 fs/btrfs/Makefile  |3 +-
 fs/btrfs/backref.c |  748 
 fs/btrfs/backref.h |   62 +
 fs/btrfs/ioctl.h   |   10 +
 4 files changed, 822 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 9b72dcf..c63f649 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -7,4 +7,5 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o 
root-tree.o dir-item.o \
   extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \
   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
   export.o tree-log.o acl.o free-space-cache.o zlib.o lzo.o \
-  compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o
+  compression.o delayed-ref.o relocation.o delayed-inode.o backref.o \
+  scrub.o
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
new file mode 100644
index 000..477f154
--- /dev/null
+++ b/fs/btrfs/backref.c
@@ -0,0 +1,748 @@
+/*
+ * Copyright (C) 2011 STRATO.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include "ctree.h"
+#include "disk-io.h"
+#include "backref.h"
+
+struct __data_ref {
+   struct list_head list;
+   u64 inum;
+   u64 root;
+   u64 extent_data_item_offset;
+};
+
+struct __shared_ref {
+   struct list_head list;
+   u64 disk_byte;
+};
+
+static int __inode_info(u64 inum, u64 ioff, u8 key_type,
+   struct btrfs_root *fs_root, struct btrfs_path *path,
+   struct btrfs_key *found_key)
+{
+   int ret;
+   struct btrfs_key key;
+   struct extent_buffer *eb;
+
+   key.type = key_type;
+   key.objectid = inum;
+   key.offset = ioff;
+
+   ret = btrfs_search_slot(NULL, fs_root, &key, path, 0, 0);
+   if (ret < 0)
+   return ret;
+
+   eb = path->nodes[0];
+   if (ret && path->slots[0] >= btrfs_header_nritems(eb)) {
+   ret = btrfs_next_leaf(fs_root, path);
+   if (ret)
+   return ret;
+   eb = path->nodes[0];
+   }
+
+   btrfs_item_key_to_cpu(eb, found_key, path->slots[0]);
+   if (found_key->type != key.type || found_key->objectid != key.objectid)
+   return 1;
+
+   return 0;
+}
+
+/*
+ * this makes the path point to (inum INODE_ITEM ioff)
+ */
+int inode_item_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
+   struct btrfs_path *path)
+{
+   struct btrfs_key key;
+   return __inode_info(inum, ioff, BTRFS_INODE_ITEM_KEY, fs_root, path,
+   &key);
+}
+
+static int inode_ref_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
+   struct btrfs_path *path, int strict,
+   u64 *out_parent_inum,
+   struct extent_buffer **out_iref_eb,
+   int *out_slot)
+{
+   int ret;
+   struct btrfs_key found_key;
+
+   ret = __inode_info(inum, ioff, BTRFS_INODE_REF_KEY, fs_root, path,
+   &found_key);
+
+   if (!ret) {
+   if (out_slot)
+   *out_slot = path->slots[0];
+   if (out_iref_eb)
+   *out_iref_eb = path->nodes[0];
+   if (out_parent_inum)
+   *out_parent_inum = found_key.offset;
+   }
+
+   btrfs_release_path(path);
+   return ret;
+}
+
+/*
+ * this iterates to turn a btrfs_inode_ref into a full filesystem path. 
elements
+ * of the path are separated by '/' and the path is guaranteed to be
+ * 0-terminated. the path is only given within the current file system.
+ * Therefore, it never starts with a '/'. the caller is responsible to provide
+ * "size" bytes in "dest". the dest buffer will be filled backwards. finally,
+ * the start point of the resulting string is returned. this pointer is within
+ * dest, normally.
+ * in case the path buffer would overflow, the pointer is decremented further
+ * as if output was written to the buffer, though no more output is actually
+ * generated. that way, the caller 

[PATCH v3 8/8] new ioctls to do logical->inode and inode->path resolving

2011-07-07 Thread Jan Schmidt
these ioctls make use of the new functions initially added for scrub. they
return all inodes belonging to a logical address (BTRFS_IOC_LOGICAL_INO) and
all paths belonging to an inode (BTRFS_IOC_INO_PATHS).

Signed-off-by: Jan Schmidt 
---
 fs/btrfs/ioctl.c |  134 ++
 fs/btrfs/ioctl.h |   19 
 2 files changed, 153 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a3c4751..5299b40 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -51,6 +51,7 @@
 #include "volumes.h"
 #include "locking.h"
 #include "inode-map.h"
+#include "backref.h"
 
 /* Mask out flags that are inappropriate for the given type of inode. */
 static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
@@ -2836,6 +2837,135 @@ static long btrfs_ioctl_scrub_progress(struct 
btrfs_root *root,
return ret;
 }
 
+static long btrfs_ioctl_ino_to_path(struct btrfs_root *root, void __user *arg)
+{
+   int ret = 0;
+   int i;
+   unsigned long rel_ptr;
+   int size;
+   struct btrfs_ioctl_ino_path_args *ipa;
+   struct inode_fs_paths *ipath = NULL;
+   struct btrfs_path *path;
+
+   path = btrfs_alloc_path();
+   if (!path) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   ipa = memdup_user(arg, sizeof(*ipa));
+   if (IS_ERR(ipa)) {
+   ret = PTR_ERR(ipa);
+   ipa = NULL;
+   goto out;
+   }
+
+   size = min(ipa->size, 4096);
+   ipath = init_ipath(size, root, path);
+   if (IS_ERR(ipath)) {
+   ret = PTR_ERR(ipath);
+   ipath = NULL;
+   goto out;
+   }
+
+   ret = paths_from_inode(ipa->inum, ipath);
+   if (ret < 0)
+   goto out;
+
+   for (i = 0; i < ipath->fspath->elem_cnt; ++i) {
+   rel_ptr = ipath->fspath->str[i] - (char *)ipath->fspath->str;
+   ipath->fspath->str[i] = (void *)rel_ptr;
+   }
+
+   ret = copy_to_user(ipa->fspath, ipath->fspath, size);
+   if (ret) {
+   ret = -EFAULT;
+   goto out;
+   }
+
+out:
+   btrfs_free_path(path);
+   free_ipath(ipath);
+   kfree(ipa);
+
+   return ret;
+}
+
+static int build_ino_list(u64 inum, u64 offset, u64 root, void *ctx)
+{
+   struct btrfs_data_container *inodes = ctx;
+
+   inodes->size -= 3 * sizeof(u64);
+   if (inodes->size > 0) {
+   inodes->val[inodes->elem_cnt] = inum;
+   inodes->val[inodes->elem_cnt + 1] = offset;
+   inodes->val[inodes->elem_cnt + 2] = root;
+   inodes->elem_cnt += 3;
+   } else {
+   inodes->elem_missed += 3;
+   }
+
+   return 0;
+}
+
+static long btrfs_ioctl_logical_to_ino(struct btrfs_root *root,
+   void __user *arg)
+{
+   int ret = 0;
+   int size;
+   u64 extent_offset;
+   struct btrfs_ioctl_logical_ino_args *loi;
+   struct btrfs_data_container *inodes = NULL;
+   struct btrfs_path *path = NULL;
+   struct btrfs_key key;
+
+   loi = memdup_user(arg, sizeof(*loi));
+   if (IS_ERR(loi)) {
+   ret = PTR_ERR(loi);
+   loi = NULL;
+   goto out;
+   }
+
+   path = btrfs_alloc_path();
+   if (!path) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   size = min(loi->size, 4096);
+   inodes = init_data_container(size);
+   if (IS_ERR(inodes)) {
+   ret = PTR_ERR(inodes);
+   inodes = NULL;
+   goto out;
+   }
+
+   ret = extent_from_logical(root->fs_info, loi->logical, path, &key);
+
+   if (ret & BTRFS_EXTENT_FLAG_TREE_BLOCK)
+   ret = -ENOENT;
+   if (ret < 0)
+   goto out;
+
+   extent_offset = loi->logical - key.objectid;
+   ret = iterate_extent_inodes(root->fs_info, path, key.objectid,
+   extent_offset, build_ino_list, inodes);
+
+   if (ret < 0)
+   goto out;
+
+   ret = copy_to_user(loi->inodes, inodes, size);
+   if (ret)
+   ret = -EFAULT;
+
+out:
+   btrfs_free_path(path);
+   kfree(inodes);
+   kfree(loi);
+
+   return ret;
+}
+
 long btrfs_ioctl(struct file *file, unsigned int
cmd, unsigned long arg)
 {
@@ -2893,6 +3023,10 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_tree_search(file, argp);
case BTRFS_IOC_INO_LOOKUP:
return btrfs_ioctl_ino_lookup(file, argp);
+   case BTRFS_IOC_INO_PATHS:
+   return btrfs_ioctl_ino_to_path(root, argp);
+   case BTRFS_IOC_LOGICAL_INO:
+   return btrfs_ioctl_logical_to_ino(root, argp);
case BTRFS_IOC_SPACE_INFO:
return btrfs_ioctl_space_info(root, argp);
case BTRFS_IOC_SYNC:
diff --git a/fs/btrfs/ioctl.h b

[PATCH v3 6/8] scrub: use int for mirror_num, not u64

2011-07-07 Thread Jan Schmidt
the rest of the code uses int mirror_num, and so should scrub

Signed-off-by: Jan Schmidt 
---
 fs/btrfs/scrub.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 15fed35..12c08c0 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -65,7 +65,7 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix);
 struct scrub_page {
u64 flags;  /* extent flags */
u64 generation;
-   u64 mirror_num;
+   int mirror_num;
int have_csum;
u8  csum[BTRFS_CSUM_SIZE];
 };
@@ -776,7 +776,7 @@ nomem:
 }
 
 static int scrub_page(struct scrub_dev *sdev, u64 logical, u64 len,
- u64 physical, u64 flags, u64 gen, u64 mirror_num,
+ u64 physical, u64 flags, u64 gen, int mirror_num,
  u8 *csum, int force)
 {
struct scrub_bio *sbio;
@@ -873,7 +873,7 @@ static int scrub_find_csum(struct scrub_dev *sdev, u64 
logical, u64 len,
 
 /* scrub extent tries to collect up to 64 kB for each bio */
 static int scrub_extent(struct scrub_dev *sdev, u64 logical, u64 len,
-   u64 physical, u64 flags, u64 gen, u64 mirror_num)
+   u64 physical, u64 flags, u64 gen, int mirror_num)
 {
int ret;
u8 csum[BTRFS_CSUM_SIZE];
@@ -919,7 +919,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_dev 
*sdev,
u64 physical;
u64 logical;
u64 generation;
-   u64 mirror_num;
+   int mirror_num;
 
u64 increment = map->stripe_len;
u64 offset;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 7/8] scrub: add fixup code for errors on nodatasum files

2011-07-07 Thread Jan Schmidt
This removes a FIXME comment and introduces the first part of nodatasum
fixup: It gets the corresponding inode for a logical address and triggers a
regular readpage for the corrupted sector.

Once we have on-the-fly error correction our error will be automatically
corrected. The correction code is expected to clear the newly introduced
EXTENT_DAMAGED flag, making scrub report that error as "corrected" instead
of "uncorrectable" eventually.

Signed-off-by: Jan Schmidt 
---
 fs/btrfs/extent_io.h |1 +
 fs/btrfs/scrub.c |  188 --
 2 files changed, 183 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 22bf366..2734fd9 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -17,6 +17,7 @@
 #define EXTENT_NODATASUM (1 << 10)
 #define EXTENT_DO_ACCOUNTING (1 << 11)
 #define EXTENT_FIRST_DELALLOC (1 << 12)
+#define EXTENT_DAMAGED (1 << 13)
 #define EXTENT_IOBITS (EXTENT_LOCKED | EXTENT_WRITEBACK)
 #define EXTENT_CTLBITS (EXTENT_DO_ACCOUNTING | EXTENT_FIRST_DELALLOC)
 
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 12c08c0..563686f 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -22,6 +22,7 @@
 #include "volumes.h"
 #include "disk-io.h"
 #include "ordered-data.h"
+#include "transaction.h"
 #include "backref.h"
 
 /*
@@ -89,6 +90,7 @@ struct scrub_dev {
int first_free;
int curr;
atomic_tin_flight;
+   atomic_tfixup_cnt;
spinlock_t  list_lock;
wait_queue_head_t   list_wait;
u16 csum_size;
@@ -102,6 +104,14 @@ struct scrub_dev {
spinlock_t  stat_lock;
 };
 
+struct scrub_fixup_nodatasum {
+   struct scrub_dev*sdev;
+   u64 logical;
+   struct btrfs_root   *root;
+   struct btrfs_work   work;
+   int mirror_num;
+};
+
 struct scrub_warning {
struct btrfs_path   *path;
u64 extent_item_size;
@@ -190,12 +200,13 @@ struct scrub_dev *scrub_setup_dev(struct btrfs_device 
*dev)
 
if (i != SCRUB_BIOS_PER_DEV-1)
sdev->bios[i]->next_free = i + 1;
-else
+   else
sdev->bios[i]->next_free = -1;
}
sdev->first_free = 0;
sdev->curr = -1;
atomic_set(&sdev->in_flight, 0);
+   atomic_set(&sdev->fixup_cnt, 0);
atomic_set(&sdev->cancel_req, 0);
sdev->csum_size = btrfs_super_csum_size(&fs_info->super_copy);
INIT_LIST_HEAD(&sdev->csum_list);
@@ -347,6 +358,151 @@ out:
kfree(swarn.msg_buf);
 }
 
+static int scrub_fixup_readpage(u64 inum, u64 offset, u64 root, void *ctx)
+{
+   struct page *page;
+   unsigned long index;
+   struct scrub_fixup_nodatasum *fixup = ctx;
+   int ret;
+   int corrected;
+   struct btrfs_key key;
+   struct inode *inode;
+   u64 end = offset + PAGE_SIZE - 1;
+   struct btrfs_root *local_root;
+
+   key.objectid = root;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+   key.offset = (u64)-1;
+   local_root = btrfs_read_fs_root_no_name(fixup->root->fs_info, &key);
+   if (IS_ERR(local_root))
+   return PTR_ERR(local_root);
+
+   key.type = BTRFS_INODE_ITEM_KEY;
+   key.objectid = inum;
+   key.offset = 0;
+   inode = btrfs_iget(fixup->root->fs_info->sb, &key, local_root, NULL);
+   if (IS_ERR(inode))
+   return PTR_ERR(inode);
+
+   ret = set_extent_bit(&BTRFS_I(inode)->io_tree, offset, end,
+   EXTENT_DAMAGED, 0, NULL, NULL, GFP_NOFS);
+
+   /* set_extent_bit should either succeed or give proper error */
+   WARN_ON(ret > 0);
+   if (ret)
+   return ret < 0 ? ret : -EFAULT;
+
+   index = offset >> PAGE_CACHE_SHIFT;
+
+   page = find_or_create_page(inode->i_mapping, index, GFP_NOFS);
+   if (!page)
+   return -ENOMEM;
+
+   ret = extent_read_full_page(&BTRFS_I(inode)->io_tree, page,
+   btrfs_get_extent, fixup->mirror_num);
+   wait_on_page_locked(page);
+   corrected = !test_range_bit(&BTRFS_I(inode)->io_tree, offset, end,
+   EXTENT_DAMAGED, 0, NULL);
+
+   if (corrected)
+   WARN_ON(!PageUptodate(page));
+   else
+   clear_extent_bit(&BTRFS_I(inode)->io_tree, offset, end,
+   EXTENT_DAMAGED, 0, 0, NULL, GFP_NOFS);
+
+   put_page(page);
+   iput(inode);
+
+   if (ret < 0)
+   return ret;
+
+   if (ret == 0 && corrected) {
+   /*
+* we only need to call readpage for one of the inodes belonging
+* to this extent. so make iterate_extent_inodes stop
+

[PATCH v3 4/8] scrub: bugfix: mirror_num off by one

2011-07-07 Thread Jan Schmidt
Fix the mirror_num determination in scrub_stripe. The rest of the scrub code
did not use mirror_num for anything important and that error went unnoticed.
The nodatasum fixup patch of this set depends on a correct mirror_num.

Signed-off-by: Jan Schmidt 
---
 fs/btrfs/scrub.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 221fd5c..15fed35 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -930,21 +930,21 @@ static noinline_for_stack int scrub_stripe(struct 
scrub_dev *sdev,
if (map->type & BTRFS_BLOCK_GROUP_RAID0) {
offset = map->stripe_len * num;
increment = map->stripe_len * map->num_stripes;
-   mirror_num = 0;
+   mirror_num = 1;
} else if (map->type & BTRFS_BLOCK_GROUP_RAID10) {
int factor = map->num_stripes / map->sub_stripes;
offset = map->stripe_len * (num / map->sub_stripes);
increment = map->stripe_len * factor;
-   mirror_num = num % map->sub_stripes;
+   mirror_num = num % map->sub_stripes + 1;
} else if (map->type & BTRFS_BLOCK_GROUP_RAID1) {
increment = map->stripe_len;
-   mirror_num = num % map->num_stripes;
+   mirror_num = num % map->num_stripes + 1;
} else if (map->type & BTRFS_BLOCK_GROUP_DUP) {
increment = map->stripe_len;
-   mirror_num = num % map->num_stripes;
+   mirror_num = num % map->num_stripes + 1;
} else {
increment = map->stripe_len;
-   mirror_num = 0;
+   mirror_num = 1;
}
 
path = btrfs_alloc_path();
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 5/8] add mirror_num to extent_read_full_page

2011-07-07 Thread Jan Schmidt
Currently, extent_read_full_page always assumes we are trying to read mirror
0, which generally is the best we can do. To add flexibility, pass it as a
parameter. This will be needed by scrub fixup code.

Signed-off-by: Jan Schmidt 
---
 fs/btrfs/disk-io.c   |2 +-
 fs/btrfs/extent_io.c |6 +++---
 fs/btrfs/extent_io.h |2 +-
 fs/btrfs/inode.c |2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1ac8db5d..b898319 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -874,7 +874,7 @@ static int btree_readpage(struct file *file, struct page 
*page)
 {
struct extent_io_tree *tree;
tree = &BTRFS_I(page->mapping->host)->io_tree;
-   return extent_read_full_page(tree, page, btree_get_extent);
+   return extent_read_full_page(tree, page, btree_get_extent, 0);
 }
 
 static int btree_releasepage(struct page *page, gfp_t gfp_flags)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b181a94..b78f665 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2111,16 +2111,16 @@ static int __extent_read_full_page(struct 
extent_io_tree *tree,
 }
 
 int extent_read_full_page(struct extent_io_tree *tree, struct page *page,
-   get_extent_t *get_extent)
+   get_extent_t *get_extent, int mirror_num)
 {
struct bio *bio = NULL;
unsigned long bio_flags = 0;
int ret;
 
-   ret = __extent_read_full_page(tree, page, get_extent, &bio, 0,
+   ret = __extent_read_full_page(tree, page, get_extent, &bio, mirror_num,
  &bio_flags);
if (bio)
-   ret = submit_one_bio(READ, bio, 0, bio_flags);
+   ret = submit_one_bio(READ, bio, mirror_num, bio_flags);
return ret;
 }
 
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index a11a92e..22bf366 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -177,7 +177,7 @@ int unlock_extent_cached(struct extent_io_tree *tree, u64 
start, u64 end,
 int try_lock_extent(struct extent_io_tree *tree, u64 start, u64 end,
gfp_t mask);
 int extent_read_full_page(struct extent_io_tree *tree, struct page *page,
- get_extent_t *get_extent);
+ get_extent_t *get_extent, int mirror_num);
 int __init extent_io_init(void);
 void extent_io_exit(void);
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 447612d..18c3a3f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6248,7 +6248,7 @@ int btrfs_readpage(struct file *file, struct page *page)
 {
struct extent_io_tree *tree;
tree = &BTRFS_I(page->mapping->host)->io_tree;
-   return extent_read_full_page(tree, page, btrfs_get_extent);
+   return extent_read_full_page(tree, page, btrfs_get_extent, 0);
 }
 
 static int btrfs_writepage(struct page *page, struct writeback_control *wbc)
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/8] scrub: added unverified_errors

2011-07-07 Thread Jan Schmidt
In normal operation, scrub is reading data sequentially in large portions.
In case of an i/o error, we try to find the corrupted area(s) by issuing
page sized read requests. With this commit we increment the
unverified_errors counter if all of the small size requests succeed.

Userland patches carrying such conspicous events to the administrator should
already be around.

Signed-off-by: Jan Schmidt 
---
 fs/btrfs/scrub.c |   37 ++---
 1 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index a8d03d5..35099fa 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -201,18 +201,25 @@ nomem:
  * recheck_error gets called for every page in the bio, even though only
  * one may be bad
  */
-static void scrub_recheck_error(struct scrub_bio *sbio, int ix)
+static int scrub_recheck_error(struct scrub_bio *sbio, int ix)
 {
+   struct scrub_dev *sdev = sbio->sdev;
+   u64 sector = (sbio->physical + ix * PAGE_SIZE) >> 9;
+
if (sbio->err) {
-   if (scrub_fixup_io(READ, sbio->sdev->dev->bdev,
-  (sbio->physical + ix * PAGE_SIZE) >> 9,
+   if (scrub_fixup_io(READ, sbio->sdev->dev->bdev, sector,
   sbio->bio->bi_io_vec[ix].bv_page) == 0) {
if (scrub_fixup_check(sbio, ix) == 0)
-   return;
+   return 0;
}
}
 
+   spin_lock(&sdev->stat_lock);
+   ++sdev->stat.read_errors;
+   spin_unlock(&sdev->stat_lock);
+
scrub_fixup(sbio, ix);
+   return 1;
 }
 
 static int scrub_fixup_check(struct scrub_bio *sbio, int ix)
@@ -382,8 +389,14 @@ static void scrub_checksum(struct btrfs_work *work)
int ret;
 
if (sbio->err) {
+   ret = 0;
for (i = 0; i < sbio->count; ++i)
-   scrub_recheck_error(sbio, i);
+   ret |= scrub_recheck_error(sbio, i);
+   if (!ret) {
+   spin_lock(&sdev->stat_lock);
+   ++sdev->stat.unverified_errors;
+   spin_unlock(&sdev->stat_lock);
+   }
 
sbio->bio->bi_flags &= ~(BIO_POOL_MASK - 1);
sbio->bio->bi_flags |= 1 << BIO_UPTODATE;
@@ -396,10 +409,6 @@ static void scrub_checksum(struct btrfs_work *work)
bi->bv_offset = 0;
bi->bv_len = PAGE_SIZE;
}
-
-   spin_lock(&sdev->stat_lock);
-   ++sdev->stat.read_errors;
-   spin_unlock(&sdev->stat_lock);
goto out;
}
for (i = 0; i < sbio->count; ++i) {
@@ -420,8 +429,14 @@ static void scrub_checksum(struct btrfs_work *work)
WARN_ON(1);
}
kunmap_atomic(buffer, KM_USER0);
-   if (ret)
-   scrub_recheck_error(sbio, i);
+   if (ret) {
+   ret = scrub_recheck_error(sbio, i);
+   if (!ret) {
+   spin_lock(&sdev->stat_lock);
+   ++sdev->stat.unverified_errors;
+   spin_unlock(&sdev->stat_lock);
+   }
+   }
}
 
 out:
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix return value check of btrfs_alloc_path()

2011-07-07 Thread Josef Bacik

On 07/07/2011 05:31 AM, Tsutomu Itoh wrote:

The return value check of btrfs_alloc_path() in several places is
changed from BUG_ON() to error return.

Signed-off-by: Tsutomu Itoh


Reviewed-by: Josef Bacik 

Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device failure hangs the system

2011-07-07 Thread Anand Jain



Josef,



Well that's a neat trick, do you have a way to undo that action too?
Seems a rescan doesn't make it show back up.



hope the following helps..
-
 # fdisk -l /dev/sdg | egrep "Disk /"
 Disk /dev/sdg: 4294 MB, 4294967296 bytes

 # x=`ls -l /sys/class/block/sdg | cut -d "/" -f12 | sed 's/:/ /g'`
 # echo "scsi remove-single-device ${x}" > /proc/scsi/scsi

 # fdisk -l /dev/sdg | egrep "Disk /"

 # echo "scsi add-single-device ${x}" > /proc/scsi/scsi

 # fdisk -l /dev/sdg | egrep "Disk /"
 Disk /dev/sdg: 4294 MB, 4294967296 bytes
-



 Please try the patch I just posted to the list to fix this problem.  Thanks,


 Facing some challenges to upgrade my machine to 3.0.0-rc6, so is
 the delay.

 Thanks for the patch.

Anand


Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix return value check of btrfs_alloc_path()

2011-07-07 Thread Tsutomu Itoh
The return value check of btrfs_alloc_path() in several places is
changed from BUG_ON() to error return.

Signed-off-by: Tsutomu Itoh 
---
 fs/btrfs/extent-tree.c |3 ++-
 fs/btrfs/extent_io.c   |9 ++---
 fs/btrfs/inode.c   |   15 +++
 fs/btrfs/ioctl.c   |1 +
 4 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 71cd456..624ca25 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5494,7 +5494,8 @@ static int alloc_reserved_tree_block(struct 
btrfs_trans_handle *trans,
u32 size = sizeof(*extent_item) + sizeof(*block_info) + sizeof(*iref);
 
path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;
 
path->leave_spinning = 1;
ret = btrfs_insert_empty_item(trans, fs_info->extent_root, path,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b181a94..9703b65 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -,9 +,12 @@ static int __extent_writepage(struct page *page, struct 
writeback_control *wbc,
delalloc_start = delalloc_end + 1;
continue;
}
-   tree->ops->fill_delalloc(inode, page, delalloc_start,
-delalloc_end, &page_started,
-&nr_written);
+   ret = tree->ops->fill_delalloc(inode, page,
+  delalloc_start,
+  delalloc_end,
+  &page_started,
+  &nr_written);
+   BUG_ON(ret);
/*
 * delalloc_end is already one less than the total
 * length, so we don't subtract one from
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 447612d..d0dee5e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1070,7 +1070,8 @@ static noinline int run_delalloc_nocow(struct inode 
*inode,
u64 ino = btrfs_ino(inode);
 
path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;
 
nolock = is_free_space_inode(root, inode);
 
@@ -3711,7 +3712,8 @@ static int btrfs_inode_by_name(struct inode *dir, struct 
dentry *dentry,
int ret = 0;
 
path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;
 
di = btrfs_lookup_dir_item(NULL, root, path, btrfs_ino(dir), name,
namelen, 0);
@@ -4436,7 +4438,8 @@ static struct inode *btrfs_new_inode(struct 
btrfs_trans_handle *trans,
int owner;
 
path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return ERR_PTR(-ENOMEM);
 
inode = new_inode(root->fs_info->sb);
if (!inode) {
@@ -7192,7 +7195,11 @@ static int btrfs_symlink(struct inode *dir, struct 
dentry *dentry,
goto out_unlock;
 
path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path) {
+   err = -ENOMEM;
+   drop_inode = 1;
+   goto out_unlock;
+   }
key.objectid = btrfs_ino(inode);
key.offset = 0;
btrfs_set_key_type(&key, BTRFS_EXTENT_DATA_KEY);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a3c4751..b12f7fe 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -415,6 +415,7 @@ static noinline int create_subvol(struct btrfs_root *root,
btrfs_record_root_in_trans(trans, new_root);
 
ret = btrfs_create_subvol_root(trans, new_root, new_dirid);
+   BUG_ON(ret);
/*
 * insert the directory item
 */


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs: open_ctree failed

2011-07-07 Thread Yulin, Denis
Hi all, just apologize for my English.
In a nice, warm evening, turn off the electricity, and my router with
btrfs on root, broke ..
I removed the image of a partition (10G), I will now rearrange the
system is probably on ext4 ..

Start the discussion - http://www.linux.org.ru/forum/general/6465851
OS during the fall - Debian 6 (2.6.39 kernel from sid), mounted
filesystem with parameters compression=lzo
btfs-progs v0.19 from git.
btrfs-show - all ok;
FS is the LVM;

root@sysresccd /root % mount -t btrfs -o compress=lzo /dev/mapper/nas-root /re
mount: wrong fs type, bad option, bad superblock on /dev/mapper/nas-root,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so
root@sysresccd /root % dmesg | tail
[ 3821.972350] parent transid verify failed on 3807195136 wanted 5412 found 5414
[ 3821.972364] parent transid verify failed on 3807195136 wanted 5412 found 5414
[ 3821.979182] btrfs: open_ctree failed
[ 6298.660270] device label root devid 1 transid 12174 /dev/mapper/nas-root
[ 6298.660657] btrfs: use lzo compression
[ 6298.662878] parent transid verify failed on 3807195136 wanted 5412 found 5414
[ 6298.663321] parent transid verify failed on 3807195136 wanted 5412 found 5414
[ 6298.663584] parent transid verify failed on 3807195136 wanted 5412 found 5414
[ 6298.663595] parent transid verify failed on 3807195136 wanted 5412 found 5414
[ 6298.669180] btrfs: open_ctree failed
root@sysresccd /root % btrfsck /dev/nas/root
parent transid verify failed on 3807195136 wanted 5412 found 5414
parent transid verify failed on 3807195136 wanted 5412 found 5414
parent transid verify failed on 3807195136 wanted 5412 found 5414
btrfsck: disk-io.c:416: find_and_setup_root: Assertion `!(!root->node)' failed.
zsh: abort      btrfsck /dev/nas/root
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory leak?

2011-07-07 Thread Stephane Chazelas
2011-07-07 16:20:20 +0800, Li Zefan:
[...]
> btrfs_inode_cache is a slab cache for in memory inodes, which is of
> struct btrfs_inode.
[...]

Thanks Li.

If that's a cache, the system should be able to reuse the space
there when it's low on memory, wouldn't it? What would be the
conditions where that couldn't be done? (like in my case, where
the oom killer was hired to free memory rather than reclaiming
that cache memory).

Best regards,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory leak?

2011-07-07 Thread Li Zefan
Stephane Chazelas wrote:
> 2011-07-06 09:11:11 +0100, Stephane Chazelas:
> [...]
>> extent_map delayed_node btrfs_inode_cache btrfs_free_space_cache
>> (in bytes)
> [...]
>> 01:00  267192640  668595744 23216460003418048
>> 01:10  267192640  668595744 23216460003418048
>> 01:20  267192640  668595744 23216460003418048
>> 01:30  267192640  668595744 23216460003418048
>> 01:40  267192640  668595744 23216460003418048
> [...]
> 
> I've just come accross
> http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commit;h=4b9465cb9e3859186eefa1ca3b990a5849386320
> 
> GIT> author   Chris Mason 
> GIT>  Fri, 3 Jun 2011 13:36:29 + (09:36 -0400)
> GIT> committerChris Mason 
> GIT>  Sat, 4 Jun 2011 12:03:47 + (08:03 -0400)
> GIT> commit   4b9465cb9e3859186eefa1ca3b990a5849386320
> GIT> tree 8fc06452fb75e52f6c1c2e2253c2ff6700e622fdtree | snapshot
> GIT> parent   e7786c3ae517b2c433edc91714e86be770e9f1cecommit | diff
> GIT> Btrfs: add mount -o inode_cache
> GIT> 
> GIT> This makes the inode map cache default to off until we
> GIT> fix the overflow problem when the free space crcs don't fit
> GIT> inside a single page.
> 
> I would have thought that would have disabled that
> btrfs_inode_cache. And I can see that patch is in 3.0.0-rc5 (I'm
> not mounting with -o inode_cache). So, why those 2.2GiB in
> btrfs_inode_cache above?
> 

This should be irrelevant to your problem..

btrfs_inode_cache is a slab cache for in memory inodes, which is of
struct btrfs_inode.

while the ino_cache is a cache in which the entries are ranges of free
inode numbers, and currently it won't be enabled unless you mount with
inode_cache option.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory leak?

2011-07-07 Thread Stephane Chazelas
2011-07-06 09:11:11 +0100, Stephane Chazelas:
[...]
> extent_map delayed_node btrfs_inode_cache btrfs_free_space_cache
> (in bytes)
[...]
> 01:00  267192640  668595744 23216460003418048
> 01:10  267192640  668595744 23216460003418048
> 01:20  267192640  668595744 23216460003418048
> 01:30  267192640  668595744 23216460003418048
> 01:40  267192640  668595744 23216460003418048
[...]

I've just come accross
http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commit;h=4b9465cb9e3859186eefa1ca3b990a5849386320

GIT> author Chris Mason 
GIT>Fri, 3 Jun 2011 13:36:29 + (09:36 -0400)
GIT> committer  Chris Mason 
GIT>Sat, 4 Jun 2011 12:03:47 + (08:03 -0400)
GIT> commit 4b9465cb9e3859186eefa1ca3b990a5849386320
GIT> tree   8fc06452fb75e52f6c1c2e2253c2ff6700e622fdtree | snapshot
GIT> parent e7786c3ae517b2c433edc91714e86be770e9f1cecommit | diff
GIT> Btrfs: add mount -o inode_cache
GIT> 
GIT> This makes the inode map cache default to off until we
GIT> fix the overflow problem when the free space crcs don't fit
GIT> inside a single page.

I would have thought that would have disabled that
btrfs_inode_cache. And I can see that patch is in 3.0.0-rc5 (I'm
not mounting with -o inode_cache). So, why those 2.2GiB in
btrfs_inode_cache above?

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html