Re: please review snapshot corruption path with delayed metadata insertion
Hi, Chris, (2011/07/08 5:26), Chris Mason wrote: > Excerpts from Tsutomu Itoh's message of 2011-07-01 04:11:28 -0400: >> Hi, Miao, >> >> (2011/06/30 15:32), Miao Xie wrote: >>> Hi, Itoh-san >>> >>> Could you test the following patch to check whether it can fix the bug or >>> not? >>> I have tested it on my x86_64 machine by your test script for two days, it >>> worked well. >> >> I ran my test script about a day, I was not able to reproduce this BUG. > > Can you please try this patch with the inode_cache option (in addition > to Miao's code). Unfortunately, I encountered following panic. = btrfs: relocating block group 17746100224 flags 20 btrfs: relocating block group 12377391104 flags 9 btrfs: found 4181 extents [ cut here ] kernel BUG at fs/btrfs/relocation.c:2502! invalid opcode: [#1] SMP last sysfs file: /sys/kernel/mm/ksm/run CPU 0 Modules linked in: btrfs zlib_deflate crc32c libcrc32c autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata scsi_mod floppy [last unloaded: microcode] Pid: 26214, comm: btrfs Not tainted 2.6.39btrfs-test5+ #2 FUJITSU-SV PRIMERGY/D2399 RIP: 0010:[] [] do_relocation+0x562/0x590 [btrfs] RSP: 0018:8801622519a8 EFLAGS: 00010202 RAX: 0001 RBX: 8800d2754140 RCX: 0001 RDX: RSI: 8800 RDI: RBP: 880162251a78 R08: R09: 02e9 R10: R11: 0026 R12: 880161f2fb40 R13: 8800cd81eac0 R14: 880080038000 R15: FS: 7f4081d05740() GS:88019fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0033cfea6a60 CR3: 00015d345000 CR4: 06f0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process btrfs (pid: 26214, threadinfo 88016225, task 880161c3eab0) Stack: 880191f006d0 8800cd81eac0 880191f005b0 8800cd81eb00 88016225b000 880079e0 000162251a48 88016225 880162251a78 880193a26930 000100251a78 880193a26930 Call Trace: [] ? block_rsv_add_bytes+0x2b/0x70 [btrfs] [] relocate_tree_blocks+0x62b/0x6e0 [btrfs] [] ? read_extent_buffer+0xd8/0x1d0 [btrfs] [] ? add_data_references+0x263/0x280 [btrfs] [] relocate_block_group+0x272/0x620 [btrfs] [] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs] [] ? btrfs_tree_unlock+0x50/0x50 [btrfs] [] btrfs_relocate_chunk+0x8b/0x670 [btrfs] [] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs] [] ? read_extent_buffer+0xd8/0x1d0 [btrfs] [] ? btrfs_previous_item+0xb1/0x150 [btrfs] [] ? read_extent_buffer+0xd8/0x1d0 [btrfs] [] btrfs_balance+0x21a/0x2a0 [btrfs] [] ? path_openat+0x101/0x3d0 [] btrfs_ioctl+0x798/0xd20 [btrfs] [] ? handle_mm_fault+0x148/0x270 [] ? do_page_fault+0x1d8/0x4b0 [] do_vfs_ioctl+0x9a/0x540 [] sys_ioctl+0xa1/0xb0 [] system_call_fastpath+0x16/0x1b Code: 0f 0b 0f 1f 80 00 00 00 00 eb f7 0f 0b eb fe 0f 0b 0f 1f 84 00 00 00 00 00 eb f6 0f 0b eb fe 0f 0b 0f 1f 84 00 00 00 00 00 eb f6 <0f> 0b eb fe 48 83 7a 68 00 0f 1f 44 00 00 0f 84 d2 fa ff ff 0f RIP [] do_relocation+0x562/0x590 [btrfs] RSP (gdb) l *do_relocation+0x562 0x6f922 is in do_relocation (fs/btrfs/relocation.c:2502). 2497ret = btrfs_search_slot(trans, root, key, path, 0, 1); 2498if (ret < 0) { 2499err = ret; 2500break; 2501} 2502BUG_ON(ret > 0); 2503 2504if (!upper->eb) { 2505upper->eb = path->nodes[upper->level]; 2506path->nodes[upper->level] = NULL; (gdb) > > commit d0243d46f7a1e4cd57c74fa14556be65b454687d > Author: Chris Mason > Date: Thu Jul 7 15:53:12 2011 -0400 > > Btrfs: write out free inode cache before taking snapshots > > The btrfs snapshotting code requires that once a root has been > snapshotted, we don't change it during a commit > > But the free inode cache was changing the roots when it root the cache, > which lead to corruptions. > > This fixes things by making sure we write the cache while we are taking > the snapshot, and that we don't write it again later. > > Signed-off-by: Chris Mason > > diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c > index bf0d615..d594cf7 100644 > -
Re: please review snapshot corruption path with delayed metadata insertion
Excerpts from Tsutomu Itoh's message of 2011-07-07 19:51:09 -0400: > Hi, Chris, > > (2011/07/08 5:26), Chris Mason wrote: > > Excerpts from Tsutomu Itoh's message of 2011-07-01 04:11:28 -0400: > >> Hi, Miao, > >> > >> (2011/06/30 15:32), Miao Xie wrote: > >>> Hi, Itoh-san > >>> > >>> Could you test the following patch to check whether it can fix the bug or > >>> not? > >>> I have tested it on my x86_64 machine by your test script for two days, > >>> it worked well. > >> > >> I ran my test script about a day, I was not able to reproduce this BUG. > > > > Can you please try this patch with the inode_cache option (in addition > > to Miao's code). > > In my clarification. > > I do only have to apply this patch to 'btrfs-unstable + (current)for-linus'? > or, other patches also necessary? > Hi, sorry that I wasn't clear. You can apply it to the current for-linus branch, which has Miao's fix to keep from doing delayed metadata updates on the relocation inode. -chris > Thanks, > Tsutomu > > > > > commit d0243d46f7a1e4cd57c74fa14556be65b454687d > > Author: Chris Mason > > Date: Thu Jul 7 15:53:12 2011 -0400 > > > > Btrfs: write out free inode cache before taking snapshots > > > > The btrfs snapshotting code requires that once a root has been > > snapshotted, we don't change it during a commit > > > > But the free inode cache was changing the roots when it root the cache, > > which lead to corruptions. > > > > This fixes things by making sure we write the cache while we are taking > > the snapshot, and that we don't write it again later. > > > > Signed-off-by: Chris Mason > > > > diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c > > index bf0d615..d594cf7 100644 > > --- a/fs/btrfs/free-space-cache.c > > +++ b/fs/btrfs/free-space-cache.c > > @@ -1651,6 +1651,7 @@ int __btrfs_add_free_space(struct > > btrfs_free_space_ctl *ctl, > > info->bytes = bytes; > > > > spin_lock(&ctl->tree_lock); > > +ctl->dirty = 1; > > > > if (try_merge_free_space(ctl, info, true)) > > goto link; > > @@ -1691,6 +1692,7 @@ int btrfs_remove_free_space(struct > > btrfs_block_group_cache *block_group, > > int ret = 0; > > > > spin_lock(&ctl->tree_lock); > > +ctl->dirty = 1; > > > > again: > > info = tree_search_offset(ctl, offset, 0, 0); > > @@ -2589,6 +2591,7 @@ u64 btrfs_find_ino_for_alloc(struct btrfs_root > > *fs_root) > > if (entry->bytes == 0) > > free_bitmap(ctl, entry); > > } > > +ctl->dirty = 1; > > out: > > spin_unlock(&ctl->tree_lock); > > > > @@ -2688,6 +2691,10 @@ int btrfs_write_out_ino_cache(struct btrfs_root > > *root, > > printk(KERN_ERR "btrfs: failed to write free ino cache " > > "for root %llu\n", root->root_key.objectid); > > > > +/* we write out at transaction commit time, there's no racing. */ > > +if (ret == 0) > > +ctl->dirty = 0; > > + > > iput(inode); > > return ret; > > } > > diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h > > index 8f2613f..1e92c93 100644 > > --- a/fs/btrfs/free-space-cache.h > > +++ b/fs/btrfs/free-space-cache.h > > @@ -35,6 +35,11 @@ struct btrfs_free_space_ctl { > > int free_extents; > > int total_bitmaps; > > int unit; > > +/* > > + * record if we've changed since written. This can turn > > + * into a bit field if we need more flags > > + */ > > +unsigned long dirty; > > u64 start; > > struct btrfs_free_space_op *op; > > void *private; > > diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c > > index b4087e0..e7c1493 100644 > > --- a/fs/btrfs/inode-map.c > > +++ b/fs/btrfs/inode-map.c > > @@ -376,6 +376,7 @@ void btrfs_init_free_ino_ctl(struct btrfs_root *root) > > ctl->start = 0; > > ctl->private = NULL; > > ctl->op = &free_ino_op; > > +ctl->dirty = 1; > > > > /* > > * Initially we allow to use 16K of ram to cache chunks of > > @@ -417,6 +418,9 @@ int btrfs_save_ino_cache(struct btrfs_root *root, > > if (!btrfs_test_opt(root, INODE_MAP_CACHE)) > > return 0; > > > > +if (!ctl->dirty) > > +return 0; > > + > > path = btrfs_alloc_path(); > > if (!path) > > return -ENOMEM; > > @@ -485,6 +489,24 @@ out: > > return ret; > > } > > > > +/* > > + * this tries to save the cache, but if it fails for any reason we clear > > + * the dirty flag so that it won't be saved again during this commit. > > + * > > + * This is used by the snapshotting code to make sure we don't corrupt the > > + * FS by saving the inode cache after the snapshot is taken. > > + */ > > +int btrfs_force_save_ino_cache(struct btrfs_root *root, > > + struct btrfs_trans_handle *trans) > > +{ > > +struct btrfs_free_space_ctl *ctl = root->free_ino_ctl; > > +int ret; > > +ret = btrfs_save_ino_cache(root, trans)
Re: please review snapshot corruption path with delayed metadata insertion
Hi, Chris, (2011/07/08 5:26), Chris Mason wrote: > Excerpts from Tsutomu Itoh's message of 2011-07-01 04:11:28 -0400: >> Hi, Miao, >> >> (2011/06/30 15:32), Miao Xie wrote: >>> Hi, Itoh-san >>> >>> Could you test the following patch to check whether it can fix the bug or >>> not? >>> I have tested it on my x86_64 machine by your test script for two days, it >>> worked well. >> >> I ran my test script about a day, I was not able to reproduce this BUG. > > Can you please try this patch with the inode_cache option (in addition > to Miao's code). In my clarification. I do only have to apply this patch to 'btrfs-unstable + (current)for-linus'? or, other patches also necessary? Thanks, Tsutomu > > commit d0243d46f7a1e4cd57c74fa14556be65b454687d > Author: Chris Mason > Date: Thu Jul 7 15:53:12 2011 -0400 > > Btrfs: write out free inode cache before taking snapshots > > The btrfs snapshotting code requires that once a root has been > snapshotted, we don't change it during a commit > > But the free inode cache was changing the roots when it root the cache, > which lead to corruptions. > > This fixes things by making sure we write the cache while we are taking > the snapshot, and that we don't write it again later. > > Signed-off-by: Chris Mason > > diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c > index bf0d615..d594cf7 100644 > --- a/fs/btrfs/free-space-cache.c > +++ b/fs/btrfs/free-space-cache.c > @@ -1651,6 +1651,7 @@ int __btrfs_add_free_space(struct btrfs_free_space_ctl > *ctl, > info->bytes = bytes; > > spin_lock(&ctl->tree_lock); > + ctl->dirty = 1; > > if (try_merge_free_space(ctl, info, true)) > goto link; > @@ -1691,6 +1692,7 @@ int btrfs_remove_free_space(struct > btrfs_block_group_cache *block_group, > int ret = 0; > > spin_lock(&ctl->tree_lock); > + ctl->dirty = 1; > > again: > info = tree_search_offset(ctl, offset, 0, 0); > @@ -2589,6 +2591,7 @@ u64 btrfs_find_ino_for_alloc(struct btrfs_root *fs_root) > if (entry->bytes == 0) > free_bitmap(ctl, entry); > } > + ctl->dirty = 1; > out: > spin_unlock(&ctl->tree_lock); > > @@ -2688,6 +2691,10 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root, > printk(KERN_ERR "btrfs: failed to write free ino cache " > "for root %llu\n", root->root_key.objectid); > > + /* we write out at transaction commit time, there's no racing. */ > + if (ret == 0) > + ctl->dirty = 0; > + > iput(inode); > return ret; > } > diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h > index 8f2613f..1e92c93 100644 > --- a/fs/btrfs/free-space-cache.h > +++ b/fs/btrfs/free-space-cache.h > @@ -35,6 +35,11 @@ struct btrfs_free_space_ctl { > int free_extents; > int total_bitmaps; > int unit; > + /* > + * record if we've changed since written. This can turn > + * into a bit field if we need more flags > + */ > + unsigned long dirty; > u64 start; > struct btrfs_free_space_op *op; > void *private; > diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c > index b4087e0..e7c1493 100644 > --- a/fs/btrfs/inode-map.c > +++ b/fs/btrfs/inode-map.c > @@ -376,6 +376,7 @@ void btrfs_init_free_ino_ctl(struct btrfs_root *root) > ctl->start = 0; > ctl->private = NULL; > ctl->op = &free_ino_op; > + ctl->dirty = 1; > > /* >* Initially we allow to use 16K of ram to cache chunks of > @@ -417,6 +418,9 @@ int btrfs_save_ino_cache(struct btrfs_root *root, > if (!btrfs_test_opt(root, INODE_MAP_CACHE)) > return 0; > > + if (!ctl->dirty) > + return 0; > + > path = btrfs_alloc_path(); > if (!path) > return -ENOMEM; > @@ -485,6 +489,24 @@ out: > return ret; > } > > +/* > + * this tries to save the cache, but if it fails for any reason we clear > + * the dirty flag so that it won't be saved again during this commit. > + * > + * This is used by the snapshotting code to make sure we don't corrupt the > + * FS by saving the inode cache after the snapshot is taken. > + */ > +int btrfs_force_save_ino_cache(struct btrfs_root *root, > +struct btrfs_trans_handle *trans) > +{ > + struct btrfs_free_space_ctl *ctl = root->free_ino_ctl; > + int ret; > + ret = btrfs_save_ino_cache(root, trans); > + > + ctl->dirty = 0; > + return ret; > +} > + > static int btrfs_find_highest_objectid(struct btrfs_root *root, u64 > *objectid) > { > struct btrfs_path *path; > diff --git a/fs/btrfs/inode-map.h b/fs/btrfs/inode-map.h > index ddb347b..2be060e 100644 > --- a/fs/btrfs/inode-map.h > +++ b/fs/btrfs/inode-map.h > @@ -7,7 +7,8 @@ void btrfs_return_ino(struct btrfs_root *root, u64 objectid); > int btrfs_find_free_ino(struc
Re: [PATCH v1 0/2] Btrfs-progs: commands "resolve inode" and "resolve logical"
On 07/07/2011 06:01 PM, Jan Schmidt wrote: > The kernel patch series just sent (Subject: "Btrfs: scrub: print path to > corrupted files and trigger nodatasum fixup") introduces two new ioctls to > do in-kernel filesystem path construction. This series provides the > corresponding userspace changes, adding two new commands to the btrfs utility: Which is the aim of these commands ? It seems more a "debug" utilities than a standard command. If so, these commands may be put under a new group called "debug" or "test" or whichever we decided to use. But, please, highlight the fact that these commands aren't for a general use. I suggest to use btrfs debug resolve ... Or better btrfs inspect resolve ... > > -- > btrfs resolve inode [-v] > resolves an to all filesystem paths local to the fs mounted > at . > -v print count of returned and missed paths > > btrfs resolve logical [-v] [-P] > resolves a address to all filesystem paths in the file > system mounted at and all its subvolumes. > -v print count of returned and missed inode/offset/root > triples > -P do not resolve the path but stop after finding all > inodes at this logical address and print them instead > -- > > These patches are based on Hugo's current integration branch. > > Please try them out and report bugs here. I'll send an update to the manpages > later. Please update the man pages at the same time of the code. Develop the man page coupled with the code may help to design a "good interface" (from an user point of view) and to explain better the aim of the new command. BR G.Baroncelli > > -Jan > > Jan Schmidt (2): > btrfs-list: split list_subvols > added ioctls and commands to resolve inodes and logical addresses > > btrfs-list.c | 139 ++ > btrfs.c | 10 +++ > btrfs_cmds.c | 177 > ++ > btrfs_cmds.h |3 + > ioctl.h | 29 ++ > 5 files changed, 323 insertions(+), 35 deletions(-) > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 8/8] new ioctls to do logical->inode and inode->path resolving
Hi, On 07/08/2011 12:29 AM, Hugo Mills wrote: >Hi, Jan, > > On Thu, Jul 07, 2011 at 05:48:33PM +0200, Jan Schmidt wrote: >> these ioctls make use of the new functions initially added for scrub. they >> return all inodes belonging to a logical address (BTRFS_IOC_LOGICAL_INO) and >> all paths belonging to an inode (BTRFS_IOC_INO_PATHS). > >I've not read this patch in detail, so I may have missed something, > but why do we need new ioctls for these functions, when we have > BTRFS_IOC_TREE_SEARCH, which will allow us to perform the same two > operations using existing kernel-side infrastructure? > >Hugo. Note that those ioctls do a lot more than just one tree search. You are right, we could implement all this with (quite a few) BTRFS_IOC_TREE_SEARCH ioctls. Especially resolving all file system paths for an inode needs really a lot of searches. I like to have logic requiring deep knowledge of the internals of btrfs trees in kernel, generally. Not to mention that this way we are safe to run this on a file system under load and still can get consistent results. Last but not least, if we want to use this code for general error reporting to the kernel log (e.g. by scrub), we need all the resolving code in kernel anyway. So I'd like to provide that functionality to user space, too. -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 8/8] new ioctls to do logical->inode and inode->path resolving
Hi, Jan, On Thu, Jul 07, 2011 at 05:48:33PM +0200, Jan Schmidt wrote: > these ioctls make use of the new functions initially added for scrub. they > return all inodes belonging to a logical address (BTRFS_IOC_LOGICAL_INO) and > all paths belonging to an inode (BTRFS_IOC_INO_PATHS). I've not read this patch in detail, so I may have missed something, but why do we need new ioctls for these functions, when we have BTRFS_IOC_TREE_SEARCH, which will allow us to perform the same two operations using existing kernel-side infrastructure? Hugo. > Signed-off-by: Jan Schmidt > --- > fs/btrfs/ioctl.c | 134 > ++ > fs/btrfs/ioctl.h | 19 > 2 files changed, 153 insertions(+), 0 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index a3c4751..5299b40 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -51,6 +51,7 @@ > #include "volumes.h" > #include "locking.h" > #include "inode-map.h" > +#include "backref.h" > > /* Mask out flags that are inappropriate for the given type of inode. */ > static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags) > @@ -2836,6 +2837,135 @@ static long btrfs_ioctl_scrub_progress(struct > btrfs_root *root, > return ret; > } > > +static long btrfs_ioctl_ino_to_path(struct btrfs_root *root, void __user > *arg) > +{ > + int ret = 0; > + int i; > + unsigned long rel_ptr; > + int size; > + struct btrfs_ioctl_ino_path_args *ipa; > + struct inode_fs_paths *ipath = NULL; > + struct btrfs_path *path; > + > + path = btrfs_alloc_path(); > + if (!path) { > + ret = -ENOMEM; > + goto out; > + } > + > + ipa = memdup_user(arg, sizeof(*ipa)); > + if (IS_ERR(ipa)) { > + ret = PTR_ERR(ipa); > + ipa = NULL; > + goto out; > + } > + > + size = min(ipa->size, 4096); > + ipath = init_ipath(size, root, path); > + if (IS_ERR(ipath)) { > + ret = PTR_ERR(ipath); > + ipath = NULL; > + goto out; > + } > + > + ret = paths_from_inode(ipa->inum, ipath); > + if (ret < 0) > + goto out; > + > + for (i = 0; i < ipath->fspath->elem_cnt; ++i) { > + rel_ptr = ipath->fspath->str[i] - (char *)ipath->fspath->str; > + ipath->fspath->str[i] = (void *)rel_ptr; > + } > + > + ret = copy_to_user(ipa->fspath, ipath->fspath, size); > + if (ret) { > + ret = -EFAULT; > + goto out; > + } > + > +out: > + btrfs_free_path(path); > + free_ipath(ipath); > + kfree(ipa); > + > + return ret; > +} > + > +static int build_ino_list(u64 inum, u64 offset, u64 root, void *ctx) > +{ > + struct btrfs_data_container *inodes = ctx; > + > + inodes->size -= 3 * sizeof(u64); > + if (inodes->size > 0) { > + inodes->val[inodes->elem_cnt] = inum; > + inodes->val[inodes->elem_cnt + 1] = offset; > + inodes->val[inodes->elem_cnt + 2] = root; > + inodes->elem_cnt += 3; > + } else { > + inodes->elem_missed += 3; > + } > + > + return 0; > +} > + > +static long btrfs_ioctl_logical_to_ino(struct btrfs_root *root, > + void __user *arg) > +{ > + int ret = 0; > + int size; > + u64 extent_offset; > + struct btrfs_ioctl_logical_ino_args *loi; > + struct btrfs_data_container *inodes = NULL; > + struct btrfs_path *path = NULL; > + struct btrfs_key key; > + > + loi = memdup_user(arg, sizeof(*loi)); > + if (IS_ERR(loi)) { > + ret = PTR_ERR(loi); > + loi = NULL; > + goto out; > + } > + > + path = btrfs_alloc_path(); > + if (!path) { > + ret = -ENOMEM; > + goto out; > + } > + > + size = min(loi->size, 4096); > + inodes = init_data_container(size); > + if (IS_ERR(inodes)) { > + ret = PTR_ERR(inodes); > + inodes = NULL; > + goto out; > + } > + > + ret = extent_from_logical(root->fs_info, loi->logical, path, &key); > + > + if (ret & BTRFS_EXTENT_FLAG_TREE_BLOCK) > + ret = -ENOENT; > + if (ret < 0) > + goto out; > + > + extent_offset = loi->logical - key.objectid; > + ret = iterate_extent_inodes(root->fs_info, path, key.objectid, > + extent_offset, build_ino_list, inodes); > + > + if (ret < 0) > + goto out; > + > + ret = copy_to_user(loi->inodes, inodes, size); > + if (ret) > + ret = -EFAULT; > + > +out: > + btrfs_free_path(path); > + kfree(inodes); > + kfree(loi); > + > + return ret; > +} > + > long btrfs_ioctl(struct file *file, unsigned int > cmd, unsigned long arg) > { > @@ -2893,6 +3023,10 @@ long btrfs_ioctl(struct file *file, unsigned int > return btrfs_i
Re: please review snapshot corruption path with delayed metadata insertion
Excerpts from Tsutomu Itoh's message of 2011-07-01 04:11:28 -0400: > Hi, Miao, > > (2011/06/30 15:32), Miao Xie wrote: > > Hi, Itoh-san > > > > Could you test the following patch to check whether it can fix the bug or > > not? > > I have tested it on my x86_64 machine by your test script for two days, it > > worked well. > > I ran my test script about a day, I was not able to reproduce this BUG. Can you please try this patch with the inode_cache option (in addition to Miao's code). commit d0243d46f7a1e4cd57c74fa14556be65b454687d Author: Chris Mason Date: Thu Jul 7 15:53:12 2011 -0400 Btrfs: write out free inode cache before taking snapshots The btrfs snapshotting code requires that once a root has been snapshotted, we don't change it during a commit But the free inode cache was changing the roots when it root the cache, which lead to corruptions. This fixes things by making sure we write the cache while we are taking the snapshot, and that we don't write it again later. Signed-off-by: Chris Mason diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index bf0d615..d594cf7 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -1651,6 +1651,7 @@ int __btrfs_add_free_space(struct btrfs_free_space_ctl *ctl, info->bytes = bytes; spin_lock(&ctl->tree_lock); + ctl->dirty = 1; if (try_merge_free_space(ctl, info, true)) goto link; @@ -1691,6 +1692,7 @@ int btrfs_remove_free_space(struct btrfs_block_group_cache *block_group, int ret = 0; spin_lock(&ctl->tree_lock); + ctl->dirty = 1; again: info = tree_search_offset(ctl, offset, 0, 0); @@ -2589,6 +2591,7 @@ u64 btrfs_find_ino_for_alloc(struct btrfs_root *fs_root) if (entry->bytes == 0) free_bitmap(ctl, entry); } + ctl->dirty = 1; out: spin_unlock(&ctl->tree_lock); @@ -2688,6 +2691,10 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root, printk(KERN_ERR "btrfs: failed to write free ino cache " "for root %llu\n", root->root_key.objectid); + /* we write out at transaction commit time, there's no racing. */ + if (ret == 0) + ctl->dirty = 0; + iput(inode); return ret; } diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index 8f2613f..1e92c93 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -35,6 +35,11 @@ struct btrfs_free_space_ctl { int free_extents; int total_bitmaps; int unit; + /* +* record if we've changed since written. This can turn +* into a bit field if we need more flags +*/ + unsigned long dirty; u64 start; struct btrfs_free_space_op *op; void *private; diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c index b4087e0..e7c1493 100644 --- a/fs/btrfs/inode-map.c +++ b/fs/btrfs/inode-map.c @@ -376,6 +376,7 @@ void btrfs_init_free_ino_ctl(struct btrfs_root *root) ctl->start = 0; ctl->private = NULL; ctl->op = &free_ino_op; + ctl->dirty = 1; /* * Initially we allow to use 16K of ram to cache chunks of @@ -417,6 +418,9 @@ int btrfs_save_ino_cache(struct btrfs_root *root, if (!btrfs_test_opt(root, INODE_MAP_CACHE)) return 0; + if (!ctl->dirty) + return 0; + path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -485,6 +489,24 @@ out: return ret; } +/* + * this tries to save the cache, but if it fails for any reason we clear + * the dirty flag so that it won't be saved again during this commit. + * + * This is used by the snapshotting code to make sure we don't corrupt the + * FS by saving the inode cache after the snapshot is taken. + */ +int btrfs_force_save_ino_cache(struct btrfs_root *root, + struct btrfs_trans_handle *trans) +{ + struct btrfs_free_space_ctl *ctl = root->free_ino_ctl; + int ret; + ret = btrfs_save_ino_cache(root, trans); + + ctl->dirty = 0; + return ret; +} + static int btrfs_find_highest_objectid(struct btrfs_root *root, u64 *objectid) { struct btrfs_path *path; diff --git a/fs/btrfs/inode-map.h b/fs/btrfs/inode-map.h index ddb347b..2be060e 100644 --- a/fs/btrfs/inode-map.h +++ b/fs/btrfs/inode-map.h @@ -7,7 +7,8 @@ void btrfs_return_ino(struct btrfs_root *root, u64 objectid); int btrfs_find_free_ino(struct btrfs_root *root, u64 *objectid); int btrfs_save_ino_cache(struct btrfs_root *root, struct btrfs_trans_handle *trans); - +int btrfs_force_save_ino_cache(struct btrfs_root *root, + struct btrfs_trans_handle *trans); int btrfs_find_free_objectid(struct btrfs_root *root, u64 *objectid); #endif diff --git a
[PATCH] Btrfs-progs: bugfix: bail out when check_mounted_where returns an error
Signed-off-by: Jan Schmidt --- scrub.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/scrub.c b/scrub.c index 22052ed..8270431 100644 --- a/scrub.c +++ b/scrub.c @@ -942,6 +942,8 @@ static int scrub_fs_info(int fd, char *path, &fs_devices_mnt); if (!ret) return -EINVAL; + if (ret < 0) + return ret; fi_args->num_devices = 1; fi_args->max_id = fs_devices_mnt->latest_devid; i = fs_devices_mnt->latest_devid; -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 2/2] added ioctls and commands to resolve inodes and logical addresses
two new commands that make use of the new path resolving functions implemented for scrub, doing the resolving in-kernel. the result for both commands is a list of files belonging to that inode / logical address. Signed-off-by: Jan Schmidt --- btrfs-list.c | 35 btrfs.c | 10 +++ btrfs_cmds.c | 177 ++ btrfs_cmds.h |3 + ioctl.h | 29 ++ 5 files changed, 254 insertions(+), 0 deletions(-) diff --git a/btrfs-list.c b/btrfs-list.c index dd685c2..cbf6a08 100644 --- a/btrfs-list.c +++ b/btrfs-list.c @@ -900,3 +900,38 @@ int find_updated_files(int fd, u64 root_id, u64 oldest_gen) printf("transid marker was %llu\n", (unsigned long long)max_found); return ret; } + +char *path_for_root(int fd, u64 root) +{ + struct root_lookup root_lookup; + struct rb_node *n; + char *ret_path = NULL; + int ret; + + ret = __list_subvol_search(fd, &root_lookup); + if (ret < 0) + return ERR_PTR(ret); + + ret = __list_subvol_fill_paths(fd, &root_lookup); + if (ret < 0) + return ERR_PTR(ret); + + n = rb_last(&root_lookup.root); + while (n) { + struct root_info *entry; + u64 root_id; + u64 parent_id; + u64 level; + char *path; + entry = rb_entry(n, struct root_info, rb_node); + resolve_root(&root_lookup, entry, &root_id, &parent_id, &level, + &path); + if (root_id == root) + ret_path = path; + else + free(path); + n = rb_prev(n); + } + + return ret_path; +} diff --git a/btrfs.c b/btrfs.c index 67d6f6f..86d356b 100644 --- a/btrfs.c +++ b/btrfs.c @@ -178,6 +178,16 @@ static struct Command commands[] = { "Remove a device from a filesystem.", NULL }, + { do_ino_to_path, -2, + "resolve inode", "[-v] \n" + "get file system paths for the given inode.", + NULL + }, + { do_logical_to_ino, -2, + "resolve logical", "[-v] [-P] \n" + "get file system paths for the given logical address.", + NULL + }, { 0, 0, 0, 0 } }; diff --git a/btrfs_cmds.c b/btrfs_cmds.c index 0612f34..2db5d31 100644 --- a/btrfs_cmds.c +++ b/btrfs_cmds.c @@ -1545,3 +1545,180 @@ int do_df_filesystem(int nargs, char **argv) return 0; } + +static int __ino_to_path_fd(u64 inum, int fd, int verbose, const char *prepend) +{ + int ret; + int i; + struct btrfs_ioctl_ino_path_args ipa; + struct btrfs_data_container *fspath; + + fspath = malloc(4096); + if (!fspath) + return 1; + + ipa.inum = inum; + ipa.size = 4096; + ipa.fspath = fspath; + + ret = ioctl(fd, BTRFS_IOC_INO_PATHS, &ipa); + if (ret) { + printf("ioctl ret=%d, error: %s\n", ret, strerror(errno)); + goto out; + } + + if (verbose) + printf("ioctl ret=%d, size=%d, cnt=%d, missed=%d\n", ret, + fspath->size, fspath->elem_cnt, fspath->elem_missed); + + for (i = 0; i < fspath->elem_cnt; ++i) { + fspath->str[i] += (unsigned long)fspath->str; + if (prepend) + printf("%s/%s\n", prepend, fspath->str[i]); + else + printf("%s\n", fspath->str[i]); + } + +out: + free(fspath); + return ret; +} + +int do_ino_to_path(int nargs, char **argv) +{ + int fd; + int verbose = 0; + + optind = 1; + while (1) { + int c = getopt(nargs, argv, "v"); + if (c < 0) + break; + switch (c) { + case 'v': + verbose = 1; + break; + default: + fprintf(stderr, "invalid arguments for ipath\n"); + return 1; + } + } + if (nargs - optind != 2) { + fprintf(stderr, "invalid arguments for ipath\n"); + return 1; + } + + fd = open_file_or_dir(argv[optind+1]); + if (fd < 0) { + fprintf(stderr, "ERROR: can't access '%s'\n", argv[optind+1]); + return 12; + } + + return __ino_to_path_fd(atoll(argv[optind]), fd, verbose, + argv[optind+1]); +} + +int do_logical_to_ino(int nargs, char **argv) +{ + int ret; + int fd; + int i; + int verbose = 0; + int getpath = 1; + int bytes_left; + struct btrfs_ioctl_logical_ino_args loi; + struct btrfs_data_container *inodes; + char full_path[4096]; + char *path_ptr; + + optind = 1; + while (1) { + in
[PATCH v1 0/2] Btrfs-progs: commands "resolve inode" and "resolve logical"
The kernel patch series just sent (Subject: "Btrfs: scrub: print path to corrupted files and trigger nodatasum fixup") introduces two new ioctls to do in-kernel filesystem path construction. This series provides the corresponding userspace changes, adding two new commands to the btrfs utility: -- btrfs resolve inode [-v] resolves an to all filesystem paths local to the fs mounted at . -v print count of returned and missed paths btrfs resolve logical [-v] [-P] resolves a address to all filesystem paths in the file system mounted at and all its subvolumes. -v print count of returned and missed inode/offset/root triples -P do not resolve the path but stop after finding all inodes at this logical address and print them instead -- These patches are based on Hugo's current integration branch. Please try them out and report bugs here. I'll send an update to the manpages later. -Jan Jan Schmidt (2): btrfs-list: split list_subvols added ioctls and commands to resolve inodes and logical addresses btrfs-list.c | 139 ++ btrfs.c | 10 +++ btrfs_cmds.c | 177 ++ btrfs_cmds.h |3 + ioctl.h | 29 ++ 5 files changed, 323 insertions(+), 35 deletions(-) -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 1/2] btrfs-list: split list_subvols
split list_subvols to separate functions and allow printing only in the containing function. lets us make use of those functions when resolving logical addresses. Signed-off-by: Jan Schmidt --- btrfs-list.c | 104 ++--- 1 files changed, 69 insertions(+), 35 deletions(-) diff --git a/btrfs-list.c b/btrfs-list.c index 07b179a..dd685c2 100644 --- a/btrfs-list.c +++ b/btrfs-list.c @@ -199,10 +199,9 @@ static int add_root(struct root_lookup *root_lookup, * This can't be called until all the root_info->path fields are filled * in by lookup_ino_path */ -static int resolve_root(struct root_lookup *rl, struct root_info *ri, int print_parent) +static int resolve_root(struct root_lookup *rl, struct root_info *ri, + u64 *root_id, u64 *parent_id, u64 *top_id, char **path) { - u64 top_id; - u64 parent_id = 0; char *full_path = NULL; int len = 0; struct root_info *found; @@ -211,6 +210,7 @@ static int resolve_root(struct root_lookup *rl, struct root_info *ri, int print_ * we go backwards from the root_info object and add pathnames * from parent directories as we go. */ + *parent_id = 0; found = ri; while (1) { char *tmp; @@ -234,13 +234,12 @@ static int resolve_root(struct root_lookup *rl, struct root_info *ri, int print_ next = found->ref_tree; /* record the first parent */ - if ( parent_id == 0 ) { - parent_id = next; - } + if (*parent_id == 0) + *parent_id = next; /* if the ref_tree refers to ourselves, we're at the top */ if (next == found->root_id) { - top_id = next; + *top_id = next; break; } @@ -250,20 +249,15 @@ static int resolve_root(struct root_lookup *rl, struct root_info *ri, int print_ */ found = tree_search(&rl->root, next); if (!found) { - top_id = next; + *top_id = next; break; } } - if (print_parent) { - printf("ID %llu parent %llu top level %llu path %s\n", - (unsigned long long)ri->root_id, (unsigned long long)parent_id, (unsigned long long)top_id, - full_path); - } else { - printf("ID %llu top level %llu path %s\n", - (unsigned long long)ri->root_id, (unsigned long long)top_id, - full_path); - } - free(full_path); + + *root_id = ri->root_id; + *parent_id = ri->root_id; + *path = full_path; + return 0; } @@ -560,10 +554,8 @@ build: return full; } -int list_subvols(int fd, int print_parent) +static int __list_subvol_search(int fd, struct root_lookup *root_lookup) { - struct root_lookup root_lookup; - struct rb_node *n; int ret; struct btrfs_ioctl_search_args args; struct btrfs_ioctl_search_key *sk = &args.key; @@ -574,9 +566,11 @@ int list_subvols(int fd, int print_parent) char *name; u64 dir_id; int i; - int e; - root_lookup_init(&root_lookup); + root_lookup_init(root_lookup); + memset(&args, 0, sizeof(args)); + + root_lookup_init(root_lookup); memset(&args, 0, sizeof(args)); @@ -603,12 +597,8 @@ int list_subvols(int fd, int print_parent) while(1) { ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args); - e = errno; - if (ret < 0) { - fprintf(stderr, "ERROR: can't perform the search - %s\n", - strerror(e)); + if (ret < 0) return ret; - } /* the ioctl returns the number of item it found in nr_items */ if (sk->nr_items == 0) break; @@ -629,7 +619,7 @@ int list_subvols(int fd, int print_parent) name = (char *)(ref + 1); dir_id = btrfs_stack_root_ref_dirid(ref); - add_root(&root_lookup, sh->objectid, sh->offset, + add_root(root_lookup, sh->objectid, sh->offset, dir_id, name, name_len); } @@ -657,11 +647,15 @@ int list_subvols(int fd, int print_parent) } else break; } - /* -* now we have an rbtree full of root_info objects, but we need to fill -* in their path names within the subvol that is referencing each one. -*/ - n = rb_first(&root_lookup.root); + + return 0; +} + +static int __l
[PATCH v3 3/8] scrub: print paths of corrupted files
While scrubbing, we may encounter various errors. Previously, a logical address was printed to the log only. Now, all paths belonging to that address are resolved and printed separately. That should work for hardlinks as well as reflinks. Signed-off-by: Jan Schmidt --- fs/btrfs/scrub.c | 169 -- 1 files changed, 163 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 35099fa..221fd5c 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -17,10 +17,12 @@ */ #include +#include #include "ctree.h" #include "volumes.h" #include "disk-io.h" #include "ordered-data.h" +#include "backref.h" /* * This is only the first step towards a full-features scrub. It reads all @@ -100,6 +102,19 @@ struct scrub_dev { spinlock_t stat_lock; }; +struct scrub_warning { + struct btrfs_path *path; + u64 extent_item_size; + char*scratch_buf; + char*msg_buf; + const char *errstr; + sector_tsector; + u64 logical; + struct btrfs_device *dev; + int msg_bufsize; + int scratch_bufsize; +}; + static void scrub_free_csums(struct scrub_dev *sdev) { while (!list_empty(&sdev->csum_list)) { @@ -195,6 +210,143 @@ nomem: return ERR_PTR(-ENOMEM); } +static int scrub_print_warning_inode(u64 inum, u64 offset, u64 root, void *ctx) +{ + u64 isize; + u32 nlink; + int ret; + int i; + struct extent_buffer *eb; + struct btrfs_inode_item *inode_item; + struct scrub_warning *swarn = ctx; + struct btrfs_fs_info *fs_info = swarn->dev->dev_root->fs_info; + struct inode_fs_paths *ipath = NULL; + struct btrfs_root *local_root; + struct btrfs_key root_key; + + root_key.objectid = root; + root_key.type = BTRFS_ROOT_ITEM_KEY; + root_key.offset = (u64)-1; + local_root = btrfs_read_fs_root_no_name(fs_info, &root_key); + if (IS_ERR(local_root)) { + ret = PTR_ERR(local_root); + goto err; + } + + ret = inode_item_info(inum, 0, local_root, swarn->path); + if (ret) { + btrfs_release_path(swarn->path); + goto err; + } + + eb = swarn->path->nodes[0]; + inode_item = btrfs_item_ptr(eb, swarn->path->slots[0], + struct btrfs_inode_item); + isize = btrfs_inode_size(eb, inode_item); + nlink = btrfs_inode_nlink(eb, inode_item); + btrfs_release_path(swarn->path); + + ipath = init_ipath(4096, local_root, swarn->path); + ret = paths_from_inode(inum, ipath); + + if (ret < 0) + goto err; + + /* +* we deliberately ignore the bit ipath might have been too small to +* hold all of the paths here +*/ + for (i = 0; i < ipath->fspath->elem_cnt; ++i) + printk(KERN_WARNING "btrfs: %s at logical %llu on dev " + "%s, sector %llu, root %llu, inode %llu, offset %llu, " + "length %llu, links %u (path: %s)\n", swarn->errstr, + swarn->logical, swarn->dev->name, + (unsigned long long)swarn->sector, root, inum, offset, + min(isize - offset, (u64)PAGE_SIZE), nlink, + ipath->fspath->str[i]); + + free_ipath(ipath); + return 0; + +err: + printk(KERN_WARNING "btrfs: %s at logical %llu on dev " + "%s, sector %llu, root %llu, inode %llu, offset %llu: path " + "resolving failed with ret=%d\n", swarn->errstr, + swarn->logical, swarn->dev->name, + (unsigned long long)swarn->sector, root, inum, offset, ret); + + free_ipath(ipath); + return 0; +} + +static void scrub_print_warning(const char *errstr, struct scrub_bio *sbio, + int ix) +{ + struct btrfs_device *dev = sbio->sdev->dev; + struct btrfs_fs_info *fs_info = dev->dev_root->fs_info; + struct btrfs_path *path; + struct btrfs_key found_key; + struct extent_buffer *eb; + struct btrfs_extent_item *ei; + struct scrub_warning swarn; + u32 item_size; + int ret; + u64 ref_root; + u8 ref_level; + unsigned long ptr = 0; + const int bufsize = 4096; + u64 extent_offset; + + path = btrfs_alloc_path(); + + swarn.scratch_buf = kmalloc(bufsize, GFP_NOFS); + swarn.msg_buf = kmalloc(bufsize, GFP_NOFS); + swarn.sector = (sbio->physical + ix * PAGE_SIZE) >> 9; + swarn.logical = sbio->logical + ix * PAGE_SIZE; + swarn.errstr = errstr; + swarn.dev = dev; + swarn.msg_bufsize = bufsize; + swarn.scratch_bufsize = bufsize;
[PATCH v3 0/8] Btrfs: scrub: print path to corrupted files and trigger nodatasum fixup
This patch set introduces two new features for scrub. They share the backref iteration code which is the reason they made it into the same patch set. The first feature adds printk statements in case scrub finds an error which list all affected files. You will need patch 1, 2 and 3 for that. The second feature adds the trigger which enables us to correct i/o errors in case the affected extent does not have a checksum (nodatasum), eventually. You will need patch 1, 4, 5 and 6 for that. I tried to apply all patches to the current cmason/for-linus branch and to Arne's current for-chris branch. They do apply with no errors (some offsets possible). Please review. Next I'm starting to make up my mind how to implement on-the-fly error correction correctly. This will enable us to rewrite good data whenever we encounter a bad copy. I have some preliminary patches already, the stress in the first sentence is on "correctly". The second feature mentioned in this patch series will then automatically use that code, too. Changelog v1->v2: - Various cleanup, sensible error codes as suggested by David Sterba Changelog v2->v3: - evaluation and iteration of shared refs - support for in-tree refs (v2 iterated inline refs only) - never call an interator function without releasing the path - iterate_irefs now returns -ENOENT in case no refs are found - some stupid bugs removed where release_path was called too early - ioctls added to provide new functions to user mode - bugfixes for cases where search_slot found the very end of a leaf - bugfix: use right fs root for readpage instead of fs_root->fs_info - based on current cmason/for-linus A patch series to use the new ioctls from usermode will follow shortly. Please try it and report errors (or confirm there are none, of course). -Jan Jan Schmidt (8): added helper functions to iterate backrefs scrub: added unverified_errors scrub: print paths of corrupted files scrub: bugfix: mirror_num off by one add mirror_num to extent_read_full_page scrub: use int for mirror_num, not u64 scrub: add fixup code for errors on nodatasum files new ioctls to do logical->inode and inode->path resolving fs/btrfs/Makefile|3 +- fs/btrfs/backref.c | 748 ++ fs/btrfs/backref.h | 62 + fs/btrfs/disk-io.c |2 +- fs/btrfs/extent_io.c |6 +- fs/btrfs/extent_io.h |3 +- fs/btrfs/inode.c |2 +- fs/btrfs/ioctl.c | 134 + fs/btrfs/ioctl.h | 29 ++ fs/btrfs/scrub.c | 412 +--- 10 files changed, 1362 insertions(+), 39 deletions(-) create mode 100644 fs/btrfs/backref.c create mode 100644 fs/btrfs/backref.h -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/8] added helper functions to iterate backrefs
These helper functions iterate back references and call a function for each backref. There is also a function to resolve an inode to a path in the file system. Signed-off-by: Jan Schmidt --- fs/btrfs/Makefile |3 +- fs/btrfs/backref.c | 748 fs/btrfs/backref.h | 62 + fs/btrfs/ioctl.h | 10 + 4 files changed, 822 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index 9b72dcf..c63f649 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -7,4 +7,5 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \ extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \ export.o tree-log.o acl.o free-space-cache.o zlib.o lzo.o \ - compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o + compression.o delayed-ref.o relocation.o delayed-inode.o backref.o \ + scrub.o diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c new file mode 100644 index 000..477f154 --- /dev/null +++ b/fs/btrfs/backref.c @@ -0,0 +1,748 @@ +/* + * Copyright (C) 2011 STRATO. All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program; if not, write to the + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, + * Boston, MA 021110-1307, USA. + */ + +#include "ctree.h" +#include "disk-io.h" +#include "backref.h" + +struct __data_ref { + struct list_head list; + u64 inum; + u64 root; + u64 extent_data_item_offset; +}; + +struct __shared_ref { + struct list_head list; + u64 disk_byte; +}; + +static int __inode_info(u64 inum, u64 ioff, u8 key_type, + struct btrfs_root *fs_root, struct btrfs_path *path, + struct btrfs_key *found_key) +{ + int ret; + struct btrfs_key key; + struct extent_buffer *eb; + + key.type = key_type; + key.objectid = inum; + key.offset = ioff; + + ret = btrfs_search_slot(NULL, fs_root, &key, path, 0, 0); + if (ret < 0) + return ret; + + eb = path->nodes[0]; + if (ret && path->slots[0] >= btrfs_header_nritems(eb)) { + ret = btrfs_next_leaf(fs_root, path); + if (ret) + return ret; + eb = path->nodes[0]; + } + + btrfs_item_key_to_cpu(eb, found_key, path->slots[0]); + if (found_key->type != key.type || found_key->objectid != key.objectid) + return 1; + + return 0; +} + +/* + * this makes the path point to (inum INODE_ITEM ioff) + */ +int inode_item_info(u64 inum, u64 ioff, struct btrfs_root *fs_root, + struct btrfs_path *path) +{ + struct btrfs_key key; + return __inode_info(inum, ioff, BTRFS_INODE_ITEM_KEY, fs_root, path, + &key); +} + +static int inode_ref_info(u64 inum, u64 ioff, struct btrfs_root *fs_root, + struct btrfs_path *path, int strict, + u64 *out_parent_inum, + struct extent_buffer **out_iref_eb, + int *out_slot) +{ + int ret; + struct btrfs_key found_key; + + ret = __inode_info(inum, ioff, BTRFS_INODE_REF_KEY, fs_root, path, + &found_key); + + if (!ret) { + if (out_slot) + *out_slot = path->slots[0]; + if (out_iref_eb) + *out_iref_eb = path->nodes[0]; + if (out_parent_inum) + *out_parent_inum = found_key.offset; + } + + btrfs_release_path(path); + return ret; +} + +/* + * this iterates to turn a btrfs_inode_ref into a full filesystem path. elements + * of the path are separated by '/' and the path is guaranteed to be + * 0-terminated. the path is only given within the current file system. + * Therefore, it never starts with a '/'. the caller is responsible to provide + * "size" bytes in "dest". the dest buffer will be filled backwards. finally, + * the start point of the resulting string is returned. this pointer is within + * dest, normally. + * in case the path buffer would overflow, the pointer is decremented further + * as if output was written to the buffer, though no more output is actually + * generated. that way, the caller
[PATCH v3 8/8] new ioctls to do logical->inode and inode->path resolving
these ioctls make use of the new functions initially added for scrub. they return all inodes belonging to a logical address (BTRFS_IOC_LOGICAL_INO) and all paths belonging to an inode (BTRFS_IOC_INO_PATHS). Signed-off-by: Jan Schmidt --- fs/btrfs/ioctl.c | 134 ++ fs/btrfs/ioctl.h | 19 2 files changed, 153 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index a3c4751..5299b40 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -51,6 +51,7 @@ #include "volumes.h" #include "locking.h" #include "inode-map.h" +#include "backref.h" /* Mask out flags that are inappropriate for the given type of inode. */ static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags) @@ -2836,6 +2837,135 @@ static long btrfs_ioctl_scrub_progress(struct btrfs_root *root, return ret; } +static long btrfs_ioctl_ino_to_path(struct btrfs_root *root, void __user *arg) +{ + int ret = 0; + int i; + unsigned long rel_ptr; + int size; + struct btrfs_ioctl_ino_path_args *ipa; + struct inode_fs_paths *ipath = NULL; + struct btrfs_path *path; + + path = btrfs_alloc_path(); + if (!path) { + ret = -ENOMEM; + goto out; + } + + ipa = memdup_user(arg, sizeof(*ipa)); + if (IS_ERR(ipa)) { + ret = PTR_ERR(ipa); + ipa = NULL; + goto out; + } + + size = min(ipa->size, 4096); + ipath = init_ipath(size, root, path); + if (IS_ERR(ipath)) { + ret = PTR_ERR(ipath); + ipath = NULL; + goto out; + } + + ret = paths_from_inode(ipa->inum, ipath); + if (ret < 0) + goto out; + + for (i = 0; i < ipath->fspath->elem_cnt; ++i) { + rel_ptr = ipath->fspath->str[i] - (char *)ipath->fspath->str; + ipath->fspath->str[i] = (void *)rel_ptr; + } + + ret = copy_to_user(ipa->fspath, ipath->fspath, size); + if (ret) { + ret = -EFAULT; + goto out; + } + +out: + btrfs_free_path(path); + free_ipath(ipath); + kfree(ipa); + + return ret; +} + +static int build_ino_list(u64 inum, u64 offset, u64 root, void *ctx) +{ + struct btrfs_data_container *inodes = ctx; + + inodes->size -= 3 * sizeof(u64); + if (inodes->size > 0) { + inodes->val[inodes->elem_cnt] = inum; + inodes->val[inodes->elem_cnt + 1] = offset; + inodes->val[inodes->elem_cnt + 2] = root; + inodes->elem_cnt += 3; + } else { + inodes->elem_missed += 3; + } + + return 0; +} + +static long btrfs_ioctl_logical_to_ino(struct btrfs_root *root, + void __user *arg) +{ + int ret = 0; + int size; + u64 extent_offset; + struct btrfs_ioctl_logical_ino_args *loi; + struct btrfs_data_container *inodes = NULL; + struct btrfs_path *path = NULL; + struct btrfs_key key; + + loi = memdup_user(arg, sizeof(*loi)); + if (IS_ERR(loi)) { + ret = PTR_ERR(loi); + loi = NULL; + goto out; + } + + path = btrfs_alloc_path(); + if (!path) { + ret = -ENOMEM; + goto out; + } + + size = min(loi->size, 4096); + inodes = init_data_container(size); + if (IS_ERR(inodes)) { + ret = PTR_ERR(inodes); + inodes = NULL; + goto out; + } + + ret = extent_from_logical(root->fs_info, loi->logical, path, &key); + + if (ret & BTRFS_EXTENT_FLAG_TREE_BLOCK) + ret = -ENOENT; + if (ret < 0) + goto out; + + extent_offset = loi->logical - key.objectid; + ret = iterate_extent_inodes(root->fs_info, path, key.objectid, + extent_offset, build_ino_list, inodes); + + if (ret < 0) + goto out; + + ret = copy_to_user(loi->inodes, inodes, size); + if (ret) + ret = -EFAULT; + +out: + btrfs_free_path(path); + kfree(inodes); + kfree(loi); + + return ret; +} + long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -2893,6 +3023,10 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_tree_search(file, argp); case BTRFS_IOC_INO_LOOKUP: return btrfs_ioctl_ino_lookup(file, argp); + case BTRFS_IOC_INO_PATHS: + return btrfs_ioctl_ino_to_path(root, argp); + case BTRFS_IOC_LOGICAL_INO: + return btrfs_ioctl_logical_to_ino(root, argp); case BTRFS_IOC_SPACE_INFO: return btrfs_ioctl_space_info(root, argp); case BTRFS_IOC_SYNC: diff --git a/fs/btrfs/ioctl.h b
[PATCH v3 6/8] scrub: use int for mirror_num, not u64
the rest of the code uses int mirror_num, and so should scrub Signed-off-by: Jan Schmidt --- fs/btrfs/scrub.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 15fed35..12c08c0 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -65,7 +65,7 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix); struct scrub_page { u64 flags; /* extent flags */ u64 generation; - u64 mirror_num; + int mirror_num; int have_csum; u8 csum[BTRFS_CSUM_SIZE]; }; @@ -776,7 +776,7 @@ nomem: } static int scrub_page(struct scrub_dev *sdev, u64 logical, u64 len, - u64 physical, u64 flags, u64 gen, u64 mirror_num, + u64 physical, u64 flags, u64 gen, int mirror_num, u8 *csum, int force) { struct scrub_bio *sbio; @@ -873,7 +873,7 @@ static int scrub_find_csum(struct scrub_dev *sdev, u64 logical, u64 len, /* scrub extent tries to collect up to 64 kB for each bio */ static int scrub_extent(struct scrub_dev *sdev, u64 logical, u64 len, - u64 physical, u64 flags, u64 gen, u64 mirror_num) + u64 physical, u64 flags, u64 gen, int mirror_num) { int ret; u8 csum[BTRFS_CSUM_SIZE]; @@ -919,7 +919,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_dev *sdev, u64 physical; u64 logical; u64 generation; - u64 mirror_num; + int mirror_num; u64 increment = map->stripe_len; u64 offset; -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 7/8] scrub: add fixup code for errors on nodatasum files
This removes a FIXME comment and introduces the first part of nodatasum fixup: It gets the corresponding inode for a logical address and triggers a regular readpage for the corrupted sector. Once we have on-the-fly error correction our error will be automatically corrected. The correction code is expected to clear the newly introduced EXTENT_DAMAGED flag, making scrub report that error as "corrected" instead of "uncorrectable" eventually. Signed-off-by: Jan Schmidt --- fs/btrfs/extent_io.h |1 + fs/btrfs/scrub.c | 188 -- 2 files changed, 183 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 22bf366..2734fd9 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -17,6 +17,7 @@ #define EXTENT_NODATASUM (1 << 10) #define EXTENT_DO_ACCOUNTING (1 << 11) #define EXTENT_FIRST_DELALLOC (1 << 12) +#define EXTENT_DAMAGED (1 << 13) #define EXTENT_IOBITS (EXTENT_LOCKED | EXTENT_WRITEBACK) #define EXTENT_CTLBITS (EXTENT_DO_ACCOUNTING | EXTENT_FIRST_DELALLOC) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 12c08c0..563686f 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -22,6 +22,7 @@ #include "volumes.h" #include "disk-io.h" #include "ordered-data.h" +#include "transaction.h" #include "backref.h" /* @@ -89,6 +90,7 @@ struct scrub_dev { int first_free; int curr; atomic_tin_flight; + atomic_tfixup_cnt; spinlock_t list_lock; wait_queue_head_t list_wait; u16 csum_size; @@ -102,6 +104,14 @@ struct scrub_dev { spinlock_t stat_lock; }; +struct scrub_fixup_nodatasum { + struct scrub_dev*sdev; + u64 logical; + struct btrfs_root *root; + struct btrfs_work work; + int mirror_num; +}; + struct scrub_warning { struct btrfs_path *path; u64 extent_item_size; @@ -190,12 +200,13 @@ struct scrub_dev *scrub_setup_dev(struct btrfs_device *dev) if (i != SCRUB_BIOS_PER_DEV-1) sdev->bios[i]->next_free = i + 1; -else + else sdev->bios[i]->next_free = -1; } sdev->first_free = 0; sdev->curr = -1; atomic_set(&sdev->in_flight, 0); + atomic_set(&sdev->fixup_cnt, 0); atomic_set(&sdev->cancel_req, 0); sdev->csum_size = btrfs_super_csum_size(&fs_info->super_copy); INIT_LIST_HEAD(&sdev->csum_list); @@ -347,6 +358,151 @@ out: kfree(swarn.msg_buf); } +static int scrub_fixup_readpage(u64 inum, u64 offset, u64 root, void *ctx) +{ + struct page *page; + unsigned long index; + struct scrub_fixup_nodatasum *fixup = ctx; + int ret; + int corrected; + struct btrfs_key key; + struct inode *inode; + u64 end = offset + PAGE_SIZE - 1; + struct btrfs_root *local_root; + + key.objectid = root; + key.type = BTRFS_ROOT_ITEM_KEY; + key.offset = (u64)-1; + local_root = btrfs_read_fs_root_no_name(fixup->root->fs_info, &key); + if (IS_ERR(local_root)) + return PTR_ERR(local_root); + + key.type = BTRFS_INODE_ITEM_KEY; + key.objectid = inum; + key.offset = 0; + inode = btrfs_iget(fixup->root->fs_info->sb, &key, local_root, NULL); + if (IS_ERR(inode)) + return PTR_ERR(inode); + + ret = set_extent_bit(&BTRFS_I(inode)->io_tree, offset, end, + EXTENT_DAMAGED, 0, NULL, NULL, GFP_NOFS); + + /* set_extent_bit should either succeed or give proper error */ + WARN_ON(ret > 0); + if (ret) + return ret < 0 ? ret : -EFAULT; + + index = offset >> PAGE_CACHE_SHIFT; + + page = find_or_create_page(inode->i_mapping, index, GFP_NOFS); + if (!page) + return -ENOMEM; + + ret = extent_read_full_page(&BTRFS_I(inode)->io_tree, page, + btrfs_get_extent, fixup->mirror_num); + wait_on_page_locked(page); + corrected = !test_range_bit(&BTRFS_I(inode)->io_tree, offset, end, + EXTENT_DAMAGED, 0, NULL); + + if (corrected) + WARN_ON(!PageUptodate(page)); + else + clear_extent_bit(&BTRFS_I(inode)->io_tree, offset, end, + EXTENT_DAMAGED, 0, 0, NULL, GFP_NOFS); + + put_page(page); + iput(inode); + + if (ret < 0) + return ret; + + if (ret == 0 && corrected) { + /* +* we only need to call readpage for one of the inodes belonging +* to this extent. so make iterate_extent_inodes stop +
[PATCH v3 4/8] scrub: bugfix: mirror_num off by one
Fix the mirror_num determination in scrub_stripe. The rest of the scrub code did not use mirror_num for anything important and that error went unnoticed. The nodatasum fixup patch of this set depends on a correct mirror_num. Signed-off-by: Jan Schmidt --- fs/btrfs/scrub.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 221fd5c..15fed35 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -930,21 +930,21 @@ static noinline_for_stack int scrub_stripe(struct scrub_dev *sdev, if (map->type & BTRFS_BLOCK_GROUP_RAID0) { offset = map->stripe_len * num; increment = map->stripe_len * map->num_stripes; - mirror_num = 0; + mirror_num = 1; } else if (map->type & BTRFS_BLOCK_GROUP_RAID10) { int factor = map->num_stripes / map->sub_stripes; offset = map->stripe_len * (num / map->sub_stripes); increment = map->stripe_len * factor; - mirror_num = num % map->sub_stripes; + mirror_num = num % map->sub_stripes + 1; } else if (map->type & BTRFS_BLOCK_GROUP_RAID1) { increment = map->stripe_len; - mirror_num = num % map->num_stripes; + mirror_num = num % map->num_stripes + 1; } else if (map->type & BTRFS_BLOCK_GROUP_DUP) { increment = map->stripe_len; - mirror_num = num % map->num_stripes; + mirror_num = num % map->num_stripes + 1; } else { increment = map->stripe_len; - mirror_num = 0; + mirror_num = 1; } path = btrfs_alloc_path(); -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 5/8] add mirror_num to extent_read_full_page
Currently, extent_read_full_page always assumes we are trying to read mirror 0, which generally is the best we can do. To add flexibility, pass it as a parameter. This will be needed by scrub fixup code. Signed-off-by: Jan Schmidt --- fs/btrfs/disk-io.c |2 +- fs/btrfs/extent_io.c |6 +++--- fs/btrfs/extent_io.h |2 +- fs/btrfs/inode.c |2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 1ac8db5d..b898319 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -874,7 +874,7 @@ static int btree_readpage(struct file *file, struct page *page) { struct extent_io_tree *tree; tree = &BTRFS_I(page->mapping->host)->io_tree; - return extent_read_full_page(tree, page, btree_get_extent); + return extent_read_full_page(tree, page, btree_get_extent, 0); } static int btree_releasepage(struct page *page, gfp_t gfp_flags) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index b181a94..b78f665 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2111,16 +2111,16 @@ static int __extent_read_full_page(struct extent_io_tree *tree, } int extent_read_full_page(struct extent_io_tree *tree, struct page *page, - get_extent_t *get_extent) + get_extent_t *get_extent, int mirror_num) { struct bio *bio = NULL; unsigned long bio_flags = 0; int ret; - ret = __extent_read_full_page(tree, page, get_extent, &bio, 0, + ret = __extent_read_full_page(tree, page, get_extent, &bio, mirror_num, &bio_flags); if (bio) - ret = submit_one_bio(READ, bio, 0, bio_flags); + ret = submit_one_bio(READ, bio, mirror_num, bio_flags); return ret; } diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index a11a92e..22bf366 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -177,7 +177,7 @@ int unlock_extent_cached(struct extent_io_tree *tree, u64 start, u64 end, int try_lock_extent(struct extent_io_tree *tree, u64 start, u64 end, gfp_t mask); int extent_read_full_page(struct extent_io_tree *tree, struct page *page, - get_extent_t *get_extent); + get_extent_t *get_extent, int mirror_num); int __init extent_io_init(void); void extent_io_exit(void); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 447612d..18c3a3f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6248,7 +6248,7 @@ int btrfs_readpage(struct file *file, struct page *page) { struct extent_io_tree *tree; tree = &BTRFS_I(page->mapping->host)->io_tree; - return extent_read_full_page(tree, page, btrfs_get_extent); + return extent_read_full_page(tree, page, btrfs_get_extent, 0); } static int btrfs_writepage(struct page *page, struct writeback_control *wbc) -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/8] scrub: added unverified_errors
In normal operation, scrub is reading data sequentially in large portions. In case of an i/o error, we try to find the corrupted area(s) by issuing page sized read requests. With this commit we increment the unverified_errors counter if all of the small size requests succeed. Userland patches carrying such conspicous events to the administrator should already be around. Signed-off-by: Jan Schmidt --- fs/btrfs/scrub.c | 37 ++--- 1 files changed, 26 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index a8d03d5..35099fa 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -201,18 +201,25 @@ nomem: * recheck_error gets called for every page in the bio, even though only * one may be bad */ -static void scrub_recheck_error(struct scrub_bio *sbio, int ix) +static int scrub_recheck_error(struct scrub_bio *sbio, int ix) { + struct scrub_dev *sdev = sbio->sdev; + u64 sector = (sbio->physical + ix * PAGE_SIZE) >> 9; + if (sbio->err) { - if (scrub_fixup_io(READ, sbio->sdev->dev->bdev, - (sbio->physical + ix * PAGE_SIZE) >> 9, + if (scrub_fixup_io(READ, sbio->sdev->dev->bdev, sector, sbio->bio->bi_io_vec[ix].bv_page) == 0) { if (scrub_fixup_check(sbio, ix) == 0) - return; + return 0; } } + spin_lock(&sdev->stat_lock); + ++sdev->stat.read_errors; + spin_unlock(&sdev->stat_lock); + scrub_fixup(sbio, ix); + return 1; } static int scrub_fixup_check(struct scrub_bio *sbio, int ix) @@ -382,8 +389,14 @@ static void scrub_checksum(struct btrfs_work *work) int ret; if (sbio->err) { + ret = 0; for (i = 0; i < sbio->count; ++i) - scrub_recheck_error(sbio, i); + ret |= scrub_recheck_error(sbio, i); + if (!ret) { + spin_lock(&sdev->stat_lock); + ++sdev->stat.unverified_errors; + spin_unlock(&sdev->stat_lock); + } sbio->bio->bi_flags &= ~(BIO_POOL_MASK - 1); sbio->bio->bi_flags |= 1 << BIO_UPTODATE; @@ -396,10 +409,6 @@ static void scrub_checksum(struct btrfs_work *work) bi->bv_offset = 0; bi->bv_len = PAGE_SIZE; } - - spin_lock(&sdev->stat_lock); - ++sdev->stat.read_errors; - spin_unlock(&sdev->stat_lock); goto out; } for (i = 0; i < sbio->count; ++i) { @@ -420,8 +429,14 @@ static void scrub_checksum(struct btrfs_work *work) WARN_ON(1); } kunmap_atomic(buffer, KM_USER0); - if (ret) - scrub_recheck_error(sbio, i); + if (ret) { + ret = scrub_recheck_error(sbio, i); + if (!ret) { + spin_lock(&sdev->stat_lock); + ++sdev->stat.unverified_errors; + spin_unlock(&sdev->stat_lock); + } + } } out: -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix return value check of btrfs_alloc_path()
On 07/07/2011 05:31 AM, Tsutomu Itoh wrote: The return value check of btrfs_alloc_path() in several places is changed from BUG_ON() to error return. Signed-off-by: Tsutomu Itoh Reviewed-by: Josef Bacik Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: device failure hangs the system
Josef, Well that's a neat trick, do you have a way to undo that action too? Seems a rescan doesn't make it show back up. hope the following helps.. - # fdisk -l /dev/sdg | egrep "Disk /" Disk /dev/sdg: 4294 MB, 4294967296 bytes # x=`ls -l /sys/class/block/sdg | cut -d "/" -f12 | sed 's/:/ /g'` # echo "scsi remove-single-device ${x}" > /proc/scsi/scsi # fdisk -l /dev/sdg | egrep "Disk /" # echo "scsi add-single-device ${x}" > /proc/scsi/scsi # fdisk -l /dev/sdg | egrep "Disk /" Disk /dev/sdg: 4294 MB, 4294967296 bytes - Please try the patch I just posted to the list to fix this problem. Thanks, Facing some challenges to upgrade my machine to 3.0.0-rc6, so is the delay. Thanks for the patch. Anand Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix return value check of btrfs_alloc_path()
The return value check of btrfs_alloc_path() in several places is changed from BUG_ON() to error return. Signed-off-by: Tsutomu Itoh --- fs/btrfs/extent-tree.c |3 ++- fs/btrfs/extent_io.c |9 ++--- fs/btrfs/inode.c | 15 +++ fs/btrfs/ioctl.c |1 + 4 files changed, 20 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 71cd456..624ca25 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -5494,7 +5494,8 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans, u32 size = sizeof(*extent_item) + sizeof(*block_info) + sizeof(*iref); path = btrfs_alloc_path(); - BUG_ON(!path); + if (!path) + return -ENOMEM; path->leave_spinning = 1; ret = btrfs_insert_empty_item(trans, fs_info->extent_root, path, diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index b181a94..9703b65 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -,9 +,12 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc, delalloc_start = delalloc_end + 1; continue; } - tree->ops->fill_delalloc(inode, page, delalloc_start, -delalloc_end, &page_started, -&nr_written); + ret = tree->ops->fill_delalloc(inode, page, + delalloc_start, + delalloc_end, + &page_started, + &nr_written); + BUG_ON(ret); /* * delalloc_end is already one less than the total * length, so we don't subtract one from diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 447612d..d0dee5e 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1070,7 +1070,8 @@ static noinline int run_delalloc_nocow(struct inode *inode, u64 ino = btrfs_ino(inode); path = btrfs_alloc_path(); - BUG_ON(!path); + if (!path) + return -ENOMEM; nolock = is_free_space_inode(root, inode); @@ -3711,7 +3712,8 @@ static int btrfs_inode_by_name(struct inode *dir, struct dentry *dentry, int ret = 0; path = btrfs_alloc_path(); - BUG_ON(!path); + if (!path) + return -ENOMEM; di = btrfs_lookup_dir_item(NULL, root, path, btrfs_ino(dir), name, namelen, 0); @@ -4436,7 +4438,8 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans, int owner; path = btrfs_alloc_path(); - BUG_ON(!path); + if (!path) + return ERR_PTR(-ENOMEM); inode = new_inode(root->fs_info->sb); if (!inode) { @@ -7192,7 +7195,11 @@ static int btrfs_symlink(struct inode *dir, struct dentry *dentry, goto out_unlock; path = btrfs_alloc_path(); - BUG_ON(!path); + if (!path) { + err = -ENOMEM; + drop_inode = 1; + goto out_unlock; + } key.objectid = btrfs_ino(inode); key.offset = 0; btrfs_set_key_type(&key, BTRFS_EXTENT_DATA_KEY); diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index a3c4751..b12f7fe 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -415,6 +415,7 @@ static noinline int create_subvol(struct btrfs_root *root, btrfs_record_root_in_trans(trans, new_root); ret = btrfs_create_subvol_root(trans, new_root, new_dirid); + BUG_ON(ret); /* * insert the directory item */ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs: open_ctree failed
Hi all, just apologize for my English. In a nice, warm evening, turn off the electricity, and my router with btrfs on root, broke .. I removed the image of a partition (10G), I will now rearrange the system is probably on ext4 .. Start the discussion - http://www.linux.org.ru/forum/general/6465851 OS during the fall - Debian 6 (2.6.39 kernel from sid), mounted filesystem with parameters compression=lzo btfs-progs v0.19 from git. btrfs-show - all ok; FS is the LVM; root@sysresccd /root % mount -t btrfs -o compress=lzo /dev/mapper/nas-root /re mount: wrong fs type, bad option, bad superblock on /dev/mapper/nas-root, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so root@sysresccd /root % dmesg | tail [ 3821.972350] parent transid verify failed on 3807195136 wanted 5412 found 5414 [ 3821.972364] parent transid verify failed on 3807195136 wanted 5412 found 5414 [ 3821.979182] btrfs: open_ctree failed [ 6298.660270] device label root devid 1 transid 12174 /dev/mapper/nas-root [ 6298.660657] btrfs: use lzo compression [ 6298.662878] parent transid verify failed on 3807195136 wanted 5412 found 5414 [ 6298.663321] parent transid verify failed on 3807195136 wanted 5412 found 5414 [ 6298.663584] parent transid verify failed on 3807195136 wanted 5412 found 5414 [ 6298.663595] parent transid verify failed on 3807195136 wanted 5412 found 5414 [ 6298.669180] btrfs: open_ctree failed root@sysresccd /root % btrfsck /dev/nas/root parent transid verify failed on 3807195136 wanted 5412 found 5414 parent transid verify failed on 3807195136 wanted 5412 found 5414 parent transid verify failed on 3807195136 wanted 5412 found 5414 btrfsck: disk-io.c:416: find_and_setup_root: Assertion `!(!root->node)' failed. zsh: abort btrfsck /dev/nas/root -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory leak?
2011-07-07 16:20:20 +0800, Li Zefan: [...] > btrfs_inode_cache is a slab cache for in memory inodes, which is of > struct btrfs_inode. [...] Thanks Li. If that's a cache, the system should be able to reuse the space there when it's low on memory, wouldn't it? What would be the conditions where that couldn't be done? (like in my case, where the oom killer was hired to free memory rather than reclaiming that cache memory). Best regards, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory leak?
Stephane Chazelas wrote: > 2011-07-06 09:11:11 +0100, Stephane Chazelas: > [...] >> extent_map delayed_node btrfs_inode_cache btrfs_free_space_cache >> (in bytes) > [...] >> 01:00 267192640 668595744 23216460003418048 >> 01:10 267192640 668595744 23216460003418048 >> 01:20 267192640 668595744 23216460003418048 >> 01:30 267192640 668595744 23216460003418048 >> 01:40 267192640 668595744 23216460003418048 > [...] > > I've just come accross > http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commit;h=4b9465cb9e3859186eefa1ca3b990a5849386320 > > GIT> author Chris Mason > GIT> Fri, 3 Jun 2011 13:36:29 + (09:36 -0400) > GIT> committerChris Mason > GIT> Sat, 4 Jun 2011 12:03:47 + (08:03 -0400) > GIT> commit 4b9465cb9e3859186eefa1ca3b990a5849386320 > GIT> tree 8fc06452fb75e52f6c1c2e2253c2ff6700e622fdtree | snapshot > GIT> parent e7786c3ae517b2c433edc91714e86be770e9f1cecommit | diff > GIT> Btrfs: add mount -o inode_cache > GIT> > GIT> This makes the inode map cache default to off until we > GIT> fix the overflow problem when the free space crcs don't fit > GIT> inside a single page. > > I would have thought that would have disabled that > btrfs_inode_cache. And I can see that patch is in 3.0.0-rc5 (I'm > not mounting with -o inode_cache). So, why those 2.2GiB in > btrfs_inode_cache above? > This should be irrelevant to your problem.. btrfs_inode_cache is a slab cache for in memory inodes, which is of struct btrfs_inode. while the ino_cache is a cache in which the entries are ranges of free inode numbers, and currently it won't be enabled unless you mount with inode_cache option. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory leak?
2011-07-06 09:11:11 +0100, Stephane Chazelas: [...] > extent_map delayed_node btrfs_inode_cache btrfs_free_space_cache > (in bytes) [...] > 01:00 267192640 668595744 23216460003418048 > 01:10 267192640 668595744 23216460003418048 > 01:20 267192640 668595744 23216460003418048 > 01:30 267192640 668595744 23216460003418048 > 01:40 267192640 668595744 23216460003418048 [...] I've just come accross http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commit;h=4b9465cb9e3859186eefa1ca3b990a5849386320 GIT> author Chris Mason GIT>Fri, 3 Jun 2011 13:36:29 + (09:36 -0400) GIT> committer Chris Mason GIT>Sat, 4 Jun 2011 12:03:47 + (08:03 -0400) GIT> commit 4b9465cb9e3859186eefa1ca3b990a5849386320 GIT> tree 8fc06452fb75e52f6c1c2e2253c2ff6700e622fdtree | snapshot GIT> parent e7786c3ae517b2c433edc91714e86be770e9f1cecommit | diff GIT> Btrfs: add mount -o inode_cache GIT> GIT> This makes the inode map cache default to off until we GIT> fix the overflow problem when the free space crcs don't fit GIT> inside a single page. I would have thought that would have disabled that btrfs_inode_cache. And I can see that patch is in 3.0.0-rc5 (I'm not mounting with -o inode_cache). So, why those 2.2GiB in btrfs_inode_cache above? -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html