Re: warning and bug on 3.2-rc4 + for-linus from yesterday
On Fri, Dec 09, 2011 at 12:39:48PM -0800, Simon Kirby wrote: > Hello! > > We recently upgraded our backup server kernel (rsync with snapshots and > compression) to Linus git master from yesterday (3.2-rc4+ 09d9673d53005) > that contains the btrfs for-linus as of yesterday. We've been seeing a > few warnings and bugs: Then it kept pinging but didn't accept SSH anymore, with this captured via serial console: [79214.481458] [ cut here ] [79214.485335] kernel BUG at fs/btrfs/inode.c:2893! [79214.485335] invalid opcode: [#2] SMP [79214.485335] CPU 0 [79214.485335] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2 [79214.485335] [79214.485335] Pid: 24202, comm: btrfsctl Tainted: G D W3.2.0-rc4-hw+ #71 Dell Inc. PowerEdge 1950/0NK937 [79214.485335] RIP: 0010:[] [] btrfs_unlink_subvol+0x268/0x270 [79214.485335] RSP: 0018:880344babd28 EFLAGS: 00010286 [79214.485335] RAX: ffe4 RBX: 0c46 RCX: 880336fd1588 [79214.485335] RDX: ffe4 RSI: RDI: 880336fd15a8 [79214.485335] RBP: 880344babda8 R08: R09: [79214.485335] R10: R11: 9001 R12: 880405cf5e88 [79214.485335] R13: 880428a9ba20 R14: 880405158c00 R15: 0100 [79214.485335] FS: 7f27ff13d740() GS:88043fc0() knlGS: [79214.485335] CS: 0010 DS: ES: CR0: 8005003b [79214.485335] CR2: 7fffdf79f950 CR3: 0003f79fe000 CR4: 06f0 [79214.485335] DR0: DR1: DR2: [79214.485335] DR3: DR6: 0ff0 DR7: 0400 [79214.485335] Process btrfsctl (pid: 24202, threadinfo 880344baa000, task 8803dcec) [79214.485335] Stack: [79214.485335] 8804037d53f8 88030010 001044babd58 8804037d53f8 [79214.485335] 08a0 8803fd8b43f0 08a0 ff84 [79214.485335] 00ff 0268 880037e73008 [79214.485335] Call Trace: [79214.485335] [] btrfs_ioctl_snap_destroy+0x3b5/0x480 [79214.485335] [] btrfs_ioctl+0x3a2/0x10d0 [79214.485335] [] ? do_page_fault+0x254/0x4b0 [79214.485335] [] do_vfs_ioctl+0xa0/0x520 [79214.485335] [] sys_ioctl+0x4a/0x80 [79214.485335] [] system_call_fastpath+0x16/0x1b [79214.485335] Code: 48 8d 54 92 65 e8 89 f2 00 00 48 8b 5d b9 4c 89 ef e8 4d 2c fd ff 48 89 5d c8 e9 ca fe ff ff 0f 0b eb fe 0f 0b eb fe 0f 1f 40 00 <0f> 0b eb fe 0f 0b eb fe 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c [79214.485335] RIP [] btrfs_unlink_subvol+0x268/0x270 [79214.485335] RSP [79214.700401] ---[ end trace 52453f1ad38744ba ]--- Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
warning and bug on 3.2-rc4 + for-linus from yesterday
Hello! We recently upgraded our backup server kernel (rsync with snapshots and compression) to Linus git master from yesterday (3.2-rc4+ 09d9673d53005) that contains the btrfs for-linus as of yesterday. We've been seeing a few warnings and bugs: [ cut here ] WARNING: at mm/page-writeback.c:1763 __set_page_dirty_nobuffers+0x17b/0x190() Hardware name: PowerEdge 1950 Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2 Pid: 14299, comm: btrfs-delalloc- Tainted: GW3.2.0-rc4-hw+ #71 Call Trace: [] ? __set_page_dirty_nobuffers+0x17b/0x190 [] warn_slowpath_common+0x80/0xc0 [] warn_slowpath_null+0x15/0x20 [] __set_page_dirty_nobuffers+0x17b/0x190 [] compress_file_range+0x535/0x5e0 [] ? kfree+0xee/0x120 [] async_cow_start+0x30/0x50 [] worker_loop+0x173/0x530 [] ? btrfs_queue_worker+0x310/0x310 [] ? btrfs_queue_worker+0x310/0x310 [] kthread+0x96/0xb0 [] kernel_thread_helper+0x4/0x10 [] ? kthread_worker_fn+0x190/0x190 [] ? gs_change+0x13/0x13 ---[ end trace 52453f1ad38744b8 ]--- (several hours later) [ cut here ] kernel BUG at fs/btrfs/inode.c:1587! invalid opcode: [#1] SMP CPU 2 Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2 Pid: 4477, comm: btrfs-fixup-0 Tainted: GW3.2.0-rc4-hw+ #71 Dell Inc. PowerEdge 1950/0NK937 RIP: 0010:[] [] btrfs_writepage_fixup_worker+0x160/0x170 RSP: 0018:88040ff1dde0 EFLAGS: 00010246 RAX: RBX: 013d6000 RCX: RDX: 0065 RSI: 013d6000 RDI: 8800996fe8e0 RBP: 88040ff1de30 R08: 88040ff1dd34 R09: 88040ff1dda0 R10: dead00200200 R11: R12: ea000ea54840 R13: 013d6fff R14: 8800996fe9b0 R15: 8800996fe850 FS: () GS:88043fc8() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 02051c80 CR3: 000427492000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process btrfs-fixup-0 (pid: 4477, threadinfo 88040ff1c000, task 8804135b9630) Stack: 8106b2c0 880261a9bae0 0286 8801b10103f0 880261a9bae8 880261a9bb10 880412f606c0 880412f60710 880412f606d8 88040ff1dee0 813220a3 Call Trace: [] ? del_timer+0xd0/0xd0 [] worker_loop+0x173/0x530 [] ? btrfs_queue_worker+0x310/0x310 [] ? btrfs_queue_worker+0x310/0x310 [] kthread+0x96/0xb0 [] kernel_thread_helper+0x4/0x10 [] ? kthread_worker_fn+0x190/0x190 [] ? gs_change+0x13/0x13 Code: 5d 41 5e 41 5f c9 c3 0f 1f 40 00 48 8d 4d d0 41 b8 50 00 00 00 4c 89 ea 48 89 de 4c 89 ff e8 f8 c1 01 00 eb ba 66 0f 1f 44 00 00 <0f> 0b eb fe 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 RIP [] btrfs_writepage_fixup_worker+0x160/0x170 RSP ---[ end trace 52453f1ad38744b9 ]--- Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix leaked space in truncate
We were occasionaly leaking space when running xfstest 269. This is because if we failed to start the transaction in the truncate loop we'd just goto out, but we need to break so that the inode is removed from the orphan list and the space is properly freed. Thanks, Signed-off-by: Josef Bacik --- fs/btrfs/inode.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d3e3ca2..ae5b354a 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6581,8 +6581,9 @@ static int btrfs_truncate(struct inode *inode) /* Just need the 1 for updating the inode */ trans = btrfs_start_transaction(root, 1); if (IS_ERR(trans)) { - err = PTR_ERR(trans); - goto out; + ret = err = PTR_ERR(trans); + trans = NULL; + break; } } -- 1.7.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG during btrfs device delete missing
On Thu, Dec 08, 2011 at 12:27:52PM -0800, David Marcin wrote: > Hi Chris, > This was on 3.2-rc2 but I tried with rc4 and it segfaulted again. I > think the traces were the same but I've rebooted and can't say for > sure. > David > On Thu, Dec 8, 2011 at 11:45 AM, Chris Mason wrote: > > Which kernel is this? This looks like one I recently fixed. > > > > -chris > > > > On Thu, Dec 08, 2011 at 11:06:47AM -0800, David Marcin wrote: > >> raid10 metadata and data filesystem. dmesg log follows. The system > >> is unable to unmount the filesystem after this occurs. > >> > >> Filesystem mounted at/mnt/btrfs with -o compress,degraded > >> Command: btrfs device delete missing /mnt/btrfs > >> > >> [ 283.398222] [ cut here ] > >> [ 283.398289] kernel BUG at > >> /home/apw/COD/linux/fs/btrfs/transaction.c:1329! So this crash means we failed to write all the blocks required to commit the transaction. The reason is that we're getting failed bios to the missing device, and that failure isn't properly eaten by the raid aware endio code. If you pull the top commit from my for-linus branch, it should all work. I know you've got a big FS here, I haven't tested this on raid10 yet, only raid1. If you want to wait a bit for safety I'll do a raid10 run too. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix how we do delalloc reservations and how we free reservations on error
Running xfstests 269 with some tracing my scripts kept spitting out errors about releasing bytes that we didn't actually have reserved. This took me down a huge rabbit hole and it turns out the way we deal with reserved_extents is wrong, we need to only be setting it if the reservation succeeds, otherwise the free() method will come in and unreserve space that isn't actually reserved yet, which can lead to other warnings and such. The math was all working out right in the end, but it caused all sorts of other issues in addition to making my scripts yell and scream and generally make it impossible for me to track down the original issue I was looking for. The other problem is with our error handling in the reservation code. There are two cases that we need to deal with 1) We raced with free. In this case free won't free anything because csum_bytes is modified before we dro the lock in our reservation path, so free rightly doesn't release any space because the reservation code may be depending on that reservation. However if we fail, we need the reservation side to do the free at that point since that space is no longer in use. So as it stands the code was doing this fine and it worked out, except in case #2 2) We don't race with free. Nobody comes in and changes anything, and our reservation fails. In this case we didn't reserve anything anyway and we just need to clean up csum_bytes but not free anything. So we keep track of csum_bytes before we drop the lock and if it hasn't changed we know we can just decrement csum_bytes and carry on. Because of the case where we can race with free()'s since we have to drop our spin_lock to do the reservation, I'm going to serialize all reservations with the i_mutex. We already get this for free in the heavy use paths, truncate and file write all hold the i_mutex, just needed to add it to page_mkwrite and various ioctl/balance things. With this patch my space leak scripts no longer scream bloody murder. Thanks, Signed-off-by: Josef Bacik --- fs/btrfs/extent-tree.c | 43 ++- fs/btrfs/inode-map.c |2 ++ fs/btrfs/inode.c | 10 ++ fs/btrfs/ioctl.c |2 ++ fs/btrfs/relocation.c |2 ++ 5 files changed, 46 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 24cfd10..6dd0406 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4189,10 +4189,15 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes) struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_block_rsv *block_rsv = &root->fs_info->delalloc_block_rsv; u64 to_reserve = 0; + u64 csum_bytes; unsigned nr_extents = 0; + int extra_reserve = 0; int flush = 1; int ret; + /* Need to be holding the i_mutex here */ + WARN_ON(!mutex_is_locked(&inode->i_mutex)); + if (btrfs_is_free_space_inode(root, inode)) flush = 0; @@ -4205,11 +4210,9 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes) BTRFS_I(inode)->outstanding_extents++; if (BTRFS_I(inode)->outstanding_extents > - BTRFS_I(inode)->reserved_extents) { + BTRFS_I(inode)->reserved_extents) nr_extents = BTRFS_I(inode)->outstanding_extents - BTRFS_I(inode)->reserved_extents; - BTRFS_I(inode)->reserved_extents += nr_extents; - } /* * Add an item to reserve for updating the inode when we complete the @@ -4217,11 +4220,12 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes) */ if (!BTRFS_I(inode)->delalloc_meta_reserved) { nr_extents++; - BTRFS_I(inode)->delalloc_meta_reserved = 1; + extra_reserve = 1; } to_reserve = btrfs_calc_trans_metadata_size(root, nr_extents); to_reserve += calc_csum_metadata_size(inode, num_bytes, 1); + csum_bytes = BTRFS_I(inode)->csum_bytes; spin_unlock(&BTRFS_I(inode)->lock); ret = reserve_metadata_bytes(root, block_rsv, to_reserve, flush); @@ -4231,22 +4235,35 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes) spin_lock(&BTRFS_I(inode)->lock); dropped = drop_outstanding_extent(inode); - to_free = calc_csum_metadata_size(inode, num_bytes, 0); - spin_unlock(&BTRFS_I(inode)->lock); - to_free += btrfs_calc_trans_metadata_size(root, dropped); - /* -* Somebody could have come in and twiddled with the -* reservation, so if we have to free more than we would have -* reserved from this reservation go ahead and release those -* bytes. +* If the inodes csum_bytes is the same as the original +* csum_bytes then
[PATCH 3/3] Btrfs: read device stats on mount, write modified ones during commit
The device statistics are written into the device tree with each transaction commit. Only modified statistics are written. When a filesystem is mounted, the device statistic for each involved device are read from the device tree and used to initialize the counters. Signed-off-by: Stefan Behrens --- fs/btrfs/ctree.h | 51 fs/btrfs/disk-io.c |7 ++ fs/btrfs/print-tree.c |3 + fs/btrfs/transaction.c |4 + fs/btrfs/volumes.c | 205 fs/btrfs/volumes.h |9 ++ 6 files changed, 279 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 89fab53..f5e2429 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -750,6 +750,26 @@ struct btrfs_csum_item { u8 csum; } __attribute__ ((__packed__)); +struct btrfs_device_stats_item { + /* +* grow this item struct at the end for future enhancements and keep +* the existing values unchanged +*/ + __le64 cnt_write_io_errs; /* EIO or EREMOTEIO from lower layers */ + __le64 cnt_read_io_errs; /* EIO or EREMOTEIO from lower layers */ + __le64 cnt_flush_io_errs; /* EIO or EREMOTEIO from lower layers */ + + /* stats for indirect indications for I/O failures */ + __le64 cnt_corruption_errs; /* checksum error, bytenr error or +* contents is illegal: this is an +* indication that the block was damaged +* during read or write, or written to +* wrong location or read from wrong +* location */ + __le64 cnt_generation_errs; /* an indication that blocks have not +* been written */ +} __attribute__ ((__packed__)); + /* different types of block groups (and chunks) */ #define BTRFS_BLOCK_GROUP_DATA (1 << 0) #define BTRFS_BLOCK_GROUP_SYSTEM (1 << 1) @@ -1388,6 +1408,12 @@ struct btrfs_ioctl_defrag_range_args { #define BTRFS_CHUNK_ITEM_KEY 228 /* + * Persistantly stores the io stats in the device tree. + * One key for all stats, (0, BTRFS_DEVICE_STATS_KEY, devid). + */ +#define BTRFS_DEVICE_STATS_KEY 248 + +/* * string items are for debugging. They just store a short string of * data in the FS */ @@ -2202,6 +2228,31 @@ static inline u32 btrfs_file_extent_inline_item_len(struct extent_buffer *eb, return btrfs_item_size(eb, e) - offset; } +/* btrfs_device_stats_item */ +BTRFS_SETGET_FUNCS(device_stats_cnt_write_io_errs, + struct btrfs_device_stats_item, cnt_write_io_errs, 64); +BTRFS_SETGET_FUNCS(device_stats_cnt_read_io_errs, + struct btrfs_device_stats_item, cnt_read_io_errs, 64); +BTRFS_SETGET_FUNCS(device_stats_cnt_flush_io_errs, + struct btrfs_device_stats_item, cnt_flush_io_errs, 64); +BTRFS_SETGET_FUNCS(device_stats_cnt_corruption_errs, + struct btrfs_device_stats_item, cnt_corruption_errs, 64); +BTRFS_SETGET_FUNCS(device_stats_cnt_generation_errs, + struct btrfs_device_stats_item, cnt_generation_errs, 64); + +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_write_io_errs, +struct btrfs_device_stats_item, cnt_write_io_errs, 64); +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_read_io_errs, +struct btrfs_device_stats_item, cnt_read_io_errs, 64); +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_flush_io_errs, +struct btrfs_device_stats_item, cnt_flush_io_errs, 64); +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_corruption_errs, +struct btrfs_device_stats_item, cnt_corruption_errs, +64); +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_generation_errs, +struct btrfs_device_stats_item, cnt_generation_errs, +64); + static inline struct btrfs_root *btrfs_sb(struct super_block *sb) { return sb->s_fs_info; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b0f2a37..cac8f51 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2321,6 +2321,13 @@ retry_root_backup: fs_info->metadata_alloc_profile = (u64)-1; fs_info->system_alloc_profile = fs_info->metadata_alloc_profile; + ret = btrfs_init_device_stats(fs_info); + if (ret) { + printk(KERN_ERR "btrfs: failed to init device_stats: %d\n", + ret); + goto fail_block_groups; + } + ret = btrfs_init_space_info(fs_info); if (ret) { printk(KERN_ERR "Failed to initial space info: %d\n", ret); diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c index f38e452..a9e45e4 100644 --- a/fs/btrfs/print-tree.c +++ b/fs/btrfs/print-tree.c @@ -294,6 +294,9 @@ void btrfs_print_leaf(struct btrfs_root *root, struct extent_buffe
[PATCH 1/3] Btrfs: add device counters for detected IO and checksum errors
The goal is to detect when drives start to get an increased error rate, when drives should be replaced soon. Therefore statistic counters are added that count IO errors (read, write and flush). Additionally, the software detected errors like checksum errors and corrupted blocks are counted. Signed-off-by: Stefan Behrens --- fs/btrfs/disk-io.c | 18 +++--- fs/btrfs/extent_io.c | 27 - fs/btrfs/scrub.c | 52 +++--- fs/btrfs/volumes.c | 61 +++-- fs/btrfs/volumes.h | 21 + 5 files changed, 161 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 89094ee..b0f2a37 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2511,18 +2511,24 @@ recovery_tree_root: static void btrfs_end_buffer_write_sync(struct buffer_head *bh, int uptodate) { - char b[BDEVNAME_SIZE]; - if (uptodate) { set_buffer_uptodate(bh); } else { + struct btrfs_device *device = (struct btrfs_device *) + (((uintptr_t) bh->b_private) & ~((uintptr_t) 1)); + unsigned int with_flush = ((uintptr_t) bh->b_private) & 1; + printk_ratelimited(KERN_WARNING "lost page write due to " - "I/O error on %s\n", - bdevname(bh->b_bdev, b)); + "I/O error on %s\n", device->name); /* note, we dont' set_buffer_write_io_error because we have * our own ways of dealing with the IO errors */ clear_buffer_uptodate(bh); + btrfs_device_stat_inc(&device->cnt_write_io_errs); + if (with_flush) + btrfs_device_stat_inc(&device->cnt_flush_io_errs); + device->device_stats_dirty = 1; + btrfs_device_stat_print_on_error(device); } unlock_buffer(bh); put_bh(bh); @@ -2637,6 +2643,7 @@ static int write_dev_supers(struct btrfs_device *device, set_buffer_uptodate(bh); lock_buffer(bh); bh->b_end_io = btrfs_end_buffer_write_sync; + bh->b_private = device; } /* @@ -2695,6 +2702,9 @@ static int write_dev_flush(struct btrfs_device *device, int wait) } if (!bio_flagged(bio, BIO_UPTODATE)) { ret = -EIO; + btrfs_device_stat_inc(&device->cnt_flush_io_errs); + device->device_stats_dirty = 1; + btrfs_device_stat_print_on_error(device); } /* drop the reference from the wait == 0 run */ diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 7609d28..566d262 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1894,6 +1894,9 @@ int repair_io_failure(struct btrfs_mapping_tree *map_tree, u64 start, if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) { /* try to remap that extent elsewhere? */ bio_put(bio); + btrfs_device_stat_inc(&dev->cnt_write_io_errs); + dev->device_stats_dirty = 1; + btrfs_device_stat_print_on_error(dev); return -EIO; } @@ -2280,10 +2283,30 @@ static void end_bio_extent_readpage(struct bio *bio, int err) if (uptodate && tree->ops && tree->ops->readpage_end_io_hook) { ret = tree->ops->readpage_end_io_hook(page, start, end, state); - if (ret) + if (ret) { + /* no IO indicated but software detected errors +* in the block, either checksum errros or +* issues with the contents */ + int failed_mirror = (int)(uintptr_t) + bio->bi_bdev; + struct btrfs_root *root = + BTRFS_I(page->mapping->host)->root; + struct btrfs_device *device; + uptodate = 0; - else + device = btrfs_find_device_for_logical( + root, start, + (int)failed_mirror); + if (device) { + btrfs_device_stat_inc( + &device->cnt_corruption_errs); + device->device_stats_dirty = 1; + btrfs_device_stat_pri
[PATCH 2/3] Btrfs: add ioctl to get and reset the device stats
An ioctl interface is added to get the device statistic counters. A second ioctl is added to atomically get and reset these counters. Signed-off-by: Stefan Behrens --- fs/btrfs/ioctl.c | 26 +++ fs/btrfs/ioctl.h | 27 fs/btrfs/volumes.c | 69 fs/btrfs/volumes.h | 13 ++ 4 files changed, 135 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 72d4616..bce3f92 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2891,6 +2891,28 @@ static long btrfs_ioctl_scrub_progress(struct btrfs_root *root, return ret; } +static long btrfs_ioctl_get_device_stats(struct btrfs_root *root, +void __user *arg, int reset_after_read) +{ + struct btrfs_ioctl_get_device_stats *sa; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + sa = memdup_user(arg, sizeof(*sa)); + if (IS_ERR(sa)) + return PTR_ERR(sa); + + ret = btrfs_get_device_stats(root, sa, reset_after_read); + + if (copy_to_user(arg, sa, sizeof(*sa))) + ret = -EFAULT; + + kfree(sa); + return ret; +} + static long btrfs_ioctl_ino_to_path(struct btrfs_root *root, void __user *arg) { int ret = 0; @@ -3108,6 +3130,10 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_scrub_cancel(root, argp); case BTRFS_IOC_SCRUB_PROGRESS: return btrfs_ioctl_scrub_progress(root, argp); + case BTRFS_IOC_GET_DEVICE_STATS: + return btrfs_ioctl_get_device_stats(root, argp, 0); + case BTRFS_IOC_GET_AND_RESET_DEVICE_STATS: + return btrfs_ioctl_get_device_stats(root, argp, 1); } return -ENOTTY; diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index 252ae99..b9ffd0b 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -217,6 +217,29 @@ struct btrfs_ioctl_logical_ino_args { __u64 inodes; }; +#define BTRFS_IOCTL_GET_DEVICE_STATS_MAX_NR_ITEMS 5 +struct btrfs_ioctl_get_device_stats { + __u64 devid;/* in */ + __u64 nr_items; /* in/out */ + + /* out values: */ + + /* disk I/O failure stats */ + __u64 cnt_write_io_errs; /* EIO or EREMOTEIO from lower layers */ + __u64 cnt_read_io_errs; /* EIO or EREMOTEIO from lower layers */ + __u64 cnt_flush_io_errs; /* EIO or EREMOTEIO from lower layers */ + + /* stats for indirect indications for I/O failures */ + __u64 cnt_corruption_errs; /* checksum error, bytenr error or + * contents is illegal: this is an + * indication that the block was damaged + * during read or write, or written to + * wrong location or read from wrong + * location */ + __u64 cnt_generation_errs; /* an indication that blocks have not + * been written */ +}; + #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \ struct btrfs_ioctl_vol_args) #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \ @@ -276,5 +299,9 @@ struct btrfs_ioctl_logical_ino_args { struct btrfs_ioctl_ino_path_args) #define BTRFS_IOC_LOGICAL_INO _IOWR(BTRFS_IOCTL_MAGIC, 36, \ struct btrfs_ioctl_ino_path_args) +#define BTRFS_IOC_GET_DEVICE_STATS _IOWR(BTRFS_IOCTL_MAGIC, 52, \ +struct btrfs_ioctl_get_device_stats) +#define BTRFS_IOC_GET_AND_RESET_DEVICE_STATS _IOWR(BTRFS_IOCTL_MAGIC, 53, \ +struct btrfs_ioctl_get_device_stats) #endif diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index cc21e14..99dfd00 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3852,3 +3852,72 @@ void btrfs_device_stat_print_on_error(struct btrfs_device *device) btrfs_device_stat_read( &device->cnt_generation_errs)); } + +int btrfs_get_device_stats(struct btrfs_root *root, + struct btrfs_ioctl_get_device_stats *stats, + int reset_after_read) +{ + struct btrfs_device *dev; + struct btrfs_fs_devices *fs_devices = root->fs_info->fs_devices; + + mutex_lock(&fs_devices->device_list_mutex); + dev = btrfs_find_device(root, stats->devid, NULL, NULL); + mutex_unlock(&fs_devices->device_list_mutex); + + if (!dev) { + printk(KERN_WARNING + "btrfs: get device_stats failed, device not found\n"); + return -ENODEV; + } else if (reset_after_read) { + if (
[PATCH 0/3] Btrfs: add IO error device stats
The goal is to detect when drives start to get an increased error rate, when drives should be replaced soon. Therefore statistic counters are added that count IO errors (read, write and flush). Additionally, the software detected errors like checksum errors and corrupted blocks are counted. An ioctl interface is added to get the device statistic counters. A second ioctl is added to atomically get and reset these counters. The device statistics are written into the device tree with each transaction commit. Only modified statistics are written. When a filesystem is mounted, the device statistic for each involved device are read from the device tree and used to initialize the counters. A patch for the btrfs-progs world will also be sent. The patches are based on v3.1-161-gf4a8e65 (btrfs pull request from 12/1/2011). Stefan Behrens (3): Btrfs: add device counters for detected IO and checksum errors Btrfs: add ioctl to get and reset the device stats Btrfs: read device stats on mount, write modified ones during commit fs/btrfs/ctree.h | 51 fs/btrfs/disk-io.c | 25 +++- fs/btrfs/extent_io.c | 27 - fs/btrfs/ioctl.c | 26 fs/btrfs/ioctl.h | 27 fs/btrfs/print-tree.c |3 + fs/btrfs/scrub.c | 52 ++-- fs/btrfs/transaction.c |4 + fs/btrfs/volumes.c | 335 +++- fs/btrfs/volumes.h | 43 ++ 10 files changed, 575 insertions(+), 18 deletions(-) -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs-progs: add command to get/reset device stats via ioctl
"btrfs device stats" is used to retrieve and print the device stats. "btrfs device stats -z" is used atomically retrieve, reset and print the stats. Signed-off-by: Stefan Behrens --- Makefile |4 +- btrfs.c |5 ++ btrfs_cmds.c | 67 + btrfs_cmds.h |5 ++ ctree.h |6 +++ devstats.c | 131 ++ ioctl.h | 28 print-tree.c |7 +++ scrub.c | 74 +--- 9 files changed, 254 insertions(+), 73 deletions(-) diff --git a/Makefile b/Makefile index eeb92ad..c7ad82b 100644 --- a/Makefile +++ b/Makefile @@ -36,8 +36,8 @@ all: version $(progs) manpages version: bash version.sh -btrfs: $(objects) btrfs.o btrfs_cmds.o scrub.o - $(CC) $(CFLAGS) -o btrfs btrfs.o btrfs_cmds.o scrub.o \ +btrfs: $(objects) btrfs.o btrfs_cmds.o scrub.o devstats.o + $(CC) $(CFLAGS) -o btrfs btrfs.o btrfs_cmds.o scrub.o devstats.o \ $(objects) $(LDFLAGS) $(LIBS) -lpthread calc-size: $(objects) calc-size.o diff --git a/btrfs.c b/btrfs.c index 1def354..078729a 100644 --- a/btrfs.c +++ b/btrfs.c @@ -159,6 +159,11 @@ static struct Command commands[] = { "filesystem.", NULL }, + { do_device_stats, -1, + "device stats", "[-z] |\n" + "Show current device IO stats. -z to reset stats afterwards.", + NULL + }, { do_add_volume, -2, "device add", " [...] \n" "Add a device to a filesystem.", diff --git a/btrfs_cmds.c b/btrfs_cmds.c index b59e9cb..065e103 100644 --- a/btrfs_cmds.c +++ b/btrfs_cmds.c @@ -117,6 +117,73 @@ int open_file_or_dir(const char *fname) return fd; } +int get_device_info(int fd, u64 devid, + struct btrfs_ioctl_dev_info_args *di_args) +{ + int ret; + + di_args->devid = devid; + memset(&di_args->uuid, '\0', sizeof(di_args->uuid)); + + ret = ioctl(fd, BTRFS_IOC_DEV_INFO, di_args); + return ret ? -errno : 0; +} + +int get_fs_info(int fd, char *path, struct btrfs_ioctl_fs_info_args *fi_args, + struct btrfs_ioctl_dev_info_args **di_ret) +{ + int ret = 0; + int ndevs = 0; + int i = 1; + struct btrfs_fs_devices *fs_devices_mnt = NULL; + struct btrfs_ioctl_dev_info_args *di_args; + char mp[BTRFS_PATH_NAME_MAX + 1]; + + memset(fi_args, 0, sizeof(*fi_args)); + + ret = ioctl(fd, BTRFS_IOC_FS_INFO, fi_args); + if (ret && (errno == EINVAL || errno == ENOTTY)) { + /* path is not a mounted btrfs. Try if it's a device */ + ret = check_mounted_where(fd, path, mp, sizeof(mp), + &fs_devices_mnt); + if (!ret) + return -EINVAL; + if (ret < 0) + return ret; + fi_args->num_devices = 1; + fi_args->max_id = fs_devices_mnt->latest_devid; + i = fs_devices_mnt->latest_devid; + memcpy(fi_args->fsid, fs_devices_mnt->fsid, BTRFS_FSID_SIZE); + close(fd); + fd = open_file_or_dir(mp); + if (fd < 0) + return -errno; + } else if (ret) { + return -errno; + } + + if (!fi_args->num_devices) + return 0; + + di_args = *di_ret = malloc(fi_args->num_devices * sizeof(*di_args)); + if (!di_args) + return -errno; + + for (; i <= fi_args->max_id; ++i) { + BUG_ON(ndevs >= fi_args->num_devices); + ret = get_device_info(fd, i, &di_args[ndevs]); + if (ret == -ENODEV) + continue; + if (ret) + return ret; + ndevs++; + } + + BUG_ON(ndevs == 0); + + return 0; +} + static u64 parse_size(char *s) { int len = strlen(s); diff --git a/btrfs_cmds.h b/btrfs_cmds.h index 81182b1..6be9cc5 100644 --- a/btrfs_cmds.h +++ b/btrfs_cmds.h @@ -41,4 +41,9 @@ int do_change_label(int argc, char **argv); int open_file_or_dir(const char *fname); int do_ino_to_path(int nargs, char **argv); int do_logical_to_ino(int nargs, char **argv); +int do_device_stats(int nargs, char **argv); +int get_device_info(int fd, u64 devid, + struct btrfs_ioctl_dev_info_args *di_args); +int get_fs_info(int fd, char *path, struct btrfs_ioctl_fs_info_args *fi_args, + struct btrfs_ioctl_dev_info_args **di_ret); char *path_for_root(int fd, u64 root); diff --git a/ctree.h b/ctree.h index 54748c8..12a0603 100644 --- a/ctree.h +++ b/ctree.h @@ -912,6 +912,12 @@ struct btrfs_root { #define BTRFS_CHUNK_ITEM_KEY 228 /* + * Persistantly stores the io stats in the device tree. + * One key for all stats, (0, BTRFS_DEVICE_STATS_KEY, devid). + */ +#define BTRFS_DEVICE_STATS_KEY 248 + +/* *
Re: [PATCH 02/20] Btrfs: initialize new bitmaps' list
2011/12/7 Christian Brunner : > 2011/12/1 Christian Brunner : >> 2011/12/1 Alexandre Oliva : >>> On Nov 29, 2011, Christian Brunner wrote: >>> When I'm doing havy reading in our ceph cluster. The load and wait-io on the patched servers is higher than on the unpatched ones. >>> >>> That's unexpected. > > In the mean time I know, that it's not related to the reads. > >>> I suppose I could wave my hands while explaining that you're getting >>> higher data throughput, so it's natural that it would take up more >>> resources, but that explanation doesn't satisfy me. I suppose >>> allocation might have got slightly more CPU intensive in some cases, as >>> we now use bitmaps where before we'd only use the cheaper-to-allocate >>> extents. But that's unsafisfying as well. >> >> I must admit, that I do not completely understand the difference >> between bitmaps and extents. >> >> From what I see on my servers, I can tell, that the degradation over >> time is gone. (Rebooting the servers every day is no longer needed. >> This is a real plus.) But the performance compared to a freshly >> booted, unpatched server is much slower with my ceph workload. >> >> I wonder if it would make sense to initialize the list field only, >> when the cluster setup fails? This would avoid the fallback to the >> much unclustered allocation and would give us the cheaper-to-allocate >> extents. > > I've now tried various combinations of you patches and I can really > nail it down to this one line. > > With this patch applied I get much higher write-io values than without > it. Some of the other patches help to reduce the effect, but it's > still significant. > > iostat on an unpatched node is giving me: > > Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s > avgrq-sz avgqu-sz await svctm %util > sda 105.90 0.37 15.42 14.48 2657.33 560.13 > 107.61 1.89 62.75 6.26 18.71 > > while on a node with this patch it's > sda 128.20 0.97 11.10 57.15 3376.80 552.80 > 57.58 20.58 296.33 4.16 28.36 > > > Also interesting, is the fact that the average request size on the > patched node is much smaller. > > Josef was telling me, that this could be related to the number of > bitmaps we write out, but I've no idea how to trace this. > > I would be very happy if someone could give me a hint on what to do > next, as this is one of the last remaining issues with our ceph > cluster. This is still bugging me and I just remembered something that might be helpfull. Also I hope that this is not misleading... Back in 2.6.38 we were running ceph without btrfs performance degradation. I found a thread on the list where similar problems where reported: http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg10346.html In that thread someone bisected the issue to >From 4e69b598f6cfb0940b75abf7e179d6020e94ad1e Mon Sep 17 00:00:00 2001 From: Josef Bacik Date: Mon, 21 Mar 2011 10:11:24 -0400 Subject: [PATCH] Btrfs: cleanup how we setup free space clusters In this commit the bitmaps handling was changed. So I just thought that this may be related. I'm still hoping, that someone with a deeper understanding of btrfs could take a look at this. Thanks, Christian -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Product Order
Hello, I am Manager of SIMKINS LTD. USA, My company is interested in the purchase of your products. Kindly send me an email with details of: *Minimum Order Quantity *Your delivery time *Payment terms *And your products warranty I await to hear from you urgently Mr Stefan Al Simkins. Purchasing Manager. SIMKINS LTD ___ NOCC, http://nocc.sourceforge.net -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs switches to using mostly one thread
On 09/12/11 14:18, Chris Mason wrote: According to this you've only got one delalloc worker. That would explain it. Could you please confirm with ps? Yes - only one delalloc worker is now present, but there were at least three initially. Jeremy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs switches to using mostly one thread
On Fri, Dec 09, 2011 at 12:05:40PM +, Jeremy Sanders wrote: > On 08/12/11 20:11, Chris Mason wrote: > >On Thu, Dec 08, 2011 at 05:39:16PM +, Jeremy Sanders wrote: > >>On 08/12/11 17:23, Chris Mason wrote: > >>>On Thu, Dec 08, 2011 at 04:57:12PM +, Jeremy Sanders wrote: > On 08/12/11 15:32, Chris Mason wrote: > >On Thu, Dec 08, 2011 at 03:19:38PM +, Jeremy Sanders wrote: > >>Hi - I'm trying out btrfs again, and I see the same old bug in kernel > >>3.1.4 > >>(Fedora 16, x86_64, dual-core), where after a few hours of writing, it > >>switches from writing with several threads to writing with one: > > > >Ok, I'll try to reproduce this here. Could you please do a sysrq-t, I'd > >like to see what the other delalloc-writers are doing. > > I've attached sysrq-t. It looks like it might be truncated at the > beginning, however. > >>> > >>>/var/log/messages may have the whole thing, please do check. > >> > >>That was from /var/log/messages. I think it needs a longer > >>log_buf_len. Unfortunately the system hasn't come back from its > >>reboot, so it will have to wait until tomorrow when I can get to it > >>physically. > > > >Ok, this trace shows that we have tar sitting in balance_dirty_pages and > >we have the single delalloc worker doing requests. The other delalloc > >workers don't show up at all. > > > >So either they are earlier in the trace or they disappeared somehow. > >I'll definitely need the full trace if you can send it. > > I've got the full trace now. It's pretty big (430KB), so I've put it > on the web. > > Here's the state before switching to one thread > http://www-xray.ast.cam.ac.uk/~jss/data/btrfs-before.txt > > Here it is after it has switched to one thread: > http://www-xray.ast.cam.ac.uk/~jss/data/btrfs-after.txt According to this you've only got one delalloc worker. That would explain it. Could you please confirm with ps? You might be hitting a problem Josef sent patches for, I'll dig in. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING: at fs/btrfs/extent-tree.c:4754 followed by BUG: unable to handle kernel NULL pointer dereference at (null)
Hello! 2011/12/8 Jan Schmidt : > On 07.12.2011 21:40, Kai Krakow wrote: [...] >> The problematic file seems to be in /usr/portage but scrubbing doesn't tell >> me the filename (I was under the impression 3.2.x adds a patch which should >> report filenames). > > It should. Did you take a look at dmesg output after scrubbing? If it > doesn't contain a hint on the file or block, please paste what you get. I watched dmesg while scrubbing. Nothing there. To paste what I got I need to find a way to make my 3.2-rc4 system boot again (without freezing to due services and background jobs touching certain parts of the broken filesystem) or create a 3.2 rescue system... >> Everytime I run "emerge" (it is a gentoo system) my >> screen goes black after a few seconds and I can only revert to using ssh. >> >> Problem is: As soon as this happens, some filesystem accesses block the >> process in disk state, it cannot be killed. This initiates some feedback >> loop: From now on any other process trying to access the FS freezes. I can >> only reisub now. It seems to be fine if data comes from cache instead from >> disk. > > Please try to grab sysrq+w output in this state. I tried, nothing there. I wondered, why... This changed between 3.1 and 3.2. There is probably no blocking process because it got killed by the kernel. Next process accessing the filesystem blocks (gets not killed). I try to get a sysrq+w from this situation via ssh to copy&paste dmesg somewhere but it will be difficult because usually ssh communication freezes, too. Maybe related: When the system was still running I was sometimes seeing it use 100% CPU on one or two cores, looking at "top" I could not see a process or kernel thread using the CPU but I saw the CPU usage distributing on SYS%, WA% and USER%... This effect could only be resolved by rebooting. It can be seen in both kernel 3.1 and 3.2, but 3.2 with much lower likelihood. However, even nice'd processes were still able to acquire 100% cpu usage per core, so it didn't have any effect on system performance. I think I even made my situation worse... In an attempt to get the error fixed, I deleted and recreated the subvolume with /usr/portage (content is easily restorable from the internet). On next reboot the btrfs cleaner kernel thread spit out a lot of errors and traces into dmesg, system froze some minutes later so I couldn't save the output. Now I cannot reliably boot and btrfs has problems accessing files all over the filesystem, even in subvolumes that worked fine before. I thought subvolumes are clearly separated from each other? Now I have at least 3 different classes of error messages instead of only 1 single error. Josef's repair program fails an assertion and cannot continue on the volume. I think in order to stabilize btrfs it is important to make it handle structure errors gracefully, and then invest into some repair utility. I'd like to contribute but at some point in time I will need to get my system back into a stable state and will recreate my filesystem from scratch. Mounting the fs read-only allows me to access all parts of the filesystem without problems. I still see errors in dmesg but no kernel bugs or warnings with traces. Regards, Kai -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs switches to using mostly one thread
On 09/12/11 12:15, Arne Jansen wrote: On 08.12.2011 16:19, Jeremy Sanders wrote: Hi - I'm trying out btrfs again, and I see the same old bug in kernel 3.1.4 (Fedora 16, x86_64, dual-core), where after a few hours of writing, it switches from writing with several threads to writing with one: How many disks does the fs have? One - it's writing onto a "linear" MD array (for testing purposes). I disabled duplication of metadata as well, and zlib compression is forced. Jeremy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs switches to using mostly one thread
On 08.12.2011 16:19, Jeremy Sanders wrote: > Hi - I'm trying out btrfs again, and I see the same old bug in kernel 3.1.4 > (Fedora 16, x86_64, dual-core), where after a few hours of writing, it > switches from writing with several threads to writing with one: How many disks does the fs have? -Arne -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs switches to using mostly one thread
On 08/12/11 20:11, Chris Mason wrote: On Thu, Dec 08, 2011 at 05:39:16PM +, Jeremy Sanders wrote: On 08/12/11 17:23, Chris Mason wrote: On Thu, Dec 08, 2011 at 04:57:12PM +, Jeremy Sanders wrote: On 08/12/11 15:32, Chris Mason wrote: On Thu, Dec 08, 2011 at 03:19:38PM +, Jeremy Sanders wrote: Hi - I'm trying out btrfs again, and I see the same old bug in kernel 3.1.4 (Fedora 16, x86_64, dual-core), where after a few hours of writing, it switches from writing with several threads to writing with one: Ok, I'll try to reproduce this here. Could you please do a sysrq-t, I'd like to see what the other delalloc-writers are doing. I've attached sysrq-t. It looks like it might be truncated at the beginning, however. /var/log/messages may have the whole thing, please do check. That was from /var/log/messages. I think it needs a longer log_buf_len. Unfortunately the system hasn't come back from its reboot, so it will have to wait until tomorrow when I can get to it physically. Ok, this trace shows that we have tar sitting in balance_dirty_pages and we have the single delalloc worker doing requests. The other delalloc workers don't show up at all. So either they are earlier in the trace or they disappeared somehow. I'll definitely need the full trace if you can send it. I've got the full trace now. It's pretty big (430KB), so I've put it on the web. Here's the state before switching to one thread http://www-xray.ast.cam.ac.uk/~jss/data/btrfs-before.txt Here it is after it has switched to one thread: http://www-xray.ast.cam.ac.uk/~jss/data/btrfs-after.txt Jeremy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: keep orphans for subvolume deletion
Since we have the free space caches, btrfs_orphan_cleanup also runs for the tree_root. Unfortunately this also cleans up the orphans used to mark subvol deletions in progress. Currently if a subvol deletion gets interrupted twice by umount/mount, the deletion will not be continued and the space permanently lost, though it would be possible to write a tool to recover those lost subvol deletions. This patch checks if the orphan belongs to a subvol (dead root) and skips the deletion. Signed-off-by: Arne Jansen --- fs/btrfs/inode.c | 32 1 files changed, 32 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index c5ccec2..e30d38f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2158,6 +2158,38 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) if (ret && ret != -ESTALE) goto out; + if (ret == -ESTALE && root == root->fs_info->tree_root) { + struct btrfs_root *dead_root; + struct btrfs_fs_info *fs_info = root->fs_info; + int is_dead_root = 0; + + /* +* this is an orphan in the tree root. Currently these +* could come from 2 sources: +* a) a snapshot deletion in progress +* b) a free space cache inode +* We need to distinguish those two, as the snapshot +* orphan must not get deleted. +* find_dead_roots already ran before us, so if this +* is a snapshot deletion, we should find the root +* in the dead_roots list +*/ + spin_lock(&fs_info->trans_lock); + list_for_each_entry(dead_root, &fs_info->dead_roots, + root_list) { + if (dead_root->root_key.objectid == + found_key.objectid) { + is_dead_root = 1; + break; + } + } + spin_unlock(&fs_info->trans_lock); + if (is_dead_root) { + /* prevent this orphan from being found again */ + key.offset = found_key.objectid - 1; + continue; + } + } /* * Inode is already gone but the orphan item is still there, * kill the orphan item. -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html