Re: Deleted subvolume reappears and other cleaner issues
Can anyone please comment on below? On Thu, Jan 31, 2013 at 3:03 PM, Alex Lyakas alex.bt...@zadarastorage.com wrote: Hi, I want to check if any of the below issues are worth/should be fixed: # btrfs_ioctl_snap_destroy() does not commit a transaction. As a result, user can ask to delete a subvol, he receives ok back. Even if user does btrfs sub list, he will not see the deleted subvol (even though the transaction was not committed yet). But if a crash happens, ORPHAN_ITEM will not re-appear after crash. So after crash, the subvolume still exists perfectly fine (happened couple of times here). # btrfs_drop_snapshot() does not commit a transaction after btrfs_del_orphan_item(). So if the subvol deletion completed in one go (did not have to detach and re-attach to transaction, thus committing the ORPHAN_ITEM and drop_progress/level), then after crash ORPHAN_ITEM will not be in the tree, and subvolume still exists. # btrfs_drop_snapshot() checks btrfs_should_end_transaction(), and then does btrfs_end_transaction_throttle() and btrfs_start_transaction(). However, it looks like it can rejoin the same transaction if transaction was not not blocked yet. Minor issue, perhaps? # umount may get delayed because of pending-for-deletion subvolumes: btrfs_commit_super() locks the cleaner_mutex, so it will wait for the cleaner to complete. On the other hand, cleaner will not give up until it completes processing all its splice. If currently cleaner is not running, then btrfs_commit_super() calls btrfs_clean_old_snapshots() directly. So does it make sense: - btrfs_commit_super() will not call btrfs_clean_old_snapshots() - close_ctree() calls kthread_stop(cleaner_kthread) early, and cleaner thread periodically checks if it needs to exit Thanks, Alex. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Leaking btrfs_qgroup_inherit on snapshot creation?
Hi Jan, Arne, I see this code in create_snapshot: if (inherit) { pending_snapshot-inherit = *inherit; *inherit = NULL;/* take responsibility to free it */ } So, first thing I think it should be: if (*inherit) because in btrfs_ioctl_snap_create_v2() we have: struct btrfs_qgroup_inherit *inherit = NULL; ... btrfs_ioctl_snap_create_transid(..., inherit) so the current check is very unlikely to be NULL. Second, I don't see anybody freeing pending_snapshot-inherit. I guess it should be freed after callin btrfs_qgroup_inherit() and also in btrfs_destroy_pending_snapshots(). Thanks, Alex. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Leaking btrfs_qgroup_inherit on snapshot creation?
Hi Alex, On 02/06/13 12:18, Alex Lyakas wrote: Hi Jan, Arne, I see this code in create_snapshot: if (inherit) { pending_snapshot-inherit = *inherit; *inherit = NULL;/* take responsibility to free it */ } So, first thing I think it should be: if (*inherit) because in btrfs_ioctl_snap_create_v2() we have: struct btrfs_qgroup_inherit *inherit = NULL; ... btrfs_ioctl_snap_create_transid(..., inherit) so the current check is very unlikely to be NULL. But in btrfs_ioctl_snap_create it is called with NULL, so *inherit would dereference a NULL pointer. Second, I don't see anybody freeing pending_snapshot-inherit. I guess it should be freed after callin btrfs_qgroup_inherit() and also in btrfs_destroy_pending_snapshots(). You're right. In our original version (6f72c7e20dbaea5) it was still there, in transaction.c. It has been removed in 6fa9700e734: commit 6fa9700e734275de2acbcb0e99414bd7ddfc60f1 Author: Miao Xie mi...@cn.fujitsu.com Date: Thu Sep 6 04:00:32 2012 -0600 Btrfs: fix error path in create_pending_snapshot() This patch fixes the following problem: - If we failed to deal with the delayed dir items, we should abort transaction, just as its comment said. Fix it. - If root reference or root back reference insertion failed, we should abort transaction. Fix it. - Fix the double free problem of pending-inherit. - Do not restore the trans-rsv if we doesn't change it. - make the error path more clearly. Signed-off-by: Miao Xie mi...@cn.fujitsu.com Miao, can you please explain where you see a double free? -Arne Thanks, Alex. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs: Error removing orphan entry, stopping orphan cleanup. could not do orphan cleanup -22
On Wed, Feb 6, 2013 at 4:45 AM, Josef Bacik jba...@fusionio.com wrote: Ok so I think we've fixed this already, can you build btrfs-next and try mounting with that and see if it fixes the problem? Thanks, Hi, Josef, Is there any btrfs git compatible with kernel 3.7 and absorbs this fix? So I can dkms it, I'm using openSUSE 12.3 Beta1 with kernel 3.7, I can update to 3.8, but next time I do system update to RC1 it will fallback to 3.7 again. #btrfs-next seems to move forward to 3.8. /uapi/linux/btrfs.h doesn't exist in kernel 3.7, it still use the old ioctl.h, I tried to revert the ioctl.h btrfs.h patch, but there's still errors about: /usr/src/linux-3.7.1-1/include/trace/events/btrfs.h: In function ‘ftrace_raw_event_btrfs_transaction_commit’: /usr/src/linux-3.7.1-1/include/trace/events/btrfs.h:61:1: error: incompatible types when assigning to type ‘u64’ from type ‘atomic64_t’ Thanks Marguerite -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: add delayed_iput list head to btrfs inode
On Feb 5, 2013, at 8:11 PM, Liu Bo bo.li@oracle.com wrote: On Tue, Feb 05, 2013 at 03:14:05PM -0800, Zach Brown wrote: +struct btrfs_inode *b_inode = BTRFS_I(inode); +struct btrfs_fs_info *fs_info = b_inode-root-fs_info; if (atomic_add_unless(inode-i_count, -1, 1)) return; -delayed = kmalloc(sizeof(*delayed), GFP_NOFS | __GFP_NOFAIL); -delayed-inode = inode; - spin_lock(fs_info-delayed_iput_lock); -list_add_tail(delayed-list, fs_info-delayed_iputs); +list_add_tail(b_inode-delayed_iput, fs_info-delayed_iputs); spin_unlock(fs_info-delayed_iput_lock); } Hmm. I'm not great with inode life cycles, but isn't this only safe if someone else can't get an i_count reference while this is in flight? It looks like the final iput does the unhashing, and so on, so couldn't an iget/iput race with this and try to add the inode's list_head twice? Yeah, same concern here. Basically this will result in inodes still being in use on unmount. Actually I did a similar one, here is some disscussion: https://patchwork.kernel.org/patch/1824711/ Ok, thanks all. We should remove Jeff's comment then, it sure sounded like a good idea... Eric thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-progs pull request for coverity fixes
On 02/05/2013 07:49 PM, Zach Brown wrote: Hi gang, Eric and I went through the warnings that a Coverity scan raised against the reasonably recent btrfs-progs that's in Fedora. We tried to tackle the lowest hanging fruit: these are the most obvious and least risky fixes. If you ran your tests against the btrfs-progs in Fedora 18 then your tests may not have found problems fixed by the valgrind patch on F18. This patch is not applied to the current David Sterba's integration branches. If you have the time, it might be useful (my opinion) to run your tests against Sterba's integration-02130201 branch ... some of the problems may be fixed and other not but you also may find some additional problems. IMO, this branch (or something similar) will be the basis for a future v0.20 btrfs-progs ... at least that is what I am using/testing with. There are over 80 commits in this branch over what is the baseis for the package in Fedora 18. Gene -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
GIT PULL (was Re: Integration branch of btrfs-progs 2013-02-01)
On Fri, Feb 01, 2013 at 06:41:11PM +0100, David Sterba wrote: I'm going to do some more testing of the previous integration branch and send a pull request as of 20130126. I didn't have that much time for testing as I wanted, at least did another review round of the pending patchset. Please pull git://repo.or.cz/btrfs-progs-unstable/devel.git for-chris with top commit 78b35a43988163dbf71d9a Thanks, david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-progs pull request for coverity fixes
On 2/6/13 8:27 AM, Gene Czarcinski wrote: On 02/05/2013 07:49 PM, Zach Brown wrote: Hi gang, Eric and I went through the warnings that a Coverity scan raised against the reasonably recent btrfs-progs that's in Fedora. We tried to tackle the lowest hanging fruit: these are the most obvious and least risky fixes. If you ran your tests against the btrfs-progs in Fedora 18 then your tests may not have found problems fixed by the valgrind patch on F18. This patch is not applied to the current David Sterba's integration branches. The original one was done against F18 with the valgrind patch in place, yes. So it may have missed several things. I can easily do the same analysis against any codebase if/when it's appropriate. If you have the time, it might be useful (my opinion) to run your tests against Sterba's integration-02130201 branch ... some of the problems may be fixed and other not but you also may find some additional problems. IMO, this branch (or something similar) will be the basis for a future v0.20 btrfs-progs ... at least that is what I am using/testing with. There are over 80 commits in this branch over what is the baseis for the package in Fedora 18. I agree that we need to re-run against all those patches, but I think I will wait until they make it all the way upstream. Until it gets to the point where development and patch integration happens upstream (or at least in an orderly, predictable manner), and we have timely releases of userspace for distro consumption, it's going to be a little hit and miss with tools like this, since there are so many different codebases out there right now, with distros all maintaining their own large patchsets. There are certainly more static analysis issues to fix, but Zach and I thought we'd just see if this group of patches managed to make it upstream before we went a lot further with it. Thanks, -Eric Gene -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Deleted subvolume reappears and other cleaner issues
On Thu, Jan 31, 2013 at 06:03:06AM -0700, Alex Lyakas wrote: Hi, I want to check if any of the below issues are worth/should be fixed: # btrfs_ioctl_snap_destroy() does not commit a transaction. As a result, user can ask to delete a subvol, he receives ok back. Even if user does btrfs sub list, he will not see the deleted subvol (even though the transaction was not committed yet). But if a crash happens, ORPHAN_ITEM will not re-appear after crash. So after crash, the subvolume still exists perfectly fine (happened couple of times here). Same thing happens to normal unlinks, I don't see a reason to have different rules for subvols. # btrfs_drop_snapshot() does not commit a transaction after btrfs_del_orphan_item(). So if the subvol deletion completed in one go (did not have to detach and re-attach to transaction, thus committing the ORPHAN_ITEM and drop_progress/level), then after crash ORPHAN_ITEM will not be in the tree, and subvolume still exists. Again same thing happens with normal files. # btrfs_drop_snapshot() checks btrfs_should_end_transaction(), and then does btrfs_end_transaction_throttle() and btrfs_start_transaction(). However, it looks like it can rejoin the same transaction if transaction was not not blocked yet. Minor issue, perhaps? No if we didn't block then its ok and we wait longer, we only throttle to give the transaction stuff a chance to commit, so if the join logic decides its ok to go on then we're good. # umount may get delayed because of pending-for-deletion subvolumes: btrfs_commit_super() locks the cleaner_mutex, so it will wait for the cleaner to complete. On the other hand, cleaner will not give up until it completes processing all its splice. If currently cleaner is not running, then btrfs_commit_super() calls btrfs_clean_old_snapshots() directly. So does it make sense: - btrfs_commit_super() will not call btrfs_clean_old_snapshots() - close_ctree() calls kthread_stop(cleaner_kthread) early, and cleaner thread periodically checks if it needs to exit I don't quite follow this, but sure? Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs: Error removing orphan entry, stopping orphan cleanup. could not do orphan cleanup -22
On Wed, Feb 06, 2013 at 06:03:14AM -0700, Marguerite Su wrote: On Wed, Feb 6, 2013 at 4:45 AM, Josef Bacik jba...@fusionio.com wrote: Ok so I think we've fixed this already, can you build btrfs-next and try mounting with that and see if it fixes the problem? Thanks, Hi, Josef, Is there any btrfs git compatible with kernel 3.7 and absorbs this fix? So I can dkms it, I'm using openSUSE 12.3 Beta1 with kernel 3.7, I can update to 3.8, but next time I do system update to RC1 it will fallback to 3.7 again. #btrfs-next seems to move forward to 3.8. /uapi/linux/btrfs.h doesn't exist in kernel 3.7, it still use the old ioctl.h, I tried to revert the ioctl.h btrfs.h patch, but there's still errors about: /usr/src/linux-3.7.1-1/include/trace/events/btrfs.h: In function ‘ftrace_raw_event_btrfs_transaction_commit’: /usr/src/linux-3.7.1-1/include/trace/events/btrfs.h:61:1: error: incompatible types when assigning to type ‘u64’ from type ‘atomic64_t’ Btrfs-next is based on 3.7, you should be able to just git diff v3.7.. whatever.patch and get all the changes you need. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: add delayed_iput list head to btrfs inode
On 2/5/13 8:08 PM, Liu Bo wrote: On Tue, Feb 05, 2013 at 03:14:05PM -0800, Zach Brown wrote: + struct btrfs_inode *b_inode = BTRFS_I(inode); + struct btrfs_fs_info *fs_info = b_inode-root-fs_info; if (atomic_add_unless(inode-i_count, -1, 1)) return; - delayed = kmalloc(sizeof(*delayed), GFP_NOFS | __GFP_NOFAIL); - delayed-inode = inode; - spin_lock(fs_info-delayed_iput_lock); - list_add_tail(delayed-list, fs_info-delayed_iputs); + list_add_tail(b_inode-delayed_iput, fs_info-delayed_iputs); spin_unlock(fs_info-delayed_iput_lock); } Hmm. I'm not great with inode life cycles, but isn't this only safe if someone else can't get an i_count reference while this is in flight? It looks like the final iput does the unhashing, and so on, so couldn't an iget/iput race with this and try to add the inode's list_head twice? Yeah, same concern here. Basically this will result in inodes still being in use on unmount. Actually I did a similar one, here is some disscussion: https://patchwork.kernel.org/patch/1824711/ I read it, thanks. Did you try the counter approach? -Eric thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: add delayed_iput list head to btrfs inode
On Wed, Feb 06, 2013 at 09:53:05AM -0600, Eric Sandeen wrote: On 2/5/13 8:08 PM, Liu Bo wrote: On Tue, Feb 05, 2013 at 03:14:05PM -0800, Zach Brown wrote: + struct btrfs_inode *b_inode = BTRFS_I(inode); + struct btrfs_fs_info *fs_info = b_inode-root-fs_info; if (atomic_add_unless(inode-i_count, -1, 1)) return; - delayed = kmalloc(sizeof(*delayed), GFP_NOFS | __GFP_NOFAIL); - delayed-inode = inode; - spin_lock(fs_info-delayed_iput_lock); - list_add_tail(delayed-list, fs_info-delayed_iputs); + list_add_tail(b_inode-delayed_iput, fs_info-delayed_iputs); spin_unlock(fs_info-delayed_iput_lock); } Hmm. I'm not great with inode life cycles, but isn't this only safe if someone else can't get an i_count reference while this is in flight? It looks like the final iput does the unhashing, and so on, so couldn't an iget/iput race with this and try to add the inode's list_head twice? Yeah, same concern here. Basically this will result in inodes still being in use on unmount. Actually I did a similar one, here is some disscussion: https://patchwork.kernel.org/patch/1824711/ I read it, thanks. Did you try the counter approach? Yes, it'll bring a tradeoff situation. With counter, we need to lock the list all the time instead of doing a splice on the list and unlocking it. I think splice would be faster so I didn't go further(I MIGHT be wrong on this).. thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] Btrfs: Allow the compressed extent size limit to be modified.
Provide for modification of the limit of compressed extent size utilizing mount-time configuration settings. The size of compressed extents was limited to 128K, which leads to fragmentation of the extents (although the extents themselves may still be located contiguously). This limit is put in place to ease the RAM required when spreading compression across several CPUs, and to make sure the amount of IO required to do a random read is reasonably small. This patch is still preliminary. In this version of the patch, the allowed compressed extent size is restricted to 128 (the default) and 512. I wanted to extensively test a single value for a change in compressed extent size before expanding and testing a wider range of parameters. I submitted a similar patch about a year and a half ago where the change was hard-coded and not tuneable. http://comments.gmane.org/gmane.comp.file-systems.btrfs/10516 --- fs/btrfs/ctree.h | 3 +++ fs/btrfs/disk-io.c| 1 + fs/btrfs/inode.c | 8 fs/btrfs/relocation.c | 7 --- fs/btrfs/super.c | 19 ++- 5 files changed, 30 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 547b7b0..f37ec32 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1477,6 +1477,8 @@ struct btrfs_fs_info { unsigned data_chunk_allocations; unsigned metadata_ratio; + unsigned compressed_extent_size; + void *bdev_holder; /* private scrub information */ @@ -1829,6 +1831,7 @@ struct btrfs_ioctl_defrag_range_args { #define BTRFS_MOUNT_CHECK_INTEGRITY(1 20) #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1 21) #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR (1 22) +#define BTRFS_MOUNT_COMPR_EXTENT_SIZE (1 23) #define btrfs_clear_opt(o, opt)((o) = ~BTRFS_MOUNT_##opt) #define btrfs_set_opt(o, opt) ((o) |= BTRFS_MOUNT_##opt) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 830bc17..2d2be03 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2056,6 +2056,7 @@ int open_ctree(struct super_block *sb, fs_info-trans_no_join = 0; fs_info-free_chunk_space = 0; fs_info-tree_mod_log = RB_ROOT; + fs_info-compressed_extent_size = 128; /* readahead state */ INIT_RADIX_TREE(fs_info-reada_tree, GFP_NOFS ~__GFP_WAIT); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 148abeb..5b81b56 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -346,8 +346,8 @@ static noinline int compress_file_range(struct inode *inode, unsigned long nr_pages_ret = 0; unsigned long total_compressed = 0; unsigned long total_in = 0; - unsigned long max_compressed = 128 * 1024; - unsigned long max_uncompressed = 128 * 1024; + unsigned long max_compressed = root-fs_info-compressed_extent_size * 1024; + unsigned long max_uncompressed = root-fs_info-compressed_extent_size * 1024; int i; int will_compress; int compress_type = root-fs_info-compress_type; @@ -361,7 +361,7 @@ static noinline int compress_file_range(struct inode *inode, again: will_compress = 0; nr_pages = (end PAGE_CACHE_SHIFT) - (start PAGE_CACHE_SHIFT) + 1; - nr_pages = min(nr_pages, (128 * 1024UL) / PAGE_CACHE_SIZE); + nr_pages = min(nr_pages, (max_compressed * 1024UL) / PAGE_CACHE_SIZE); /* * we don't want to send crud past the end of i_size through @@ -386,7 +386,7 @@ again: * * We also want to make sure the amount of IO required to do * a random read is reasonably small, so we limit the size of -* a compressed extent to 128k. +* a compressed extent (default of 128k). */ total_compressed = min(total_compressed, max_uncompressed); num_bytes = (end - start + blocksize) ~(blocksize - 1); diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 300e09a..8d6f6bf 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -144,7 +144,7 @@ struct tree_block { unsigned int key_ready:1; }; -#define MAX_EXTENTS 128 +#define MAX_EXTENTS 512 struct file_extent_cluster { u64 start; @@ -3055,6 +3055,7 @@ int relocate_data_extent(struct inode *inode, struct btrfs_key *extent_key, struct file_extent_cluster *cluster) { int ret; + struct btrfs_fs_info *fs_info = BTRFS_I(inode)-root-fs_info; if (cluster-nr 0 extent_key-objectid != cluster-end + 1) { ret = relocate_file_extent_cluster(inode, cluster); @@ -3066,12 +3067,12 @@ int relocate_data_extent(struct inode *inode, struct btrfs_key *extent_key, if (!cluster-nr) cluster-start = extent_key-objectid; else - BUG_ON(cluster-nr = MAX_EXTENTS); + BUG_ON(cluster-nr = fs_info-compressed_extent_size); cluster-end = extent_key-objectid +
Re: btrfs-progs pull request for coverity fixes
On Tue, Feb 05, 2013 at 05:49:07PM -0700, Zach Brown wrote: Hi gang, Eric and I went through the warnings that a Coverity scan raised against the reasonably recent btrfs-progs that's in Fedora. We tried to tackle the lowest hanging fruit: these are the most obvious and least risky fixes. Awesome, thanks Zach and Eric. I've got this rebased on top of Dave's integration pull from today, and I'm doing a bunch of tests. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-progs pull request for coverity fixes
Awesome, thanks Zach and Eric. I've got this rebased on top of Dave's integration pull from today, and I'm doing a bunch of tests. Great, thanks for the pull. We're happy to help! - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Btrfs: Allow the compressed extent size limit to be modified.
+ unsigned compressed_extent_size; It kind of jumps out that this mentions neither that it's the max nor that it's in KB. How about max_compressed_extent_kb? + fs_info-compressed_extent_size = 128; I'd put a DEFAULT_MAX_EXTENTS up by the MAX_ definition instead of using a raw 128 here. + unsigned long max_compressed = root-fs_info-compressed_extent_size * 1024; + unsigned long max_uncompressed = root-fs_info-compressed_extent_size * 1024; (so max_compressed is in bytes) nr_pages = (end PAGE_CACHE_SHIFT) - (start PAGE_CACHE_SHIFT) + 1; - nr_pages = min(nr_pages, (128 * 1024UL) / PAGE_CACHE_SIZE); + nr_pages = min(nr_pages, (max_compressed * 1024UL) / PAGE_CACHE_SIZE); (and now that expression adds another * 1024, allowing {128,512}MB extents :)) * We also want to make sure the amount of IO required to do * a random read is reasonably small, so we limit the size of - * a compressed extent to 128k. + * a compressed extent (default of 128k). Just drop the value so that this comment doesn't need to be updated again. -* a compressed extent to 128k. +* a compressed extent. + {Opt_compr_extent_size, compressed_extent_size=%d}, It's even more important to make the exposed option self-documenting than it was to get the fs_info member right. + if ((intarg == 128) || (intarg == 512)) { + info-compressed_extent_size = intarg; + printk(KERN_INFO btrfs: compressed extent size %d\n, +info-compressed_extent_size); + } else { + printk(KERN_INFO btrfs: + Invalid compressed extent size, + using default.\n); I'd print the default value when it's used and would include a unit in both. + if (btrfs_test_opt(root, COMPR_EXTENT_SIZE)) + seq_printf(seq, ,compressed_extent_size=%d, +(unsigned long long)info-compressed_extent_size); The (ull) cast doesn't match the %d format and wouldn't be needed if it was printed with %u. - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: rework the overcommit logic to be based on the total size
People have been complaining about random ENOSPC errors that will clear up after a umount or just a given amount of time. Chris was able to reproduce this with stress.sh and lots of processes and so was I. Basically the overcommit stuff would really let us get out of hand, in my tests I saw up to 30 gigs of outstanding reservations with only 2 gigs total of metadata space. This usually worked out fine but with so much outstanding reservation the flushing stuff short circuits to make sure we don't hang forever flushing when we really need ENOSPC. Plus we allocate chunks in order to alleviate the pressure, but this doesn't actually help us since we only use the non-allocated area in our over commit logic. So instead of basing overcommit on the amount of non-allocated space, instead just do it based on how much total space we have, and then limit it to the non-allocated space in case we are short on space to spill over into. This allows us to have the same performance as well as no longer giving random ENOSPC. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/extent-tree.c | 15 --- 1 files changed, 12 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 11ffa16..981130b 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3677,6 +3677,7 @@ static int can_overcommit(struct btrfs_root *root, u64 rsv_size = 0; u64 avail; u64 used; + u64 to_add; used = space_info-bytes_used + space_info-bytes_reserved + space_info-bytes_pinned + space_info-bytes_readonly; @@ -3710,17 +3711,25 @@ static int can_overcommit(struct btrfs_root *root, BTRFS_BLOCK_GROUP_RAID10)) avail = 1; + to_add = space_info-total_bytes; + /* * If we aren't flushing all things, let us overcommit up to * 1/2th of the space. If we can flush, don't let us overcommit * too much, let it overcommit up to 1/8 of the space. */ if (flush == BTRFS_RESERVE_FLUSH_ALL) - avail = 3; + to_add = 3; else - avail = 1; + to_add = 1; + + /* +* Limit the overcommit to the amount of free space we could possibly +* allocate for chunks. +*/ + to_add = min(avail, to_add); - if (used + bytes space_info-total_bytes + avail) + if (used + bytes space_info-total_bytes + to_add) return 1; return 0; } -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: handle errors in compression submission path
I noticed we would deadlock if we aborted a transaction while doing compressed io. This is because we don't unlock our pages if something goes horribly wrong. To fix this we need to make sure that we call extent_clear_unlock_delalloc in order to unlock all the pages. If we have to cow in the async submission thread we need to make sure to unlock our locked_page as the cow error path will not unlock the locked page as it depends on the caller to unlock that page. With this patch we no longer deadlock on the page lock when we have an aborted transaction. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/inode.c | 38 -- 1 files changed, 28 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5398883..44d95d1 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -608,7 +608,7 @@ static noinline int submit_compressed_extents(struct inode *inode, if (list_empty(async_cow-extents)) return 0; - +again: while (!list_empty(async_cow-extents)) { async_extent = list_entry(async_cow-extents.next, struct async_extent, list); @@ -648,6 +648,8 @@ retry: async_extent-ram_size - 1, btrfs_get_extent, WB_SYNC_ALL); + else if (ret) + unlock_page(async_cow-locked_page); kfree(async_extent); cond_resched(); continue; @@ -672,6 +674,7 @@ retry: if (ret) { int i; + for (i = 0; i async_extent-nr_pages; i++) { WARN_ON(async_extent-pages[i]-mapping); page_cache_release(async_extent-pages[i]); @@ -679,12 +682,10 @@ retry: kfree(async_extent-pages); async_extent-nr_pages = 0; async_extent-pages = NULL; - unlock_extent(io_tree, async_extent-start, - async_extent-start + - async_extent-ram_size - 1); + if (ret == -ENOSPC) goto retry; - goto out_free; /* JDM: Requeue? */ + goto out_free; } /* @@ -696,7 +697,8 @@ retry: async_extent-ram_size - 1, 0); em = alloc_extent_map(); - BUG_ON(!em); /* -ENOMEM */ + if (!em) + goto out_free_reserve; em-start = async_extent-start; em-len = async_extent-ram_size; em-orig_start = em-start; @@ -728,6 +730,9 @@ retry: async_extent-ram_size - 1, 0); } + if (ret) + goto out_free_reserve; + ret = btrfs_add_ordered_extent_compress(inode, async_extent-start, ins.objectid, @@ -735,7 +740,8 @@ retry: ins.offset, BTRFS_ORDERED_COMPRESSED, async_extent-compress_type); - BUG_ON(ret); /* -ENOMEM */ + if (ret) + goto out_free_reserve; /* * clear dirty, set writeback and unlock the pages. @@ -756,18 +762,30 @@ retry: ins.objectid, ins.offset, async_extent-pages, async_extent-nr_pages); - - BUG_ON(ret); /* -ENOMEM */ alloc_hint = ins.objectid + ins.offset; kfree(async_extent); + if (ret) + goto out; cond_resched(); } ret = 0; out: return ret; +out_free_reserve: + btrfs_free_reserved_extent(root, ins.objectid, ins.offset); out_free: + extent_clear_unlock_delalloc(inode, BTRFS_I(inode)-io_tree, +async_extent-start, +async_extent-start + +async_extent-ram_size - 1, +NULL, EXTENT_CLEAR_UNLOCK_PAGE | +EXTENT_CLEAR_UNLOCK | +EXTENT_CLEAR_DELALLOC | +EXTENT_CLEAR_DIRTY | +EXTENT_SET_WRITEBACK | +
Re: [PATCH] Btrfs-progs print more informative error when we fail to open a device
On Mon, Feb 04, 2013 at 10:57:57AM -0500, Gene Czarcinski wrote: From: Eric Sandeen sand...@redhat.com print more informative error when we fail to open a device If open() fails, we should let the user know why it failed. Thanks, added to next integration queue. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs balance - hang/crash
Hi, my btrfs hangs when doing a balance operation. I'm using a 3.7.1 kernel from opensuse: linux-opzz 3.7.1-2.10-m4 #11 SMP PREEMPT Fri Jan 11 18:04:04 CET 2013 x86_64 x86_64 x86_64 GNU/Linux and Btrfs v0.19+ I did a scrub which completed without errors. Then I tried btrfs filesystem balance / which work fine for the first 23 of 46 chunks, then ist stopped processing further chunks, but contiued to consume large amounts of the CPU. The system logged filled quickly with messages like: [ 347.237658] btrfs: block rsv returned -28 [ 347.237661] [ cut here ] [ 347.237667] WARNING: at fs/btrfs/extent-tree.c:6297 btrfs_alloc_free_block+0x399/0x3a0() [ 347.237668] Hardware name: GA-MA74GM-S2H [ 347.237669] Modules linked in: fuse xt_pkttype af_packet ipt_REJECT xt_conntrack iptable_raw xt_CT iptable_filter nf _conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables cpufreq_c onservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf sp5100_tco usb_storage kvm_amd kvm snd_hda_codec_rea ltek k10temp edac_core edac_mce_amd sr_mod cdrom i2c_piix4 snd_hda_intel snd_hda_codec sg snd_hwdep snd_pcm snd_timer s nd_page_alloc floppy edd microcode autofs4 radeon ttm drm_kms_helper processor thermal_sys pata_atiixp [ 347.237694] Pid: 320, comm: btrfs-balance Not tainted 3.7.1-2.10-m4 #11 [ 347.237695] Call Trace: [ 347.237707] [810046c7] dump_trace+0x87/0x2f0 [ 347.237711] [81591682] dump_stack+0x69/0x6f [ 347.237716] [8103cb79] warn_slowpath_common+0x79/0xc0 [ 347.237720] [8122c889] btrfs_alloc_free_block+0x399/0x3a0 [ 347.237725] [81218617] __btrfs_cow_block+0x137/0x550 [ 347.237728] [81218baf] btrfs_cow_block+0xff/0x250 [ 347.237731] [8121d2f1] btrfs_search_slot+0x421/0x980 [ 347.237735] [8127fa0e] do_relocation+0x3be/0x510 [ 347.237740] [812839f3] relocate_tree_blocks+0x5e3/0x610 [ 347.237743] [812849a4] relocate_block_group+0x444/0x6c0 [ 347.237747] [81284dc9] btrfs_relocate_block_group+0x1a9/0x2d0 [ 347.237751] [8125de36] btrfs_relocate_chunk.isra.53+0x56/0x730 [ 347.237754] [812621fe] btrfs_balance+0x82e/0xd50 [ 347.237758] [812627a0] balance_kthread+0x80/0x90 [ 347.237762] [8105ec33] kthread+0xb3/0xc0 [ 347.237766] [815a43fc] ret_from_fork+0x7c/0xb0 [ 347.237770] ---[ end trace cf4d2bf19a87ec83 ]--- [ 347.239650] btrfs: block rsv returned -28 [ 347.239653] [ cut here ] [ 347.239658] WARNING: at fs/btrfs/extent-tree.c:6297 btrfs_alloc_free_block+0x399/0x3a0() [ 347.239660] Hardware name: GA-MA74GM-S2H [ 347.239661] Modules linked in: fuse xt_pkttype af_packet ipt_REJECT xt_conntrack iptable_raw xt_CT iptable_filter nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf sp5100_tco usb_storage kvm_amd kvm snd_hda_codec_realtek k10temp edac_core edac_mce_amd sr_mod cdrom i2c_piix4 snd_hda_intel snd_hda_codec sg snd_hwdep snd_pcm snd_timer snd_page_alloc floppy edd microcode autofs4 radeon ttm drm_kms_helper processor thermal_sys pata_atiixp [ 347.239683] Pid: 320, comm: btrfs-balance Tainted: GW 3.7.1-2.10-m4 #11 [ 347.239685] Call Trace: [ 347.239695] [810046c7] dump_trace+0x87/0x2f0 [ 347.239699] [81591682] dump_stack+0x69/0x6f [ 347.239704] [8103cb79] warn_slowpath_common+0x79/0xc0 [ 347.239708] [8122c889] btrfs_alloc_free_block+0x399/0x3a0 [ 347.239713] [81218617] __btrfs_cow_block+0x137/0x550 [ 347.239716] [81218baf] btrfs_cow_block+0xff/0x250 [ 347.239719] [8121d2f1] btrfs_search_slot+0x421/0x980 [ 347.239723] [8127fa0e] do_relocation+0x3be/0x510 [ 347.239728] [812839f3] relocate_tree_blocks+0x5e3/0x610 [ 347.239731] [812849a4] relocate_block_group+0x444/0x6c0 [ 347.239735] [81284dc9] btrfs_relocate_block_group+0x1a9/0x2d0 [ 347.239739] [8125de36] btrfs_relocate_chunk.isra.53+0x56/0x730 [ 347.239743] [812621fe] btrfs_balance+0x82e/0xd50 [ 347.239746] [812627a0] balance_kthread+0x80/0x90 [ 347.239750] [8105ec33] kthread+0xb3/0xc0 [ 347.239755] [815a43fc] ret_from_fork+0x7c/0xb0 [ 347.239758] ---[ end trace cf4d2bf19a87ec84 ]--- [ 352.352742] use_block_rsv: 146 callbacks suppressed [ 352.352750] btrfs: block rsv returned -28 [ 352.352753] [ cut here ] [ 352.352765] WARNING: at fs/btrfs/extent-tree.c:6297 btrfs_alloc_free_block+0x399/0x3a0() [ 352.352769] Hardware name: GA-MA74GM-S2H [ 352.352772] Modules linked in: fuse xt_pkttype af_packet ipt_REJECT xt_conntrack iptable_raw xt_CT iptable_filter nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables
Re: [PATCH] btrfs-progs: remove unused bit-radix.[ch] files
On Mon, Feb 04, 2013 at 11:26:23AM -0600, Eric Sandeen wrote: fd53de4d Drop bit-radix.[ch] files removed the files from the Makefile, but not the files themselves. Added to next integration. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance - hang/crash
On Wed, Feb 06, 2013 at 11:12:14PM +0100, Michael Schneider wrote: Hi, my btrfs hangs when doing a balance operation. I'm using a 3.7.1 kernel from opensuse: linux-opzz 3.7.1-2.10-m4 #11 SMP PREEMPT Fri Jan 11 18:04:04 CET 2013 x86_64 x86_64 x86_64 GNU/Linux and Btrfs v0.19+ I did a scrub which completed without errors. Then I tried btrfs filesystem balance / which work fine for the first 23 of 46 chunks, then ist stopped processing further chunks, but contiued to consume large amounts of the CPU. The system logged filled quickly with messages like: [ 347.237658] btrfs: block rsv returned -28 [snip] Any idea what's going on? You're running out of space. Can you post the output of: # btrfs fi df /mountpoint # btrfs fi show this will give us a bit more info about likely scenarios. Also, josef published a patch(*) in the last few hours, on this list, which may help deal with the issue. Hugo. (*) Btrfs: rework the overcommit logic to be based on the total size -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- What do you give the man who has everything? -- Penicillin is --- a good start... signature.asc Description: Digital signature
Re: btrfs balance - hang/crash
# btrfs fi show failed to read /dev/sr0 Label: 'bootssd' uuid: ac89584c-c757-488e-8a2a-5ee5a5484dde Total devices 1 FS bytes used 34.98GB devid1 size 56.00GB used 41.07GB path /dev/dm-1 Btrfs v0.19+ # btrfs fi df / Data: total=35.00GB, used=33.34GB System, DUP: total=32.00MB, used=16.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=3.00GB, used=1.63GB On Wed, Feb 6, 2013 at 11:21 PM, Hugo Mills h...@carfax.org.uk wrote: On Wed, Feb 06, 2013 at 11:12:14PM +0100, Michael Schneider wrote: Hi, my btrfs hangs when doing a balance operation. I'm using a 3.7.1 kernel from opensuse: linux-opzz 3.7.1-2.10-m4 #11 SMP PREEMPT Fri Jan 11 18:04:04 CET 2013 x86_64 x86_64 x86_64 GNU/Linux and Btrfs v0.19+ I did a scrub which completed without errors. Then I tried btrfs filesystem balance / which work fine for the first 23 of 46 chunks, then ist stopped processing further chunks, but contiued to consume large amounts of the CPU. The system logged filled quickly with messages like: [ 347.237658] btrfs: block rsv returned -28 [snip] Any idea what's going on? You're running out of space. Can you post the output of: # btrfs fi df /mountpoint # btrfs fi show this will give us a bit more info about likely scenarios. Also, josef published a patch(*) in the last few hours, on this list, which may help deal with the issue. Hugo. (*) Btrfs: rework the overcommit logic to be based on the total size -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- What do you give the man who has everything? -- Penicillin is --- a good start... -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Btrfs: Allow the compressed extent size limit to be modified.
On Wed, Feb 6, 2013 at 12:46 PM, Zach Brown z...@redhat.com wrote: + unsigned compressed_extent_size; It kind of jumps out that this mentions neither that it's the max nor that it's in KB. How about max_compressed_extent_kb? + fs_info-compressed_extent_size = 128; I'd put a DEFAULT_MAX_EXTENTS up by the MAX_ definition instead of using a raw 128 here. + unsigned long max_compressed = root-fs_info-compressed_extent_size * 1024; + unsigned long max_uncompressed = root-fs_info-compressed_extent_size * 1024; (so max_compressed is in bytes) nr_pages = (end PAGE_CACHE_SHIFT) - (start PAGE_CACHE_SHIFT) + 1; - nr_pages = min(nr_pages, (128 * 1024UL) / PAGE_CACHE_SIZE); + nr_pages = min(nr_pages, (max_compressed * 1024UL) / PAGE_CACHE_SIZE); (and now that expression adds another * 1024, allowing {128,512}MB extents :)) Yuk! I'm surprised this never manifested as a problem during testing. * We also want to make sure the amount of IO required to do * a random read is reasonably small, so we limit the size of - * a compressed extent to 128k. + * a compressed extent (default of 128k). Just drop the value so that this comment doesn't need to be updated again. -* a compressed extent to 128k. +* a compressed extent. + {Opt_compr_extent_size, compressed_extent_size=%d}, It's even more important to make the exposed option self-documenting than it was to get the fs_info member right. + if ((intarg == 128) || (intarg == 512)) { + info-compressed_extent_size = intarg; + printk(KERN_INFO btrfs: compressed extent size %d\n, +info-compressed_extent_size); + } else { + printk(KERN_INFO btrfs: + Invalid compressed extent size, + using default.\n); I'd print the default value when it's used and would include a unit in both. + if (btrfs_test_opt(root, COMPR_EXTENT_SIZE)) + seq_printf(seq, ,compressed_extent_size=%d, +(unsigned long long)info-compressed_extent_size); The (ull) cast doesn't match the %d format and wouldn't be needed if it was printed with %u. - z Thanks for the review. All these comments make sense, and I should be able to work them in. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2, RFC] btrfs-progs: overhaul mkfs.btrfs -r option
On Tue, Jan 29, 2013 at 02:41:08PM -0600, Eric Sandeen wrote: The manpage for the -r option simply says that it will copy the path specified to -r into the newly made filesystem. There's not a lot of reason to treat that option as differently as it is now - today it ignores discard, fs size, and mixed options, for example. It also failed to check whether the target device was mounted before proceeding. Etc... Rework things so that we really follow the same paths whether or not -r is specified, but with one special case for -r: * If the device does not exist, it will be created as a regular file of the minimum size to hold the -r path, or of size specified by the -b option. This also changes a little behavior; it does not pre-fill the new file with zeros, but allows it to be sparse, and does not truncate an existing device file. If you want to start with an empty file, just don't point it at an existing file... Fixes numerous bugs and I find the changes in behaviour sane and reasonable. A quick test failed for me when the image file does not exist: $ rm image $ ./mkfs.btrfs -r . image ... not enough free space add_file_items failed unable to traverse_directory Making image is aborted. ret = -1 mkfs.btrfs: mkfs.c:1533: main: Assertion `!(ret)' failed. seems that the estimated size is not sufficient. 'du' on the directory (without the image itself) gives me 59M, the image after failed mkfs has 102M, and it's empty when I try to mount it. Using the progs .git directory (18M) as sourcedir also failed, but it worked with man/ subdirectory (94K). david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Btrfs fixes
Hi Linus, Please pull my for-linus branch: git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus We've got corner cases for updating i_size that ceph was hitting, error handling for quotas when we run out of space, a very subtle snapshot deletion race, a crash while removing devices, and one deadlock between subvolume creation and the sb_internal code (thanks lockdep). Josef Bacik (3) commits (+12/-4): Btrfs: do not merge logged extents if we've removed them from the tree (+2/-1) Btrfs: fix possible stale data exposure (+1/-1) Btrfs: fix missing i_size update (+9/-2) Miao Xie (2) commits (+21/-9): Btrfs: fix missing release of the space/qgroup reservation in start_transaction() (+19/-8) Btrfs: fix wrong sync_writers decrement in btrfs_file_aio_write() (+2/-1) Jan Schmidt (1) commits (+10/-12): Btrfs: fix EDQUOT handling in btrfs_delalloc_reserve_metadata Liu Bo (1) commits (+38/-9): Btrfs: fix race between snapshot deletion and getting inode Chris Mason (1) commits (+4/-1): Btrfs: move d_instantiate outside the transaction during mksubvol Eric Sandeen (1) commits (+2/-1): btrfs: don't try to notify udev about missing devices Total: (9) commits fs/btrfs/extent-tree.c | 22 ++ fs/btrfs/extent_map.c | 3 ++- fs/btrfs/file.c | 25 - fs/btrfs/ioctl.c| 5 - fs/btrfs/ordered-data.c | 13 ++--- fs/btrfs/scrub.c| 25 - fs/btrfs/transaction.c | 27 +++ fs/btrfs/volumes.c | 3 ++- 8 files changed, 87 insertions(+), 36 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Btrfs fixes
[ sorry, my lbdb seems to really like linux-ker...@vger.kerrnel.org, fixed for real this time ] Hi Linus, Please pull my for-linus branch: git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus We've got corner cases for updating i_size that ceph was hitting, error handling for quotas when we run out of space, a very subtle snapshot deletion race, a crash while removing devices, and one deadlock between subvolume creation and the sb_internal code (thanks lockdep). Josef Bacik (3) commits (+12/-4): Btrfs: do not merge logged extents if we've removed them from the tree (+2/-1) Btrfs: fix possible stale data exposure (+1/-1) Btrfs: fix missing i_size update (+9/-2) Miao Xie (2) commits (+21/-9): Btrfs: fix missing release of the space/qgroup reservation in start_transaction() (+19/-8) Btrfs: fix wrong sync_writers decrement in btrfs_file_aio_write() (+2/-1) Jan Schmidt (1) commits (+10/-12): Btrfs: fix EDQUOT handling in btrfs_delalloc_reserve_metadata Liu Bo (1) commits (+38/-9): Btrfs: fix race between snapshot deletion and getting inode Chris Mason (1) commits (+4/-1): Btrfs: move d_instantiate outside the transaction during mksubvol Eric Sandeen (1) commits (+2/-1): btrfs: don't try to notify udev about missing devices Total: (9) commits fs/btrfs/extent-tree.c | 22 ++ fs/btrfs/extent_map.c | 3 ++- fs/btrfs/file.c | 25 - fs/btrfs/ioctl.c| 5 - fs/btrfs/ordered-data.c | 13 ++--- fs/btrfs/scrub.c| 25 - fs/btrfs/transaction.c | 27 +++ fs/btrfs/volumes.c | 3 ++- 8 files changed, 87 insertions(+), 36 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Leaking btrfs_qgroup_inherit on snapshot creation?
On wed, 06 Feb 2013 13:14:23 +0100, Arne Jansen wrote: Hi Alex, On 02/06/13 12:18, Alex Lyakas wrote: Hi Jan, Arne, I see this code in create_snapshot: if (inherit) { pending_snapshot-inherit = *inherit; *inherit = NULL;/* take responsibility to free it */ } So, first thing I think it should be: if (*inherit) because in btrfs_ioctl_snap_create_v2() we have: struct btrfs_qgroup_inherit *inherit = NULL; ... btrfs_ioctl_snap_create_transid(..., inherit) so the current check is very unlikely to be NULL. But in btrfs_ioctl_snap_create it is called with NULL, so *inherit would dereference a NULL pointer. Second, I don't see anybody freeing pending_snapshot-inherit. I guess it should be freed after callin btrfs_qgroup_inherit() and also in btrfs_destroy_pending_snapshots(). You're right. In our original version (6f72c7e20dbaea5) it was still there, in transaction.c. It has been removed in 6fa9700e734: commit 6fa9700e734275de2acbcb0e99414bd7ddfc60f1 Author: Miao Xie mi...@cn.fujitsu.com Date: Thu Sep 6 04:00:32 2012 -0600 Btrfs: fix error path in create_pending_snapshot() This patch fixes the following problem: - If we failed to deal with the delayed dir items, we should abort transaction, just as its comment said. Fix it. - If root reference or root back reference insertion failed, we should abort transaction. Fix it. - Fix the double free problem of pending-inherit. - Do not restore the trans-rsv if we doesn't change it. - make the error path more clearly. Signed-off-by: Miao Xie mi...@cn.fujitsu.com Miao, can you please explain where you see a double free? Sorry, I misread the code,I didn't notice that the pointer had been assigned to NULL. But I think we can make the code more readable and be easy to maintain, we can free the memory in the caller(btrfs_ioctl_snap_create_v2()) since we are sure the snapshot creation is done after btrfs_ioctl_snap_create_transid() completes. Thanks Miao -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Btrfs: fix the race between bio and btrfs_stop_workers
open_ctree() need read the metadata to initialize the global information of btrfs. But it may fail after it submit some bio, and then it will jump to the error path. Unfortunately, it doesn't check if there are some bios in flight, and just stop all the worker threads. As a result, when the submitted bios end, they can not find any worker thread which can deal with subsequent work, then oops happen. kernel BUG at fs/btrfs/async-thread.c:605! Fix this problem by invoking invalidate_inode_pages2() before we stop the worker threads. This function will wait until the bio end because it need lock the pages which are going to be invalidated, and if a page is under disk read IO, it must be locked. invalidate_inode_pages2() need wait until end bio handler to unlocked it. Reported-and-Tested-by: Tsutomu Itoh t-i...@jp.fujitsu.com Signed-off-by: Eric Sandeen sand...@redhat.com Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/disk-io.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 0c31d07..d8fd711 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2728,13 +2728,13 @@ fail_cleaner: * kthreads */ filemap_write_and_wait(fs_info-btree_inode-i_mapping); - invalidate_inode_pages2(fs_info-btree_inode-i_mapping); fail_block_groups: btrfs_free_block_groups(fs_info); fail_tree_roots: free_root_pointers(fs_info, 1); + invalidate_inode_pages2(fs_info-btree_inode-i_mapping); fail_sb_buffer: btrfs_stop_workers(fs_info-generic_worker); @@ -2755,7 +2755,6 @@ fail_alloc: fail_iput: btrfs_mapping_tree_free(fs_info-mapping_tree); - invalidate_inode_pages2(fs_info-btree_inode-i_mapping); iput(fs_info-btree_inode); fail_bdi: bdi_destroy(fs_info-bdi); -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Btrfs: fix memory leak of pending_snapshot-inherit
The argument inherit of btrfs_ioctl_snap_create_transid() was assigned to NULL during we created the snapshots, so we didn't free it though we called kfree() in the caller. But since we are sure the snapshot creation is done after the function - btrfs_ioctl_snap_create_transid() - completes, it is safe that we don't assign the pointer inherit to NULL, and just free it in the caller of btrfs_ioctl_snap_create_transid(). In this way, the code can become more readable. Reported-by: Alex Lyakas alex.bt...@zadarastorage.com Cc: Arne Jansen sensi...@gmx.net Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/ioctl.c | 18 +++--- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 02d3035..40f2fbf 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -367,7 +367,7 @@ static noinline int create_subvol(struct btrfs_root *root, struct dentry *dentry, char *name, int namelen, u64 *async_transid, - struct btrfs_qgroup_inherit **inherit) + struct btrfs_qgroup_inherit *inherit) { struct btrfs_trans_handle *trans; struct btrfs_key key; @@ -401,8 +401,7 @@ static noinline int create_subvol(struct btrfs_root *root, if (IS_ERR(trans)) return PTR_ERR(trans); - ret = btrfs_qgroup_inherit(trans, root-fs_info, 0, objectid, - inherit ? *inherit : NULL); + ret = btrfs_qgroup_inherit(trans, root-fs_info, 0, objectid, inherit); if (ret) goto fail; @@ -530,7 +529,7 @@ fail: static int create_snapshot(struct btrfs_root *root, struct dentry *dentry, char *name, int namelen, u64 *async_transid, - bool readonly, struct btrfs_qgroup_inherit **inherit) + bool readonly, struct btrfs_qgroup_inherit *inherit) { struct inode *inode; struct btrfs_pending_snapshot *pending_snapshot; @@ -549,10 +548,7 @@ static int create_snapshot(struct btrfs_root *root, struct dentry *dentry, pending_snapshot-dentry = dentry; pending_snapshot-root = root; pending_snapshot-readonly = readonly; - if (inherit) { - pending_snapshot-inherit = *inherit; - *inherit = NULL;/* take responsibility to free it */ - } + pending_snapshot-inherit = inherit; trans = btrfs_start_transaction(root-fs_info-extent_root, 6); if (IS_ERR(trans)) { @@ -692,7 +688,7 @@ static noinline int btrfs_mksubvol(struct path *parent, char *name, int namelen, struct btrfs_root *snap_src, u64 *async_transid, bool readonly, - struct btrfs_qgroup_inherit **inherit) + struct btrfs_qgroup_inherit *inherit) { struct inode *dir = parent-dentry-d_inode; struct dentry *dentry; @@ -1454,7 +1450,7 @@ out: static noinline int btrfs_ioctl_snap_create_transid(struct file *file, char *name, unsigned long fd, int subvol, u64 *transid, bool readonly, - struct btrfs_qgroup_inherit **inherit) + struct btrfs_qgroup_inherit *inherit) { int namelen; int ret = 0; @@ -1563,7 +1559,7 @@ static noinline int btrfs_ioctl_snap_create_v2(struct file *file, ret = btrfs_ioctl_snap_create_transid(file, vol_args-name, vol_args-fd, subvol, ptr, - readonly, inherit); + readonly, inherit); if (ret == 0 ptr copy_to_user(arg + -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html