Re: [RFC PATCH 0/5 v3] Btrfs: Add readonly support to replace BUG_ON phrase
Hi, chris, Is there any comment on this Forced readonly mounts on errors patchset? thanks, Liu Bo On 12/03/2010 04:15 PM, liubo wrote: Btrfs has a number of BUG_ON()s, which may lead btrfs to unpleasant panic. Meanwhile, they are very ugly and should be handled more propriately. There are mainly two ways to deal with these BUG_ON()s. 1. For those errors which can be handled well by callers, we just return their error number to callers. 2. For others, We can force the filesystem readonly when it hits errors, which is what this patchset has done. Replaced BUG_ON() with the interface provided in this patchset, we will get error infomation via dmesg. Since btrfs is now readonly, we can save our data safely and umount it, then a btrfsck is recommended. By these ways, we can protect our filesystem from panic caused by those BUG_ONs. We still need a incompat flag to make old kernels happy. This patchset needs more test. v2-v3: - since btrfs may do log replay after crash, even it is mounted as readonly, and we have add a readonly check at start transaction time, it needs to set and to restore readonly flags around log replay. v1-v2: - in order to avoid deadlock thing, move write super stuff from error handle path to unmount time. - remove BTRFS_SUPER_FLAG_VALID, just use BTRFS_SUPER_FLAG_ERROR to make it simple. - add MS_RDONLY check at start of a transaction instead of commit transaction. --- fs/btrfs/ctree.h | 19 ++ fs/btrfs/disk-io.c | 56 +- fs/btrfs/super.c | 88 fs/btrfs/transaction.c |3 ++ 4 files changed, 164 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/5 v3] Btrfs: avoid transaction stuff when btrfs is readonly
On Fri, Dec 3, 2010 at 4:16 PM, liubo liubo2...@cn.fujitsu.com wrote: When the filesystem is readonly, avoid transaction stuff by checking MS_RDONLY at start transaction time. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/transaction.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 1fffbc0..14a597d 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -181,6 +181,9 @@ static struct btrfs_trans_handle *start_transaction(struct btrfs_root *root, struct btrfs_trans_handle *h; struct btrfs_transaction *cur_trans; int ret; + + if (root-fs_info-sb-s_flags MS_RDONLY) + return ERR_PTR(-EROFS); again: h = kmem_cache_alloc(btrfs_trans_handle_cachep, GFP_NOFS); if (!h) There are cases that we need to start transaction when MS_RDONLY flag is set. For example, remount FS into read-only mode and log replay. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/5 v3] Btrfs: avoid transaction stuff when btrfs is readonly
On 12/15/2010 04:45 PM, Yan, Zheng wrote: On Fri, Dec 3, 2010 at 4:16 PM, liubo liubo2...@cn.fujitsu.com wrote: When the filesystem is readonly, avoid transaction stuff by checking MS_RDONLY at start transaction time. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/transaction.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 1fffbc0..14a597d 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -181,6 +181,9 @@ static struct btrfs_trans_handle *start_transaction(struct btrfs_root *root, struct btrfs_trans_handle *h; struct btrfs_transaction *cur_trans; int ret; + + if (root-fs_info-sb-s_flags MS_RDONLY) + return ERR_PTR(-EROFS); again: h = kmem_cache_alloc(btrfs_trans_handle_cachep, GFP_NOFS); if (!h) There are cases that we need to start transaction when MS_RDONLY flag is set. For example, remount FS into read-only mode and log replay. However, is it weird to make changes to disk as fs is in readonly state? IMO, btrfs needs to limit the use of these disk-change while readonly cases, as it is not what readonly means. Since it has been here, we can bypass readonly in those cases(as I did in the 5th patch): ... flags = sb-s_flags; if (sb-s_flags MS_RDONLY) sb-s_flags = ~MS_RDONLY remount() sb-s_flags = flags; ... thanks, Liu Bo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fsck, parent transid verify failed
Excerpts from Tommy Jonsson's message of 2010-12-08 15:07:58 -0500: Build the latest tools, then: btrfsck -s 1 /dev/xxx btrfsck -s 2 /dev/xxx If either of these work we have an easy way to get it mounted. Just let me know. -chris $ btrfsck -s 1 /dev/sda using SB copy 1, bytenr 67108864 parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. Aborted $ btrfsck -s 2 /dev/sda using SB copy 2, bytenr 274877906944 parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. Aborted Tried btrfs-debug-tree /dev/sda and btrfs-debug-tree -e /dev/sda : parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfs-debug-tree: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. dmesg said: [268375.903581] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 2 transid 39650 /dev/sdd [268375.904241] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 1 transid 39651 /dev/sdc [268375.904526] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3 transid 39651 /dev/sda Sorry to be a bother, but do you have any other suggestions ? Not a bother at all, I'm polishing off a version of fsck that I hope will be able to construct a good tree for you. It's my main priority right now and I hope to have something ready early Monday. -chris Hi Chris. Thanks for all your help. Any progress on the fsck ? I pulled the latest btrfs-progs-unstable and recompiled, same output from all the commands (btrfsck -s / btrfs-debug-tree). -tommy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How to invoke the discard operation over btrfs
Hi all, Is there anybody know how to invoke the discard operation over btrfs filesystem? Does delete a file by rm will trigger the discard operation? If I want to trigger discard operation,how should i do? if there any pre-requires? such as the device must be ssd, btrfs should be mounted with -o discard. Any suggestion is appreciated. Thanks a lot! -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/5 v3] Btrfs: avoid transaction stuff when btrfs is readonly
Excerpts from liubo's message of 2010-12-15 04:12:14 -0500: On 12/15/2010 04:45 PM, Yan, Zheng wrote: On Fri, Dec 3, 2010 at 4:16 PM, liubo liubo2...@cn.fujitsu.com wrote: When the filesystem is readonly, avoid transaction stuff by checking MS_RDONLY at start transaction time. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/transaction.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 1fffbc0..14a597d 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -181,6 +181,9 @@ static struct btrfs_trans_handle *start_transaction(struct btrfs_root *root, struct btrfs_trans_handle *h; struct btrfs_transaction *cur_trans; int ret; + + if (root-fs_info-sb-s_flags MS_RDONLY) + return ERR_PTR(-EROFS); again: h = kmem_cache_alloc(btrfs_trans_handle_cachep, GFP_NOFS); if (!h) There are cases that we need to start transaction when MS_RDONLY flag is set. For example, remount FS into read-only mode and log replay. However, is it weird to make changes to disk as fs is in readonly state? IMO, btrfs needs to limit the use of these disk-change while readonly cases, as it is not what readonly means. reiserfs and ext3 at least have always done this. Log replay is required even when the FS is readonly. Since it has been here, we can bypass readonly in those cases(as I did in the 5th patch): ... flags = sb-s_flags; if (sb-s_flags MS_RDONLY) sb-s_flags = ~MS_RDONLY I think we should have a dedicated flag to reflect a filesystem that is forced readonly, and check that flag instead. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
What do the arguments of btrfs filesystem defragment do?
Hi, I would like to know that the arguments of btrfs filesystem defragment do. According to the built-in help page, the invocation is as follows: btrfs filesystem defragment [-vcf] [-s start] [-l len] [-t size] file|dir [file|dir...] Defragment a file or a directory. Unfortunately I can't find any documentation on the meaning of these arguments. The man page doesn't list these arguments: filesystem defragment file|dir [file|dir...] Defragment files and/or directories. Also the online documentation [1] is identical to the man page. [1] https://btrfs.wiki.kernel.org/index.php/Btrfs(command) On a somewhat related note I did find another page on the wiki [2] that explains a bit on the defragmenting subject, more specifically it mentions this rather important caveat: Caveat: Defragmenting a file which has a COW copy (either a snapshot copy or one made with bcp or cp --reflinks) will produce two unrelated files. If you defragment a subvolume that has a snapshot, you will roughly double the disk usage, as the snapshot files are no longer COW images of the originals. [2] https://btrfs.wiki.kernel.org/index.php/Problem_FAQ From what I've heard on IRC this is still the case in current versions, but the Btrfs(command) documentation contains no mention of this. I hope someone can shed some light on these subjects. Kind regards, Erik. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/5 v3] Btrfs: avoid transaction stuff when btrfs is readonly
On Wed, Dec 15, 2010 at 11:03:46AM -0500, Chris Mason wrote: Excerpts from liubo's message of 2010-12-15 04:12:14 -0500: On 12/15/2010 04:45 PM, Yan, Zheng wrote: On Fri, Dec 3, 2010 at 4:16 PM, liubo liubo2...@cn.fujitsu.com wrote: When the filesystem is readonly, avoid transaction stuff by checking MS_RDONLY at start transaction time. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/transaction.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 1fffbc0..14a597d 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -181,6 +181,9 @@ static struct btrfs_trans_handle *start_transaction(struct btrfs_root *root, struct btrfs_trans_handle *h; struct btrfs_transaction *cur_trans; int ret; + + if (root-fs_info-sb-s_flags MS_RDONLY) + return ERR_PTR(-EROFS); again: h = kmem_cache_alloc(btrfs_trans_handle_cachep, GFP_NOFS); if (!h) There are cases that we need to start transaction when MS_RDONLY flag is set. For example, remount FS into read-only mode and log replay. However, is it weird to make changes to disk as fs is in readonly state? IMO, btrfs needs to limit the use of these disk-change while readonly cases, as it is not what readonly means. reiserfs and ext3 at least have always done this. Log replay is required even when the FS is readonly. Just make sure the underlying disk isn't read only, we had problems with this on ext3 a bit ago. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in the design of the tree search ioctl API ? [was Re: [PATCH 1/3] Btrfs: Really return keys within specified range]
Excerpts from Li Zefan's message of 2010-12-14 22:33:33 -0500: Suppose to have the following sequence keys [objectid, type, offset]: [...] 1)[300, BTRFS_EXTENT_DATA_KEY, xx] 2)[300, BTRFS_INODE_ITEM_KEY, xx] 3)[300, BTRFS_XATTR_ITEM_KEY, xx] 4)[301, BTRFS_EXTENT_DATA_KEY, xx] 5)[301, BTRFS_INODE_ITEM_KEY, xx] 7)[30200, BTRFS_EXTENT_DATA_KEY, xx] 8)[30200, BTRFS_INODE_ITEM_KEY, xx] 9)[30200, BTRFS_XATTR_ITEM_KEY, xx] [...] Suppose that the buffer is filled between the item 2 and 3. We should restart the search, but how set the min_* key ? Try the following hypothesis h1) objectid++, type = 0 - In the next search the key 3 would be skipped h2) objectid asis, type ++, - in the next search the key 4 would be skipped h3) objectid asis, type = 0 - in the next search the key 1,2,3 would be h4) objectid asis, type asis, offset++ - we should get the correct result. This is the right answer ;). The problem is that even though our key has 3 distinct parts, and the API makes it look like you have very fine grained control over those three parts, you have to remember to reset them as you iterate between objectids. It isn't a obvious as it should be. The current API is a very raw export of how we do the searches in the kernel too. You can do pretty much anything with it, but we pay with complexity. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs hang (deadlock?) when trying to create a ext4 filesystem in KVM guest
On Thu, Oct 28, 2010 at 1:29 AM, Tomasz Chmielewski man...@wpkg.org wrote: On 28.10.2010 00:55, Chris Mason wrote: On Wed, Oct 27, 2010 at 03:29:38PM +0200, Tomasz Chmielewski wrote: There are a couple of problems when running KVM guests with images stored on btrfs filesystem. One of them is inability to create a filesystem (i.e. ext4) in the guest: - btrfs filesystem on the host was mounted with noatime,compress-force - guest was using a 50 GB sparse file, - attempt to create a ext4 filesystem within the guest does not succeed (hangs); host prints below messages in dmesg - some deadlock in btrfs? kernel: 2.6.36 qemu-kvm: 0.13.0 Is this the full dmesg output? I think there are other messages hiding in there. There were indeed bad ordered accounting left (see below), I think they are coming from btrfs? Is this a single disk btrfs? Yes. [ 8072.773053] device fsid 1142843480ad2d13-4bdc742fd9b1f7b0 devid 1 transid 1508 /dev/sdb4 [ 8072.773674] btrfs: forcing compression [ 8122.052221] device tap0 entered promiscuous mode [ 8122.052245] br0: port 2(tap0) entering learning state [ 8122.052248] br0: port 2(tap0) entering learning state [ 8122.451587] br0: port 2(tap0) entering learning state [ 8122.543477] br0: port 2(tap0) entering disabled state [ 8122.609645] device tap0 left promiscuous mode [ 8122.609650] br0: port 2(tap0) entering disabled state [ 8131.325647] EXT4-fs (md4): recovery complete [ 8131.325809] EXT4-fs (md4): mounted filesystem with ordered data mode. Opts: (null) [ 8133.392100] device tap0 entered promiscuous mode [ 8133.392127] br0: port 2(tap0) entering learning state [ 8133.392131] br0: port 2(tap0) entering learning state [ 8134.106594] kvm: 5004: cpu0 unhandled wrmsr: 0x198 data 0 [ 8134.106618] kvm: 5004: cpu1 unhandled wrmsr: 0x198 data 0 [ 8143.460927] tap0: no IPv6 routers present [ 8148.359485] br0: port 2(tap0) entering forwarding state [ 8309.103502] bad ordered accounting left 65536 size 385024 [ 8309.106206] bad ordered accounting left 65536 size 385024 [ 8309.108915] bad ordered accounting left 65536 size 385024 [ 8309.111630] bad ordered accounting left 36864 size 385024 [ 8501.965625] INFO: task qemu-system-x86:5148 blocked for more than 120 seconds. [ 8501.965629] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 8501.965632] qemu-system-x D 880001e14cc0 0 5148 4924 0x [ 8501.965638] 880223bc3b38 0086 880223bc3fd8 00014cc0 [ 8501.965642] 00014cc0 880223bc3fd8 00014cc0 880223bc3fd8 [ 8501.965647] 00014cc0 880221965f18 880221965f20 880221965b80 [ 8501.965651] Call Trace: [ 8501.965678] [a024c52c] btrfs_start_ordered_extent+0x6c/0xb0 [btrfs] [ 8501.965685] [81083200] ? autoremove_wake_function+0x0/0x40 [ 8501.965701] [a024d1c2] btrfs_wait_ordered_range+0xd2/0x160 [btrfs] [ 8501.965716] [a0240059] btrfs_file_aio_write+0x269/0x990 [btrfs] [ 8501.965721] [8105ca94] ? try_to_wake_up+0xf4/0x3f0 [ 8501.965726] [81168119] ? __pollwake+0x49/0x50 [ 8501.965730] [8105cd90] ? default_wake_function+0x0/0x20 [ 8501.965733] [8105ca94] ? try_to_wake_up+0xf4/0x3f0 [ 8501.965737] [8116813b] ? pollwake+0x1b/0x20 [ 8501.965752] [a023fdf0] ? btrfs_file_aio_write+0x0/0x990 [btrfs] [ 8501.965761] [8115664b] do_sync_readv_writev+0xcb/0x110 [ 8501.965769] [81294d98] ? apparmor_file_permission+0x18/0x20 [ 8501.965776] [8126356e] ? security_file_permission+0x1e/0x80 [ 8501.965781] [811576e0] do_readv_writev+0xd0/0x1d0 [ 8501.965787] [81076d72] ? kill_something_info+0x42/0x130 [ 8501.965793] [81076ee0] ? sys_kill+0x80/0x90 [ 8501.965798] [8115781e] vfs_writev+0x3e/0x60 [ 8501.965802] [811578e7] sys_pwritev+0xa7/0xc0 [ 8501.965806] [8100b0f2] system_call_fastpath+0x16/0x1b [ 8501.965810] INFO: task qemu-system-x86:5150 blocked for more than 120 seconds. [ 8501.965812] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 8501.965814] qemu-system-x D 880001e94cc0 0 5150 4924 0x [ 8501.965819] 8801aba4bb38 0086 8801aba4bfd8 00014cc0 [ 8501.965823] 00014cc0 8801aba4bfd8 00014cc0 8801aba4bfd8 [ 8501.965827] 00014cc0 880226d14838 880226d14840 880226d144a0 [ 8501.965832] Call Trace: [ 8501.965847] [a024c52c] btrfs_start_ordered_extent+0x6c/0xb0 [btrfs] [ 8501.965852] [81083200] ? autoremove_wake_function+0x0/0x40 [ 8501.965867] [a024d1c2] btrfs_wait_ordered_range+0xd2/0x160 [btrfs] [ 8501.965883] [a0240059] btrfs_file_aio_write+0x269/0x990 [btrfs] [ 8501.965887] [8105ca94] ? try_to_wake_up+0xf4/0x3f0 [ 8501.965891] [81168119] ?
Re: Bug in the design of the tree search ioctl API ? [was Re: [PATCH 1/3] Btrfs: Really return keys within specified range]
On Wednesday, 15 December, 2010, Chris Mason wrote: Excerpts from Li Zefan's message of 2010-12-14 22:33:33 -0500: Suppose to have the following sequence keys [objectid, type, offset]: [...] 1)[300, BTRFS_EXTENT_DATA_KEY, xx] 2)[300, BTRFS_INODE_ITEM_KEY, xx] 3)[300, BTRFS_XATTR_ITEM_KEY, xx] 4)[301, BTRFS_EXTENT_DATA_KEY, xx] 5)[301, BTRFS_INODE_ITEM_KEY, xx] 7)[30200, BTRFS_EXTENT_DATA_KEY, xx] 8)[30200, BTRFS_INODE_ITEM_KEY, xx] 9)[30200, BTRFS_XATTR_ITEM_KEY, xx] [...] Suppose that the buffer is filled between the item 2 and 3. We should restart the search, but how set the min_* key ? Try the following hypothesis h1) objectid++, type = 0 - In the next search the key 3 would be skipped h2) objectid asis, type ++, - in the next search the key 4 would be skipped h3) objectid asis, type = 0 - in the next search the key 1,2,3 would be h4) objectid asis, type asis, offset++ - we should get the correct result. This is the right answer ;). The problem is that even though our key has 3 distinct parts, and the API makes it look like you have very fine grained control over those three parts, you have to remember to reset them as you iterate between objectids. It isn't a obvious as it should be. The current API is a very raw export of how we do the searches in the kernel too. You can do pretty much anything with it, but we pay with complexity. Hi Chris, I am a bit confused about your answer. The actual API is a bit confused (or almost not obvious). An application in order to work properly has to make some adjustment to the min_* fields AND filter the results (because if we tweak with the min_* field, not useful data is returned). Moreover this means that we move between user-space-kernel- space a lot of unused data (un-necessary context switch). On the basis of your answer, it seems that this is ok (please don't consider only the case of listing the subvolumes which is a very simple cases). And nothing have to do. Instead I suggest to (see my email [PATCH] BTRFS_IOC_TREE_SEARCH: store and use the last key found): - leave the min_* and max_* fields to act only as filter - add three more fields start_* as start point for the search - make some small modification to the kernel code to track in the start_* fields the last key found pro: - we pass to userspace only useful data - we simplify a lot the userspace application, because they don't have to update the min_* fields (they will work in a obvious way :-) ) - we can use the ordered property of a btree structure to perform efficient data lookup (if we reset to zero the min_* fields we lookup un-necessary data) - we have the same functionality of the old API cons: - we need a modification ( :-) ) May be that I missed some point, but I don't see any advantage to continue to support the actual API. Of course, that doesn't means that we can remove the old API ignoring the backward compatibility. But I think that there are sufficient pros to develop a new API Please be patient: my english is very bad; I am not trying to blame anybody; I want only a perfect fs (TM) :-) -chris -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) kreij...@inwind.it Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in the design of the tree search ioctl API ? [was Re: [PATCH 1/3] Btrfs: Really return keys within specified range]
On Wednesday, 15 December, 2010, Li Zefan wrote: Goffredo Baroncelli wrote: On Wednesday, 15 December, 2010, Li Zefan wrote: h4) objectid asis, type asis, offset++ - we should get the correct result. This fix the problem of the missing subvolume. But for the other case (searching for more than one type) the problem still here. I don't think so. And the above h4 has showed how we search for more than one type. The generic userland code for next search is: /* this is in essence the same as how we advance key in kernel code */ if (sk-min_offset (u64)-1 sk-min_offset sk-max_offset) sk-min_offset++; else if (sk-min_type (u8)-1 sk-min_type sk-max_type) { sk-min_offset = 0; sk-min_type++; } else if (sk-min_objectid (u64)-1 sk-min_objectid sk- max_objectid){ sk-min_offset = 0; sk-min_type = 0; Sorry but if you reset the sk-min_type to 0, this means that the min_type lost its purpose (act as lover bound of the acceptance criteria). sk-min_objectid++; } else break; ioctl(...); for (i = 0; i nr_items; i++) { if (!filter(items[i])) continue; So you are suggesting: Move all tree items from kernel to user space and filter it in userspace ?. This mean a lot of un-needed kernel-space - userspace transition... Sorry I don't understand if we are talking about a workaround or a solution. /* process this item */ ... } because the current ioctl uses min_{x,y,z} and max_{x,y,z} as start_key and end_key, and it returns all keys that falls in [start_key, end_key]. So this btrfs-progs patch should fix missing subvolumes in the output of subvolume list: diff --git a/btrfs-list.c b/btrfs-list.c index 93766a8..1b9ea45 100644 --- a/btrfs-list.c +++ b/btrfs-list.c @@ -620,7 +620,10 @@ int list_subvols(int fd) /* this iteration is done, step forward one root for the next * ioctl */ - if (sk-min_objectid (u64)-1) { + if (sk-min_type BTRFS_ROOT_BACKREF_KEY) { + sk-min_type = BTRFS_ROOT_BACKREF_KEY; + sk-min_offset = 0; + } else if (sk-min_objectid (u64)-1) { sk-min_objectid++; sk-min_type = BTRFS_ROOT_BACKREF_KEY; sk-min_offset = 0; -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) kreij...@inwind.it Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in the design of the tree search ioctl API ? [was Re: [PATCH 1/3] Btrfs: Really return keys within specified range]
Excerpts from Goffredo Baroncelli's message of 2010-12-15 13:42:23 -0500: On Wednesday, 15 December, 2010, Chris Mason wrote: Excerpts from Li Zefan's message of 2010-12-14 22:33:33 -0500: Suppose to have the following sequence keys [objectid, type, offset]: [...] 1)[300, BTRFS_EXTENT_DATA_KEY, xx] 2)[300, BTRFS_INODE_ITEM_KEY, xx] 3)[300, BTRFS_XATTR_ITEM_KEY, xx] 4)[301, BTRFS_EXTENT_DATA_KEY, xx] 5)[301, BTRFS_INODE_ITEM_KEY, xx] 7)[30200, BTRFS_EXTENT_DATA_KEY, xx] 8)[30200, BTRFS_INODE_ITEM_KEY, xx] 9)[30200, BTRFS_XATTR_ITEM_KEY, xx] [...] Suppose that the buffer is filled between the item 2 and 3. We should restart the search, but how set the min_* key ? Try the following hypothesis h1) objectid++, type = 0 - In the next search the key 3 would be skipped h2) objectid asis, type ++, - in the next search the key 4 would be skipped h3) objectid asis, type = 0 - in the next search the key 1,2,3 would be h4) objectid asis, type asis, offset++ - we should get the correct result. This is the right answer ;). The problem is that even though our key has 3 distinct parts, and the API makes it look like you have very fine grained control over those three parts, you have to remember to reset them as you iterate between objectids. It isn't a obvious as it should be. The current API is a very raw export of how we do the searches in the kernel too. You can do pretty much anything with it, but we pay with complexity. Hi Chris, I am a bit confused about your answer. The actual API is a bit confused (or almost not obvious). An application in order to work properly has to make some adjustment to the min_* fields AND filter the results (because if we tweak with the min_* field, not useful data is returned). Moreover this means that we move between user-space-kernel- space a lot of unused data (un-necessary context switch). On the basis of your answer, it seems that this is ok (please don't consider only the case of listing the subvolumes which is a very simple cases). And nothing have to do. Well, it's ok in that I wanted the API to be very close to the way searches are done in the kernel. I'll definitely agree it isn't perfect, especially as we hop between objectids or types. But I don't want to extend it just yet, mostly because we don't have new applications making use of it. I'd rather couple any new apis with new applications that we haven't yet thought of. Thanks a lot for the time you're spending on review and looking at this. If you have killer apps that can really make use of new APIs, I'm happy to start reworking things. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs hang (deadlock?) when trying to create a ext4 filesystem in KVM guest
Excerpts from C Anthony Risinger's message of 2010-12-15 11:46:11 -0500: On Thu, Oct 28, 2010 at 1:29 AM, Tomasz Chmielewski man...@wpkg.org wrote: On 28.10.2010 00:55, Chris Mason wrote: On Wed, Oct 27, 2010 at 03:29:38PM +0200, Tomasz Chmielewski wrote: There are a couple of problems when running KVM guests with images stored on btrfs filesystem. One of them is inability to create a filesystem (i.e. ext4) in the guest: - btrfs filesystem on the host was mounted with noatime,compress-force - guest was using a 50 GB sparse file, - attempt to create a ext4 filesystem within the guest does not succeed (hangs); host prints below messages in dmesg - some deadlock in btrfs? kernel: 2.6.36 qemu-kvm: 0.13.0 Is this the full dmesg output? I think there are other messages hiding in there. There were indeed bad ordered accounting left (see below), I think they are coming from btrfs? Is this a single disk btrfs? Yes. [ 8072.773053] device fsid 1142843480ad2d13-4bdc742fd9b1f7b0 devid 1 transid 1508 /dev/sdb4 [ 8072.773674] btrfs: forcing compression [ 8122.052221] device tap0 entered promiscuous mode [ 8122.052245] br0: port 2(tap0) entering learning state [ 8122.052248] br0: port 2(tap0) entering learning state [ 8122.451587] br0: port 2(tap0) entering learning state [ 8122.543477] br0: port 2(tap0) entering disabled state [ 8122.609645] device tap0 left promiscuous mode [ 8122.609650] br0: port 2(tap0) entering disabled state [ 8131.325647] EXT4-fs (md4): recovery complete [ 8131.325809] EXT4-fs (md4): mounted filesystem with ordered data mode. Opts: (null) [ 8133.392100] device tap0 entered promiscuous mode [ 8133.392127] br0: port 2(tap0) entering learning state [ 8133.392131] br0: port 2(tap0) entering learning state [ 8134.106594] kvm: 5004: cpu0 unhandled wrmsr: 0x198 data 0 [ 8134.106618] kvm: 5004: cpu1 unhandled wrmsr: 0x198 data 0 [ 8143.460927] tap0: no IPv6 routers present [ 8148.359485] br0: port 2(tap0) entering forwarding state [ 8309.103502] bad ordered accounting left 65536 size 385024 [ 8309.106206] bad ordered accounting left 65536 size 385024 [ 8309.108915] bad ordered accounting left 65536 size 385024 [ 8309.111630] bad ordered accounting left 36864 size 385024 [ 8501.965625] INFO: task qemu-system-x86:5148 blocked for more than 120 seconds. [ 8501.965629] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 8501.965632] qemu-system-x D 880001e14cc0 0 5148 4924 0x [ 8501.965638] 880223bc3b38 0086 880223bc3fd8 00014cc0 [ 8501.965642] 00014cc0 880223bc3fd8 00014cc0 880223bc3fd8 [ 8501.965647] 00014cc0 880221965f18 880221965f20 880221965b80 [ 8501.965651] Call Trace: [ 8501.965678] [a024c52c] btrfs_start_ordered_extent+0x6c/0xb0 [btrfs] [ 8501.965685] [81083200] ? autoremove_wake_function+0x0/0x40 [ 8501.965701] [a024d1c2] btrfs_wait_ordered_range+0xd2/0x160 [btrfs] [ 8501.965716] [a0240059] btrfs_file_aio_write+0x269/0x990 [btrfs] [ 8501.965721] [8105ca94] ? try_to_wake_up+0xf4/0x3f0 [ 8501.965726] [81168119] ? __pollwake+0x49/0x50 [ 8501.965730] [8105cd90] ? default_wake_function+0x0/0x20 [ 8501.965733] [8105ca94] ? try_to_wake_up+0xf4/0x3f0 [ 8501.965737] [8116813b] ? pollwake+0x1b/0x20 [ 8501.965752] [a023fdf0] ? btrfs_file_aio_write+0x0/0x990 [btrfs] [ 8501.965761] [8115664b] do_sync_readv_writev+0xcb/0x110 [ 8501.965769] [81294d98] ? apparmor_file_permission+0x18/0x20 [ 8501.965776] [8126356e] ? security_file_permission+0x1e/0x80 [ 8501.965781] [811576e0] do_readv_writev+0xd0/0x1d0 [ 8501.965787] [81076d72] ? kill_something_info+0x42/0x130 [ 8501.965793] [81076ee0] ? sys_kill+0x80/0x90 [ 8501.965798] [8115781e] vfs_writev+0x3e/0x60 [ 8501.965802] [811578e7] sys_pwritev+0xa7/0xc0 [ 8501.965806] [8100b0f2] system_call_fastpath+0x16/0x1b [ 8501.965810] INFO: task qemu-system-x86:5150 blocked for more than 120 seconds. [ 8501.965812] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 8501.965814] qemu-system-x D 880001e94cc0 0 5150 4924 0x [ 8501.965819] 8801aba4bb38 0086 8801aba4bfd8 00014cc0 [ 8501.965823] 00014cc0 8801aba4bfd8 00014cc0 8801aba4bfd8 [ 8501.965827] 00014cc0 880226d14838 880226d14840 880226d144a0 [ 8501.965832] Call Trace: [ 8501.965847] [a024c52c] btrfs_start_ordered_extent+0x6c/0xb0 [btrfs] [ 8501.965852] [81083200] ? autoremove_wake_function+0x0/0x40 [ 8501.965867] [a024d1c2] btrfs_wait_ordered_range+0xd2/0x160 [btrfs] [
Re: Bug in the design of the tree search ioctl API ? [was Re: [PATCH 1/3] Btrfs: Really return keys within specified range]
On Wednesday, 15 December, 2010, Chris Mason wrote: [...] Hi Chris, I am a bit confused about your answer. The actual API is a bit confused (or almost not obvious). An application in order to work properly has to make some adjustment to the min_* fields AND filter the results (because if we tweak with the min_* field, not useful data is returned). Moreover this means that we move between user-space- kernel- space a lot of unused data (un-necessary context switch). On the basis of your answer, it seems that this is ok (please don't consider only the case of listing the subvolumes which is a very simple cases). And nothing have to do. Well, it's ok in that I wanted the API to be very close to the way searches are done in the kernel. I'll definitely agree it isn't perfect, especially as we hop between objectids or types. But I don't want to extend it just yet, mostly because we don't have new applications making use of it. I'd rather couple any new apis with new applications that we haven't yet thought of. Thanks a lot for the time you're spending on review and looking at this. If you have killer apps that can really make use of new APIs, I'm happy to start reworking things. Look at a my previous email about an enhancement of the find-new command ([RFC] Improve btrfs subvolume find-new command - 11/12/2010). It would be sufficient ? (of course now on the basis of the last news of this API I know that this command is bugged :-( ) I can live with the current API ( tweaking the increasing of the min_* fields).. but think about another side of the question: now the only client of this API is the btrfs command (btrfs subvol list, which actually is broken) , and we can update this api with the minimum effort. Instead if we leave the current behavior in the future may appears an application which depends on it. So we may be obligated to maintain it .. Goffredo -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) kreij...@inwind.it Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What do the arguments of btrfs filesystem defragment do?
Excerpts from Erik Logtenberg's message of 2010-12-15 11:06:03 -0500: Hi, I would like to know that the arguments of btrfs filesystem defragment do. According to the built-in help page, the invocation is as follows: btrfs filesystem defragment [-vcf] [-s start] [-l len] [-t size] file|dir [file|dir...] Defragment a file or a directory. Unfortunately I can't find any documentation on the meaning of these arguments. The man page doesn't list these arguments: -v is verbose. It just prints the file name as it defrags -c forces compression, even if you are mounted without compression on -s is the starting byte you want to defrag in the file. This is useful for very large files that are only compressed in a particular spot -l is the number of bytes you want to compress, would be paired with -s -t is a threshold. If the extent is bigger than this threshold, we consider it defragged already. -f starts the IO right away. filesystem defragment file|dir [file|dir...] Defragment files and/or directories. Also the online documentation [1] is identical to the man page. [1] https://btrfs.wiki.kernel.org/index.php/Btrfs(command) On a somewhat related note I did find another page on the wiki [2] that explains a bit on the defragmenting subject, more specifically it mentions this rather important caveat: Caveat: Defragmenting a file which has a COW copy (either a snapshot copy or one made with bcp or cp --reflinks) will produce two unrelated files. If you defragment a subvolume that has a snapshot, you will roughly double the disk usage, as the snapshot files are no longer COW images of the originals. [2] https://btrfs.wiki.kernel.org/index.php/Problem_FAQ From what I've heard on IRC this is still the case in current versions, but the Btrfs(command) documentation contains no mention of this. This is still true. I hope someone can shed some light on these subjects. Hope this makes sense ;) -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What do the arguments of btrfs filesystem defragment do?
On Wednesday, 15 December, 2010, Erik Logtenberg wrote: Hi, I would like to know that the arguments of btrfs filesystem defragment do. According to the built-in help page, the invocation is as follows: btrfs filesystem defragment [-vcf] [-s start] [-l len] [-t size] file|dir [file|dir...] Defragment a file or a directory. Unfortunately I can't find any documentation on the meaning of these arguments. The man page doesn't list these arguments: I loked at the code and I try to explain. But what I write below may be affected by error. If the parameter is a directory, the subvolume where the dir is placed is defragged. Instead if the parameter is a file, the switches have the following meaning: -f - flush after defrag -s - start point of defrag -l - length of defrag range (if len is = 0, the rest of file is considered) -c - force compress -t - threshold for compression (if the size of file is bigger, compress the file ?) -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) kreij...@inwind.it Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What do the arguments of btrfs filesystem defragment do?
Chris, thank you very much for your explanation. Indeed this clears things up a bit. Caveat: Defragmenting a file which has a COW copy (either a snapshot copy or one made with bcp or cp --reflinks) will produce two unrelated files. If you defragment a subvolume that has a snapshot, you will roughly double the disk usage, as the snapshot files are no longer COW images of the originals. [2] https://btrfs.wiki.kernel.org/index.php/Problem_FAQ From what I've heard on IRC this is still the case in current versions, but the Btrfs(command) documentation contains no mention of this. This is still true. Is there a decent way to have btrfs compress already existing files (that were written before compression was enabled) without hurting any of the internal structures such as snapshots? The goal is to increase free disk space and possibly performance, not to expload disk usage by breaking COW relations. So given your reply, I assume that defragmenting all files is not the right way (?) Kind regards, Erik. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Mon, Dec 13, 2010 at 7:56 PM, Jon Nelson jnel...@jamponi.net wrote: On Sun, Dec 12, 2010 at 8:06 PM, Ted Ts'o ty...@mit.edu wrote: On Sun, Dec 12, 2010 at 07:11:28AM -0600, Jon Nelson wrote: I'm glad you've been able to reproduce the problem! If you should need any further assistance, please do not hesitate to ask. This patch seems to fix the problem for me. (Unless the partition is mounted with mblk_io_submit.) Could you confirm that it fixes it for you as well? I believe I have applied the (relevant) inode.c changes to bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc, rebuilt and begun testing. Now at 28 passes without error, I think I can say that the patch appears to resolve the issue. -- Jon Confirmed ! I'm running my box for 5+ hours right now with your patch applied in addition to Andi's/Milan's patch (http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-scale-to-multiple-CPUs.patch) , Ted, and can't see any indications of corruptions so far (while doing an emerge -e system) and doing everyday stuff. My /home partition (with ext4) is also still intact [which of course has a backup] so it seems to fix it for me, too so the corruption I was seeing was similar in a way to that of Jon You can add a Tested-by: Matthias Bayer jackdac...@gmail.com Thanks a lot to everyone for your support ! :) I have a question though: the deactivation of multiple page-io submission support most likely only would affect bigger systems or also desktop systems (like mine) ? Regards Matt -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
I have a question though: the deactivation of multiple page-io submission support most likely only would affect bigger systems or also desktop systems (like mine) ? I think this is not a final fix, just a workaround. The problem with the other path still really needs to be tracked down. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What do the arguments of btrfs filesystem defragment do?
Excerpts from Erik Logtenberg's message of 2010-12-15 14:14:15 -0500: Chris, thank you very much for your explanation. Indeed this clears things up a bit. Caveat: Defragmenting a file which has a COW copy (either a snapshot copy or one made with bcp or cp --reflinks) will produce two unrelated files. If you defragment a subvolume that has a snapshot, you will roughly double the disk usage, as the snapshot files are no longer COW images of the originals. [2] https://btrfs.wiki.kernel.org/index.php/Problem_FAQ From what I've heard on IRC this is still the case in current versions, but the Btrfs(command) documentation contains no mention of this. This is still true. Is there a decent way to have btrfs compress already existing files (that were written before compression was enabled) without hurting any of the internal structures such as snapshots? I'm afraid not yet. There is code for this in the btrfs balance routines, but we haven't yet adapted it to the defragment command. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Wed, Dec 15, 2010 at 8:16 PM, Andi Kleen a...@firstfloor.org wrote: I have a question though: the deactivation of multiple page-io submission support most likely only would affect bigger systems or also desktop systems (like mine) ? I think this is not a final fix, just a workaround. The problem with the other path still really needs to be tracked down. -Andi -- a...@linux.intel.com -- Speaking for myself only. ok, thanks for the clarification Regards Matt -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What do the arguments of btrfs filesystem defragment do?
Is there a decent way to have btrfs compress already existing files (that were written before compression was enabled) without hurting any of the internal structures such as snapshots? I'm afraid not yet. There is code for this in the btrfs balance routines, but we haven't yet adapted it to the defragment command. Okay, just to be 100% sure, if I add -o compress to the mount options, this will cause btrfs to (try to) compress newly written files, right? Is this fully compatible with existing snapshots, or will compressing files always (have a chance to) hurt COW relations? The use case is a filesystem used for backups, which are rsynced nightly, after which a new snapshot is made. After something like 45 days, the old snapshots are removed. I am assuming that this way, after 45 days all files will be compressed naturally, but this is only beneficial if snapshots still fully work. If instead it results in storing the compressed form of every file 45 times on disk, then it won't help much. Kind regards, Erik. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Wed, Dec 15, 2010 at 8:25 PM, Matt jackdac...@gmail.com wrote: On Wed, Dec 15, 2010 at 8:16 PM, Andi Kleen a...@firstfloor.org wrote: I have a question though: the deactivation of multiple page-io submission support most likely only would affect bigger systems or also desktop systems (like mine) ? I think this is not a final fix, just a workaround. The problem with the other path still really needs to be tracked down. -Andi -- a...@linux.intel.com -- Speaking for myself only. ok, thanks for the clarification Regards Matt Sorry to spam the mailing lists again make that a Reported-and-Tested-by: Matthias Bayer jackdac...@gmail.com (hope that's the correct way to write it) Regards Matt -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What do the arguments of btrfs filesystem defragment do?
Excerpts from Erik Logtenberg's message of 2010-12-15 14:26:49 -0500: Is there a decent way to have btrfs compress already existing files (that were written before compression was enabled) without hurting any of the internal structures such as snapshots? I'm afraid not yet. There is code for this in the btrfs balance routines, but we haven't yet adapted it to the defragment command. Okay, just to be 100% sure, if I add -o compress to the mount options, this will cause btrfs to (try to) compress newly written files, right? Yes Is this fully compatible with existing snapshots, or will compressing files always (have a chance to) hurt COW relations? COW references (both via snapshotting and cp --reflink) are broken when you change the file in either copy. While the files are the same, they are shared and the parts that change are private to each copy. If you have a reflink copy of a 1TB file and change only 4K in the middle, the other 1TB-4KB are still shared. This is true for both the default and with compression on. The use case is a filesystem used for backups, which are rsynced nightly, after which a new snapshot is made. After something like 45 days, the old snapshots are removed. I am assuming that this way, after 45 days all files will be compressed naturally, but this is only beneficial if snapshots still fully work. If instead it results in storing the compressed form of every file 45 times on disk, then it won't help much. Yes, you'll end up with a fully compressed and fully shared setup after 45 days. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: fix memory leak on error path in btrfs_get_acl()
If posix_acl_from_xattr() fails we leak memory stored in 'value'. Signed-off-by: Mariusz Kozlowski m...@lab.zgora.pl --- fs/btrfs/acl.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c index d16..11c9561 100644 --- a/fs/btrfs/acl.c +++ b/fs/btrfs/acl.c @@ -61,6 +61,7 @@ static struct posix_acl *btrfs_get_acl(struct inode *inode, int type) if (size 0) { acl = posix_acl_from_xattr(value, size); if (IS_ERR(acl)) + kfree(value); return acl; set_cached_acl(inode, type, acl); } -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
revised ioctl#21 btrfsprogs patch
the below is against 70c6c10134b502fa69955746554031939b85fb0c which is the head of the next branch. 1: absolute-sized fields in the interface object 2: now getting error from errno instead of the return value of ioctl(2) -- Forwarded message -- From: project user Date: Wed, Dec 15, 2010 at 4:15 PM Subject: btrfsprogs revised ioctl#21 patch diff --git a/Makefile b/Makefile index d65f6a2..b8ef1f2 100644 --- a/Makefile +++ b/Makefile @@ -37,12 +37,13 @@ all: version $(progs) manpages version: bash version.sh -btrfs: $(objects) btrfs.o btrfs_cmds.o - gcc $(CFLAGS) -o btrfs btrfs.o btrfs_cmds.o \ +btrfs: $(objects) btrfs.o btrfs_cmds.o iso8601toms.o + gcc $(CFLAGS) -o btrfs btrfs.o btrfs_cmds.o iso8601toms.o \ $(objects) $(LDFLAGS) $(LIBS) -btrfsctl: $(objects) btrfsctl.o - gcc $(CFLAGS) -o btrfsctl btrfsctl.o $(objects) $(LDFLAGS) $(LIBS) +btrfsctl: $(objects) btrfsctl.o iso8601toms.o + gcc $(CFLAGS) -o btrfsctl btrfsctl.o iso8601toms.o \ + $(objects) $(LDFLAGS) $(LIBS) btrfs-vol: $(objects) btrfs-vol.o gcc $(CFLAGS) -o btrfs-vol btrfs-vol.o $(objects) $(LDFLAGS) $(LIBS) diff --git a/btrfs.c b/btrfs.c index 46314cf..6b3ebf6 100644 --- a/btrfs.c +++ b/btrfs.c @@ -29,6 +29,7 @@ struct Command { CommandFunction func; /* function which implements the command */ int nargs; /* if == 999, any number of arguments if = 0, number of arguments, + if 1000, 1000 more than the _maximum_ number of arguments, if 0, _minimum_ number of arguments */ char *verb; /* verb */ char *help; /* help lines; form the 2nd onward they are @@ -73,14 +74,19 @@ static struct Command commands[] = { Set the subvolume of the filesystem path which will be mounted\n as default. }, - { do_fssync, 1, - filesystem sync, path\n - Force a sync on the filesystem path. + { do_fssync, 1002, + filesystem sync, [-dq] path\n + Force a sync on the filesystem path, defaulting to '.'. + }, + { do_wait4clean, 1002, /* require at most two args */ + filesystem reclaim, [path [timeout]] \n + Wait for cleanup of deleted subvolumes. Path defaults to .\n + Timeout in seconds, or add m for minutes.\n }, { do_resize, 2, filesystem resize, [+/-]newsize[gkm]|max filesystem\n Resize the file system. If 'max' is passed, the filesystem\n - will occupe all available space on the device. + will occupy all available space on the device. }, { do_show_filesystem, 999, filesystem show, [uuid|label]\n @@ -349,6 +355,15 @@ static int parse_args(int argc, char **argv, return -2; /* check the number of argument */ + + if(matchcmd-nargs 1000 ){ + if ( matchcmd-nargs (1000 + *nargs_)){ + fprintf(stderr, ERROR: '%s' requires only %d or fewer arg(s)\n, + matchcmd-verb, matchcmd-nargs - 1000 ); + return -2; + } + } else { + if (matchcmd-nargs 0 matchcmd-nargs -*nargs_ ){ fprintf(stderr, ERROR: '%s' requires minimum %d arg(s)\n, matchcmd-verb, -matchcmd-nargs); @@ -359,7 +374,7 @@ static int parse_args(int argc, char **argv, matchcmd-verb, matchcmd-nargs); return -2; } - + } if (prepare_args( nargs_, args_, prgname, matchcmd )){ fprintf(stderr, ERROR: not enough memory\\n); return -20; diff --git a/btrfs_cmds.c b/btrfs_cmds.c index 8031c58..89f1f22 100644 --- a/btrfs_cmds.c +++ b/btrfs_cmds.c @@ -507,15 +507,37 @@ int do_create_subvol(int argc, char **argv) int do_fssync(int argc, char **argv) { int fd, res; - char *path = argv[1]; - + int cleanup=1, quiet=0; + char *path = ., *p; /* [-dq] [path] */ + struct btrfs_ioctl_cleaner_wait_args w4c_arg; + + if (argc 1){ + if (argv[1][0] == '-'){ + for (p=argv[1][1]; *p; p++) switch (*p){ + case 'q': quiet++; break; + case 'd': cleanup++; break; + default: + fprintf(stderr, + ERROR: unknown flag '%c' in '%s'\n, *p,argv[1]); + return 12; + }; + if (argc 2) + path = argv[2]; + + }else{ + path = argv[1]; + + }; + + }; + while (cleanup--){ /* cleanup starts at 1 */ fd = open_file_or_dir(path); if (fd 0) { fprintf(stderr, ERROR: can't access to '%s'\n, path); return 12; } - printf(FSSync '%s'\n, path); + if (!quiet)
Re: [PATCH] btrfs: fix memory leak on error path in btrfs_get_acl()
On Wed, Dec 15, 2010 at 10:57 PM, Mariusz Kozlowski m...@lab.zgora.pl wrote: If posix_acl_from_xattr() fails we leak memory stored in 'value'. Signed-off-by: Mariusz Kozlowski m...@lab.zgora.pl --- fs/btrfs/acl.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c index d16..11c9561 100644 --- a/fs/btrfs/acl.c +++ b/fs/btrfs/acl.c @@ -61,6 +61,7 @@ static struct posix_acl *btrfs_get_acl(struct inode *inode, int type) if (size 0) { acl = posix_acl_from_xattr(value, size); if (IS_ERR(acl)) + kfree(value); Be careful with the evil { } return acl; set_cached_acl(inode, type, acl); } -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in the design of the tree search ioctl API ? [was Re: [PATCH 1/3] Btrfs: Really return keys within specified range]
02:48, Goffredo Baroncelli wrote: On Wednesday, 15 December, 2010, Li Zefan wrote: Goffredo Baroncelli wrote: On Wednesday, 15 December, 2010, Li Zefan wrote: h4) objectid asis, type asis, offset++ - we should get the correct result. This fix the problem of the missing subvolume. But for the other case (searching for more than one type) the problem still here. I don't think so. And the above h4 has showed how we search for more than one type. The generic userland code for next search is: /* this is in essence the same as how we advance key in kernel code */ if (sk-min_offset (u64)-1 sk-min_offset sk-max_offset) sk-min_offset++; else if (sk-min_type (u8)-1 sk-min_type sk-max_type) { sk-min_offset = 0; sk-min_type++; } else if (sk-min_objectid (u64)-1 sk-min_objectid sk- max_objectid){ sk-min_offset = 0; sk-min_type = 0; Sorry but if you reset the sk-min_type to 0, this means that the min_type lost its purpose (act as lover bound of the acceptance criteria). Yep, the changelog of Chris' commit has said that userland has to do this. sk-min_objectid++; } else break; ioctl(...); for (i = 0; i nr_items; i++) { if (!filter(items[i])) continue; So you are suggesting: Move all tree items from kernel to user space and filter it in userspace ?. This mean a lot of un-needed kernel-space - userspace transition... Right, but it's fine so far. I'm not suggesting anything, but explaining how the ioctl is working. Sorry I don't understand if we are talking about a workaround or a solution. As Chris said, it's not perfect but we can just live along with it, until we find some killer app that requests us to improve/expand this ioctl. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/5 v3] Btrfs: avoid transaction stuff when btrfs is readonly
On 12/16/2010 12:03 AM, Chris Mason wrote: Excerpts from liubo's message of 2010-12-15 04:12:14 -0500: On 12/15/2010 04:45 PM, Yan, Zheng wrote: On Fri, Dec 3, 2010 at 4:16 PM, liubo liubo2...@cn.fujitsu.com wrote: When the filesystem is readonly, avoid transaction stuff by checking MS_RDONLY at start transaction time. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/transaction.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 1fffbc0..14a597d 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -181,6 +181,9 @@ static struct btrfs_trans_handle *start_transaction(struct btrfs_root *root, struct btrfs_trans_handle *h; struct btrfs_transaction *cur_trans; int ret; + + if (root-fs_info-sb-s_flags MS_RDONLY) + return ERR_PTR(-EROFS); again: h = kmem_cache_alloc(btrfs_trans_handle_cachep, GFP_NOFS); if (!h) There are cases that we need to start transaction when MS_RDONLY flag is set. For example, remount FS into read-only mode and log replay. However, is it weird to make changes to disk as fs is in readonly state? IMO, btrfs needs to limit the use of these disk-change while readonly cases, as it is not what readonly means. reiserfs and ext3 at least have always done this. Log replay is required even when the FS is readonly. My concern is: now we have a forced readonly FS, which is already broken, if we still write something to disk, would it become more broken? Since it has been here, we can bypass readonly in those cases(as I did in the 5th patch): ... flags = sb-s_flags; if (sb-s_flags MS_RDONLY) sb-s_flags = ~MS_RDONLY I think we should have a dedicated flag to reflect a filesystem that is forced readonly, and check that flag instead. OK, we did have fs_state, a dedicated flag. thanks, Liu Bo -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] fs: fix deadlocks in writeback_if_idle
On Tue, Nov 30, 2010 at 11:01:18AM +1100, Nick Piggin wrote: On Mon, Nov 29, 2010 at 02:26:03PM -0800, Andrew Morton wrote: On Thu, 25 Nov 2010 14:53:56 +1100 Nick Piggin npig...@kernel.dk wrote: On Wed, Nov 24, 2010 at 02:10:28PM +0100, Jan Kara wrote: Well, for now, the easiest and simplest fix is my patch, I think. The objection is that we may not write out anything for the specified sb, but the current implementation provides no such guarantees at all anyway, so I don't think it's a big issue. Well yes. We take something which will fail occasionally and with your patch replace it with something which will fail a bit more often. Why don't we go all the way and do something which will fail *even more often*. Namely, just delete the damn function in the hope that the resulting failures will provoke the ext4 crew into doing something sane this time? I just need it fixed because the deadlocks are constantly hanging my tests and/or switching off lockdep. Guys, this delalloc thing *sucks*. And here we are just sticking new bandaids on top of the old bandaids. And the btrfs approach isn't exactly a thing of glory, either. So... nope. I won't be applying Nick's patch. Please fix this thing properly - you have a whole month! Testers have less. It would be better to fix it now and rip it out at the start of the next merge window if you're that way inclined :) Is this going to be fixed in time, or shall we merge my patch for 2.6.37? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: fix memory leak on error path in btrfs_get_acl()
On Wed, Dec 15, 2010 at 11:49:39PM +0100, Miguel Ojeda wrote: On Wed, Dec 15, 2010 at 10:57 PM, Mariusz Kozlowski m...@lab.zgora.pl wrote: If posix_acl_from_xattr() fails we leak memory stored in 'value'. Signed-off-by: Mariusz Kozlowski m...@lab.zgora.pl --- fs/btrfs/acl.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c index d16..11c9561 100644 --- a/fs/btrfs/acl.c +++ b/fs/btrfs/acl.c @@ -61,6 +61,7 @@ static struct posix_acl *btrfs_get_acl(struct inode *inode, int type) if (size 0) { acl = posix_acl_from_xattr(value, size); if (IS_ERR(acl)) + kfree(value); Be careful with the evil { } Dang. Too much python recently i guess. Will send v2 shortly. Thanks. return acl; set_cached_acl(inode, type, acl); } -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Mariusz Kozlowski -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] btrfs: fix memory leak on error path in btrfs_get_acl()
If posix_acl_from_xattr() fails we leak memory stored in 'value'. Signed-off-by: Mariusz Kozlowski m...@lab.zgora.pl --- fs/btrfs/acl.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c index d16..6d1410e 100644 --- a/fs/btrfs/acl.c +++ b/fs/btrfs/acl.c @@ -60,8 +60,10 @@ static struct posix_acl *btrfs_get_acl(struct inode *inode, int type) size = __btrfs_getxattr(inode, name, value, size); if (size 0) { acl = posix_acl_from_xattr(value, size); - if (IS_ERR(acl)) + if (IS_ERR(acl)) { + kfree(value); return acl; + } set_cached_acl(inode, type, acl); } kfree(value); -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BTRFS is broken with discard request?
Hi Kyungmin, Do you get some updates about your BTRFS is broken with discard request issue? Do you know how to explicitly trigger discard operation over filesystem? Thanks. From: Kyungmin Park kmpark at infradead.org Subject: BTRFS is broken with discard request? Newsgroups: gmane.comp.file-systems.btrfs, gmane.linux.kernel.mmc Date: 2010-06-15 23:17:49 GMT (26 weeks, 1 day, 3 hours and 51 minutes ago) Deal all, Now I got test the btrfs with discard option on MMC (implemented discard request). but I can't test the iozone benchmark test program with I/O error. Of course I disabled the discard option. it works well. and discard handling is tested with VFAT well. So I think BTRFS discard request seems to broken. Test kernel is commit 7e27d6e778cd87b6f2415515d7127eba53fe5d02 Author: Linus Torvalds torvalds at linux-foundation.org Date: Fri Jun 11 19:14:04 2010 -0700 Linux 2.6.35-rc3 scenarios are below. Any clues? Thank you, Kyungmin Park # mkfs.btrfs /dev/mmcblk0p1 -L mmc WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using fs created label mmc on /dev/mmcblk0p1 nodesize 4096 leafsize 4096 sectorsize 4096 size 7.40GB Btrfs Btrfs v0.19 # mount -t btrfs -o ssd,discard /dev/mmcblk0p1 /mnt/mmc [39310.465000] device label mmc devid 1 transid 7 /dev/mmcblk0p1 [39310.47] btrfs: use ssd allocation scheme # iozone -A -s 10m -U /mnt/mmc -f /mnt/mmc/test -e Iozone: Performance Test of File I/O Version $Revision: 3.326 $ Compiled for 32 bit mode. Build: linux-arm Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins Al Slater, Scott Rhine, Mike Wisner, Ken Goss Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR, Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner, Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root. Run began: Thu Jan 1 10:55:35 1970 Auto Mode 2. This option is obsolete. Use -az -i0 -i1 File size set to 10240 KB Include fsync in write timing Command line used: iozone -A -s 10m -U /mnt/mmc -f /mnt/mmc/test -e Output is in Kbytes/sec Time Resolution = 0.01 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. KB reclen write rewritereadreread 10240 4 [39337.095000] device label mmc devid 1 transid 12 /dev/mmcblk0p1 [39337.10] btree_readpage_end_io_hook: 4 callbacks suppressed [39337.105000] btrfs bad tree block start 0 29421568 [39337.11] btrfs bad tree block start 0 29421568 [39337.115000] btrfs bad tree block start 0 29425664 [39337.12] btrfs bad tree block start 0 29425664 [39337.125000] btrfs bad tree block start 0 29372416 [39337.13] btrfs bad tree block start 0 29372416 [39337.135000] btrfs bad tree block start 0 29376512 [39337.14] btrfs bad tree block start 0 29376512 [39337.145000] Btrfs detected SSD devices, enabling SSD mode [39337.15] btrfs bad tree block start 0 29380608 [39337.155000] btrfs bad tree block start 0 29380608 [39338.88] device label mmc devid 1 transid 17 /dev/mmcblk0p1 [39338.895000] Btrfs detected SSD devices, enabling SSD mode 74948721 [39340.83] device label mmc devid 1 transid 21 /dev/mmcblk0p1 [39340.845000] Btrfs detected SSD devices, enabling SSD mode Error reading block 0 4020 read: Input/output error # -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html