Re: [PATCH] Btrfs-progs: make fsck deal with bogus items
On Fri, Oct 3, 2014 at 9:54 AM, Josef Bacik jba...@fb.com wrote: We can deal with corrupt items by deleting them in a few cases. Fsck can easily recover from a missing extent item or a dir index item. So if we notice a item is completely bogus and it is of a key that we know we can repair then just delete it and carry on. Thanks, Signed-off-by: Josef Bacik jba...@fb.com --- cmds-check.c | 45 +++ tests/fsck-tests/005-bad-item-offset.img | Bin 0 - 398336 bytes 2 files changed, 45 insertions(+) create mode 100644 tests/fsck-tests/005-bad-item-offset.img It looks like tests/fsck-tests/005-bad-item-offset.img was added unintentionally to this patch. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questions on using BtrFS for fileserver
On Tue, Aug 19, 2014 at 11:21 AM, M G Berberich bt...@oss.m-berberich.de wrote: Hello, we are thinking about using BtrFS on standard hardware for a fileserver with about 50T (100T raw) of storage (25×4TByte). I would recommend carefully reading this thread titled: 1 week to rebuid 4x 3TB raid10 is a long time! http://comments.gmane.org/gmane.comp.file-systems.btrfs/36969 There are multiple methods for replacing a device in a Btrfs RAID array. If I understand the conclusions of this thread, you might still expect 12-14 hours to rebuild after replacing a 4 TByte device, assuming you use the optimal replace commands. With 25 devices, that leaves an uncomfortable period of time where another device might fail. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 40TB volume taking over 16 hours to mount, any ideas?
On Sat, Aug 9, 2014 at 11:21 PM, Duncan 1i5t5.dun...@cox.net wrote: But from rc5 on thru rc7 or 8 and release, unless you're one of the ones still waiting on a bug found earlier to be fixed, it's generally quite stable and boring. So by the time of actual .0 release, it really is quite stable, and no longer development kernel. Sure, Greg KH's stable series kernel releases stabilize it further, but that's exactly what they are, stable series, not development series, and there's really no development going into it generally from rc1 on, tho occasionally something that needs to come after everything else is slipped in in the first couple days after rc1, but still well before rc2, and the .0 release signifies the end of the post development stabilization period such that .0 really is no longer a development kernel at all, even if there are a few more weekly stable- series updates (about 10, 3.15.10 was announced to be the last one for 3.15, with the Friday-released 3.15.9) before support ceases if it's not a long-term-stable candidate. I can't say I've observed that to be the case with Btrfs. I know there is a core group of developers working very hard on testing the Btrfs updates in the _rc kernels, but once that .0 kernel hits the streets, the extra exposure to all the various combinations of hardware and options has been know to discover new issues. I think this is nearly unavoidable given the pace of Btrfs development. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC with mkdir and rename
On Mon, Aug 4, 2014 at 9:47 AM, Russell Coker russ...@coker.com.au wrote: If you regularly run a scrub with options such as -dusage=50 -musage=10 then the amount of free space in metadata chunks will tend to be a lot greater than that in data chunks. Just to clarify for posterity, I'm pretty sure you meant 'balance' with -dusage=50 -musage=10 instead of 'scrub'. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC with mkdir and rename
On Sat, Aug 2, 2014 at 6:35 PM, Peter Waller pe...@scraperwiki.com wrote: Hi All, My TL;DR questions are at the bottom, before the stack trace. I'm running Ubuntu 14.04. I wonder if this problem is related to the thread titled Machine lockup due to btrfs-transaction on AWS EC2 Ubuntu 14.04 which I started on the 29th of July: http://thread.gmane.org/gmane.comp.file-systems.btrfs/37224 Kernel: 3.15.7-031507-generic I'm on a single block device system, i.e, no RAID. I was observing ENOSPC from `mkdir` and `rename` on this system, with a good amount of free disk space (df -h reports 62 GB remain). I added enospc_debug (full umount/mount, not just mount -o remount), but this had no apparent effect when receiving ENOSPC from userland. $ sudo btrfs fi df /path/to/volume Data, single: total=489.97GiB, used=427.75GiB System, DUP: total=8.00MiB, used=60.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=5.00GiB, used=4.50GiB Metadata, single: total=8.00MiB, used=0.00 unknown, single: total=512.00MiB, used=820.00KiB After a thorough search of the internet for ENOSPC BTRFS I found various resources and came to understand a little bit more. One thing which broke my intuition severely is that I expected if there is a large number of free GiB, I should expect things to continue to work. In this case, for example, metadata has 0.5GiB free (sounds like plenty for metadata for one mkdir to me). Data has 62GiB free. Why would I get ENOSPC for a file rename? I expected that if metadata needed more space, it would just eat it from the 'data'. Now I believe this not to be the case and that it wanted to allocate 0.5GiB, and this is why I was getting ENOSPC. I tried a rebalance with btrfs balance start -dusage=10 and tried increasing the value until I saw reallocations in dmesg. This spat out a large number of messages in dmesg, of this form: [376096.546353] BTRFS info (device dm-0): relocating block group 530457821184 flags 1 [376010.736879] BTRFS info (device dm-0): 40 enospc errors during balance (and a full stack trace at the end of this message). The rebalance printed: ERROR: error during balancing '/path/to/volume' - No space left on device There may be more info in syslog - try dmesg | tail Eventually, not knowing what else to do I had to take my escape hatch and enlarge the volume. When I did this, metadata grew by 1GiB: Data, single: total=490.97GiB, used=427.75GiB System, DUP: total=8.00MiB, used=60.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=5.50GiB, used=4.50GiB Metadata, single: total=8.00MiB, used=0.00 unknown, single: total=512.00MiB, used=0.00 A few questions: * Why didn't the metadata grow before enlarging the disk? * Why didn't the rebalance enable the metadata to grow? * Why is it necessary to rebalance? Can't it automatically take some free space from 'data'? * Are my machine lockups related to the fact I was low on space? * Can we improve the documentation/FAQ for this? I was scratching my head in particular because my notion of free space definitely does not match up with BTRFS', and I didn't find the FAQ very helpful for getting out of this mess. * It isn't documented on the wiki what enospc_debug is supposed to do, so I couldn't tell whether I should have expected it to tell me anything in my circumstances. * What is the best course of action to take (other than enlarging the disk or deleting files) if I encounter this situation again? Looking at this line: Data, single: total=489.97GiB, used=427.75GiB I see that btrfs has allocated almost the entire disk to Data, and it appears you are starved for Metadata room. Once btrfs allocates space for either Data or Metadata, there are currently no build-in kernel mechanisms re-allocate that space. We have to use the userland balance tools. I agree that this behavior can become a gotcha. Btrfs has the capability to run in a mode where Data and Metadata are combined, but there is a speed penalty running in Mixed Data/Metadata mode. The btrfs balance tools have to ability to use filters to run a quicker pass on just the mostly-empty blocks, skipping a full balance. https://btrfs.wiki.kernel.org/index.php/Balance_Filters I would suggest this as the next step. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What to do about snapshot-aware defrag
On Sat, May 31, 2014 at 6:51 PM, Brendan Hide bren...@swiftspirit.co.za wrote: On 2014/05/31 12:00 AM, Martin wrote: OK... I'll jump in... On 30/05/14 21:43, Josef Bacik wrote: [snip] Option 1: Only relink inodes that haven't changed since the snapshot was taken. Pros: -Faster -Simpler -Less duplicated code, uses existing functions for tricky operations so less likely to introduce weird bugs. Cons: -Could possibly lost some of the snapshot-awareness of the defrag. If you just touch a file we would not do the relinking and you'd end up with twice the space usage. [...] Obvious way to go for fast KISS. I second this - KISS is better. Would in-band dedupe resolve the issue with losing the snapshot-awareness of the defrag? I figure that if someone absolutely wants everything deduped efficiently they'd put in the necessary resources (memory/dedicated SSD/etc) to have in-band dedupe work well. One question: Will option one mean that we always need to mount with noatime or read-only to allow snapshot defragging to do anything? When snapshot-aware defrag first came out, I was convinced it was a must-have capability for nearly everybody using btrfs. But, the more I look at my work load and common practices with btrfs, the more I am wondering just how often snapshot-aware defrag was actually doing something for me. I use a lot of snapshots. But for the most part, once I touch a file in my current subvolume, the whole file needs to be COW-ed from it's previous version. Now that we have a working sysfs, I wonder if we could implement some counters to track how often snapshot-aware defrag would have run. I might be surprised at how much it was doing. --- Regards, Mitch Harder -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid0 vs single, and should we allow -mdup by default on SSDs?
On Wed, May 7, 2014 at 3:52 AM, Marc MERLIN m...@merlins.org wrote: On Wed, May 07, 2014 at 09:29:41AM +0100, Hugo Mills wrote: On Wed, May 07, 2014 at 01:18:40AM -0700, Marc MERLIN wrote: On Tue, May 06, 2014 at 07:39:12PM +, Duncan wrote: That appears to be a very good use of either -d raid0 or -d single, yes. And since you're apparently not streaming such high resolution video that you NEED the raid0, single does indeed give you a somewhat better chance at recovery. zoneminder saves 'video' as a stream of independent small jpegs, so I'm good. Actually come to think of it they're so small that they probably all ended up in the raid1 metadata. That also means that I'm not getting twice the storage space like I planned to. Oh well... There's a mount option to change the threshold at which files are inlined in metadata: maxinline=bytes. You could play with that for this particular use-case. Oh cool, thank you. Since each non-inlined file will occupy a minimum of 4k, you may find that inlining will still save space even if it is duplicated. Even if they are duplicated in the metadata under RAID1, inlining a bunch of 256 byte files will still be more space efficient than storing them as regular files. But if most of the files are in the 2k-3k range, you may be more efficient to store them as files. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to view transaction log chronologically, human-readable?
On Sat, Apr 19, 2014 at 2:45 PM, Marcel Partap mpar...@gmx.net wrote: This is the BTRFS development list, right? Someone here should know how to achieve this I hope? #Regards On 01/03/14 02:21, Marcel Partap wrote: Dear BTFRS devs, I have a 1TB btrfs volume mounted read-only since two years because I deleted a bunch of files and didn't want to give up on them. Now with latest btrfs-find-root and btrfs restore --dry-run -t in a loop, I generated the full list of files contained in the last several hundred root trees. However, diffing these, I find the current one being the same until 94 root trees back, and the ones before contain earlier changes. Maybe by my own fault that is..whatever. Is there a way to just view the transaction history in a human-readable way? #Regards I am not a dev, but since BTRFS utilizes a COW (Copy On Write) architecture, it doesn't keep a journal or history of transactions that can be unwound. With respect to un-deleting files on BTRFS, the btrfs-find-root/'btrfs restore' combination are the most effective user-space tools I know of. It sounds like you've effectively tried this manually, but here's a link to an btrfs undelete script that also makes use of btrfs-find-root and 'btrfs restore': http://comments.gmane.org/gmane.comp.file-systems.btrfs/22560 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3.15-rc2] btrfs: replace error code from btrfs_drop_extents
On Tue, Apr 15, 2014 at 11:50 AM, David Sterba dste...@suse.cz wrote: There's a case which clone does not handle and used to BUG_ON instead, (testcase xfstests/btrfs/035), now returns EINVAL. This error code is confusing to the ioctl caller, as it normally signifies errorneous arguments. Change it to ENOPNOTSUPP which allows a fall back to copy instead of clone. This does not affect the common reflink operation. Minor spelling error in the commit message, you clearly mean EOPNOTSUPP, not ENOPNOTSUPP. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [RFC] btrfs-progs: Expand BUG_ON/WARN_ON Macros
I'm providing this patch as an example of how to expand the BUG_ON/WARN_ON macros to provide more information or extra capabilities. Josef Bacik has been working on working with a user on IRC to recover data from a btrfs volume, and the 'work-in-progress' solution involved expanding the BUG_ON/WARN_ON macros in a different method that would lose the information on where the BUG_ON/WARN_ON occured. When the macro is structured like this patch, it will still provide the location of the BUG_ON/WARN_ON in the code. This patch also highlights that BUG_ON and WARN_ON are the same thing in btrfs-progs. All WARN_ONs are treated the same as BUG_ONs, and the program is halted. Should we convert all our btrfs-progs WARN_ONs to BUG_ONs to allow us to implement a true WARN_ON functionality? Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org --- kerncompat.h | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/kerncompat.h b/kerncompat.h index f370cd8..79661f5 100644 --- a/kerncompat.h +++ b/kerncompat.h @@ -233,9 +233,19 @@ static inline long IS_ERR(const void *ptr) #define kstrdup(x, y) strdup(x) #define kfree(x) free(x) -#define BUG_ON(c) assert(!(c)) -#define WARN_ON(c) assert(!(c)) +#define BUG_ON(c) do { \ + if (c) { \ + fprintf(stderr, BUG_ON!\n); \ + assert(!(c)); \ + } \ +} while (0) +#define WARN_ON(c) do { \ + if (c) { \ + fprintf(stderr, WARN_ON!\n); \ + assert(!(c)); \ + } \ +} while (0) #define container_of(ptr, type, member) ({ \ const typeof( ((type *)0)-member ) *__mptr = (ptr);\ -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: fsck: fix wrong return value in check_block()
On Mon, Feb 24, 2014 at 7:38 PM, Wang Shilong wangsl.f...@cn.fujitsu.com wrote: Hi Mitch, On 02/25/2014 07:03 AM, Mitch Harder wrote: On Mon, Feb 24, 2014 at 5:55 AM, Wang Shilong wangsl.f...@cn.fujitsu.com wrote: We found btrfsck will output backrefs mismatch while the filesystem is defenitely ok. The problem is that check_block() don't return right value,which makes btrfsck won't walk all tree blocks thus we don't get a consistent filesystem, we will fail to check extent refs etc. Reported-by: Gui Hecheng guihc.f...@cn.fujitsu.com Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- cmds-check.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cmds-check.c b/cmds-check.c index a2afae6..253569f 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -2477,7 +2477,7 @@ static int check_block(struct btrfs_trans_handle *trans, struct cache_extent *cache; struct btrfs_key key; enum btrfs_tree_block_status status; - int ret = 1; + int ret = 0; int level; cache = lookup_cache_extent(extent_cache, buf-start, buf-len); -- I tried this fix on a broken btrfs volume I've been trying to repair, and it seemed to put me in an infinite loop. I agree that something seems wrong with the way the caller of check_block uses the return value, and I also noticed that it seemed to exit before walking all the tree blocks. But I think the problem is more subtle than flipping the default ret value from 1 to 0. No, not really even though i know there are other problems with fsck repair mode. But this problem should be fixed and pushed into btrfs-progsv3.13.(Notice, the below problem did not exist in btrfs-progsv3.12) An easy way to trigger this problem: # mkfs.btrfs -f /dev/sda9 # mount /dev/sda9 /mnt # dd if=/dev/zero of=/mnt/data bs=4k count=10240 oflag=direct # btrfs sub snapshot /mnt /mnt/snap1 # btrfs sub snapshot /mnt /mnt/snap2 # umount /mnt # btrfs check /dev/sda9 After applying this patch, the above problems did not exist. Feel free to correct me if i miss something here.^_^ I took a closer look at the check_block function today, and it looks to me like the problem is that the return value is not modified when BTRFS_BLOCK_FLAG_FULL_BACKREF is set. @@ -2521,14 +2521,17 @@ static int check_block(struct btrfs_trans_handle *trans, } } else { rec-content_checked = 1; -if (flags BTRFS_BLOCK_FLAG_FULL_BACKREF) +if (flags BTRFS_BLOCK_FLAG_FULL_BACKREF) { rec-owner_ref_checked = 1; +ret = 0; +} else { ret = check_owner_ref(root, rec, buf); if (!ret) rec-owner_ref_checked = 1; } For me, in this function I would lean towards an initial return value that must be updated by having check_block() make an affirmative PASS/FAIL decision on the block. What do you think about something like this? diff --git a/cmds-check.c b/cmds-check.c index ffc5d3e..55070da 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -2477,7 +2477,7 @@ static int check_block(struct btrfs_trans_handle *trans, struct cache_extent *cache; struct btrfs_key key; enum btrfs_tree_block_status status; -int ret = 1; +int ret = -EINVAL; int level; cache = lookup_cache_extent(extent_cache, buf-start, buf-len); @@ -2521,14 +2521,17 @@ static int check_block(struct btrfs_trans_handle *trans, } } else { rec-content_checked = 1; -if (flags BTRFS_BLOCK_FLAG_FULL_BACKREF) +if (flags BTRFS_BLOCK_FLAG_FULL_BACKREF) { rec-owner_ref_checked = 1; +ret = 0; +} else { ret = check_owner_ref(root, rec, buf); if (!ret) rec-owner_ref_checked = 1; } } +BUG_ON(ret == -EINVAL); if (!ret) maybe_free_extent_rec(extent_cache, rec); return ret; -- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: fsck: fix wrong return value in check_block()
On Mon, Feb 24, 2014 at 5:55 AM, Wang Shilong wangsl.f...@cn.fujitsu.com wrote: We found btrfsck will output backrefs mismatch while the filesystem is defenitely ok. The problem is that check_block() don't return right value,which makes btrfsck won't walk all tree blocks thus we don't get a consistent filesystem, we will fail to check extent refs etc. Reported-by: Gui Hecheng guihc.f...@cn.fujitsu.com Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- cmds-check.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cmds-check.c b/cmds-check.c index a2afae6..253569f 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -2477,7 +2477,7 @@ static int check_block(struct btrfs_trans_handle *trans, struct cache_extent *cache; struct btrfs_key key; enum btrfs_tree_block_status status; - int ret = 1; + int ret = 0; int level; cache = lookup_cache_extent(extent_cache, buf-start, buf-len); -- I tried this fix on a broken btrfs volume I've been trying to repair, and it seemed to put me in an infinite loop. I agree that something seems wrong with the way the caller of check_block uses the return value, and I also noticed that it seemed to exit before walking all the tree blocks. But I think the problem is more subtle than flipping the default ret value from 1 to 0. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix max_inline mount option
Currently, the only mount option for max_inline that has any effect is max_inline=0. Any other value that is supplied to max_inline will be adjusted to a minimum of 4k. Since max_inline has an effective maximum of ~3900 bytes due to page size limitations, the current behaviour only has meaning for max_inline=0. This patch will allow the the max_inline mount option to accept non-zero values as indicated in the documentation. Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org --- fs/btrfs/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 97cc241..e73c80e 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -566,7 +566,7 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) kfree(num); if (info-max_inline) { - info-max_inline = max_t(u64, + info-max_inline = min_t(u64, info-max_inline, root-sectorsize); } -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: Remove superfluous BUG_ON check.
The function call that set the ret parameter evaluated in this BUG_ON was removed in a previous commit: 11be10f71e1af5256f221feb9e91300b3e28bbef Btrfs-progs: make fsck fix certain file extent inconsistencies Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org --- cmds-check.c | 1 - 1 file changed, 1 deletion(-) diff --git a/cmds-check.c b/cmds-check.c index eef7c6c..ffc5d3e 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -4037,7 +4037,6 @@ static int run_next_block(struct btrfs_trans_handle *trans, parent, owner, key.objectid, key.offset - btrfs_file_extent_offset(buf, fi), 1, 1, btrfs_file_extent_disk_num_bytes(buf, fi)); - BUG_ON(ret); } } else { int level; -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: Change BUG() to use assert.
Change the definition of BUG() to use assert instead of abort to provide information about the location of the issue. Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org --- kerncompat.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kerncompat.h b/kerncompat.h index 1fc2b34..f370cd8 100644 --- a/kerncompat.h +++ b/kerncompat.h @@ -50,7 +50,7 @@ #define ULONG_MAX (~0UL) #endif -#define BUG() abort() +#define BUG() assert(0) #ifdef __CHECKER__ #define __force__attribute__((force)) #define __bitwise__ __attribute__((bitwise)) -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: Preserve process_one_leaf return value.
The return value in process_one_leaf could be over-written while looping over the items in the leaf. This patch will preserve a non-zero return value to the calling function if a non-zero return value is encountered in the loop. The return value of one (1) is consistent with non-zero values that could be returned while processing the leaf. The only caller of this function (walk_down_tree) would ignore the return value anyway. But this patch will correct the behaviour in case future changes intend to utilize the return value. Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org --- cmds-check.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/cmds-check.c b/cmds-check.c index 2911af0..eef7c6c 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -1219,6 +1219,7 @@ static int process_one_leaf(struct btrfs_root *root, struct extent_buffer *eb, u32 nritems; int i; int ret = 0; + int error = 0; struct cache_tree *inode_cache; struct shared_node *active_node; @@ -1268,8 +1269,10 @@ static int process_one_leaf(struct btrfs_root *root, struct extent_buffer *eb, default: break; }; + if (ret != 0) + error = 1; } - return ret; + return error; } static void reada_walk_down(struct btrfs_root *root, -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: Convert BUG() to BUG_ON(1)
Convert the instances of BUG() to BUG_ON(1) to provide information about the location of the abort. Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org --- btrfs-debug-tree.c | 4 ++-- ctree.c| 20 ++-- ctree.h| 2 +- disk-io.c | 4 ++-- extent-tree.c | 6 +++--- extent_io.c| 2 +- file-item.c| 4 ++-- print-tree.c | 8 volumes.c | 4 ++-- 9 files changed, 27 insertions(+), 27 deletions(-) diff --git a/btrfs-debug-tree.c b/btrfs-debug-tree.c index f37de9d..0180265 100644 --- a/btrfs-debug-tree.c +++ b/btrfs-debug-tree.c @@ -68,10 +68,10 @@ static void print_extents(struct btrfs_root *root, struct extent_buffer *eb) btrfs_node_ptr_generation(eb, i)); if (btrfs_is_leaf(next) btrfs_header_level(eb) != 1) - BUG(); + BUG_ON(1); if (btrfs_header_level(next) != btrfs_header_level(eb) - 1) - BUG(); + BUG_ON(1); print_extents(root, next); free_extent_buffer(next); } diff --git a/ctree.c b/ctree.c index 9e5b30f..7aab3b1 100644 --- a/ctree.c +++ b/ctree.c @@ -822,7 +822,7 @@ static int balance_level(struct btrfs_trans_handle *trans, check_block(root, path, level); if (orig_ptr != btrfs_node_blockptr(path-nodes[level], path-slots[level])) - BUG(); + BUG_ON(1); enospc: if (right) free_extent_buffer(right); @@ -1425,9 +1425,9 @@ static int insert_ptr(struct btrfs_trans_handle *trans, struct btrfs_root lower = path-nodes[level]; nritems = btrfs_header_nritems(lower); if (slot nritems) - BUG(); + BUG_ON(1); if (nritems == BTRFS_NODEPTRS_PER_BLOCK(root)) - BUG(); + BUG_ON(1); if (slot != nritems) { memmove_extent_buffer(lower, btrfs_node_key_ptr_offset(slot + 1), @@ -2213,7 +2213,7 @@ split: ret = 0; if (btrfs_leaf_free_space(root, leaf) 0) { btrfs_print_leaf(root, leaf); - BUG(); + BUG_ON(1); } kfree(buf); return ret; @@ -2311,7 +2311,7 @@ int btrfs_truncate_item(struct btrfs_trans_handle *trans, ret = 0; if (btrfs_leaf_free_space(root, leaf) 0) { btrfs_print_leaf(root, leaf); - BUG(); + BUG_ON(1); } return ret; } @@ -2337,7 +2337,7 @@ int btrfs_extend_item(struct btrfs_trans_handle *trans, if (btrfs_leaf_free_space(root, leaf) data_size) { btrfs_print_leaf(root, leaf); - BUG(); + BUG_ON(1); } slot = path-slots[0]; old_data = btrfs_item_end_nr(leaf, slot); @@ -2374,7 +2374,7 @@ int btrfs_extend_item(struct btrfs_trans_handle *trans, ret = 0; if (btrfs_leaf_free_space(root, leaf) 0) { btrfs_print_leaf(root, leaf); - BUG(); + BUG_ON(1); } return ret; } @@ -2406,7 +2406,7 @@ int btrfs_insert_empty_items(struct btrfs_trans_handle *trans, /* create a root if there isn't one */ if (!root-node) - BUG(); + BUG_ON(1); total_size = total_data + nr * sizeof(struct btrfs_item); ret = btrfs_search_slot(trans, root, cpu_key, path, total_size, 1); @@ -2425,7 +2425,7 @@ int btrfs_insert_empty_items(struct btrfs_trans_handle *trans, btrfs_print_leaf(root, leaf); printk(not enough freespace need %u have %d\n, total_size, btrfs_leaf_free_space(root, leaf)); - BUG(); + BUG_ON(1); } slot = path-slots[0]; @@ -2484,7 +2484,7 @@ int btrfs_insert_empty_items(struct btrfs_trans_handle *trans, if (btrfs_leaf_free_space(root, leaf) 0) { btrfs_print_leaf(root, leaf); - BUG(); + BUG_ON(1); } out: diff --git a/ctree.h b/ctree.h index a9c67b2..101389b 100644 --- a/ctree.h +++ b/ctree.h @@ -1519,7 +1519,7 @@ static inline u32 btrfs_extent_inline_ref_size(int type) if (type == BTRFS_EXTENT_DATA_REF_KEY) return sizeof(struct btrfs_extent_data_ref) + offsetof(struct btrfs_extent_inline_ref, offset); - BUG(); + BUG_ON(1); return 0; } diff --git a/disk-io.c b/disk-io.c index e840177..2a6c68f 100644 --- a/disk-io.c +++ b/disk-io.c @@ -349,10 +349,10 @@ static int write_tree_block(struct btrfs_trans_handle *trans, struct extent_buffer *eb) { if (check_tree_block(root, eb)) - BUG(); + BUG_ON(1
Re: [PATCH] btrfs-progs: Convert BUG() to BUG_ON(1)
On Thu, Feb 6, 2014 at 3:22 PM, David Sterba dste...@suse.cz wrote: On Thu, Feb 06, 2014 at 12:34:08PM -0600, Mitch Harder wrote: Convert the instances of BUG() to BUG_ON(1) to provide information about the location of the abort. kerncompat.h: #define BUG() abort() #define BUG_ON(c) assert(!(c)) I'd rather fix the definition to do the same thing, that way no developer would need to know the difference (that actually exists only in the userspace tools, in kernel the two produce the same outout). david Thanks for the feedback. Changing the definition of BUG() in kerncompat.h will be much more concise. I'll restructure the patch and resubmit it after I test it. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs-endio-wri: page allocation failure
I received a btrfs page allocation failure on my 3.12.7 kernel which is merged with Chris' for-linus branch for the 3.13_rc kernel. I have several btrfs partitions mounted, but I believe this error is on my btrfs root partition. Several things were going on at the same time on this partition. I have a snapshot script creating and deleting snapshots of the root partition. I was also compiling an application, and running Firefox. I know the snapshots may be a problem area. The snapshot script is currently running with about 550 snapshots of the root partition. It adds snapshots every 180 seconds, and removes the oldest snapshots based on available disk space. So far, I haven't encountered a crash. Since this is my root partition, I'll have to reboot to check for corruption. [111575.089533] btrfs-endio-wri: page allocation failure: order:4, mode:0x104050 [111575.089543] CPU: 1 PID: 14414 Comm: btrfs-endio-wri Tainted: G C 3.12.7-git-local #1 [111575.089546] Hardware name: Dell Inc. OptiPlex 745 /0WF810, BIOS 2.6.4 03/01/2010 [111575.089550] 00104050 88007484f6f8 81642878 88007f30eaf8 [111575.089556] 0001 88007484f788 810d27cd 81ca5d28 [111575.089561] 0010 8800fff0 810d4c86 8840 [111575.089566] Call Trace: [111575.089578] [81642878] dump_stack+0x46/0x58 [111575.089584] [810d27cd] warn_alloc_failed+0x115/0x129 [111575.089589] [810d4c86] ? drain_local_pages+0x16/0x18 [111575.089594] [810d5145] __alloc_pages_nodemask+0x47a/0x84d [111575.089620] [a018dd01] ? balance_level+0x666/0x6e8 [btrfs] [111575.089626] [810d552f] __get_free_pages+0x17/0x44 [111575.089631] [810e7e81] kmalloc_order_trace+0x2e/0x90 [111575.089637] [8110b1fc] __kmalloc_track_caller+0x3f/0x12c [111575.089653] [a01f8e5c] ? ulist_add_merge+0xe6/0x153 [btrfs] [111575.089659] [810e401e] krealloc+0x57/0x91 [111575.089674] [a01f8e5c] ulist_add_merge+0xe6/0x153 [btrfs] [111575.089689] [a01f7b8b] find_parent_nodes+0x494/0x57e [btrfs] [111575.089705] [a01f7d12] btrfs_find_all_roots+0x81/0xdc [btrfs] [111575.089721] [a01f8589] iterate_extent_inodes+0x12f/0x2c4 [btrfs] [111575.089737] [a01aec83] ? record_extent_backrefs+0xa7/0xa7 [btrfs] [111575.089754] [a01aec83] ? record_extent_backrefs+0xa7/0xa7 [btrfs] [111575.089770] [a01f87a2] iterate_inodes_from_logical+0x84/0x9a [btrfs] [111575.089787] [a01aec3c] record_extent_backrefs+0x60/0xa7 [btrfs] [111575.089804] [a01b7515] btrfs_finish_ordered_io+0x780/0x87d [btrfs] [111575.089809] [810d09cf] ? mempool_free_slab+0x17/0x19 [111575.089826] [a01b7627] finish_ordered_fn+0x15/0x17 [btrfs] [111575.089843] [a01d3153] worker_loop+0x13d/0x4a2 [btrfs] [111575.089860] [a01d3016] ? btrfs_queue_worker+0x267/0x267 [btrfs] [111575.089865] [81053779] kthread+0xba/0xc2 [111575.089870] [810536bf] ? kthread_freezable_should_stop+0x4d/0x4d [111575.089875] [81649dac] ret_from_fork+0x7c/0xb0 [111575.089879] [810536bf] ? kthread_freezable_should_stop+0x4d/0x4d [111575.089882] Mem-Info: [111575.089884] DMA per-cpu: [111575.089887] CPU0: hi:0, btch: 1 usd: 0 [111575.089890] CPU1: hi:0, btch: 1 usd: 0 [111575.089892] DMA32 per-cpu: [111575.089895] CPU0: hi: 186, btch: 31 usd: 27 [111575.089897] CPU1: hi: 186, btch: 31 usd: 0 [111575.089904] active_anon:169762 inactive_anon:52853 isolated_anon:0 active_file:115654 inactive_file:114252 isolated_file:0 unevictable:0 dirty:1795 writeback:0 unstable:0 free:22811 slab_reclaimable:19321 slab_unreclaimable:4379 mapped:15644 shmem:10186 pagetables:1982 bounce:0 free_cma:0 [111575.089918] DMA free:8264kB min:352kB low:440kB high:528kB active_anon:1496kB inactive_anon:1556kB active_file:1680kB inactive_file:1692kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15968kB mlocked:0kB dirty:0kB writeback:0kB mapped:640kB shmem:300kB slab_reclaimable:420kB slab_unreclaimable:128kB kernel_stack:24kB pagetables:88kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [111575.089920] lowmem_reserve[]: 0 1971 1971 1971 [111575.089932] DMA32 free:82980kB min:44700kB low:55872kB high:67048kB active_anon:677552kB inactive_anon:209856kB active_file:460936kB inactive_file:455316kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2070524kB managed:2022852kB mlocked:0kB dirty:7180kB writeback:0kB mapped:61936kB shmem:40444kB slab_reclaimable:76864kB slab_unreclaimable:17388kB kernel_stack:2032kB pagetables:7840kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [111575.089935] lowmem_reserve[]: 0 0 0 0 [111575.089940] DMA: 32*4kB (UEM) 32*8kB (UE) 11*16kB (U)
Re: btrfs-endio-wri: page allocation failure
On Thu, Jan 16, 2014 at 8:03 PM, Mitch Harder mitch.har...@sabayonlinux.org wrote: I received a btrfs page allocation failure on my 3.12.7 kernel which is merged with Chris' for-linus branch for the 3.13_rc kernel. I have several btrfs partitions mounted, but I believe this error is on my btrfs root partition. Several things were going on at the same time on this partition. I have a snapshot script creating and deleting snapshots of the root partition. I was also compiling an application, and running Firefox. I know the snapshots may be a problem area. The snapshot script is currently running with about 550 snapshots of the root partition. It adds snapshots every 180 seconds, and removes the oldest snapshots based on available disk space. So far, I haven't encountered a crash. Since this is my root partition, I'll have to reboot to check for corruption. The partition still mounts, and so far I can access everything I spot-check, but btrfsck is reporting the following errors: Checking filesystem on /dev/sda3 UUID: 1050ccb5-58ae-4479-9e12-2230a7b0097a checking extents checking free space cache checking fs roots checking csums There are no extents for csum range 2267451392-2267521024 Csum exists for 2267451392-2267521024 but there is no extent record There are no extents for csum range 10636697600-10636836864 Csum exists for 10636697600-10636836864 but there is no extent record found 5015120900 bytes used err is 2 total csum bytes: 10233048 total tree bytes: 2166587392 total fs tree bytes: 2043346944 total extent tree bytes: 108380160 btree space waste bytes: 483349153 file data blocks allocated: 93115641856 referenced 99033673728 Btrfs v3.12 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfsck failes
On Mon, Jan 13, 2014 at 6:37 PM, Chris Murphy li...@colorremedies.com wrote: On Jan 13, 2014, at 3:58 PM, Holger Brandsmeier brandsme...@gmail.com wrote: Currently btrfsck failes to repair my partition, I get the output: [root@ho-think bholger]# btrfsck --repair /dev/sda5 This is almost the last resort and you probably should be posting to the list before using repair. This is like saying: Yes, btrfs does now have a working btrfsck, but only for the select few who manage to get through on the mailing list for support. I'd like to think that's not the case. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel BUG on Snapshot Deletion (3.11.0-rc5)
On Fri, Aug 23, 2013 at 3:48 AM, Stefan Behrens sbehr...@giantdisaster.de wrote: On Wed, 21 Aug 2013 08:44:55 -0500, Mitch Harder wrote: I've had a hard time assembling a portable reproducer for this issue. I discovered that my reproducer was highly dependent on a local archive of out-of-date git kernel sources. My efforts to reproduce the error with a portable set of scripts with publicly available kernel git sources weren't successful. It seems like this issue is related to a corner-case workload that is difficult to reproduce. So I've bisected the error I was seeing with my local script, and identified the following commit as triggering my issue: commit:3c64a1aba7cfcb04f79e76f859b3d0275d59 Btrfs: cleanup: don't check the same thing twice https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/fs/btrfs?h=for-linusid=3c64a1aba7cfcb04 I tested a kernel which reverted this change, and also added WARN_ON lines to provide a back trace. [...] diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index cd46e2c..a1091f7 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2302,6 +2302,12 @@ static noinline int relink_extent_backref(struct btrfs_path *path, return 0; return PTR_ERR(root); } +if (btrfs_root_refs(root-root_item) == 0) { +srcu_read_unlock(fs_info-subvol_srcu, index); +/* parse ENOENT to 0 */ +WARN_ON(1); +return 0; +} [...] [ 1616.886868] [ cut here ] [ 1616.886912] WARNING: at fs/btrfs/inode.c:2308 relink_extent_backref+0x103/0x721 [btrfs]() [ 1616.887050] Call Trace: [ 1616.887064] [8161a34a] dump_stack+0x19/0x1b [ 1616.887071] [8103035a] warn_slowpath_common+0x67/0x80 [ 1616.887077] [8103038d] warn_slowpath_null+0x1a/0x1c [ 1616.887100] [a019ea82] relink_extent_backref+0x103/0x721 [ 1616.887205] [a019f7e2] btrfs_finish_ordered_io+0x742/0x829 Mitch, Thank you for this excellent work to find the cause of the issue. I've sent a patch Btrfs: fix for patch cleanup: don't check the same thing twice and would appreciate if you could repeat your test, just to make sure, because I was never able to reproduce this issue myself. Thanks. I've tested my special workload with your patch on the latest 3.11_rc6 kernel, and the patch corrects the errors I was encountering. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: fix for patch cleanup: don't check the same thing twice
On Fri, Aug 23, 2013 at 4:03 AM, Miao Xie mi...@cn.fujitsu.com wrote: On fri, 23 Aug 2013 10:34:42 +0200, Stefan Behrens wrote: Mitch Harder noticed that the patch 3c64a1a mentioned in the subject line was causing a kernel BUG() on snapshot deletion. The patch was wrong. It did not handle cached roots correctly. The check for root_refs == 0 was removed everywhere where btrfs_read_fs_root_no_name() had been used to retrieve the root, because this check was already dealt with in btrfs_read_fs_root_no_name(). But in the case when the root was found in the cache, there was no such check. This patch adds the missing check in the case where the root is found in the cache. Reported-by: Mitch Harder mitch.har...@sabayonlinux.org Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de --- fs/btrfs/disk-io.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 43ec3c6..7078554 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1583,8 +1583,11 @@ struct btrfs_root *btrfs_read_fs_root_no_name(struct btrfs_fs_info *fs_info, ERR_PTR(-ENOENT); again: root = btrfs_lookup_fs_root(fs_info, location-objectid); - if (root) + if (root) { + if (btrfs_root_refs(root-root_item) == 0) + return ERR_PTR(-ENOENT); return root; + } It seems good to me. Reviewed-by: Miao Xie mi...@cn.fujitsu.com root = btrfs_read_fs_root(fs_info-tree_root, location); if (IS_ERR(root)) Tested-by: Mitch Harder mitch.har...@sabayonlinux.org -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)
On Thu, Aug 22, 2013 at 1:47 AM, Nicholas Lee em...@nickle.es wrote: [ 45.914275] [ cut here ] [ 45.914406] kernel BUG at fs/btrfs/volumes.c:4417! [ 45.914489] invalid opcode: [#1] PREEMPT SMP I can't say if this will fix your problem or not, but the 3.10.x kernel has a patch to pass this error back instead of halting with a BUG() at this point. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel BUG on Snapshot Deletion (3.11.0-rc5)
I'm running into a curious problem. In the process of making my script portable, I am breaking the ability to replicate the error. I'm trying to isolate the aspect of my local script that is triggering the error. No firm insights yet. On Tue, Aug 13, 2013 at 11:03 AM, Mitch Harder mitch.har...@sabayonlinux.org wrote: Let me work on making that script more portable, and hopefully quicker to reproduce. On Tue, Aug 13, 2013 at 9:15 AM, Josef Bacik jba...@fusionio.com wrote: On Mon, Aug 12, 2013 at 11:06:27PM -0500, Mitch Harder wrote: I'm hitting a btrfs Kernel BUG running a snapshot stress script with linux-3.11.0-rc5. I can haz script? Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel BUG on Snapshot Deletion (3.11.0-rc5)
Let me work on making that script more portable, and hopefully quicker to reproduce. On Tue, Aug 13, 2013 at 9:15 AM, Josef Bacik jba...@fusionio.com wrote: On Mon, Aug 12, 2013 at 11:06:27PM -0500, Mitch Harder wrote: I'm hitting a btrfs Kernel BUG running a snapshot stress script with linux-3.11.0-rc5. I can haz script? Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Kernel BUG on Snapshot Deletion (3.11.0-rc5)
I'm hitting a btrfs Kernel BUG running a snapshot stress script with linux-3.11.0-rc5. I'm running with lzo compression, autodefrag, and the partition is formated with 16k leafsize/inodesize. [ 72.170431] device fsid 8a6be667-d041-4367-80f7-e4cb42356e85 devid 1 transid 4 /dev/sda7 [ 72.297512] device fsid 8a6be667-d041-4367-80f7-e4cb42356e85 devid 1 transid 4 /dev/sda7 [ 72.298928] device fsid 8a6be667-d041-4367-80f7-e4cb42356e85 devid 1 transid 4 /dev/sda7 [ 72.299390] btrfs: setting 8 feature flag [ 72.299395] btrfs: force lzo compression [ 72.299401] btrfs: enabling auto defrag [ 72.299404] btrfs: disk space caching is enabled [ 72.299407] btrfs flagging fs with big metadata feature [ 2234.790218] [ cut here ] [ 2234.790257] WARNING: CPU: 0 PID: 4246 at fs/btrfs/extent-tree.c:840 btrfs_lookup_extent_info+0x328/0x36e [btrfs]() [ 2234.790262] Modules linked in: ipv6 tg3 serio_raw ppdev snd_hda_codec_analog iTCO_wdt iTCO_vendor_support snd_hda_intel floppy snd_hda_codec sr_mod snd_hwdep pcspkr snd_pcm lpc_ich i2c_i801 parport_pc parport ptp snd_page_alloc pps_core snd_timer snd xts ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 sha256_generic fuse xfs nfs lockd sunrpc reiserfs btrfs zlib_deflate ext4 jbd2 ext3 jbd ext2 mbcache sl811_hcd hid_generic xhci_hcd ohci_hcd uhci_hcd ehci_pci ehci_hcd [ 2234.790333] CPU: 0 PID: 4246 Comm: btrfs-cleaner Not tainted 3.11.0-rc5 #1 [ 2234.790337] Hardware name: Dell Inc. OptiPlex 745 /0WF810, BIOS 2.6.4 03/01/2010 [ 2234.790341] 0348 880077739b68 81625def 0006 [ 2234.790349] 880077739ba8 810374f0 88000556e800 [ 2234.790356] a0185d5c 88007721de10 88000556e800 [ 2234.790363] Call Trace: [ 2234.790375] [81625def] dump_stack+0x46/0x58 [ 2234.790384] [810374f0] warn_slowpath_common+0x81/0x9b [ 2234.790403] [a0185d5c] ? btrfs_lookup_extent_info+0x328/0x36e [btrfs] [ 2234.790411] [81037524] warn_slowpath_null+0x1a/0x1c [ 2234.790429] [a0185d5c] btrfs_lookup_extent_info+0x328/0x36e [btrfs] [ 2234.790449] [a018837e] do_walk_down+0x142/0x438 [btrfs] [ 2234.790467] [a01860d4] ? btrfs_delayed_refs_qgroup_accounting+0xbd/0xcc [btrfs] [ 2234.790487] [a018871a] walk_down_tree+0xa6/0xd4 [btrfs] [ 2234.790507] [a018aec3] btrfs_drop_snapshot+0x32d/0x65d [btrfs] [ 2234.790531] [a019b1df] btrfs_clean_one_deleted_snapshot+0xda/0x103 [btrfs] [ 2234.790552] [a0193c0c] cleaner_kthread+0x130/0x157 [btrfs] [ 2234.790573] [a0193adc] ? transaction_kthread+0x1a0/0x1a0 [btrfs] [ 2234.790580] [810522bc] kthread+0xba/0xc2 [ 2234.790586] [81052202] ? kthread_freezable_should_stop+0x52/0x52 [ 2234.790593] [8162d89c] ret_from_fork+0x7c/0xb0 [ 2234.790599] [81052202] ? kthread_freezable_should_stop+0x52/0x52 [ 2234.790604] ---[ end trace 21a428587abe0e9d ]--- [ 2234.790610] BTRFS error (device sda7): Missing references. [ 2234.790637] [ cut here ] [ 2234.790688] kernel BUG at fs/btrfs/extent-tree.c:7191! [ 2234.790736] invalid opcode: [#1] SMP [ 2234.790779] Modules linked in: ipv6 tg3 serio_raw ppdev snd_hda_codec_analog iTCO_wdt iTCO_vendor_support snd_hda_intel floppy snd_hda_codec sr_mod snd_hwdep pcspkr snd_pcm lpc_ich i2c_i801 parport_pc parport ptp snd_page_alloc pps_core snd_timer snd xts ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 sha256_generic fuse xfs nfs lockd sunrpc reiserfs btrfs zlib_deflate ext4 jbd2 ext3 jbd ext2 mbcache sl811_hcd hid_generic xhci_hcd ohci_hcd uhci_hcd ehci_pci ehci_hcd [ 2234.791005] CPU: 0 PID: 4246 Comm: btrfs-cleaner Tainted: G W3.11.0-rc5 #1 [ 2234.791005] Hardware name: Dell Inc. OptiPlex 745 /0WF810, BIOS 2.6.4 03/01/2010 [ 2234.791005] task: 88007c97c380 ti: 880077738000 task.ti: 880077738000 [ 2234.791005] RIP: 0010:[a01883be] [a01883be] do_walk_down+0x182/0x438 [btrfs] [ 2234.791005] RSP: :880077739c58 EFLAGS: 00010296 [ 2234.791005] RAX: 002e RBX: 88000c6706c0 RCX: 0046 [ 2234.791005] RDX: 0006 RSI: 0046 RDI: 88007f20d210 [ 2234.791005] RBP: 880077739d18 R08: 0002 R09: fffe [ 2234.791005] R10: 0001 R11: 81e2ee38 R12: 88002a930500 [ 2234.791005] R13: 88007721 R14: 88000556e800 R15: 0002 [ 2234.791005] FS: () GS:88007f20() knlGS: [ 2234.791005] CS: 0010 DS: ES: CR0: 8005003b [ 2234.791005] CR2: 7f312ced67bd CR3: 255e CR4: 07f0 [ 2234.791005] Stack: [ 2234.791005] 88000c670708 a01860d4 8800771df3c0 [ 2234.791005] 880077739c98 0001 0001
Re: lz4 status?
There's been a parallel effort to incorporate a general set of lz4 patches in the kernel. I see these patches are currently queued up in the linux-next tree, so we may see them in the 3.11 kernel. It looks like lz4 and lz4hc will be provided. So, instead of btrfs having it's own implementation of lz4, the patches will be re-worked around kernel's new lz4 library. On Wed, Jun 26, 2013 at 10:57 AM, Roger Pack rogerpack2...@gmail.com wrote: Any update on the unmerged lz4 patches? Have they been merged? Just wondering (and +1'ing my support, obviously). Thank you. -roger- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs-cleaner Blocked on xfstests 068
I'm running into a problem with the btrfs-cleaner thread becoming blocked on xfstests 068. The test locks up indefinitely without completing (normally it finished in about 45 seconds on my test box). I've replicated the issue on 3.10.0_rc5 and the for-linus branch of 3.9.0. I ran a git bisect on the 3.9.0 for-linus branch, and tracked my issue to the following commit: commit 9d1a2a3ad59f7ae810bf04a5a05995bf2d79300c btrfs: clean snapshots one by one The 068 test uses the scratch drive, so I believe xfs-test is using the defaults for formatting the device, which is a physical partition on my SATA drive. My mount settings for xfstests is: export MOUNT_OPTIONS=-o compress-force=lzo,autodefrag There are no errors shown in dmesg. Here is the result of Alt-SysRq-W to show the blocked states: [ 413.408168] SysRq : Show Blocked State [ 413.409157] taskPC stack pid father [ 413.409157] btrfs-cleaner D 88007827c308 0 4516 2 0x [ 413.409157] 8800785d5d18 0046 8800785d5c58 8800785d5fd8 [ 413.409157] 4000 00012c80 88007ca7a210 88007cbd4af0 [ 413.409157] 88007cbd4b60 88007cbd4b38 88007f312cf0 0001 [ 413.409157] Call Trace: [ 413.409157] [8105e30f] ? dequeue_entity+0x34e/0x370 [ 413.409157] [81118593] ? find_inode+0x93/0xbe [ 413.409157] [8161e3e4] schedule+0x64/0x66 [ 413.409157] [8110525c] __sb_start_write+0x9a/0xf0 [ 413.409157] [8104e2d7] ? remove_wait_queue+0x3a/0x3a [ 413.409157] [a019018e] btrfs_run_defrag_inodes+0x20a/0x327 [btrfs] [ 413.409157] [a017a6c1] cleaner_kthread+0x95/0x122 [btrfs] [ 413.409157] [a017a62c] ? transaction_kthread+0x1a0/0x1a0 [btrfs] [ 413.409157] [8104da7c] kthread+0xba/0xc2 [ 413.409157] [8104d9c2] ? kthread_freezable_should_stop+0x52/0x52 [ 413.409157] [8161fd1c] ret_from_fork+0x7c/0xb0 [ 413.409157] [8104d9c2] ? kthread_freezable_should_stop+0x52/0x52 [ 413.409157] fsstressD 88007827c308 0 4730 4729 0x [ 413.409157] 88007717fd58 0082 88007717fe18 88007717ffd8 [ 413.409157] 4000 00012c80 81c11410 88007caf2fb0 [ 413.409157] 88007717fca8 8111bb58 88007717fe58 [ 413.409157] Call Trace: [ 413.409157] [8111bb58] ? mntput_no_expire+0x40/0x11b [ 413.409157] [8110bca6] ? complete_walk+0x92/0xda [ 413.409157] [8161e3e4] schedule+0x64/0x66 [ 413.409157] [8110525c] __sb_start_write+0x9a/0xf0 [ 413.409157] [8104e2d7] ? remove_wait_queue+0x3a/0x3a [ 413.409157] [8111bf5d] mnt_want_write+0x24/0x4b [ 413.409157] [8110e5dd] kern_path_create+0x6d/0x13f [ 413.409157] [810f7fdc] ? kmem_cache_alloc+0x31/0xf8 [ 413.409157] [8110cb01] ? getname_flags+0x74/0x158 [ 413.409157] [8110e6ee] user_path_create+0x3f/0x57 [ 413.409157] [81110a38] SyS_symlinkat+0x4a/0xc0 [ 413.409157] [81001f1b] ? do_notify_resume+0x5a/0x61 [ 413.409157] [81110ac4] SyS_symlink+0x16/0x18 [ 413.409157] [8161fdc6] system_call_fastpath+0x1a/0x1f [ 413.409157] fsstressD 88007827c308 0 4731 4729 0x [ 413.409157] 880077181e68 0082 8800785c2000 880077181fd8 [ 413.409157] 4000 00012c80 88007caf1470 88007caf0da0 [ 413.409157] 0001 0001 880077181da8 8110cc21 [ 413.409157] Call Trace: [ 413.409157] [8110cc21] ? putname+0x28/0x31 [ 413.409157] [81110553] ? user_path_at_empty+0x61/0x92 [ 413.409157] [811075b4] ? inode_get_bytes+0x1a/0x3a [ 413.409157] [811075b4] ? inode_get_bytes+0x1a/0x3a [ 413.409157] [8161e3e4] schedule+0x64/0x66 [ 413.409157] [8110525c] __sb_start_write+0x9a/0xf0 [ 413.409157] [8104e2d7] ? remove_wait_queue+0x3a/0x3a [ 413.409157] [8110386f] vfs_write+0xc2/0x18f [ 413.409157] [81103c5b] SyS_write+0x50/0x78 [ 413.409157] [8161fdc6] system_call_fastpath+0x1a/0x1f [ 413.409157] xfs_io D 0001 0 4750 4746 0x [ 413.409157] 88007726fd68 0086 88007c5a00b0 88007726ffd8 [ 413.409157] 4000 00012c80 81c11410 88007caf6d00 [ 413.409157] 88007726fca8 810c072e 8800767396c0 88007722ca50 [ 413.409157] Call Trace: [ 413.409157] [810c072e] ? unlock_page+0x24/0x28 [ 413.409157] [810dbd38] ? __do_fault+0x398/0x3cd [ 413.409157] [8161e3e4] schedule+0x64/0x66 [ 413.409157] [8161ee9c] rwsem_down_write_failed+0xf7/0x14a [ 413.409157] [8120d7f3] call_rwsem_down_write_failed+0x13/0x20 [ 413.409157] [8161d555] ? down_write+0x2e/0x32 [
Re: btrfs prof compile error on debian squeeze.
We had a discussion on this topic in another thread. I'd be happy to be corrected, but I think the conclusion was that you probably need to be on a really modern version of Linux to work with the latest version of btrfs-progs that is in the kernel git repository. The mkfs.btrfs version in the kernel git tree won't even work correctly on a kernel = 3.7, and only partially works on the 3.8 kernel. On 4/10/13, Wang Shilong wangshilong1...@gmail.com wrote: Hello, Maybe this url will help you. https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories Thanks, Wang Hello, I'm trying to build btrfs-prog on debian squeeze but when I'm trying to use make, I have an error : pc@debian:~/b/btrfs-progs$ make [LD] mkfs.btrfs mkfs.o: In function `is_ssd': /home/pc/b/btrfs-progs/mkfs.c:1234: undefined reference to `blkid_probe_get_wholedisk_devno' collect2: ld returned 1 exit status make: *** [mkfs.btrfs] Erreur 1 After a few searches over the internet, it seems that my blkid library is out of date. How can I compile btrfs prog on debian squeeze ? Thanks ! Olivier.-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: minimum kernel version for btrfsprogs.0.20?
On 4/3/13, Chris Murphy li...@colorremedies.com wrote: On Mar 29, 2013, at 9:42 AM, Mitch Harder mitch.har...@sabayonlinux.org wrote: On Fri, Mar 29, 2013 at 1:21 AM, Chris Murphy li...@colorremedies.com wrote: mkfs.btrfs -l 8192 with kernel 3.9.0 creates a file system mountable by 3.9.0 and only 3.9.0 (so far). And while there's no error making such a file system with other kernels, they won't mount the resulting file system. I'm seeing something similar. Using the current master branch of btrfs-progs (https://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git, top commit 7854c8b667654502f69e05584729146a06827bc6 Btrfs-progs: give restore a list roots option), if I run 'mkfs.btrfs -f /dev/device' on a 3.7.x vintage kernel, the mkfs operation is successful, but I can't mount the partition. I am successful on a 3.8.x vintage kernel or testing the _rc code for 3.9. If you try a leaf size other than default, it creates the file system but won't mount it, for any 3.8.x kernel I've tried including 3.8.5. Only 3.9.0 kernels are apparently mounting leaf sizes above 4KB, if the fs was created with btrfs-progs-0.20.rc1.20130308git704a08c. I disagree with cwillu regarding the default setting for extended inode refs. While the extended inode refs are a great addition and solve a long standing problem, it appears only the 3.9.0_rc kernel consistently works with extended inode refs. There should be at least a few working kernel versions out there before this becomes the default. Options like this that will make btrfs unmountable on older kernel versions need buy-in by the users. There is still the capability to enable extended inode refs with btrfstune. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: minimum kernel version for btrfsprogs.0.20?
On Fri, Mar 29, 2013 at 1:21 AM, Chris Murphy li...@colorremedies.com wrote: Chris Murphy wrote: On Mar 29, 2013, at 12:04 AM, cwillu cwi...@cwillu.com wrote: commit 1a72afaa btrfs-progs: mkfs support for extended inode refs unconditionally enables extended irefs (which permits more than 4k links to the same inode). It's the right default imo, but there probably should have been a mkfs option to disable it. mkfs.btrfs -l 8192 That is not mountable by 3.8.5. I get: [ 252.870733] btrfs: disk space caching is enabled [ 252.870740] btrfs flagging fs with big metadata feature [ 252.874944] btrfs: failed to recover relocation [ 252.885186] btrfs: open_ctree failed That's definitely not expected. mkfs.btrfs -l 8192 with kernel 3.9.0 creates a file system mountable by 3.9.0 and only 3.9.0 (so far). And while there's no error making such a file system with other kernels, they won't mount the resulting file system. I'm seeing something similar. Using the current master branch of btrfs-progs (https://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git, top commit 7854c8b667654502f69e05584729146a06827bc6 Btrfs-progs: give restore a list roots option), if I run 'mkfs.btrfs -f /dev/device' on a 3.7.x vintage kernel, the mkfs operation is successful, but I can't mount the partition. I am successful on a 3.8.x vintage kernel or testing the _rc code for 3.9. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: zlib vs lzo uncompress speed, ssd vs nossd
On Wed, Mar 27, 2013 at 11:53 AM, Marc MERLIN m...@merlins.org wrote: Is my feeling of slower boot wrong, or is zlib also noticeably slower than lzo to read and decompress? Lzo compression should be faster in every aspect than zlib, especially for reading. But having said that, btrfs won't recompress any existing files just because you switch your mount option from lzo to zlib. Only newly written files will be zlib, and btrfs will leave the lzo-compressed files alone unless they are re-written, or you expressly recompress them using the defrag tool. If you were to take a snapshot of your root partition, and reboot to the snapshot as the new root with zlib compression, you could make some side-by-side comparisons of boot time to clarify your impressions. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: zlib vs lzo uncompress speed, ssd vs nossd
On Wed, Mar 27, 2013 at 4:22 PM, Marc MERLIN m...@merlins.org wrote: On Wed, Mar 27, 2013 at 04:12:27PM -0500, Mitch Harder wrote: On Wed, Mar 27, 2013 at 11:53 AM, Marc MERLIN m...@merlins.org wrote: Is my feeling of slower boot wrong, or is zlib also noticeably slower than lzo to read and decompress? Lzo compression should be faster in every aspect than zlib, especially for reading. But having said that, btrfs won't recompress any existing files just because you switch your mount option from lzo to zlib. Only newly written files will be zlib, and btrfs will leave the lzo-compressed files alone unless they are re-written, or you expressly recompress them using the defrag tool. That was my intent at the time, I thought that zlib decompression was about as fast as lzo, so it would have been good that most my files stayed compressed as zlib. Turns out I was wrong :) If you were to take a snapshot of your root partition, and reboot to the snapshot as the new root with zlib compression, you could make some side-by-side comparisons of boot time to clarify your impressions. Fair point. By that, you mean degrag all my files somehow (recompressing as lzo, and doubling the size of my rootfs)? Also, I was re-reading ssd vs nossd: https://btrfs.wiki.kernel.org/index.php/Mount_options isn't clear whether these are read/write ordering optimizations, or filesystem layout optimization (i.e. you'd have to recreate the entire FS, and rewrite everything). http://www.phoronix.com/scan.php?page=articleitem=btrfs_ssd_modenum=1 says 'However, unless disabling the write cache for the drive, the SSD mode does not necessarily mean better performance. In fact, as our results are about to show, the quantitative disk performance can drop greatly in the SSD mode when the write cache remains enabled' But that's from 2009, so not very relevant to today. Do you happen to know more than me on this? I'm sorry, I have no experience with the ssd mount option. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs stuck on
On Thu, Mar 21, 2013 at 1:56 PM, Ask Bjørn Hansen a...@develooper.com wrote: Hello, A few weeks ago I replaced a ZFS backup system with one backed by btrfs. A script loops over a bunch of hosts rsyncing them to each their own subvolume. After each rsync I snapshot the host-specific subvolume. The disk is an iscsi disk that in my benchmarks performs roughly like a local raid with 2-3 SATA disks. It worked fine for about a week (~150 snapshots from ~20 sub volumes) before it suddenly exploded in disk io wait. Doing anything (in particular changes) on the file system is just insanely slow, rsync basically can't complete (an rsync that should take 10-20 minutes takes 24 hours; I have a directory of 60k files I tried deleting and it's deleting one file every few minutes, that sort of thing). I am using 3.8.2-206.fc18.x86_64 (Fedora 18). I tried rebooting, it doesn't make a difference. As soon as I boot [btrfs-cleaner] and [btrfs-transacti] gets really busy. I wonder if it's because I deleted a few snapshots at some point? The file system is mounted with -o compress=zlib,noatime # mount | grep tank /dev/sdc on /tank type btrfs (rw,noatime,seclabel,compress=zlib,space_cache,_netdev) I don't recall mounting it with space_cache; though I don't think that's the default so I wonder if I did do that at some point. Could that be what's messing me up? btrfs-cleaner stack: # cat /proc/1117/stack [a022598a] btrfs_commit_transaction+0x36a/0xa70 [btrfs] [a022677f] start_transaction+0x23f/0x460 [btrfs] [a0226cb8] btrfs_start_transaction+0x18/0x20 [btrfs] [a021487f] btrfs_drop_snapshot+0x3ef/0x5d0 [btrfs] [a0226e1f] btrfs_clean_old_snapshots+0x9f/0x120 [btrfs] [a021eda9] cleaner_kthread+0xa9/0x120 [btrfs] [81081f90] kthread+0xc0/0xd0 [816584ac] ret_from_fork+0x7c/0xb0 [] 0x btrfs-transaction stack: # cat /proc/1118/stack [a0256b35] btrfs_tree_read_lock+0x95/0x110 [btrfs] [a020033b] btrfs_read_lock_root_node+0x3b/0x50 [btrfs] [a0205649] btrfs_search_slot+0x3f9/0x7a0 [btrfs] [a020be5e] lookup_inline_extent_backref+0x8e/0x4d0 [btrfs] [a020dd38] __btrfs_free_extent+0xc8/0x870 [btrfs] [a0211f29] run_clustered_refs+0x459/0xb50 [btrfs] [a0215e48] btrfs_run_delayed_refs+0xc8/0x2f0 [btrfs] [a02256a6] btrfs_commit_transaction+0x86/0xa70 [btrfs] [a021e7c5] transaction_kthread+0x1a5/0x220 [btrfs] [81081f90] kthread+0xc0/0xd0 [816584ac] ret_from_fork+0x7c/0xb0 [] 0x Thank you for reading this far. Any suggestions would be most appreciated! The space_cache option is probably not the issue. As you've guessed, this gets activated by default. The cleaner runs to remove deleted snapshots. Responsiveness while the cleaner is running has been an issue that has come up, but it is usually just an inconvenience. I can't recall hearing about a slowdown of this degree while the cleaner is running. I haven't noticed many discussions on the Btrfs mailing list where Btrfs is used in the context of iSCSI, so you may be seeing new issues in your use case. If you can, it would be interesting to know how well the cleaner runs across iSCSI if nothing else is running. If you could delete a single snapshot, and make note of the space used before and after the cleaner finishes and the time required, this might help isolate the issue. As a work-around, I would suggest using a script to delete the files in the subvolume before removing the snapshot. This way, you will have more control over the priority given to the deletion process. Once the subvolume is empty, the cleaner usually runs much better. :) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problems with compiling btrfs
On Thu, Mar 21, 2013 at 4:46 PM, Avi Miller avi.mil...@oracle.com wrote: Hi, On 22/03/2013, at 8:11 AM, Joseph Moore jap...@gmail.com wrote: [root@ol6 btrfs-progs]# uname -a Linux ol6.localdomain 2.6.39-400.17.2.el6uek.x86_64 #1 SMP Wed Mar 13 12:31:05 PDT 2013 x86_64 x86_64 x86_64 GNU/Linux This is the currently shipping Oracle Linux 6 UEK and as such, doesn't support a newer btrfs-progs. If you want to run a newer btrfs, you should install the 3.8 kernel from our playground channel on public-yum.oracle.com and then you can compile a newer btrfs-progs to match. I've also asked the playground build team to build a newer btrfs-progs RPM for the playground channel, but I'm not sure what the timeframes on that would be. -- Oracle http://www.oracle.com Avi Miller | Principal Program Manager | +61 (412) 229 687 Oracle Linux and Virtualization 417 St Kilda Road, Melbourne, Victoria 3004 Australia I have also run into the same problem on Enterprise Linux 6.3 (Scientific Linux in my case). It is relatively trivial to get a current kernel from sources like ELREPO, so I was hoping to use my Scientific Linux partition at least for rescue and evaluation. Is the position of the Btrfs Developer community that Enterprise Linux 6.x is not to be supported? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mkfs.btrfs broken
On Thu, Mar 7, 2013 at 12:10 PM, Swâmi Petaramesh sw...@petaramesh.org wrote: Le 07/03/2013 19:06, Jérôme Poulin a écrit : mkfs.btrfs tries to lookup loop devices by their filenames and fails if any loop device file is missing. Hmm Why would mkfs.btrfs want to lookup anything else but the device we're trying to format, to check if it's mounted or not ? At Sabayon, we pretty-much hacked our way around this with a make-it-go kind of patch. Otherwise, our installation would break with btrfs on our Live-[CD/DVD/USB] media. I know we should have taken the time to put together a proper solution, but I could never figure out the reasoning for needing to scan every device either. --- btrfs-progs-0.19.orig/utils.c +++ btrfs-progs-0.19/utils.c @@ -708,6 +708,21 @@ int is_same_blk_file(const char* a, cons return 0; } +/* Checks if a file exists and is a block or regular file*/ +int is_existing_blk_or_reg_file(const char* filename) +{ + struct stat st_buf; + + if(stat(filename, st_buf) 0) { + if(errno == ENOENT) + return 0; + else + return -errno; + } + + return (S_ISBLK(st_buf.st_mode) || S_ISREG(st_buf.st_mode)); +} + /* checks if a and b are identical or device * files associated with the same block device or * if one file is a loop device that uses the other @@ -727,7 +742,10 @@ int is_same_loop_file(const char* a, con } else if(ret) { if((ret = resolve_loop_device(a, res_a, sizeof(res_a))) 0) return ret; - + /* if the resolved path is not available, there is nothing + we can do */ + if((ret = is_existing_blk_or_reg_file(res_a)) == 0) + return ret; final_a = res_a; } else { final_a = a; @@ -739,6 +757,10 @@ int is_same_loop_file(const char* a, con } else if(ret) { if((ret = resolve_loop_device(b, res_b, sizeof(res_b))) 0) return ret; + /* if the resolved path is not available, there is nothing + we can do */ + if((ret = is_existing_blk_or_reg_file(res_b)) == 0) + return ret; final_b = res_b; } else { @@ -748,21 +770,6 @@ int is_same_loop_file(const char* a, con return is_same_blk_file(final_a, final_b); } -/* Checks if a file exists and is a block or regular file*/ -int is_existing_blk_or_reg_file(const char* filename) -{ - struct stat st_buf; - - if(stat(filename, st_buf) 0) { - if(errno == ENOENT) - return 0; - else - return -errno; - } - - return (S_ISBLK(st_buf.st_mode) || S_ISREG(st_buf.st_mode)); -} - /* Checks if a file is used (directly or indirectly via a loop device) * by a device in fs_devices */ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix wrong outstanding_extents when doing DIO write
On Thu, Feb 21, 2013 at 7:26 AM, Chris Mason chris.ma...@fusionio.com wrote: On Thu, Feb 21, 2013 at 02:48:22AM -0700, Miao Xie wrote: When running the 083th case of xfstests on the filesystem with compress-force=lzo, the following WARNINGs were triggered. WARNING: at fs/btrfs/inode.c:7908 WARNING: at fs/btrfs/inode.c:7909 WARNING: at fs/btrfs/inode.c:7911 WARNING: at fs/btrfs/extent-tree.c:4510 WARNING: at fs/btrfs/extent-tree.c:4511 This problem was introduced by the patch Btrfs: fix deadlock due to unsubmitted. In this patch, there are two bugs which caused the above problem. I saw this as well on test 132 last night. My plan was to track it down this morning, so discovering it already fixed while I slept was wonderful. Thanks Miao. Josef I've got this one and Miao's defrag unmount patch queued up. Thanks, I've also tested this patch, and it cleared the error I was receiving. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix cleaner thread not working with inode cache option
On Wed, Feb 20, 2013 at 8:10 AM, Liu Bo bo.li@oracle.com wrote: Right now inode cache inode is treated as the same as space cache inode, ie. keep inode in memory till putting super. But this leads to an awkward situation. If we're going to delete a snapshot/subvolume, btrfs will not actually delete it and return free space, but will add it to dead roots list until the last inode on this snap/subvol being destroyed. Then we'll fetch deleted roots and cleanup them via cleaner thread. So here is the problem, if we enable inode cache option, each snap/subvol has a cached inode which is used to store inode allcation information. And this cache inode will be kept in memory, as the above said. So with inode cache, snap/subvol can only be added into dead roots list during freeing roots stage in umount, so that we can ONLY get space back after another remount(we cleanup dead roots on mount). But the real thing is we'll no more use the snap/subvol if we mark it deleted, so we can safely iput its cache inode when we delete snap/subvol. Another thing is that we need to change the rules of droping inode, we don't keep snap/subvol's cache inode in memory till end so that we can add snap/subvol into dead roots list in time. Reported-by: Mitch Harder mitch.har...@sabayonlinux.org Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/inode.c |3 ++- fs/btrfs/ioctl.c |6 ++ 2 files changed, 8 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index ca7ace7..d9984fa 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7230,8 +7230,9 @@ int btrfs_drop_inode(struct inode *inode) { struct btrfs_root *root = BTRFS_I(inode)-root; + /* the snap/subvol tree is on deleting */ if (btrfs_root_refs(root-root_item) == 0 - !btrfs_is_free_space_inode(inode)) + root != root-fs_info-tree_root) return 1; else return generic_drop_inode(inode); diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index a31cd93..375f31f 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2171,6 +2171,12 @@ out_unlock: shrink_dcache_sb(root-fs_info-sb); btrfs_invalidate_inodes(dest); d_delete(dentry); + + /* the last ref */ + if (dest-cache_inode) { + iput(dest-cache_inode); + dest-cache_inode = NULL; + } } out_dput: dput(dentry); -- 1.7.7.6 Thanks, I tested this patch, and it fixes the issues I was seeing. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Kernel WARNINGs on btrfs-next
I'm getting a series of kernel WARNING messages when testing Josef's btrfs-next and Chris' next branch running xfstests 083 when mounted with compress-force=lzo. I'm not seeing any other indications of problems other than the WARNINGs on xfstests 083, so this may be some sort of false positive. Here are the messages against Chris' -next branch (the same warnings are being generated against josef's branch, except against a 3.7.x kernel): [ 553.194991] [ cut here ] [ 553.195002] WARNING: at fs/btrfs/inode.c:7908 btrfs_destroy_inode+0x67/0x25b [btrfs]() [ 553.195043] Hardware name: OptiPlex 745 [ 553.195046] Modules linked in: ipv6 snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep ppdev parport_pc snd_pcm snd_page_alloc snd_timer snd floppy sr_mod i2c_i801 tg3 ptp iTCO_wdt pps_core iTCO_vendor_support ehci_pci parport lpc_ich microcode serio_raw pcspkr ablk_helper cryptd lrw xts gf128mul aes_x86_64 sha256_generic fuse xfs nfs lockd sunrpc reiserfs btrfs zlib_deflate ext4 jbd2 ext3 jbd ext2 mbcache sl811_hcd hid_generic xhci_hcd ohci_hcd uhci_hcd ehci_hcd [ 553.195099] Pid: 4674, comm: rm Not tainted 3.8.0-mason-next+ #1 [ 553.195102] Call Trace: [ 553.195112] [81030522] warn_slowpath_common+0x83/0x9b [ 553.195118] [81030554] warn_slowpath_null+0x1a/0x1c [ 553.195135] [a018d69e] btrfs_destroy_inode+0x67/0x25b [btrfs] [ 553.195141] [8111759a] destroy_inode+0x3b/0x54 [ 553.195145] [811176fc] evict+0x149/0x151 [ 553.195149] [81117f82] iput+0x12c/0x135 [ 553.195166] [a0187f42] ? btrfs_unlink_inode+0x38/0x40 [btrfs] [ 553.195171] [8110de10] do_unlinkat+0x145/0x1df [ 553.195177] [81106e9f] ? sys_newfstatat+0x2a/0x33 [ 553.195191] [8110fce5] sys_unlinkat+0x29/0x2b [ 553.195212] [81607746] system_call_fastpath+0x1a/0x1f [ 553.195224] ---[ end trace 0adc4db1ad1a6634 ]--- [ 553.195231] [ cut here ] [ 553.195247] WARNING: at fs/btrfs/inode.c:7909 btrfs_destroy_inode+0x7e/0x25b [btrfs]() [ 553.195249] Hardware name: OptiPlex 745 [ 553.195251] Modules linked in: ipv6 snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep ppdev parport_pc snd_pcm snd_page_alloc snd_timer snd floppy sr_mod i2c_i801 tg3 ptp iTCO_wdt pps_core iTCO_vendor_support ehci_pci parport lpc_ich microcode serio_raw pcspkr ablk_helper cryptd lrw xts gf128mul aes_x86_64 sha256_generic fuse xfs nfs lockd sunrpc reiserfs btrfs zlib_deflate ext4 jbd2 ext3 jbd ext2 mbcache sl811_hcd hid_generic xhci_hcd ohci_hcd uhci_hcd ehci_hcd [ 553.195296] Pid: 4674, comm: rm Tainted: GW3.8.0-mason-next+ #1 [ 553.195298] Call Trace: [ 553.195304] [81030522] warn_slowpath_common+0x83/0x9b [ 553.195308] [81030554] warn_slowpath_null+0x1a/0x1c [ 553.195324] [a018d6b5] btrfs_destroy_inode+0x7e/0x25b [btrfs] [ 553.195329] [8111759a] destroy_inode+0x3b/0x54 [ 553.195333] [811176fc] evict+0x149/0x151 [ 553.195336] [81117f82] iput+0x12c/0x135 [ 553.195352] [a0187f42] ? btrfs_unlink_inode+0x38/0x40 [btrfs] [ 553.195356] [8110de10] do_unlinkat+0x145/0x1df [ 553.195360] [81106e9f] ? sys_newfstatat+0x2a/0x33 [ 553.195364] [8110fce5] sys_unlinkat+0x29/0x2b [ 553.195368] [81607746] system_call_fastpath+0x1a/0x1f [ 553.195371] ---[ end trace 0adc4db1ad1a6635 ]--- [ 553.195373] [ cut here ] [ 553.195389] WARNING: at fs/btrfs/inode.c:7911 btrfs_destroy_inode+0xae/0x25b [btrfs]() [ 553.195391] Hardware name: OptiPlex 745 [ 553.195393] Modules linked in: ipv6 snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep ppdev parport_pc snd_pcm snd_page_alloc snd_timer snd floppy sr_mod i2c_i801 tg3 ptp iTCO_wdt pps_core iTCO_vendor_support ehci_pci parport lpc_ich microcode serio_raw pcspkr ablk_helper cryptd lrw xts gf128mul aes_x86_64 sha256_generic fuse xfs nfs lockd sunrpc reiserfs btrfs zlib_deflate ext4 jbd2 ext3 jbd ext2 mbcache sl811_hcd hid_generic xhci_hcd ohci_hcd uhci_hcd ehci_hcd [ 553.195437] Pid: 4674, comm: rm Tainted: GW3.8.0-mason-next+ #1 [ 553.195439] Call Trace: [ 553.195444] [81030522] warn_slowpath_common+0x83/0x9b [ 553.195449] [81030554] warn_slowpath_null+0x1a/0x1c [ 553.195463] [a018d6e5] btrfs_destroy_inode+0xae/0x25b [btrfs] [ 553.195470] [8111759a] destroy_inode+0x3b/0x54 [ 553.195474] [811176fc] evict+0x149/0x151 [ 553.195480] [81117f82] iput+0x12c/0x135 [ 553.195495] [a0187f42] ? btrfs_unlink_inode+0x38/0x40 [btrfs] [ 553.195499] [8110de10] do_unlinkat+0x145/0x1df [ 553.195504] [81106e9f] ? sys_newfstatat+0x2a/0x33 [ 553.195508] [8110fce5] sys_unlinkat+0x29/0x2b [ 553.195512] [81607746] system_call_fastpath+0x1a/0x1f [ 553.195515] ---[ end trace 0adc4db1ad1a6636 ]--- [ 553.404031] [ cut here
Snapshot Cleaner not Working with inode_cache
I've encountered an issue where the space from previously deleted snapshots is not being freed up by the cleaner thread. I'm only encountering this issue when I mount with the inode_cache option. I've reproduced this on a 3.7.9 kernel merged with the latest for-linus branch. No additional patches are involved. My testing partition is 16 GB. There is nothing in dmesg indicating any issues. A simple manual test can reproduce the issue on my box (1) Format a fresh, scratch btrfs partition (it would probably work with an existing test partition, but I always like to test things that seem broken on a scratch partition). (2) Mount partition (my options are -o compress-force=lzo,inode_cache). My mount command was: mount -o compress-force=lzo,inode_cache /dev/sda7 /mnt/benchmark/ (3) Make a subvolume: cd /mnt/device; btrfs su create test1 (4) Untar kernel sources to the subvolume: cd test1; tar -xpf path/to/kernel/source/tarball I believe anything you use to populate the subvolume is sufficient. (5) Make a note of the disk usage: df -T /mnt/device (6) Remove the subvolume: cd ..; btrfs su delete test1 (7) Wait 2 minutes, and notice that the space has not been freed up. I've waited much longer, but I forget the exact timeout on the cleaner thread. df -T /mnt/device If I unmount and remount the partition with the same mount options, the cleaner will begin to correctly free the space. I've never used the inode_cache option before, so I'll try a few other kernels to see if this is a regression. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Btrfs: Allow the compressed extent size limit to be modified v2
On Fri, Feb 8, 2013 at 8:53 AM, David Sterba dste...@suse.cz wrote: On Thu, Feb 07, 2013 at 11:17:46PM -0600, Mitch Harder wrote: On Thu, Feb 7, 2013 at 6:28 PM, David Sterba d...@jikos.cz wrote: On Thu, Feb 07, 2013 at 03:38:34PM -0600, Mitch Harder wrote: --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -144,7 +144,7 @@ struct tree_block { unsigned int key_ready:1; }; -#define MAX_EXTENTS 128 +#define MAX_EXTENTS 512 Is this really related to compression? IIRC I've seen it only in context of batch work in reloc, but not anywhere near compression. (I may be wrong of course, just checking). When you defragment compressed extents, it will run through relocation. If autodefrag is enabled, I found most everything I touched was running through relocation. AFAIK defragmentation runs through the writeback loop, blocks are marked dirty, delalloc tries to make them contiguous and then synced back to disk. Autodefrag uses the same loop, just affects newly written data. It has been a while since I looked at the issue, but I think balancing your data will also run through relocation. Balance does go through reloc for sure. From the commit that introduces MAX_EXTENTS it's imo quite clear that it's only a balance speedup: (0257bb82d21bedff26541bcf12f1461c23f9ed61) Btrfs: relocate file extents in clusters The extent relocation code copy file extents one by one when relocating data block group. This is inefficient if file extents are small. This patch makes the relocation code copy file extents in clusters. So we can can make better use of read-ahead. In an earlier version of the patch, I had the changes to relocation.c in a separate patch. But, I couldn't consistently attain the changed maximum extent size unless I also addressed the issue with relocation. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] Btrfs: Allow the compressed extent size limit to be modified v2
Provide for modification of the limit of compressed extent size utilizing mount-time configuration settings. The size of compressed extents was limited to 128K, which leads to fragmentation of the extents (although the extents themselves may still be located contiguously). This limit is put in place to ease the RAM required when spreading compression across several CPUs, and to make sure the amount of IO required to do a random read is reasonably small. Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org --- Changelog v1 - v2: - Use more self-documenting variable name: compressed_extent_size - max_compressed_extent_kb - Use #define BTRFS_DEFAULT_MAX_COMPR_EXTENTS instead of raw 128. - Fix min calculation for nr_pages. - Comment cleanup. - Use more self-documenting mount option parameter: compressed_extent_size - max_compressed_extent_kb - Fix formatting in btrfs_show_options. --- fs/btrfs/ctree.h | 6 ++ fs/btrfs/disk-io.c| 1 + fs/btrfs/inode.c | 8 fs/btrfs/relocation.c | 7 --- fs/btrfs/super.c | 20 +++- 5 files changed, 34 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 547b7b0..a62f20c 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -191,6 +191,9 @@ static int btrfs_csum_sizes[] = { 4, 0 }; /* ioprio of readahead is set to idle */ #define BTRFS_IOPRIO_READA (IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0)) +/* Default value for maximum compressed extent size (kb) */ +#define BTRFS_DEFAULT_MAX_COMPR_EXTENTS128 + /* * The key defines the order in the tree, and so it also defines (optimal) * block layout. @@ -1477,6 +1480,8 @@ struct btrfs_fs_info { unsigned data_chunk_allocations; unsigned metadata_ratio; + unsigned max_compressed_extent_kb; + void *bdev_holder; /* private scrub information */ @@ -1829,6 +1834,7 @@ struct btrfs_ioctl_defrag_range_args { #define BTRFS_MOUNT_CHECK_INTEGRITY(1 20) #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1 21) #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR (1 22) +#define BTRFS_MOUNT_COMPR_EXTENT_SIZE (1 23) #define btrfs_clear_opt(o, opt)((o) = ~BTRFS_MOUNT_##opt) #define btrfs_set_opt(o, opt) ((o) |= BTRFS_MOUNT_##opt) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 830bc17..775e7ba 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2056,6 +2056,7 @@ int open_ctree(struct super_block *sb, fs_info-trans_no_join = 0; fs_info-free_chunk_space = 0; fs_info-tree_mod_log = RB_ROOT; + fs_info-max_compressed_extent_kb = BTRFS_DEFAULT_MAX_COMPR_EXTENTS; /* readahead state */ INIT_RADIX_TREE(fs_info-reada_tree, GFP_NOFS ~__GFP_WAIT); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 148abeb..78fc6eb 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -346,8 +346,8 @@ static noinline int compress_file_range(struct inode *inode, unsigned long nr_pages_ret = 0; unsigned long total_compressed = 0; unsigned long total_in = 0; - unsigned long max_compressed = 128 * 1024; - unsigned long max_uncompressed = 128 * 1024; + unsigned long max_compressed = root-fs_info-max_compressed_extent_kb * 1024; + unsigned long max_uncompressed = root-fs_info-max_compressed_extent_kb * 1024; int i; int will_compress; int compress_type = root-fs_info-compress_type; @@ -361,7 +361,7 @@ static noinline int compress_file_range(struct inode *inode, again: will_compress = 0; nr_pages = (end PAGE_CACHE_SHIFT) - (start PAGE_CACHE_SHIFT) + 1; - nr_pages = min(nr_pages, (128 * 1024UL) / PAGE_CACHE_SIZE); + nr_pages = min(nr_pages, max_compressed / PAGE_CACHE_SIZE); /* * we don't want to send crud past the end of i_size through @@ -386,7 +386,7 @@ again: * * We also want to make sure the amount of IO required to do * a random read is reasonably small, so we limit the size of -* a compressed extent to 128k. +* a compressed extent. */ total_compressed = min(total_compressed, max_uncompressed); num_bytes = (end - start + blocksize) ~(blocksize - 1); diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 300e09a..64bbc9e 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -144,7 +144,7 @@ struct tree_block { unsigned int key_ready:1; }; -#define MAX_EXTENTS 128 +#define MAX_EXTENTS 512 struct file_extent_cluster { u64 start; @@ -3055,6 +3055,7 @@ int relocate_data_extent(struct inode *inode, struct btrfs_key *extent_key, struct file_extent_cluster *cluster) { int ret; + struct btrfs_fs_info *fs_info = BTRFS_I(inode)-root-fs_info; if (cluster-nr 0 extent_key-objectid != cluster-end + 1) { ret
Re: [RFC] Btrfs: Allow the compressed extent size limit to be modified v2
On Thu, Feb 7, 2013 at 6:28 PM, David Sterba d...@jikos.cz wrote: On Thu, Feb 07, 2013 at 03:38:34PM -0600, Mitch Harder wrote: --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -144,7 +144,7 @@ struct tree_block { unsigned int key_ready:1; }; -#define MAX_EXTENTS 128 +#define MAX_EXTENTS 512 Is this really related to compression? IIRC I've seen it only in context of batch work in reloc, but not anywhere near compression. (I may be wrong of course, just checking). When you defragment compressed extents, it will run through relocation. If autodefrag is enabled, I found most everything I touched was running through relocation. It has been a while since I looked at the issue, but I think balancing your data will also run through relocation. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] Btrfs: Allow the compressed extent size limit to be modified.
Provide for modification of the limit of compressed extent size utilizing mount-time configuration settings. The size of compressed extents was limited to 128K, which leads to fragmentation of the extents (although the extents themselves may still be located contiguously). This limit is put in place to ease the RAM required when spreading compression across several CPUs, and to make sure the amount of IO required to do a random read is reasonably small. This patch is still preliminary. In this version of the patch, the allowed compressed extent size is restricted to 128 (the default) and 512. I wanted to extensively test a single value for a change in compressed extent size before expanding and testing a wider range of parameters. I submitted a similar patch about a year and a half ago where the change was hard-coded and not tuneable. http://comments.gmane.org/gmane.comp.file-systems.btrfs/10516 --- fs/btrfs/ctree.h | 3 +++ fs/btrfs/disk-io.c| 1 + fs/btrfs/inode.c | 8 fs/btrfs/relocation.c | 7 --- fs/btrfs/super.c | 19 ++- 5 files changed, 30 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 547b7b0..f37ec32 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1477,6 +1477,8 @@ struct btrfs_fs_info { unsigned data_chunk_allocations; unsigned metadata_ratio; + unsigned compressed_extent_size; + void *bdev_holder; /* private scrub information */ @@ -1829,6 +1831,7 @@ struct btrfs_ioctl_defrag_range_args { #define BTRFS_MOUNT_CHECK_INTEGRITY(1 20) #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1 21) #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR (1 22) +#define BTRFS_MOUNT_COMPR_EXTENT_SIZE (1 23) #define btrfs_clear_opt(o, opt)((o) = ~BTRFS_MOUNT_##opt) #define btrfs_set_opt(o, opt) ((o) |= BTRFS_MOUNT_##opt) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 830bc17..2d2be03 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2056,6 +2056,7 @@ int open_ctree(struct super_block *sb, fs_info-trans_no_join = 0; fs_info-free_chunk_space = 0; fs_info-tree_mod_log = RB_ROOT; + fs_info-compressed_extent_size = 128; /* readahead state */ INIT_RADIX_TREE(fs_info-reada_tree, GFP_NOFS ~__GFP_WAIT); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 148abeb..5b81b56 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -346,8 +346,8 @@ static noinline int compress_file_range(struct inode *inode, unsigned long nr_pages_ret = 0; unsigned long total_compressed = 0; unsigned long total_in = 0; - unsigned long max_compressed = 128 * 1024; - unsigned long max_uncompressed = 128 * 1024; + unsigned long max_compressed = root-fs_info-compressed_extent_size * 1024; + unsigned long max_uncompressed = root-fs_info-compressed_extent_size * 1024; int i; int will_compress; int compress_type = root-fs_info-compress_type; @@ -361,7 +361,7 @@ static noinline int compress_file_range(struct inode *inode, again: will_compress = 0; nr_pages = (end PAGE_CACHE_SHIFT) - (start PAGE_CACHE_SHIFT) + 1; - nr_pages = min(nr_pages, (128 * 1024UL) / PAGE_CACHE_SIZE); + nr_pages = min(nr_pages, (max_compressed * 1024UL) / PAGE_CACHE_SIZE); /* * we don't want to send crud past the end of i_size through @@ -386,7 +386,7 @@ again: * * We also want to make sure the amount of IO required to do * a random read is reasonably small, so we limit the size of -* a compressed extent to 128k. +* a compressed extent (default of 128k). */ total_compressed = min(total_compressed, max_uncompressed); num_bytes = (end - start + blocksize) ~(blocksize - 1); diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 300e09a..8d6f6bf 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -144,7 +144,7 @@ struct tree_block { unsigned int key_ready:1; }; -#define MAX_EXTENTS 128 +#define MAX_EXTENTS 512 struct file_extent_cluster { u64 start; @@ -3055,6 +3055,7 @@ int relocate_data_extent(struct inode *inode, struct btrfs_key *extent_key, struct file_extent_cluster *cluster) { int ret; + struct btrfs_fs_info *fs_info = BTRFS_I(inode)-root-fs_info; if (cluster-nr 0 extent_key-objectid != cluster-end + 1) { ret = relocate_file_extent_cluster(inode, cluster); @@ -3066,12 +3067,12 @@ int relocate_data_extent(struct inode *inode, struct btrfs_key *extent_key, if (!cluster-nr) cluster-start = extent_key-objectid; else - BUG_ON(cluster-nr = MAX_EXTENTS); + BUG_ON(cluster-nr = fs_info-compressed_extent_size); cluster-end = extent_key-objectid +
Re: [RFC] Btrfs: Allow the compressed extent size limit to be modified.
On Wed, Feb 6, 2013 at 12:46 PM, Zach Brown z...@redhat.com wrote: + unsigned compressed_extent_size; It kind of jumps out that this mentions neither that it's the max nor that it's in KB. How about max_compressed_extent_kb? + fs_info-compressed_extent_size = 128; I'd put a DEFAULT_MAX_EXTENTS up by the MAX_ definition instead of using a raw 128 here. + unsigned long max_compressed = root-fs_info-compressed_extent_size * 1024; + unsigned long max_uncompressed = root-fs_info-compressed_extent_size * 1024; (so max_compressed is in bytes) nr_pages = (end PAGE_CACHE_SHIFT) - (start PAGE_CACHE_SHIFT) + 1; - nr_pages = min(nr_pages, (128 * 1024UL) / PAGE_CACHE_SIZE); + nr_pages = min(nr_pages, (max_compressed * 1024UL) / PAGE_CACHE_SIZE); (and now that expression adds another * 1024, allowing {128,512}MB extents :)) Yuk! I'm surprised this never manifested as a problem during testing. * We also want to make sure the amount of IO required to do * a random read is reasonably small, so we limit the size of - * a compressed extent to 128k. + * a compressed extent (default of 128k). Just drop the value so that this comment doesn't need to be updated again. -* a compressed extent to 128k. +* a compressed extent. + {Opt_compr_extent_size, compressed_extent_size=%d}, It's even more important to make the exposed option self-documenting than it was to get the fs_info member right. + if ((intarg == 128) || (intarg == 512)) { + info-compressed_extent_size = intarg; + printk(KERN_INFO btrfs: compressed extent size %d\n, +info-compressed_extent_size); + } else { + printk(KERN_INFO btrfs: + Invalid compressed extent size, + using default.\n); I'd print the default value when it's used and would include a unit in both. + if (btrfs_test_opt(root, COMPR_EXTENT_SIZE)) + seq_printf(seq, ,compressed_extent_size=%d, +(unsigned long long)info-compressed_extent_size); The (ull) cast doesn't match the %d format and wouldn't be needed if it was printed with %u. - z Thanks for the review. All these comments make sense, and I should be able to work them in. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs: fix race between snapshot deletion and getting inode
On Mon, Jan 28, 2013 at 9:52 PM, Chris Mason chris.ma...@fusionio.com wrote: On Mon, Jan 28, 2013 at 08:22:10PM -0700, Liu Bo wrote: While running snapshot testscript created by Mitch and David, the race between autodefrag and snapshot deletion can lead to corruption of dead_root list so that we can get crash on btrfs_clean_old_snapshots(). Really nice. Thanks to everyone that hashed this out. -chris I've been testing [PATCH v2] Btrfs: fix race between snapshot deletion and getting inode along with [PATCH v6] Btrfs: snapshot-aware defrag using the same work flow that was reproducing the dead_root list corruptions. I've been unable to reproduce the error in ~24 hours of testing. Normally, I'd hit the error within an hour of testing on a single run. I've made three separate runs, and let the last run proceed overnight. I'll keep using these patches, and let you know if anything turns up. Thanks for all your work on this patch set. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix race between snapshot deletion and getting inode
On Mon, Jan 28, 2013 at 5:04 AM, Liu Bo bo.li@oracle.com wrote: While running snapshot testscript created by Mitch and David, the race between autodefrag and snapshot deletion can lead to corruption of dead_root list so that we can get crash on btrfs_clean_old_snapshots(). And besides autodefrag, scrub also do the same thing, ie. read root first and get inode. Here is the story(take autodefrag as an example): (1) when we delete a snapshot or subvolume, it will set its root's refs to zero and do a iput() on its own inode, and if this inode happens to be the only active in-meory one in root's inode rbtree, it will add itself to the global dead_roots list for later cleanup. (2) after (1), the autodefrag thread may read another inode for defrag and the inode is just in the deleted snapshot/subvolume, but all of these are without checking if the root is still valid(refs 0). So the end up result is adding the deleted snapshot/subvolume's root to the global dead_roots list AGAIN. Fortunately, we already have a srcu lock to avoid the race, ie. subvol_srcu. So all we need to do is to take the lock to protect 'read root and get inode', since we synchronize to wait for the rcu grace period before adding something to the global dead_roots list. Reported-by: Mitch Harder mitch.har...@sabayonlinux.org Signed-off-by: Liu Bo bo.li@oracle.com I'm still seeing seeing issues with duplications in the dead_roots list. I'm using a 3.7.4 kernel merged with the for-linus branch with the following four patches: [PATCH V5] Btrfs: snapshot-aware defrag [PATCH] Btrfs: List Debugging for cleaning deleted Non-functional patch to issue some trace_printk debugging. [PATCH] [RFC] Btrfs: Check for duplicate dead root list This is the patch discussed in the snapshot-aware defrag thread. It checks for duplicate list entries, and dumps a backtrace if it finds one. Btrfs: fix race between snapshot deletion and getting inode I've run into several backtraces similar to the following: [ 3129.368196] btrfs: Duplicate dead root entry. [ 3129.368199] [ cut here ] [ 3129.368220] WARNING: at fs/btrfs/transaction.c:893 btrfs_add_dead_root+0x73/0xbc [btrfs]() [ 3129.368223] Hardware name: OptiPlex 745 [ 3129.368224] Modules linked in: ipv6 snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc iTCO_wdt ppdev iTCO_vendor_support i2c_i801 parport_pc floppy tg3 sr_mod microcode snd_timer snd lpc_ich serio_raw pcspkr parport ablk_helper cryptd lrw xts gf128mul aes_x86_64 sha256_generic fuse xfs nfs lockd sunrpc reiserfs btrfs zlib_deflate ext4 jbd2 ext3 jbd ext2 mbcache sl811_hcd hid_generic xhci_hcd ohci_hcd uhci_hcd ehci_hcd [ 3129.368268] Pid: 4309, comm: btrfs-endio-wri Tainted: GW 3.7.4-sad-v2+ #1 [ 3129.368271] Call Trace: [ 3129.368278] [81030586] warn_slowpath_common+0x83/0x9b [ 3129.368282] [810305b8] warn_slowpath_null+0x1a/0x1c [ 3129.368297] [a0179e0b] btrfs_add_dead_root+0x73/0xbc [btrfs] [ 3129.368313] [a0187bef] btrfs_destroy_inode+0x227/0x25b [btrfs] [ 3129.368319] [8111393a] destroy_inode+0x3b/0x54 [ 3129.368322] [81113a9c] evict+0x149/0x151 [ 3129.368327] [81114322] iput+0x12c/0x135 [ 3129.368342] [a01845e7] relink_extent_backref+0x669/0x6af [btrfs] [ 3129.368346] [815e9849] ? __slab_free+0x17c/0x21b [ 3129.368362] [a017c33d] ? record_extent_backrefs+0xa3/0xa3 [btrfs] [ 3129.368377] [a0184d9d] ? btrfs_finish_ordered_io+0x770/0x827 [btrfs] [ 3129.368393] [a0184d6d] btrfs_finish_ordered_io+0x740/0x827 [btrfs] [ 3129.368409] [a0184e69] finish_ordered_fn+0x15/0x17 [btrfs] [ 3129.368424] [a019e7fd] worker_loop+0x14c/0x493 [btrfs] [ 3129.368439] [a019e6b1] ? btrfs_queue_worker+0x258/0x258 [btrfs] [ 3129.368443] [8104c750] kthread+0xba/0xc2 [ 3129.368447] [8104c696] ? kthread_freezable_should_stop+0x52/0x52 [ 3129.368451] [815f301c] ret_from_fork+0x7c/0xb0 [ 3129.368455] [8104c696] ? kthread_freezable_should_stop+0x52/0x52 [ 3129.368458] ---[ end trace 46705ba72c45db88 ]--- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5] Btrfs: snapshot-aware defrag
On Sun, Jan 27, 2013 at 6:41 AM, Liu Bo bo.li@oracle.com wrote: Hi Mitch, Many thanks for testing it! Well, after some debugging, I finally figure out the whys: (1) btrfs_ioctl_snap_destroy() will free the inode of snapshot and set root's refs to zero(btrfs_set_root_refs()), if this inode happens to be the only one in the rbtree of the snapshot's root at this moment, we add this root to the dead_root list. (2) Unfortunately, after (1), our snapshot-aware defrag work may read another inode in this snapshot into memory during 'relink' stage, and later after we finish relink work and iput() will force us to add the snapshot's root to the dead_root list again. So that's why we get double list_add and list_del corruption. And IMO, it can also take place without snapshot-aware defrag, but it's a rare case. I'm seeing a smattering of reports that resemble list corruption on the M/L, so that is possible. So could you please try this? thanks, liubo diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index f154946..d4ee66b 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -885,7 +885,15 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans, int btrfs_add_dead_root(struct btrfs_root *root) { spin_lock(root-fs_info-trans_lock); + if (!list_empty(root-root_list)) { + struct btrfs_root *tmp; + list_for_each_entry(tmp, root-fs_info-dead_roots, root_list) + if (tmp == root) + goto unlock; + } + list_add(root-root_list, root-fs_info-dead_roots); +unlock: spin_unlock(root-fs_info-trans_lock); return 0; } It feels like we're correcting the problem after-the-fact with this method, instead of addressing the root problem. But I was able to successfully run with this patch. I slightly modified your patch as follows by introducing a WARN_ON in order to get a back trace, and also to give me a positive confirmation that I was triggering the problem. diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index d6b17fa..0c1066e 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -885,7 +885,18 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans, int btrfs_add_dead_root(struct btrfs_root *root) { spin_lock(root-fs_info-trans_lock); + if (!list_empty(root-root_list)) { + struct btrfs_root *tmp; + list_for_each_entry(tmp, root-fs_info-dead_roots, root_list) + if (tmp == root) { + printk(KERN_ERR btrfs: Duplicate dead root entry.\n); + WARN_ON(1); + goto unlock; + } + } + list_add(root-root_list, root-fs_info-dead_roots); +unlock: spin_unlock(root-fs_info-trans_lock); return 0; } -- I was able to trigger the problem several times (16 separate times according to dmesg) without killing the cleaner process, and everything appears to have continued successfully after encountering a duplicate list entry. My test partition passes btrfsck afterwards. 13 out of the 16 backtraces seem support your hypothesis as passing through the iput in your patch: [ 4367.314806] btrfs: Duplicate dead root entry. [ 4367.314809] [ cut here ] [ 4367.314834] WARNING: at fs/btrfs/transaction.c:893 btrfs_add_dead_root+0x73/0xbc [btrfs]() [ 4367.314836] Hardware name: OptiPlex 745 [ 4367.314841] Modules linked in: ipv6 snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_pcm tg3 snd_page_alloc snd_timer snd iTCO_wdt iTCO_vendor_support ppdev parport_pc microcode i2c_i801 floppy parport sr_mod lpc_ich serio_raw pcspkr ablk_helper cryptd lrw xts gf128mul aes_x86_64 sha256_generic fuse xfs nfs lockd sunrpc reiserfs btrfs zlib_deflate ext4 jbd2 ext3 jbd ext2 mbcache sl811_hcd hid_generic xhci_hcd ohci_hcd uhci_hcd ehci_hcd [ 4367.314887] Pid: 4463, comm: btrfs-endio-wri Tainted: GW 3.7.4-sad-v2+ #1 [ 4367.314889] Call Trace: [ 4367.314895] [81030586] warn_slowpath_common+0x83/0x9b [ 4367.314899] [810305b8] warn_slowpath_null+0x1a/0x1c [ 4367.314915] [a0179e0b] btrfs_add_dead_root+0x73/0xbc [btrfs] [ 4367.314931] [a0187bef] btrfs_destroy_inode+0x227/0x25b [btrfs] [ 4367.314936] [8111393a] destroy_inode+0x3b/0x54 [ 4367.314940] [81113a9c] evict+0x149/0x151 [ 4367.314944] [81114322] iput+0x12c/0x135 [ 4367.314959] [a01845e7] relink_extent_backref+0x669/0x6af [btrfs] [ 4367.314964] [815e9849] ? __slab_free+0x17c/0x21b [ 4367.314980] [a0184d9d] ? btrfs_finish_ordered_io+0x770/0x827 [btrfs] [ 4367.314995] [a0184d6d] btrfs_finish_ordered_io+0x740/0x827 [btrfs] [ 4367.315011] [a0184e69] finish_ordered_fn+0x15/0x17 [btrfs] [ 4367.315034]
Re: [PATCH V5] Btrfs: snapshot-aware defrag
On Wed, Jan 23, 2013 at 6:52 PM, Liu Bo bo.li@oracle.com wrote: On Wed, Jan 23, 2013 at 10:05:04AM -0600, Mitch Harder wrote: On Wed, Jan 23, 2013 at 1:51 AM, Liu Bo bo.li@oracle.com wrote: On Tue, Jan 22, 2013 at 11:41:19AM -0600, Mitch Harder wrote: On Thu, Jan 17, 2013 at 8:42 AM, Mitch Harder mitch.har...@sabayonlinux.org wrote: On Wed, Jan 16, 2013 at 6:36 AM, Liu Bo bo.li@oracle.com wrote: This comes from one of btrfs's project ideas, As we defragment files, we break any sharing from other snapshots. The balancing code will preserve the sharing, and defrag needs to grow this as well. [...] I've been testing this patch on a 3.7.2 kernel merged with the for-linus branch for the 3.8_rc kernels, and I'm seeing the following error: I've reproduced the error with CONFIG_DEBUG_LIST enabled, which shows some problem with an entry in the list. [59312.260441] [ cut here ] [59312.260454] WARNING: at lib/list_debug.c:62 __list_del_entry+0x8d/0x98() [59312.260458] Hardware name: OptiPlex 745 [59312.260461] list_del corruption. next-prev should be 88006511c438, but was dead00200200 LIST_POISON2 - (00200200) So we can know that the next one is deleted from the list even _earlier_ than the current one is. Any other messages before this warning complains? Just some normal feedback from a metadata balance I had run. Well, these do fit my expectation, since balance also involves with playing with root_list, which may lead to the bad situation. [14057.193343] device fsid 28c688c5-7dbd-4071-b271-1bf6726d8835 devid 1 transid 4 /dev/sda7 [14057.194438] btrfs: force lzo compression [14057.194446] btrfs: enabling auto defrag [14057.194449] btrfs: disk space caching is enabled [14057.194452] btrfs flagging fs with big metadata feature [14057.194455] btrfs: lzo incompat flag set. [57508.799193] btrfs: relocating block group 14516486144 flags 4 [57632.178797] btrfs: found 6775 extents [57633.214701] btrfs: relocating block group 11832131584 flags 4 [57776.400102] btrfs: found 6480 extents [5.021175] btrfs: relocating block group 10489954304 flags 4 [57949.182725] btrfs: found 6681 extents [59312.260441] [ cut here ] [59312.260454] WARNING: at lib/list_debug.c:62 __list_del_entry+0x8d/0x98() [59312.260458] Hardware name: OptiPlex 745 ... I'm going to try to wrap some debugging around the section of code in btrfs_clean_old_snapshots() where the dead_roots list is spliced onto the root list being processed. The double entry may be slipping in here. 1764 spin_lock(fs_info-trans_lock); 1765 list_splice_init(fs_info-dead_roots, list); 1766 spin_unlock(fs_info-trans_lock); hmm, I don't think there is anything wrong in this code. But you can give it a shot anyway :) I've changed up my reproducer to try some things that may hit the issue quicker and more reliably. It gave me a slightly different set of warnings in dmesg, which seem to suggest issues in the dead_root list. [43925.656065] device fsid a8f6fadb-3022-4c01-b369-f1f3f638c052 devid 1 transid 310 /dev/sda7 [43925.658062] btrfs: force lzo compression [43925.658072] btrfs: enabling auto defrag [43925.658075] btrfs: disk space caching is enabled [43925.658078] btrfs: lzo incompat flag set. [44503.421293] btrfs: unlinked 1 orphans [44898.287365] btrfs: unlinked 1 orphans [45080.641383] btrfs: unlinked 1 orphans [45250.063773] btrfs: unlinked 1 orphans [46223.387355] btrfs: unlinked 1 orphans [46476.473944] btrfs: unlinked 1 orphans [46499.665615] btrfs: unlinked 1 orphans [46769.785454] [ cut here ] [46769.785471] WARNING: at lib/list_debug.c:36 __list_add+0x9d/0xba() [46769.785474] Hardware name: OptiPlex 745 [46769.785478] list_add double add: new=880050c27c38, prev=880078f3e720, next=880050c27c38. [46769.785480] Modules linked in: ipv6 snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer tg3 sr_mod snd i2c_i801 ppdev parport_pc iTCO_wdt iTCO_vendor_support lpc_ich pcspkr parport floppy serio_raw microcode ablk_helper cryptd lrw xts gf128mul aes_x86_64 sha256_generic fuse xfs nfs lockd sunrpc reiserfs btrfs zlib_deflate ext4 jbd2 ext3 jbd ext2 mbcache sl811_hcd hid_generic xhci_hcd ohci_hcd uhci_hcd ehci_hcd [46769.785537] Pid: 18291, comm: btrfs-endio-wri Not tainted 3.7.4-sad-v1+ #3 [46769.785539] Call Trace: [46769.785549] [81030586] warn_slowpath_common+0x83/0x9b [46769.785553] [81030641] warn_slowpath_fmt+0x46/0x48 [46769.785558] [8120987b] __list_add+0x9d/0xba [46769.785586] [a0179dd6] btrfs_add_dead_root+0x42/0x56 [btrfs] [46769.785603] [a0187b67] btrfs_destroy_inode+0x227/0x25b [btrfs] [46769.785611] [8111393a] destroy_inode+0x3b/0x54 [46769.785615] [81113a9c] evict+0x149/0x151 [46769.785619] [81114322] iput
Re: [PATCH V5] Btrfs: snapshot-aware defrag
On Fri, Jan 25, 2013 at 9:42 AM, Liu Bo bo.li@oracle.com wrote: On Fri, Jan 25, 2013 at 08:55:58AM -0600, Mitch Harder wrote: On Wed, Jan 23, 2013 at 6:52 PM, Liu Bo bo.li@oracle.com wrote: On Wed, Jan 23, 2013 at 10:05:04AM -0600, Mitch Harder wrote: On Wed, Jan 23, 2013 at 1:51 AM, Liu Bo bo.li@oracle.com wrote: On Tue, Jan 22, 2013 at 11:41:19AM -0600, Mitch Harder wrote: On Thu, Jan 17, 2013 at 8:42 AM, Mitch Harder mitch.har...@sabayonlinux.org wrote: On Wed, Jan 16, 2013 at 6:36 AM, Liu Bo bo.li@oracle.com wrote: This comes from one of btrfs's project ideas, As we defragment files, we break any sharing from other snapshots. The balancing code will preserve the sharing, and defrag needs to grow this as well. [...] I've been testing this patch on a 3.7.2 kernel merged with the for-linus branch for the 3.8_rc kernels, and I'm seeing the following error: I've reproduced the error with CONFIG_DEBUG_LIST enabled, which shows some problem with an entry in the list. [59312.260441] [ cut here ] [59312.260454] WARNING: at lib/list_debug.c:62 __list_del_entry+0x8d/0x98() [59312.260458] Hardware name: OptiPlex 745 [59312.260461] list_del corruption. next-prev should be 88006511c438, but was dead00200200 LIST_POISON2 - (00200200) So we can know that the next one is deleted from the list even _earlier_ than the current one is. Any other messages before this warning complains? Just some normal feedback from a metadata balance I had run. Well, these do fit my expectation, since balance also involves with playing with root_list, which may lead to the bad situation. [14057.193343] device fsid 28c688c5-7dbd-4071-b271-1bf6726d8835 devid 1 transid 4 /dev/sda7 [14057.194438] btrfs: force lzo compression [14057.194446] btrfs: enabling auto defrag [14057.194449] btrfs: disk space caching is enabled [14057.194452] btrfs flagging fs with big metadata feature [14057.194455] btrfs: lzo incompat flag set. [57508.799193] btrfs: relocating block group 14516486144 flags 4 [57632.178797] btrfs: found 6775 extents [57633.214701] btrfs: relocating block group 11832131584 flags 4 [57776.400102] btrfs: found 6480 extents [5.021175] btrfs: relocating block group 10489954304 flags 4 [57949.182725] btrfs: found 6681 extents [59312.260441] [ cut here ] [59312.260454] WARNING: at lib/list_debug.c:62 __list_del_entry+0x8d/0x98() [59312.260458] Hardware name: OptiPlex 745 ... I'm going to try to wrap some debugging around the section of code in btrfs_clean_old_snapshots() where the dead_roots list is spliced onto the root list being processed. The double entry may be slipping in here. 1764 spin_lock(fs_info-trans_lock); 1765 list_splice_init(fs_info-dead_roots, list); 1766 spin_unlock(fs_info-trans_lock); hmm, I don't think there is anything wrong in this code. But you can give it a shot anyway :) I've changed up my reproducer to try some things that may hit the issue quicker and more reliably. It gave me a slightly different set of warnings in dmesg, which seem to suggest issues in the dead_root list. Great! Many thanks for nail it down, we really shouldn't iput() after btrfs_iget(). Could you please try this(remove iput()) and see if it gets us rid of the trouble? thanks, liubo diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 1683f48..c7a0fb7 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2337,7 +2337,6 @@ out_free_path: out_unlock: unlock_extent_cached(BTRFS_I(inode)-io_tree, lock_start, lock_end, cached, GFP_NOFS); - iput(inode); return ret; } With this patch, the cleaner never runs to delete the old roots. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5] Btrfs: snapshot-aware defrag
On Wed, Jan 23, 2013 at 1:51 AM, Liu Bo bo.li@oracle.com wrote: On Tue, Jan 22, 2013 at 11:41:19AM -0600, Mitch Harder wrote: On Thu, Jan 17, 2013 at 8:42 AM, Mitch Harder mitch.har...@sabayonlinux.org wrote: On Wed, Jan 16, 2013 at 6:36 AM, Liu Bo bo.li@oracle.com wrote: This comes from one of btrfs's project ideas, As we defragment files, we break any sharing from other snapshots. The balancing code will preserve the sharing, and defrag needs to grow this as well. [...] I've been testing this patch on a 3.7.2 kernel merged with the for-linus branch for the 3.8_rc kernels, and I'm seeing the following error: I've reproduced the error with CONFIG_DEBUG_LIST enabled, which shows some problem with an entry in the list. [59312.260441] [ cut here ] [59312.260454] WARNING: at lib/list_debug.c:62 __list_del_entry+0x8d/0x98() [59312.260458] Hardware name: OptiPlex 745 [59312.260461] list_del corruption. next-prev should be 88006511c438, but was dead00200200 LIST_POISON2 - (00200200) So we can know that the next one is deleted from the list even _earlier_ than the current one is. Any other messages before this warning complains? Just some normal feedback from a metadata balance I had run. [14057.193343] device fsid 28c688c5-7dbd-4071-b271-1bf6726d8835 devid 1 transid 4 /dev/sda7 [14057.194438] btrfs: force lzo compression [14057.194446] btrfs: enabling auto defrag [14057.194449] btrfs: disk space caching is enabled [14057.194452] btrfs flagging fs with big metadata feature [14057.194455] btrfs: lzo incompat flag set. [57508.799193] btrfs: relocating block group 14516486144 flags 4 [57632.178797] btrfs: found 6775 extents [57633.214701] btrfs: relocating block group 11832131584 flags 4 [57776.400102] btrfs: found 6480 extents [5.021175] btrfs: relocating block group 10489954304 flags 4 [57949.182725] btrfs: found 6681 extents [59312.260441] [ cut here ] [59312.260454] WARNING: at lib/list_debug.c:62 __list_del_entry+0x8d/0x98() [59312.260458] Hardware name: OptiPlex 745 ... I'm going to try to wrap some debugging around the section of code in btrfs_clean_old_snapshots() where the dead_roots list is spliced onto the root list being processed. The double entry may be slipping in here. 1764 spin_lock(fs_info-trans_lock); 1765 list_splice_init(fs_info-dead_roots, list); 1766 spin_unlock(fs_info-trans_lock); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5] Btrfs: snapshot-aware defrag
On Thu, Jan 17, 2013 at 8:42 AM, Mitch Harder mitch.har...@sabayonlinux.org wrote: On Wed, Jan 16, 2013 at 6:36 AM, Liu Bo bo.li@oracle.com wrote: This comes from one of btrfs's project ideas, As we defragment files, we break any sharing from other snapshots. The balancing code will preserve the sharing, and defrag needs to grow this as well. Now we're able to fill the blank with this patch, in which we make full use of backref walking stuff. Here is the basic idea, o set the writeback ranges started by defragment with flag EXTENT_DEFRAG o at endio, after we finish updating fs tree, we use backref walking to find all parents of the ranges and re-link them with the new COWed file layout by adding corresponding backrefs. Signed-off-by: Li Zefan l...@cn.fujitsu.com Signed-off-by: Liu Bo bo.li@oracle.com --- v4-v5: - Clarify the comments for duplicated refs. - Clear defrag flag after we're ready to defrag. - Fix a bug on HOLE extent. v3-v4: - Fix duplicated refs bugs detected by mounting with autodefrag, thanks for the bug report from Mitch and Chris. v2-v3: - Rebase v1-v2: - Address comments from David. I've been testing this patch on a 3.7.2 kernel merged with the for-linus branch for the 3.8_rc kernels, and I'm seeing the following error: I've reproduced the error with CONFIG_DEBUG_LIST enabled, which shows some problem with an entry in the list. [59312.260441] [ cut here ] [59312.260454] WARNING: at lib/list_debug.c:62 __list_del_entry+0x8d/0x98() [59312.260458] Hardware name: OptiPlex 745 [59312.260461] list_del corruption. next-prev should be 88006511c438, but was dead00200200 [59312.260464] Modules linked in: ipv6 snd_hda_codec_analog snd_hda_intel i2c_i801 tg3 snd_hda_codec iTCO_wdt snd_hwdep snd_pcm ppdev parport_pc sr_mod microcode floppy parport snd_page_alloc snd_timer snd iTCO_vendor_support lpc_ich serio_raw pcspkr ablk_helper cryptd lrw xts gf128mul aes_x86_64 sha256_generic fuse xfs nfs lockd sunrpc reiserfs btrfs zlib_deflate ext4 jbd2 ext3 jbd ext2 mbcache sl811_hcd hid_generic xhci_hcd ohci_hcd uhci_hcd ehci_hcd [59312.260519] Pid: 20523, comm: btrfs-cleaner Not tainted 3.7.2-sad+ #1 [59312.260521] Call Trace: [59312.260529] [81030586] warn_slowpath_common+0x83/0x9b [59312.260549] [a015aa01] ? reada_for_balance+0x187/0x218 [btrfs] [59312.260554] [81030641] warn_slowpath_fmt+0x46/0x48 [59312.260566] [a015aa01] ? reada_for_balance+0x187/0x218 [btrfs] [59312.260570] [812099e5] __list_del_entry+0x8d/0x98 [59312.260574] [812099fe] list_del+0xe/0x2e [59312.260590] [a017b325] btrfs_clean_old_snapshots+0x101/0x168 [btrfs] [59312.260605] [a0173d99] cleaner_kthread+0x5a/0xe6 [btrfs] [59312.260619] [a0173d3f] ? transaction_kthread+0x1a0/0x1a0 [btrfs] [59312.260624] [8104c750] kthread+0xba/0xc2 [59312.260629] [8104c696] ? kthread_freezable_should_stop+0x52/0x52 [59312.260634] [815f2f1c] ret_from_fork+0x7c/0xb0 [59312.260639] [8104c696] ? kthread_freezable_should_stop+0x52/0x52 [59312.260642] ---[ end trace 61b4cbd93690300f ]--- [59318.623735] [ cut here ] [59318.623751] WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98() [59318.623755] Hardware name: OptiPlex 745 [59318.623760] list_del corruption, 88006511c438-next is LIST_POISON1 (dead00100100) [59318.623766] Modules linked in: ipv6 snd_hda_codec_analog snd_hda_intel i2c_i801 tg3 snd_hda_codec iTCO_wdt snd_hwdep snd_pcm ppdev parport_pc sr_mod microcode floppy parport snd_page_alloc snd_timer snd iTCO_vendor_support lpc_ich serio_raw pcspkr ablk_helper cryptd lrw xts gf128mul aes_x86_64 sha256_generic fuse xfs nfs lockd sunrpc reiserfs btrfs zlib_deflate ext4 jbd2 ext3 jbd ext2 mbcache sl811_hcd hid_generic xhci_hcd ohci_hcd uhci_hcd ehci_hcd [59318.623840] Pid: 20523, comm: btrfs-cleaner Tainted: GW 3.7.2-sad+ #1 [59318.623844] Call Trace: [59318.623855] [81030586] warn_slowpath_common+0x83/0x9b [59318.623878] [a015aab9] ? btrfs_free_path+0x27/0x2c [btrfs] [59318.623885] [81030641] warn_slowpath_fmt+0x46/0x48 [59318.623901] [a015aab9] ? btrfs_free_path+0x27/0x2c [btrfs] [59318.623907] [812099e5] __list_del_entry+0x8d/0x98 [59318.623912] [812099fe] list_del+0xe/0x2e [59318.623935] [a017b325] btrfs_clean_old_snapshots+0x101/0x168 [btrfs] [59318.623955] [a0173d99] cleaner_kthread+0x5a/0xe6 [btrfs] [59318.623975] [a0173d3f] ? transaction_kthread+0x1a0/0x1a0 [btrfs] [59318.623981] [8104c750] kthread+0xba/0xc2 [59318.623988] [8104c696] ? kthread_freezable_should_stop+0x52/0x52 [59318.623994] [815f2f1c] ret_from_fork+0x7c/0xb0 [59318.624000] [8104c696] ? kthread_freezable_should_stop+0x52/0x52 [59318.624022] ---[ end trace 61b4cbd936903010
Re: [PATCH V5] Btrfs: snapshot-aware defrag
On Thu, Jan 17, 2013 at 6:53 PM, Liu Bo bo.li@oracle.com wrote: On Thu, Jan 17, 2013 at 08:42:46AM -0600, Mitch Harder wrote: On Wed, Jan 16, 2013 at 6:36 AM, Liu Bo bo.li@oracle.com wrote: This comes from one of btrfs's project ideas, As we defragment files, we break any sharing from other snapshots. The balancing code will preserve the sharing, and defrag needs to grow this as well. Now we're able to fill the blank with this patch, in which we make full use of backref walking stuff. Here is the basic idea, o set the writeback ranges started by defragment with flag EXTENT_DEFRAG o at endio, after we finish updating fs tree, we use backref walking to find all parents of the ranges and re-link them with the new COWed file layout by adding corresponding backrefs. Signed-off-by: Li Zefan l...@cn.fujitsu.com Signed-off-by: Liu Bo bo.li@oracle.com --- v4-v5: - Clarify the comments for duplicated refs. - Clear defrag flag after we're ready to defrag. - Fix a bug on HOLE extent. v3-v4: - Fix duplicated refs bugs detected by mounting with autodefrag, thanks for the bug report from Mitch and Chris. v2-v3: - Rebase v1-v2: - Address comments from David. I've been testing this patch on a 3.7.2 kernel merged with the for-linus branch for the 3.8_rc kernels, and I'm seeing the following error: Hi Mitch, Insteresting! I don't even change the snapshot code ever. Yes, this patch series has been excellent at tickling unrelated issues. Is it reproducable stably from your side? Still with the snapshot-test-pub scripts? I'm still using the same snapshot-test scripts, but they don't reproduce reliably. I have to run for a while after my script reaches the point where it starts deleting snapshots to make space. But, I've been able to hit this error four times with this script. I'll try to keep playing with this to make a better reproducer, and to isolate the problem with the parameter supplied to list_del. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can moving data to a subvolume not take as long as a fully copy?
On Tue, Jan 15, 2013 at 8:49 AM, Marc MERLIN m...@merlins.org wrote: On Mon, Jan 14, 2013 at 10:48:50PM -0800, David Brown wrote: Why not make a snapshot of the root volume, and then delete the files you want to move from the original root, and delete the rest of root from the snapshot? Are a snapshot of the root volume and a subvolume effectively the same thing as far as btrfs sees them? Once I have that snapshot which I'll treat as a subvolume, can I then snapshot that snapshot/subvolume further? Yes, the product of the btrfs snapshot command is a subvolume. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Errors not found by btrfsck or scrub
On Fri, Jan 11, 2013 at 12:13 PM, Chris Carlin chrisrcar...@gmail.com wrote: I have a week-old filesystem that is reported clean by btrfsck and scrub, but that fails under operations ranging from du to sync and umount (but no failures if mounted readonly). My problem sounds similar to a few other reports (e.g. TM's in http://thread.gmane.org/gmane.comp.file-systems.btrfs/22014 ) that seem to hint at problems with full metadata. My df shows: I know this advice will run counter to what everyone else is saying, but I've had some luck booting with an older kernel (such as 3.4 or 3.5) just long enough to get some more Metadata allocated. I would also caution you to back up your data. I've had a similar issue, and that file system soon showed additional corruptions. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: segmentation-fault in btrfsck (git-version)
On Sat, Dec 29, 2012 at 5:28 AM, Hendrik Friedel hend...@friedels.name wrote: Hello, I re-send this message, hoping that someone can give me a hint? Regards, Hendrik Two possibilities come to mind (although there may be others). (1) The file still exists, but it is somewhere you did not expect. (2) Your filesystem tree has some sort of corruption. For item (1), have you thoroughly searched the entire volume for this file with something like: find path/to/volume/top/level/mount -iname 'Sting_Live_in_Berlin' It is possible that the file exists in a snapshot or different directory then you were expecting. If the filesystem tree is corrupted, the task becomes tricky. Perhaps you can look at the Wiki entry for how the filesystem tree is constructed: https://btrfs.wiki.kernel.org/index.php/Trees Then examine the btrfs-debug-tree output around these entries, and try to determine why the tree still has entries for these files, but does not show these files nor report the problem with btrfsck. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: reset path lock state to zero
On Fri, Dec 28, 2012 at 3:33 AM, Liu Bo bo.li@oracle.com wrote: We forgot to reset the path lock state to zero after we unlock the path block, and this can lead to the ASSERT checker in tree unlock API. Reported-by: Slava Barinov raysl...@gmail.com Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/extent-tree.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 521e9d4..a71d457 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6788,11 +6788,13 @@ static noinline int walk_up_proc(struct btrfs_trans_handle *trans, wc-flags[level]); if (ret 0) { btrfs_tree_unlock_rw(eb, path-locks[level]); + path-locks[level] = 0; return ret; } BUG_ON(wc-refs[level] == 0); if (wc-refs[level] == 1) { btrfs_tree_unlock_rw(eb, path-locks[level]); + path-locks[level] = 0; return 1; } } -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html This patch seems to clear a lock WARNING I've been seeing recently. http://permalink.gmane.org/gmane.comp.file-systems.btrfs/21692 I'm unable to generate the WARNING after applying this patch. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Kernel lockdep WARNING on btrfs-next
I've been testing Josef's btrfs-next master branch using a test that loops through creation, manipulation and destruction of snapshots of kernel git sources. The version of btrfs-next I'm using was built as of Friday, December 14th, and the top commit is: Btrfs: don't take inode delalloc mutex if we're a free space inode committer Josef Bacik jba...@fusionio.com Fri, 14 Dec 2012 21:57:39 + (16:57 -0500) commit bd2dd0060cf0ae2a81a7b22e9cc23063796fe09c I've hit a WARN_ON at kernel/lockdep.c:702, which is in the look_up_lock_class(...) function /* * We can walk the hash lockfree, because the hash only * grows, and we are careful when adding entries to the end: */ list_for_each_entry(class, hash_head, hash_entry) { if (class-key == key) { /* * Huh! same key, different name? Did someone trample * on some memory? We're most confused. */ Line 702 WARN_ON_ONCE(class-name != lock-name); return class; } } It looks like this occurred during the delayed deletion of one of the subvolumes. As far as I can tell, no corruption occurred, the file system passes btrfsck checks, and seems to be otherwise behaving normally. I was not on the system at the time this occurred, so I can't say if it noticeably delayed the system. [ 5260.068074] [ cut here ] [ 5260.068092] WARNING: at kernel/lockdep.c:702 __lock_acquire.isra.29+0xa44/0xab9() [ 5260.068096] Hardware name: OptiPlex 745 [ 5260.068099] Modules linked in: iTCO_wdt iTCO_vendor_support lpc_ich mfd_core lrw xts gf128mul ablk_helper cryptd aes_x86_64 sha256_generic btrfs libcrc32c [ 5260.068124] Pid: 3801, comm: btrfs-cleaner Not tainted 3.7.0-btrfs-next+ #2 [ 5260.068128] Call Trace: [ 5260.068139] [8103663a] warn_slowpath_common+0x74/0xa2 [ 5260.068172] [a0062805] ? btrfs_tree_read_unlock+0x7d/0xa9 [btrfs] [ 5260.068179] [81036682] warn_slowpath_null+0x1a/0x1c [ 5260.068185] [81084813] __lock_acquire.isra.29+0xa44/0xab9 [ 5260.068210] [a0062483] ? btrfs_tree_lock+0xf7/0x24c [btrfs] [ 5260.068217] [81084d8e] lock_acquire+0x81/0xff [ 5260.068241] [a0062483] ? btrfs_tree_lock+0xf7/0x24c [btrfs] [ 5260.068248] [8183d7b5] _raw_write_lock+0x31/0x40 [ 5260.068271] [a0062483] ? btrfs_tree_lock+0xf7/0x24c [btrfs] [ 5260.068295] [a0062483] btrfs_tree_lock+0xf7/0x24c [btrfs] [ 5260.068319] [a004eb1d] ? find_extent_buffer+0x8f/0xd6 [btrfs] [ 5260.068343] [a004eaa4] ? find_extent_buffer+0x16/0xd6 [btrfs] [ 5260.068360] [a001ef79] do_walk_down+0xd2/0x4b6 [btrfs] [ 5260.068378] [a001e1ae] ? btrfs_block_rsv_check+0x29/0x7d [btrfs] [ 5260.068394] [a001e1ae] ? btrfs_block_rsv_check+0x29/0x7d [btrfs] [ 5260.068411] [a001f420] walk_down_tree+0xc3/0xef [btrfs] [ 5260.068430] [a0021f4f] btrfs_drop_snapshot+0x372/0x5c7 [btrfs] [ 5260.068451] [a0033ccc] btrfs_clean_old_snapshots+0xa6/0x13a [btrfs] [ 5260.068471] [a002b1c0] ? cleaner_kthread+0x8d/0x102 [btrfs] [ 5260.068490] [a002b1d4] cleaner_kthread+0xa1/0x102 [btrfs] [ 5260.068509] [a002b133] ? btree_invalidatepage+0x73/0x73 [btrfs] [ 5260.068515] [81058333] kthread+0xea/0xef [ 5260.068522] [81058249] ? flush_kthread_work+0x19c/0x19c [ 5260.068528] [8184549c] ret_from_fork+0x7c/0xb0 [ 5260.068534] [81058249] ? flush_kthread_work+0x19c/0x19c [ 5260.068538] ---[ end trace 0caa5c9123c1e741 ]--- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: segmentation-fault in btrfsck (git-version)
On Sat, Dec 15, 2012 at 1:40 PM, Hendrik Friedel hend...@friedels.name wrote: Hello Mitch, hello all, Since btrfs has significant improvements and fixes in each kernel release, and since very few of these changes are backported, it is recommended to use the latest kernels available. Ok, it's 3.7 now. The root ### inode # errors 400 are an indication that there is an inconsistency in the inode size. There was a patch included in the 3.1 or 3.2 kernel to address this issue (http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commit;h=f70a9a6b94af86fca069a7552ab672c31b457786). But I don't believe this patch fixed existing occurrences of this error. Apparently not. It's still there. At this point, the quickest solution for you may be to rebuild and reformat this RAID assembly, and restore this data from backups. Yepp, I did that. But in fact, some data is missing. It is not essential, but nice to have. If you don't have a backup of this data, and since your array seems to be working pretty well in a degraded state, this would be a really good time to look at a strategy of getting a backup of this data before doing many more attempts at rescue. Done. It's all save on another ext4 drive. So, let's play ;-) Could you please help me trying to restore the missing Data? What I tried sofar was: ./btrfs-restore /dev/sdc1 /mnt/restore/ It worked, in a way that it restored what I already had. What's odd aswell is, that btrfs scrub did run through without errors. So, the missing data could have been (accidentally) deleted by me. But I don't think... nevertheless I cannot exclude. What I know is the (original) Path of the Data. You could try btrfs-debug-tree, and search for any traces of your file. However, be ready to sift through a massive amount of output. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Encryption
On Wed, Dec 12, 2012 at 11:12 AM, merc1...@f-m.fm wrote: So there is no way to have filesystem encryption, while keeping snapshots? I run btrfs on top of LUKS encryption on my laptop. You should be able to do the same. You could then run rsync through ssh. However, rsync will have no knowledge of any blocks shared under subvolume snapshots. Btrfs does not yet have internal encryption. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2 v4] Btrfs: snapshot-aware defrag
On Thu, Nov 1, 2012 at 6:21 AM, Liu Bo bo.li@oracle.com wrote: On Thu, Nov 01, 2012 at 08:08:52PM +0900, Itaru Kitayama wrote: Hi Liubo, I couldn't apply your V4 patch against the btrfs-next HEAD. Do you have a github branch which I can checkout? The current btrfs-next HEAD actually have included this v4 patch, so just pull btrfs-next and give it a shot :) I'm still seeing similar issues using Josef's current btrfs-next branch (which still includes the v4 version of the snapshot-aware defrag patches). [44507.850693] [ cut here ] [44507.850728] WARNING: at fs/btrfs/inode.c:7755 btrfs_destroy_inode+0x231/0x2c4 [btrfs]() [44507.850732] Hardware name: OptiPlex 745 [44507.850735] Modules linked in: iTCO_wdt iTCO_vendor_support lpc_ich mfd_core lrw xts gf128mul ablk_helper cryptd aes_x86_64 sha256_generic btrfs libcrc32c [44507.850753] Pid: 15719, comm: umount Tainted: GW 3.7.0-btrfs-next+ #1 [44507.850756] Call Trace: [44507.850766] [810364da] warn_slowpath_common+0x74/0xa2 [44507.850770] [81036522] warn_slowpath_null+0x1a/0x1c [44507.850787] [a0041e0e] btrfs_destroy_inode+0x231/0x2c4 [btrfs] [44507.850793] [81141670] destroy_inode+0x3c/0x5f [44507.850797] [811417b5] evict+0x122/0x1ac [44507.850800] [81142016] iput+0xed/0x169 [44507.850816] [a0038c18] btrfs_run_delayed_iputs+0xd6/0xf6 [btrfs] [44507.850831] [a002db75] btrfs_commit_super+0x2c/0xfd [btrfs] [44507.850845] [a002f289] close_ctree+0x2c1/0x300 [btrfs] [44507.850850] [811424c9] ? evict_inodes+0x106/0x115 [44507.850861] [a00070b1] btrfs_put_super+0x19/0x1b [btrfs] [44507.850866] [8112b321] generic_shutdown_super+0x5b/0xdc [44507.850869] [8112b424] kill_anon_super+0x16/0x24 [44507.850880] [a000ad98] btrfs_kill_super+0x1a/0x8f [btrfs] [44507.850884] [8112b647] deactivate_locked_super+0x33/0x6c [44507.850887] [8112c25f] deactivate_super+0x4e/0x66 [44507.850892] [81145e64] mntput_no_expire+0xf7/0x14d [44507.850896] [81146ced] sys_umount+0x63/0x37a [44507.850901] [8183e642] system_call_fastpath+0x16/0x1b [44507.850905] ---[ end trace ba14fbf3de68a237 ]--- [44507.850907] [ cut here ] [44507.850924] WARNING: at fs/btrfs/inode.c:7756 btrfs_destroy_inode+0x2b9/0x2c4 [btrfs]() [44507.850927] Hardware name: OptiPlex 745 [44507.850930] Modules linked in: iTCO_wdt iTCO_vendor_support lpc_ich mfd_core lrw xts gf128mul ablk_helper cryptd aes_x86_64 sha256_generic btrfs libcrc32c [44507.850947] Pid: 15719, comm: umount Tainted: GW 3.7.0-btrfs-next+ #1 [44507.850949] Call Trace: [44507.850956] [810364da] warn_slowpath_common+0x74/0xa2 [44507.850961] [81036522] warn_slowpath_null+0x1a/0x1c [44507.850978] [a0041e96] btrfs_destroy_inode+0x2b9/0x2c4 [btrfs] [44507.850982] [81141670] destroy_inode+0x3c/0x5f [44507.850986] [811417b5] evict+0x122/0x1ac [44507.850990] [81142016] iput+0xed/0x169 [44507.851003] [a0038c18] btrfs_run_delayed_iputs+0xd6/0xf6 [btrfs] [44507.851033] [a002db75] btrfs_commit_super+0x2c/0xfd [btrfs] [44507.851048] [a002f289] close_ctree+0x2c1/0x300 [btrfs] [44507.851052] [811424c9] ? evict_inodes+0x106/0x115 [44507.851063] [a00070b1] btrfs_put_super+0x19/0x1b [btrfs] [44507.851066] [8112b321] generic_shutdown_super+0x5b/0xdc [44507.851070] [8112b424] kill_anon_super+0x16/0x24 [44507.851080] [a000ad98] btrfs_kill_super+0x1a/0x8f [btrfs] [44507.851084] [8112b647] deactivate_locked_super+0x33/0x6c [44507.851087] [8112c25f] deactivate_super+0x4e/0x66 [44507.851091] [81145e64] mntput_no_expire+0xf7/0x14d [44507.851095] [81146ced] sys_umount+0x63/0x37a [44507.851099] [8183e642] system_call_fastpath+0x16/0x1b [44507.851101] ---[ end trace ba14fbf3de68a238 ]--- [44507.851104] [ cut here ] [44507.851121] WARNING: at fs/btrfs/inode.c:7758 btrfs_destroy_inode+0x28d/0x2c4 [btrfs]() [44507.851123] Hardware name: OptiPlex 745 [44507.851124] Modules linked in: iTCO_wdt iTCO_vendor_support lpc_ich mfd_core lrw xts gf128mul ablk_helper cryptd aes_x86_64 sha256_generic btrfs libcrc32c [44507.851140] Pid: 15719, comm: umount Tainted: GW 3.7.0-btrfs-next+ #1 [44507.851142] Call Trace: [44507.851148] [810364da] warn_slowpath_common+0x74/0xa2 [44507.851152] [81036522] warn_slowpath_null+0x1a/0x1c [44507.851168] [a0041e6a] btrfs_destroy_inode+0x28d/0x2c4 [btrfs] [44507.851172] [81141670] destroy_inode+0x3c/0x5f [44507.851176] [811417b5] evict+0x122/0x1ac [44507.851180] [81142016] iput+0xed/0x169 [44507.851195] [a0038c18] btrfs_run_delayed_iputs+0xd6/0xf6 [btrfs] [44507.851209] [a002db75] btrfs_commit_super+0x2c/0xfd [btrfs] [44507.851223] [a002f289]
Re: segmentation-fault in btrfsck (git-version)
On Sun, Dec 9, 2012 at 1:06 PM, Hendrik Friedel hend...@friedels.name wrote: Dear Mich, thanks for your help and suggestion: It might be interesting for you to try a newer kernel, and use scrub on this volume if you have the two disks RAIDed. I have now scrubbed the Disk: ./btrfs scrub status /mnt/other/ scrub status for a15eede9-1a92-47d8-940a-adc7cf97352d scrub started at Sun Dec 9 13:48:57 2012 and finished after 3372 seconds total bytes scrubbed: 1.10TB with 0 errors That's odd, as in one folder, data is missing (I could have deleted it, but I'd be very surprised...) Also, when I run btrfsck, I get errors: On sdc1: root 261 inode 64370 errors 400 root 261 inode 64373 errors 400 root 261 inode 64375 errors 400 root 261 inode 64376 errors 400 found 1203899371520 bytes used err is 1 total csum bytes: 1173983136 total tree bytes: 1740640256 total fs tree bytes: 280260608 btree space waste bytes: 212383383 file data blocks allocated: 28032005304320 referenced 1190305632256 Btrfs v0.20-rc1-37-g91d9eec On sdb1: root 261 inode 64373 errors 400 root 261 inode 64375 errors 400 root 261 inode 64376 errors 400 found 1203899371520 bytes used err is 1 total csum bytes: 1173983136 total tree bytes: 1740640256 total fs tree bytes: 280260608 btree space waste bytes: 212383383 file data blocks allocated: 28032005304320 referenced 1190305632256 Btrfs v0.20-rc1-37-g91d9eec And when I try to mount one of the two raided disks, I get: [ 1173.773861] device fsid a15eede9-1a92-47d8-940a-adc7cf97352d devid 1 transid 140194 /dev/sdb1 [ 1173.774695] btrfs: failed to read the system array on sdb1 [ 1173.774854] btrfs: open_ctree failed while the other works: [ 1177.927096] device fsid a15eede9-1a92-47d8-940a-adc7cf97352d devid 2 transid 140194 /dev/sdc1 Do you have hints for me? The Kernel now is 3.3.7-030307-generic (anything more recent, I would have to compile myself, which I will do, if you suggest to) Since btrfs has significant improvements and fixes in each kernel release, and since very few of these changes are backported, it is recommended to use the latest kernels available. The root ### inode # errors 400 are an indication that there is an inconsistency in the inode size. There was a patch included in the 3.1 or 3.2 kernel to address this issue (http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commit;h=f70a9a6b94af86fca069a7552ab672c31b457786). But I don't believe this patch fixed existing occurrences of this error. At this point, the quickest solution for you may be to rebuild and reformat this RAID assembly, and restore this data from backups. If you don't have a backup of this data, and since your array seems to be working pretty well in a degraded state, this would be a really good time to look at a strategy of getting a backup of this data before doing many more attempts at rescue. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: segmentation-fault in btrfsck (git-version)
On Wed, Dec 5, 2012 at 2:50 PM, Hendrik Friedel hend...@friedels.name wrote: Dear all, thanks for developing btrfsck! Now, I'd like to contribute -as far as I can. I'm not a developer, but I do have some linux-experience. I've been using btrfsck on two 3TB HDDs (mirrored) for a while now under Kernel 3.0. Now it's corrupt. I had some hard resets of the machine -which might have contributed. I do have a backup of the data -at least of the important stuff. Some TV-Recordings are missing. The reason I am writing is, to support the development. Unfortunately, btrfsck (latest git-version) crashes with a segmentation fault, when trying to repair this. Here's the backtrace: root 261 inode 64375 errors 400 root 261 inode 64376 errors 400 btrfsck: disk-io.c:382: __commit_transaction: Assertion `!(!eb || eb-start != start)' failed. Program received signal SIGABRT, Aborted. 0x7784c425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) (gdb) backtrace #0 0x7784c425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x7784fb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x778450ee in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x77845192 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6 #4 0x0040d3ae in __commit_transaction (trans=0x62e010, root=0xb66ae0) at disk-io.c:382 #5 0x0040d4d8 in btrfs_commit_transaction (trans=0x62e010, root=0xb66ae0) at disk-io.c:415 #6 0x0040743d in main (ac=optimized out, av=optimized out) at btrfsck.c:3587 Now, here's where my debugging knowledge ends. Are you interested in debugging this further, or is it a known bug? Line 382 in disk-io.c is: BUG_ON(!eb || eb-start != start); So, basically, btrfsck is intentionally crashing because it doesn't know how to handle this condition. Future refinements of btrfsck will probably include proper error messages for issues that can't be handled, or perhaps even fix the error. It might be interesting for you to try a newer kernel, and use scrub on this volume if you have the two disks RAIDed. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs Slow Down (Metadata Starvation?)
One of my Btrfs partitions ran into a severe slowdown recently. Operations that would normally complete in 20-30 seconds were now requiring hours. There were no errors or warnings in dmesg (Alt-SysRq-W is below, but shows nothing out of the ordinary). And if I took the partition offline, it would pass btrfsck without error. So far, I've found not indications of corruption. The kernel is a version 3.6.6 merged with the for-linus branch for 3.7. I usually mount with compress-force-lzo, but no autodefrag or other options. The symptoms were consistent with some kind of corner case metadata starvation. While under pressure, my 'btrfs df' would show something like the following: # btrfs fi df /mnt/sabayon9/ Data: total=7.00GB, used=6.00GB System: total=4.00MB, used=4.00KB Metadata: total=768.00MB, used=737.65MB For some reason, btrfs was not allocating any additional metadata space. The partition is 25 GB, and not very full: /dev/sda2 btrfs 25165824 7047816 18082844 29% /mnt/sabayon9 When I rebooted into a 3.4 kernel (which is merged with the Btrfs code for 3.5), the slow down cleared after I mounted the partition, and triggered an allocation of metadata up to 1 GB. I would note that I tried my 3.5 vintage kernel (which is merged with the Btrfs code for 3.6), and was unable to clear the issue. This tends to strengthen my suspicion that this is some kind of corner case since this code has been out there for a while now. Currently, I'm showing something like this with 'btrfs df' # btrfs fi df /mnt/sabayon9/ Data: total=8.00GB, used=5.97GB System: total=4.00MB, used=4.00KB Metadata: total=1.00GB, used=722.65MB Now, everything is operating normally in my 3.6.6 kernel. I've saved an image of the partition in it's 'slow-down' condition in case it becomes desirable to test something in that condition. I'm including my dmesg output of an Alt-SysRq-W operation, but I don't see anything useful there. [18697.498504] SysRq : Show Blocked State [18697.498510] taskPC stack pid father [18697.498551] btrfs-submit-1 D 0210 0 4236 2 0x [18697.498556] 880123d53b70 0046 8801231908d0 880123d53fd8 [18697.498560] 4000 00012700 88012aaf4380 880124269680 [18697.498563] 0006 880125e18000 880125e18000 [18697.498567] Call Trace: [18697.498576] [812d8750] ? __blk_run_queue+0x1e/0x20 [18697.498580] [812db573] ? queue_unplugged+0x83/0x99 [18697.498585] [8161dcf4] schedule+0x64/0x66 [18697.498588] [8161dd85] io_schedule+0x8f/0xce [18697.498591] [812dd520] get_request+0x559/0x5b0 [18697.498596] [8104c79b] ? abort_exclusive_wait+0x8e/0x8e [18697.498599] [812de7de] blk_queue_bio+0x1b7/0x315 [18697.498602] [812dcbc3] generic_make_request+0x9f/0xe1 [18697.498605] [812dcce9] submit_bio+0xe4/0x103 [18697.498640] [a006a145] run_scheduled_bios+0x28c/0x428 [btrfs] [18697.498660] [a006a2f6] pending_bios_fn+0x15/0x17 [btrfs] [18697.498679] [a0071840] worker_loop+0x15f/0x497 [btrfs] [18697.498698] [a00716e1] ? btrfs_queue_worker+0x272/0x272 [btrfs] [18697.498702] [8104c072] kthread+0x8b/0x93 [18697.498707] [816205b4] kernel_thread_helper+0x4/0x10 [18697.498710] [8104bfe7] ? kthread_freezable_should_stop+0x57/0x57 [18697.498714] [816205b0] ? gs_change+0xb/0xb [18697.498718] btrfs-transacti D 0002 0 4248 2 0x [18697.498721] 880123d83b70 0046 880123d83fd8 [18697.498725] 4000 00012700 88012aaf4380 880123d79680 [18697.498728] a0049342 880123d83bf0 880123d83b10 810c9ee4 [18697.498732] Call Trace: [18697.498748] [a0049342] ? check_leaf+0x2d4/0x2d4 [btrfs] [18697.498753] [810c9ee4] ? release_pages+0x1b2/0x1c1 [18697.498772] [a0063068] ? submit_one_bio+0x8a/0x94 [btrfs] [18697.498776] [8106c8db] ? ktime_get_ts+0x56/0xbc [18697.498780] [8109c171] ? delayacct_end+0x79/0x84 [18697.498784] [810bf744] ? __lock_page+0x68/0x68 [18697.498787] [8161dcf4] schedule+0x64/0x66 [18697.498790] [8161dd85] io_schedule+0x8f/0xce [18697.498793] [810bf752] sleep_on_page+0xe/0x12 [18697.498796] [8161c436] __wait_on_bit+0x48/0x7b [18697.498799] [810bf4b9] ? find_get_pages_tag+0xf4/0x130 [18697.498803] [810bf97d] wait_on_page_bit+0x72/0x74 [18697.498806] [8104c7d3] ? autoremove_wake_function+0x38/0x38 [18697.498810] [810bfa4c] filemap_fdatawait_range+0x87/0x13e [18697.498829] [a0063c09] ? free_extent_state+0x7d/0x85 [btrfs] [18697.498849] [a0064631] ? clear_extent_bit+0x272/0x2aa [btrfs] [18697.498866] [a004f696] btrfs_wait_marked_extents+0x7d/0xce [btrfs] [18697.498884]
Re: Why btrfs inline small file by default?
On Tue, Oct 30, 2012 at 6:04 AM, ching lschin...@gmail.com wrote: Hi all, I am testing my btrfs root partition with max_inline=0, and 64k leaf size for weeks and it seems that it is fine. AFAIK btrfs inline small files into metadata by default, I am curious why? If there is only a few small files, then there will be neither effect nor benefit at all If there is a lot of small files, then the size of metadata will be undesirable due to deduplication there are also some email threads related to problem of metadata inline (i don't know whether they are fixed in recent kernel): http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg16295.html http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg05265.html How about turning off inline so that btrfs works better out of the box? ching I did some rough benchmarking around this a few weeks ago. I'll try to clean up my method and post the results. I was working with multiple copies and rsyncs of kernel sources, which have many candidate files for inlining. To my surprise, my btrfs benchmarks were always the same or faster when I let btrfs inline the files, even though metadata was much larger. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2 v4] Btrfs: snapshot-aware defrag
On Mon, Oct 29, 2012 at 8:20 PM, Liu Bo bo.li@oracle.com wrote: On 10/30/2012 04:06 AM, Mitch Harder wrote: On Sat, Oct 27, 2012 at 5:28 AM, Liu Bo bo.li@oracle.com wrote: This comes from one of btrfs's project ideas, As we defragment files, we break any sharing from other snapshots. The balancing code will preserve the sharing, and defrag needs to grow this as well. Now we're able to fill the blank with this patch, in which we make full use of backref walking stuff. Here is the basic idea, o set the writeback ranges started by defragment with flag EXTENT_DEFRAG o at endio, after we finish updating fs tree, we use backref walking to find all parents of the ranges and re-link them with the new COWed file layout by adding corresponding backrefs. Originally patch by Li Zefan l...@cn.fujitsu.com Signed-off-by: Liu Bo bo.li@oracle.com --- v3-v4: - fix duplicated refs bugs detected by mounting with autodefrag, thanks for the bug report from Mitch and Chris. I'm picking up many WARN_ON messages while testing this patch. I'm testing a snapshot script that uses kernel git sources along with some git manipulations. The kernel is a 3.6.4 kernel merged with the latest for-linus branch. I mounted with -o compress-force=lzo,autodefrag. I also have the second patch in this set (Btrfs: make snapshot-aware defrag as a mount option). However, I did not mount with 'snap_aware_defrag'. I did not find any corrupted data, and the partition passes a btrfsck without error after these warnings were observed. Hi Mitch, Well, good report, but I don't think it has anything to do with this patch(since you didn't mount with 'snap_aware_defrag' :) I've re-run my my testing script with a combination of no compression and lzo compression, combined with no further options, only -o autodefrag, and -o autodefrag,snap_aware_defrag. I only get the WARN_ONs when I run with autodefrag only (no snap_aware_defrag). My logs are clean when I avoid all defrag options, or use both autodefrag and snap_aware_defrag. After going through the below messages, the bug comes from the space side where we must have mis-used our reservation somehow. So can you show me your script so that I can give it a shot to reproduce locally? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2 v4] Btrfs: snapshot-aware defrag
On Sat, Oct 27, 2012 at 5:28 AM, Liu Bo bo.li@oracle.com wrote: This comes from one of btrfs's project ideas, As we defragment files, we break any sharing from other snapshots. The balancing code will preserve the sharing, and defrag needs to grow this as well. Now we're able to fill the blank with this patch, in which we make full use of backref walking stuff. Here is the basic idea, o set the writeback ranges started by defragment with flag EXTENT_DEFRAG o at endio, after we finish updating fs tree, we use backref walking to find all parents of the ranges and re-link them with the new COWed file layout by adding corresponding backrefs. Originally patch by Li Zefan l...@cn.fujitsu.com Signed-off-by: Liu Bo bo.li@oracle.com --- v3-v4: - fix duplicated refs bugs detected by mounting with autodefrag, thanks for the bug report from Mitch and Chris. I'm picking up many WARN_ON messages while testing this patch. I'm testing a snapshot script that uses kernel git sources along with some git manipulations. The kernel is a 3.6.4 kernel merged with the latest for-linus branch. I mounted with -o compress-force=lzo,autodefrag. I also have the second patch in this set (Btrfs: make snapshot-aware defrag as a mount option). However, I did not mount with 'snap_aware_defrag'. I did not find any corrupted data, and the partition passes a btrfsck without error after these warnings were observed. Here's a summary of the WARN_ON messages: $ cat local/dmesg-3.6.4-x+ | grep WARNING: [ 610.407561] WARNING: at fs/btrfs/inode.c:7779 btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]() [ 610.407757] WARNING: at fs/btrfs/inode.c:7780 btrfs_destroy_inode+0x296/0x2e6 [btrfs]() [ 610.407929] WARNING: at fs/btrfs/inode.c:7782 btrfs_destroy_inode+0x26a/0x2e6 [btrfs]() [ 661.211849] WARNING: at fs/btrfs/inode.c:7779 btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]() [ 661.212004] WARNING: at fs/btrfs/inode.c:7780 btrfs_destroy_inode+0x296/0x2e6 [btrfs]() [ 661.212236] WARNING: at fs/btrfs/inode.c:7782 btrfs_destroy_inode+0x26a/0x2e6 [btrfs]() [ 719.882942] WARNING: at fs/btrfs/inode.c:7779 btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]() [ 719.883112] WARNING: at fs/btrfs/inode.c:7780 btrfs_destroy_inode+0x296/0x2e6 [btrfs]() [ 719.883232] WARNING: at fs/btrfs/inode.c:7782 btrfs_destroy_inode+0x26a/0x2e6 [btrfs]() [ 786.978869] WARNING: at fs/btrfs/inode.c:7779 btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]() [ 786.979003] WARNING: at fs/btrfs/inode.c:7780 btrfs_destroy_inode+0x296/0x2e6 [btrfs]() [ 786.979140] WARNING: at fs/btrfs/inode.c:7782 btrfs_destroy_inode+0x26a/0x2e6 [btrfs]() [ 845.605176] WARNING: at fs/btrfs/inode.c:7779 btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]() [ 845.605323] WARNING: at fs/btrfs/inode.c:7780 btrfs_destroy_inode+0x296/0x2e6 [btrfs]() [ 845.605445] WARNING: at fs/btrfs/inode.c:7782 btrfs_destroy_inode+0x26a/0x2e6 [btrfs]() [ 912.300307] WARNING: at fs/btrfs/inode.c:7779 btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]() [ 912.300454] WARNING: at fs/btrfs/inode.c:7780 btrfs_destroy_inode+0x296/0x2e6 [btrfs]() [ 912.300577] WARNING: at fs/btrfs/inode.c:7782 btrfs_destroy_inode+0x26a/0x2e6 [btrfs]() [ 968.835873] WARNING: at fs/btrfs/inode.c:7779 btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]() [ 968.836032] WARNING: at fs/btrfs/inode.c:7780 btrfs_destroy_inode+0x296/0x2e6 [btrfs]() [ 968.836156] WARNING: at fs/btrfs/inode.c:7782 btrfs_destroy_inode+0x26a/0x2e6 [btrfs]() [ 1023.778160] WARNING: at fs/btrfs/inode.c:7779 btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]() [ 1023.778316] WARNING: at fs/btrfs/inode.c:7780 btrfs_destroy_inode+0x296/0x2e6 [btrfs]() [ 1023.778435] WARNING: at fs/btrfs/inode.c:7782 btrfs_destroy_inode+0x26a/0x2e6 [btrfs]() [ 1064.342768] WARNING: at fs/btrfs/inode.c:7779 btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]() [ 1064.342914] WARNING: at fs/btrfs/inode.c:7780 btrfs_destroy_inode+0x296/0x2e6 [btrfs]() [ 1064.343112] WARNING: at fs/btrfs/inode.c:7782 btrfs_destroy_inode+0x26a/0x2e6 [btrfs]() [ 1177.892047] WARNING: at fs/btrfs/inode.c:7779 btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]() [ 1177.892189] WARNING: at fs/btrfs/inode.c:7780 btrfs_destroy_inode+0x296/0x2e6 [btrfs]() [ 1177.892312] WARNING: at fs/btrfs/inode.c:7782 btrfs_destroy_inode+0x26a/0x2e6 [btrfs]() [ 1281.951715] WARNING: at fs/btrfs/inode.c:7779 btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]() [ 1281.951857] WARNING: at fs/btrfs/inode.c:7780 btrfs_destroy_inode+0x296/0x2e6 [btrfs]() [ 1281.951978] WARNING: at fs/btrfs/inode.c:7782 btrfs_destroy_inode+0x26a/0x2e6 [btrfs]() [ 1282.804376] WARNING: at fs/btrfs/inode.c:7779 btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]() [ 1282.804524] WARNING: at fs/btrfs/inode.c:7780 btrfs_destroy_inode+0x296/0x2e6 [btrfs]() [ 1282.804645] WARNING: at fs/btrfs/inode.c:7782 btrfs_destroy_inode+0x26a/0x2e6 [btrfs]() [ 1351.187114] WARNING: at fs/btrfs/inode.c:7779 btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]() [ 1351.187263] WARNING: at fs/btrfs/inode.c:7780 btrfs_destroy_inode+0x296/0x2e6
Re: block rsv returned -28 during balance
On Mon, Oct 1, 2012 at 1:28 AM, Roman Mamedov r...@romanrm.ru wrote: Hello, On a 3.6.0-rc7 kernel, I launched: # btrfs fi balance start -f -mconvert=single /mnt/tmp/ Current situation: # df -h /mnt/tmp/ Filesystem Size Used Avail Use% Mounted on /dev/mapper/alpha-lv1 3.6T 2.7T 801G 78% /mnt/tmp # btrfs fi df /mnt/tmp/ Data: total=3.00TB, used=2.66TB System: total=4.00MB, used=364.00KB Metadata, DUP: total=11.00GB, used=5.72GB Metadata: total=63.00GB, used=0.00 There seems to be plenty of free space, but the balance seems to have stalled and the dmesg is being filled with messages like this: [ 2926.465406] btrfs: block rsv returned -28 [ 2926.465411] [ cut here ] [ 2926.465446] WARNING: at /home/apw/COD/linux/fs/btrfs/extent-tree.c:6323 use_block_rsv+0x19f/0x1b0 [btrfs]() [ 2926.465450] Hardware name: VirtualBox [ 2926.465452] Modules linked in: joydev microcode parport_pc hid_generic parport psmouse serio_raw pcspkr i2c_piix4 mac_hid xfs btrfs libcrc32c zlib_deflate raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq usbhid hid e1000 [ 2926.465517] Pid: 4682, comm: btrfs Tainted: GW 3.6.0-030600rc7-generic #201209232235 I've just run into the same issue running a balance. My kernel is a 3.6.1 kernel merged with the latest for-linus branch. The dmesg log is full of warnings, and the balance appears stuck. Looking at the results of 'btrfs fi df ..., it almost seems like btrfs is unable to allocate any more metadata space, even though there is some space available. # btrfs fi df /mnt/sabayon8/ Data: total=7.14GB, used=6.08GB System: total=4.00MB, used=4.00KB Metadata: total=1.00GB, used=973.32MB # df -T /mnt/sabayon8/ Filesystem Type 1K-blocksUsed Available Use% Mounted on /dev/sdb5 btrfs 10008460 7373064 2537536 75% /mnt/sabayon8 For reference here's an example of the dmesg warning: [ 4070.726429] btrfs: block rsv returned -28 [ 4070.726431] [ cut here ] [ 4070.726455] WARNING: at fs/btrfs/extent-tree.c:6359 btrfs_alloc_free_block+0x4ee/0x500 [btrfs]() [ 4070.726531] Hardware name: [ 4070.726533] Modules linked in: nvidia(PO) nvidia_agp xts gf128mul ablk_helper cryptd sha256_generic btrfs libcrc32c xhci_hcd [ 4070.726543] Pid: 8717, comm: btrfs Tainted: PW O 3.6.1-git-local+ #1 [ 4070.726545] Call Trace: [ 4070.726552] [c1029952] warn_slowpath_common+0x72/0xa0 [ 4070.726569] [f85d38de] ? btrfs_alloc_free_block+0x4ee/0x500 [btrfs] [ 4070.726585] [f85d38de] ? btrfs_alloc_free_block+0x4ee/0x500 [btrfs] [ 4070.726590] [c10299a2] warn_slowpath_null+0x22/0x30 [ 4070.726606] [f85d38de] btrfs_alloc_free_block+0x4ee/0x500 [btrfs] [ 4070.726628] [f8603c0c] ? read_extent_buffer+0x9c/0x100 [btrfs] [ 4070.726643] [f85c0394] __btrfs_cow_block+0x144/0x590 [btrfs] [ 4070.726731] [f85ddb65] ? verify_parent_transid+0x55/0x1c0 [btrfs] [ 4070.726746] [f85c08b9] ? btrfs_cow_block+0xd9/0x230 [btrfs] [ 4070.726765] [f85fc799] ? mark_extent_buffer_accessed+0x59/0x70 [btrfs] [ 4070.726780] [f85c08b9] btrfs_cow_block+0xd9/0x230 [btrfs] [ 4070.726801] [f862815a] do_relocation+0x42a/0x4d0 [btrfs] [ 4070.726818] [f85d00fb] ? btrfs_block_rsv_add+0x6b/0x80 [btrfs] [ 4070.726838] [f862bb9a] relocate_tree_blocks+0x3fa/0x5a0 [btrfs] [ 4070.726929] [f862ca42] relocate_block_group+0x212/0x670 [btrfs] [ 4070.726950] [f862d030] btrfs_relocate_block_group+0x190/0x2e0 [btrfs] [ 4070.726969] [f8605a57] btrfs_relocate_chunk.isra.54+0x57/0x690 [btrfs] [ 4070.726989] [f85fa351] ? btrfs_get_token_64+0x61/0x100 [btrfs] [ 4070.727008] [f8602bf6] ? free_extent_buffer+0x26/0x70 [btrfs] [ 4070.727045] [f860ad41] btrfs_balance+0x9b1/0xf40 [btrfs] [ 4070.727051] [c12f14ae] ? cred_has_capability+0x7e/0xf0 [ 4070.727071] [f861254b] btrfs_ioctl_balance+0xcb/0x330 [btrfs] [ 4070.727163] [f8614777] btrfs_ioctl+0x907/0x1840 [btrfs] [ 4070.727168] [c10bcd22] ? lru_cache_add_lru+0x22/0x40 [ 4070.727172] [c10cf00f] ? handle_pte_fault+0x42f/0x5c0 [ 4070.727192] [f8613e70] ? update_ioctl_balance_args+0x240/0x240 [btrfs] [ 4070.727197] [c10f5cf2] do_vfs_ioctl+0x82/0x570 [ 4070.727202] [c12f1dba] ? inode_has_perm.isra.42.constprop.68+0x3a/0x50 [ 4070.727206] [c12f43c6] ? selinux_file_ioctl+0x46/0xe0 [ 4070.727209] [c10f624f] sys_ioctl+0x6f/0x80 [ 4070.727214] [c176e693] sysenter_do_call+0x12/0x22 [ 4070.727217] ---[ end trace b39bb21a5ae11cb1 ]--- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Two Issues with Btrfs Delayed Cleaner Process (linux-next)
I've run across two issues with the delayed cleaner process running a kernel based on the 3.6.0 btrfs-next branch in Josef's git repository. (1) I'm getting an error when trying to list my subvolumes whenever the cleaner thread is running: # btrfs su li /mnt/benchmark/ ERROR: Failed to lookup path for root 0 - No such file or directory As long as the cleaner thread is idle, I can run this command without error. (2) I ran into an issue on a slower x86 machine (AMD Athlon XP 2600+) where the cleaner thread literally required an hour to finish deleting a subvolume that contained the sources for a kernel I had previously built. The machine was responsive the whole time, and the cleaner thread never required much more than 5-10% of the CPU, leaving ample idle time. Interestingly, every attempt to replicate this behaviour resulted in the cleaner thread finishing in a few seconds. My first issue replicates every time the cleaner thread is running. I'll need to work on the second issue for a while to see if I can get it to replicate. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2 v3] Btrfs: snapshot-aware defrag
On Thu, Oct 4, 2012 at 9:22 AM, Liu Bo bo.li@oracle.com wrote: On 10/03/2012 10:02 PM, Chris Mason wrote: On Tue, Sep 25, 2012 at 07:07:53PM -0600, Liu Bo wrote: On 09/26/2012 01:39 AM, Mitch Harder wrote: On Mon, Sep 17, 2012 at 4:58 AM, Liu Bo bo.li@oracle.com wrote: This comes from one of btrfs's project ideas, As we defragment files, we break any sharing from other snapshots. The balancing code will preserve the sharing, and defrag needs to grow this as well. Now we're able to fill the blank with this patch, in which we make full use of backref walking stuff. Here is the basic idea, o set the writeback ranges started by defragment with flag EXTENT_DEFRAG o at endio, after we finish updating fs tree, we use backref walking to find all parents of the ranges and re-link them with the new COWed file layout by adding corresponding backrefs. Originally patch by Li Zefan l...@cn.fujitsu.com Signed-off-by: Liu Bo bo.li@oracle.com I'm hitting the WARN_ON in record_extent_backrefs() indicating a problem with the return value from iterate_inodes_from_logical(). Me too. It triggers reliably with mount -o autodefrag, and then crashes a in the next function ;) -chris Good news, I'm starting hitting the crash (a NULL pointer crash) ;) thanks, liubo I'm also starting to hit this crash while balancing a test partition. I guess this isn't surprising since both autodefrag and balancing make use of relocation. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tree root
On Wed, Oct 3, 2012 at 11:35 AM, Øystein Sættem Middelthun oyst...@middelthun.no wrote: Hi! I have a broken btrfs unable to mount because it is unable to find the tree root. Using find-root I find the following: Well block 14102764707840 seems great, but generation doesn't match, have=109268, want=109269 Because the filesystem was last in use with a pre 3.2-kernel I am unable to use mount -o recovery, but restore seems to work when I specify the previous tree-root. My problem is however that the btrfs is so large I have nowhere to temporarily put all the files. I am currently running kernel 3.5. Does mount have an option to manually tell it to use the tree root at block 14102764707840? If you do not have a suitable backup for these files, please make an effort to do what you can with restore. Some of the repair methods out there have a possibility to make the situation worse. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tree root
On Wed, Oct 3, 2012 at 5:11 PM, Øystein Sættem Middelthun oyst...@middelthun.no wrote: On 10/03/2012 07:29 PM, Mitch Harder wrote: If you do not have a suitable backup for these files, please make an effort to do what you can with restore. Some of the repair methods out there have a possibility to make the situation worse. We are talking about something like 50TB, so there is just no way I have the available space on other disks for temporary storage. So in effect you are saying that there are no other available options than a restore? If I understand correctly a feature along the lines of mount -o tree_root=14102764707840 /dev/ /path/ would solve my problem. The fs is unmountable because of a temporary loss of connection with an underlying disk controller, and I don't think the device has a lot of errors besides not being able to find the latest tree root. You should probably try to supply some more information about your situation. Was this btrfs volume build with RAID-1? If so, we should be able to mount in degraded mode. Even so, when I see the words unable to find the tree root and temporary loss of connection with an underlying disk controller along with the implication that you have no reliable backup of this data, I worry that your situation is potentially precarious. The possibility exists that recovering your data is your best option (as opposed to restoring to previous working condition). Using backup tree-roots and super-blocks has the potential to do irreversible damage. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: Remove orphaned comment.
Remove a comment that was orphaned by a previous commit which removed the function associated with the comment. See commit efd049fb26a162c3830fd3cb1001fdc09b147f3b This left the comment in a confusing context that seemed to be associated with another function. Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org --- fs/btrfs/inode.c |6 -- 1 files changed, 0 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 2c785c0..93e1351 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2441,12 +2441,6 @@ out_kfree: return NULL; } -/* - * helper function for btrfs_finish_ordered_io, this - * just reads in some of the csum leaves to prime them into ram - * before we start the transaction. It limits the amount of btree - * reads required while inside the transaction. - */ /* as ordered data IO finishes, this gets called so we can finish * an ordered extent if the range of bytes in the file it covers are * fully written. -- 1.7.8.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2 v3] Btrfs: snapshot-aware defrag
On Mon, Sep 17, 2012 at 4:58 AM, Liu Bo bo.li@oracle.com wrote: This comes from one of btrfs's project ideas, As we defragment files, we break any sharing from other snapshots. The balancing code will preserve the sharing, and defrag needs to grow this as well. Now we're able to fill the blank with this patch, in which we make full use of backref walking stuff. Here is the basic idea, o set the writeback ranges started by defragment with flag EXTENT_DEFRAG o at endio, after we finish updating fs tree, we use backref walking to find all parents of the ranges and re-link them with the new COWed file layout by adding corresponding backrefs. Originally patch by Li Zefan l...@cn.fujitsu.com Signed-off-by: Liu Bo bo.li@oracle.com I'm hitting the WARN_ON in record_extent_backrefs() indicating a problem with the return value from iterate_inodes_from_logical(). [ 6865.184782] [ cut here ] [ 6865.184819] WARNING: at fs/btrfs/inode.c:2062 record_extent_backrefs+0xe5/0xe7 [btrfs]() [ 6865.184823] Hardware name: OptiPlex 745 [ 6865.184825] Modules linked in: lpc_ich mfd_core xts gf128mul cryptd aes_x86_64 sha256_generic btrfs libcrc32c [ 6865.184841] Pid: 4239, comm: btrfs-endio-wri Not tainted 3.5.4-git-local+ #1 [ 6865.184844] Call Trace: [ 6865.184856] [81031d6a] warn_slowpath_common+0x74/0xa2 [ 6865.184862] [81031db2] warn_slowpath_null+0x1a/0x1c [ 6865.184884] [a003356b] record_extent_backrefs+0xe5/0xe7 [btrfs] [ 6865.184908] [a003cf3a] btrfs_finish_ordered_io+0x131/0xa4b [btrfs] [ 6865.184930] [a003d869] finish_ordered_fn+0x15/0x17 [btrfs] [ 6865.184951] [a005882f] worker_loop+0x145/0x516 [btrfs] [ 6865.184959] [81059727] ? __wake_up_common+0x54/0x84 [ 6865.184983] [a00586ea] ? btrfs_queue_worker+0x2d3/0x2d3 [btrfs] [ 6865.184989] [810516bb] kthread+0x93/0x98 [ 6865.184996] [817d7934] kernel_thread_helper+0x4/0x10 [ 6865.185001] [81051628] ? kthread_freezable_should_stop+0x6a/0x6a [ 6865.185021] [817d7930] ? gs_change+0xb/0xb [ 6865.185025] ---[ end trace 26cc0e186efc79d8 ]--- I'm testing a 3.5.4 kernel merged with 3.6_rc patchset as well as the send_recv patches and most of the btrfs-next patches. I'm running into this issue when mounting with autodefrag, and running some snapshot tests. This may be related to a problem elsewhere, because I've been encountering other backref issues even before testing this patch. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC design issues
On Thu, Sep 20, 2012 at 2:03 PM, Josef Bacik jba...@fusionio.com wrote: Hello, I'm going to look at fixing some of the performance issues that crop up because of our reservation system. Before I go and do a whole lot of work I want some feedback. When I was trying to figure out the problem with gzip ENOSPC issues, I spent some time debugging and following the flow through the reserve_metadata_bytes() function in extent-tree.c. My observation was that the accounting around space_info-bytes_may_use did not appear to be tightly closed. The space_info-bytes_may_use value would grow large (often 3 or 4 times greater than space_info-total), and the flow through reserve_metadata_bytes() would stay in overcommit. I was unsuccessfull in figuring out how to rework or close the loop on the accounting for space_info-bytes_may_use. I noticed that btrfs seemed to work OK even though the value in space_info-bytes_may_use appeared inexplicably large, and btrfs was always in overcommit. So, since you're asking for possibly 'crazy ideas', I suggest considering finding a way to ignore space_info-bytes_may_use in reserve_metadata_bytes(). Either make the overcommit the default (which I found to approximate my real-life case anyhow), or have a simple mechanism for quick fail-over to overcommit. I doubt this will be any kind of comprehensive fix for ENOSPC issues, but simplifying reserve_metadata_bytes() may make it easier to find the other issues. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 2/2] Btrfs: fix the snapshot that should not exist
On Thu, Aug 2, 2012 at 6:46 AM, David Sterba d...@jikos.cz wrote: ... Fsck spits lots of errors: ref mismatch on [1133031424 4096] extent item 1, found 0 Backref 1133031424 root 5 not referenced back 0x7d1f40 Incorrect global backref count on 1133031424 found 1 wanted 0 backpointer mismatch on [1133031424 4096] owner ref check failed [1133031424 4096] ref mismatch on [11213131776 16384] extent item 1, found 0 Incorrect local backref count on 11213131776 root 5 owner 34509 offset 0 found 0 wanted 1 back 0x1424d8e0 backpointer mismatch on [11213131776 16384] owner ref check failed [11213131776 16384] fs tree 260 refs 6 not found unresolved ref root 263 dir 256 index 4 namelen 14 name snap2748615355 error 600 unresolved ref root 267 dir 256 index 4 namelen 14 name snap2748615355 error 600 unresolved ref root 269 dir 256 index 4 namelen 14 name snap2748615355 error 600 unresolved ref root 273 dir 256 index 4 namelen 14 name snap2748615355 error 600 unresolved ref root 274 dir 256 index 4 namelen 14 name snap2748615355 error 600 unresolved ref root 276 dir 256 index 4 namelen 14 name snap2748615355 error 600 I've asked Josef to pull those patches out of btrfs-next, feel free to send me any testing version if you can't reproduce it on your side. I've run into similar errors after an unclean shutdown on a partition where I make use of several subvolumes. Some of the data in the subvolume is inaccessible, although the original root volume seems OK. So far, the partition is resisting my efforts to fix the errors. This unclean shutdown occurred while using a 3.5.3 kernel merged with the for-linus branch, so it did not contain any of Miao Xie's recent patches to address this issue. I've made an image of the corrupted volume if anybody has something they'd like me to test. But I'm primarily reporting this to let you know I'm seeing errors similar to the one's thrown off by your test case. I'm going to look into merging the patches from Josef's btrfs-next to see if the problem recurs. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: wait on async pages when shrinking delalloc
On Thu, Sep 6, 2012 at 3:51 PM, Josef Bacik jba...@fusionio.com wrote: Mitch reported a problem where you could get an ENOSPC error when untarring a kernel git tree onto a 16gb file system with compress-force=zlib. This is because compression is a huge pain, it will return from -writepages() without having actually created any ordered extents. To get around this we check to see if the async submit counter is up, and if it is wait until it drops to 0 before doing our normal ordered wait dance. With this patch I can now untar a kernel git tree onto a 16gb file system without getting ENOSPC errors. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com Thanks, this patch fixes the issues I was seeing with ENOSPC on zlib compression. I also did some rough testing for any performance regressions on lzo and with no compression, and my benchmarks were all in the same range. I don't have any comparisons available for zlib since my benchmark tests would always trigger ENOSPC errors. I also checked Zach Brown's suggestion of dropping: + if (atomic_read(root-fs_info-async_delalloc_pages)) and just leaving: + wait_event(root-fs_info-async_submit_wait, + !atomic_read(root-fs_info-async_delalloc_pages)); This is because the wait_event macro should perform the same test (although it will start a 'do' loop before making the same test). This change also worked in my tests. For reference, I pulled up the wait_event macro according to the Linux Cross Reference site: http://lxr.free-electrons.com/source/include/linux/wait.h#L205 205 /** 206 * wait_event - sleep until a condition gets true 207 * @wq: the waitqueue to wait on 208 * @condition: a C expression for the event to wait for 209 * 210 * The process is put to sleep (TASK_UNINTERRUPTIBLE) until the 211 * @condition evaluates to true. The @condition is checked each time 212 * the waitqueue @wq is woken up. 213 * 214 * wake_up() has to be called after changing any variable that could 215 * change the result of the wait condition. 216 */ 217 #define wait_event(wq, condition) \ 218 do {\ 219 if (condition) \ 220 break; \ 221 __wait_event(wq, condition);\ 222 } while (0) Tested-by: Mitch Harder mitch.har...@sabayonlinux.org -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Varying Leafsize and Nodesize in Btrfs
I've been trying out different leafsize/nodesize settings by benchmarking some typical operations. These changes had more impact than I expected. Using a leafsize/nodesize of either 8192 or 16384 provided a noticeable improvement in my limited testing. These results are similar to some that Chris Mason has already reported: https://oss.oracle.com/~mason/blocksizes/ I noticed that metadata allocation was more efficient with bigger block sizes. My data was git kernel sources, which will utilize btrfs' inlining. This may have tilted the scales. Read operations seemed to benefit the most. Write operations seemed to get punished when the leafsize/nodesize was increased to 64K. Are there any known downsides to using a leafsize/nodesize bigger than the default 4096? Time (seconds) to finish 7 simultaneous copy operations on a set of Linux kernel git sources. Leafsize/ NodesizeTime (Std Dev%) 4096 124.7 (1.25%) 8192 115.2 (0.69%) 16384114.8 (0.53%) 65536130.5 (0.3%) Time (seconds) to finish 'git status' on a set of Linux kernel git sources. Leafsize/ NodesizeTime (Std Dev%) 4096 13.2 (0.86%) 8192 11.2 (1.36%) 16384 9.0 (0.92%) 65536 8.5 (1.3%) Time (seconds) to perform a git checkout of a different branch on a set of Linux kernel sources. Leafsize/ NodesizeTime (Std Dev%) 4096 19.4 (1.1%) 8192 16.94 (3.1%) 16384 14.4 (0.6%) 65536 16.3 (0.8%) Time (seconds) to perform 7 simultaneous rsync threads on the Linux kernel git sources directories. Leafsize/ NodesizeTime (Std Dev%) 4096 410.3 (4.5%) 8192 289.8 (0.96%) 16384250.7 (3.8%) 65536227.0 (1.2%) Used Metadata (MB) as reported by 'btrfs fi df' Leafsize/ NodesizeSize (Std Dev%) 4096 484 MB (0.13%) 8192 443 MB (0.2%) 16384424 MB (0.2%) 65536411 MB (0.2%) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs Intermittent ENOSPC Issues
On Tue, Jul 31, 2012 at 2:37 PM, Mitch Harder mitch.har...@sabayonlinux.org wrote: I've been working on running down intermittent ENOSPC issues. I can only seem to replicate ENOSPC errors when running zlib compression. However, I have been seeing similar ENOSPC errors to a lesser extent when playing with the LZ4HC patches. I've been spending most of my efforts on the specific areas of code that are generating the ENOSPC error. But I've been developing the perception that the real problem is elsewhere. I probably should have looked at this a while ago, but if I generate an Alt-SysRq-W delayed tasks traceback during the intermittent periods when ENOSPC errors are occurring, I'm seeing delays in other areas. It may be that the ENOSPC errors are occurring due to a page lock that is not clearing in another thread. [12339.617366] SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E) memory-full-oom-kill(F) kill-all-tasks(I) thaw-filesystems(J) saK show-backtrace-all-active-cpus(L) show-memory-usage(M) nice-all-RT-tasks(N) powerOff show-registers(P) show-all-timers(Q) unRaw Sync show-task-states(T) Unmount show-blocked-tasks(W) dump-ftrace-buffer(Z) [12339.650620] SysRq : Show Blocked State [12339.650624] taskPC stack pid father [12339.650678] flush-btrfs-6 D 810c03bb 0 7162 2 0x [12339.650681] 880126a83990 0046 880126a82000 8801266fad40 [12339.650684] 00012280 880126a83fd8 00012280 4000 [12339.650687] 880126a83fd8 00012280 880129af16a0 8801266fad40 [12339.650690] Call Trace: [12339.650698] [8106c6d0] ? ktime_get_ts+0xae/0xbb [12339.650701] [8106c6d0] ? ktime_get_ts+0xae/0xbb [12339.650705] [810c03bb] ? __lock_page+0x6d/0x6d [12339.650708] [8162da84] schedule+0x64/0x66 [12339.650710] [8162db12] io_schedule+0x8c/0xcf [12339.650713] [810c03c9] sleep_on_page+0xe/0x12 [12339.650715] [8162c159] __wait_on_bit_lock+0x46/0x8f [12339.650717] [810c0117] ? find_get_pages_tag+0xf8/0x134 [12339.650720] [810c03b4] __lock_page+0x66/0x6d [12339.650723] [8104b7ff] ? autoremove_wake_function+0x39/0x39 [12339.650753] [a0065f28] extent_write_cache_pages.clone.16.clone.29+0x143/0x30c [btrfs] [12339.650770] [a0066303] extent_writepages+0x48/0x5d [btrfs] [12339.650784] [a0053019] ? uncompress_inline.clone.33+0x15f/0x15f [btrfs] [12339.650788] [8105c8f4] ? update_curr+0x81/0x123 [12339.650802] [a00528ac] btrfs_writepages+0x27/0x29 [btrfs] [12339.650805] [810c9975] do_writepages+0x20/0x29 [12339.650808] [8112ec67] __writeback_single_inode.clone.22+0x48/0x11c [12339.650811] [8112f1cf] writeback_sb_inodes+0x1f0/0x332 [12339.650813] [810c870e] ? global_dirtyable_memory+0x1a/0x3b [12339.650816] [8112f389] __writeback_inodes_wb+0x78/0xb9 [12339.650818] [8112f510] wb_writeback+0x146/0x23e [12339.650820] [810c891b] ? global_dirty_limits+0x2f/0x10f [12339.650822] [8112fdef] wb_do_writeback+0x195/0x1b0 [12339.650825] [8112fe98] bdi_writeback_thread+0x8e/0x1f1 [12339.650827] [8112fe0a] ? wb_do_writeback+0x1b0/0x1b0 [12339.650829] [8112fe0a] ? wb_do_writeback+0x1b0/0x1b0 [12339.650832] [8104b2ef] kthread+0x89/0x91 [12339.650835] [816303f4] kernel_thread_helper+0x4/0x10 [12339.650837] [8104b266] ? kthread_freezable_should_stop+0x57/0x57 [12339.650839] [816303f0] ? gs_change+0xb/0xb [12339.650842] tar D 88012683f8b8 0 7173 7152 0x [12339.650845] 880126c0f9e8 0086 880126c0e000 8801267496a0 [12339.650848] 00012280 880126c0ffd8 00012280 4000 [12339.650851] 880126c0ffd8 00012280 880129af16a0 8801267496a0 [12339.650854] Call Trace: [12339.650866] [a0037b35] ? block_rsv_release_bytes+0xc7/0x127 [btrfs] [12339.650869] [8103c073] ? lock_timer_base.clone.26+0x2b/0x50 [12339.650871] [8162da84] schedule+0x64/0x66 [12339.650873] [8162c075] schedule_timeout+0x22c/0x26a [12339.650876] [8103c038] ? run_timer_softirq+0x2d4/0x2d4 [12339.650878] [8162c0f1] schedule_timeout_killable+0x1e/0x20 [12339.650890] [a003dd0c] reserve_metadata_bytes.clone.57+0x4ba/0x5e7 [btrfs] [12339.650906] [a0066b52] ? free_extent_buffer+0x68/0x6c [btrfs] [12339.650918] [a003e1a9] btrfs_block_rsv_add+0x2b/0x4d [btrfs] [12339.650932] [a004ff40] start_transaction+0x131/0x310 [btrfs] [12339.650946] [a0050386] btrfs_start_transaction+0x13/0x15 [btrfs] [12339.650961] [a005b10a] btrfs_create+0x3a/0x1e0 [btrfs] [12339.650964] [81120861] ? d_splice_alias+0xcc/0xd8 [12339.650966] [811173aa] vfs_create+0x9c/0xf5 [12339.650968] [81119786
Re: cross-subvolume cp --reflink
On Fri, Aug 17, 2012 at 12:20 AM, Marc MERLIN m...@merlins.org wrote: On Thu, Aug 16, 2012 at 09:20:00PM -0700, james northrup wrote: dunno if this thread is dead, but im inclined to patch in cp --reflink to fdupes prog. It currently does provide a poor-man's dedupe via md5sum and hardlink, or delete. all the better if the distro-kernels can backport cross-snapshot reflinks sooner than later. So, I'd love for cp --reflink to bring back a deleted VM (huge file) from a snapshot back to trunk without duplicating it. But how would fdupes help? I can't hardlink between two snapshots, can I? gandalfthegreat:/mnt/btrfs_pool1# ln usr_weekly_20120812_00\:02\:01/svn-commit.tmp usr/test ln: failed to create hard link `usr/test' = `usr_weekly_20120812_00:02:01/svn-commit.tmp': Invalid cross-device link So, is there anything user space can do without kernel support? A cross-subvolume copy patch has made it into 3.6_rc http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=362a20c5e27614739c4 This patch will allow cp --reflink across subvolumes, as long as the copy does not cross mount points. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Update LZO compression
On Thu, Aug 16, 2012 at 5:17 PM, Andi Kleen a...@firstfloor.org wrote: On Thu, Aug 16, 2012 at 11:55:06AM -0700, james northrup wrote: looks like ARM results are inconclusive from a lot of folks without bandwidth to do a write-up, what about just plain STAGING status for ARM so the android tweakers can beat on it for a while? Staging only really works for new drivers, not for updating existing library functions like this. I suppose you could keep both and have the architecture select with a CONFIG. I've been doing some rough benchmarking with the updated LZO in btrfs. My tests primarily consist of timing some typical copying, git manipulating, and running rsync using a set of kernel git sources. Git sources are typically about 50% pack files which won't compress very well, with the remainder being mostly highly compressible source files. Of course, any underlying speed improvement attributable only to LZO is not shown by test like this. But I thought it would be interesting to see the impact in some typical real-world btrfs operations. I was seeing between 3-9% improvement in speed with the new LZO. Copying several directories of git sources showed the most improvement, ~9%. Typical git operations, such as a git checkout or git status where only showing 3-5% improvement, which is close to the noise level of my tests. Running multiple rsync processes showed a 5% improvement. With only 10 trials (5 with each LZO), I can't say I would statistically hang my hat on these numbers. Given all the other stuff that is going on in my rough benchmarks, a 3-9% improvement from a single change is probably pretty good. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: do not allocate chunks as agressively
On Tue, Aug 14, 2012 at 3:22 PM, Josef Bacik jba...@fusionio.com wrote: Swinging this pendulum back the other way. We've been allocating chunks up to 2% of the disk no matter how much we actually have allocated. So instead fix this calculation to only allocate chunks if we have more than 80% of the space available allocated. Please test this as it will likely cause all sorts of ENOSPC problems to pop up suddenly. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com I've been testing this patch with my multiple rsync test (On a 3.5.1 kernel merged with for-linus). I tested without compression, and with lzo compression, and I haven't run into any ENOSPC issues. I still have ENOSPC issues with zlib, with or without this patch. I made a series of runs with and without this patch (on an uncompressed, newly formatted partition), and some of the results were not what I anticipated. 1) I found that *MORE* metadata space was being allocated with this patch than when using an unpatched baseline kernel. The total allocated space was exactly the same in each run (I saw a slight variation in the amount of used Metadata). On the unpatched baseline kernel, at the end of the run, the 'btrfs fi df' command would show: # btrfs fi df /mnt/benchmark/ Data: total=10.01GB, used=6.99GB System: total=4.00MB, used=4.00KB Metadata: total=776.00MB, used=481.38MB With this patch applied, the 'btrfs fi df' command would show: # btrfs fi df /mnt/benchmark/ Data: total=10.01GB, used=6.99GB System: total=4.00MB, used=4.00KB Metadata: total=1.01GB, used=480.94MB 2) The multiple rsync's would run significantly faster with the patched kernel. Unpatched baseline kernel: Time to run 7 rysncs: 348.3 sec (+/- 9.7 sec) Patched kernel: Time to run 7 rsyncs: 316.6 sec (+/- 6.5 sec) Perhaps the extra allocated metadata space made things run better, or perhaps something else was going on. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix race in run_clustered refs
On Wed, Aug 8, 2012 at 3:37 PM, Josef Bacik jba...@fusionio.com wrote: On Wed, Aug 08, 2012 at 01:49:06PM -0600, Arne Jansen wrote: run_clustered_refs runs all delayed refs for one head one by one. During the runs, the delayed_refs-lock is released. In this window, the ref_mod from the head does not match the sum of all refs below the head. When btrfs_lookup_extent_info is run in this window, it gives inconsistent results. The qgroups patch added code to put delayed refs back, thus opening this window very wide. This patch assures that head-ref_mod always matches the queued refs, but a window still remains where on-disk refs + delayed_refs miss the ref currently being run. Signed-off-by: Arne Jansen sensi...@gmx.net --- fs/btrfs/extent-tree.c | 17 + 1 files changed, 17 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index e66dc9a..60d175a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2318,6 +2318,23 @@ static noinline int run_clustered_refs(struct btrfs_trans_handle *trans, ref-in_tree = 0; rb_erase(ref-rb_node, delayed_refs-root); delayed_refs-num_entries--; + if (locked_ref) { + /* + * when we play the delayed ref, also correct the + * ref_mod on head + */ + switch (ref-action) { + case BTRFS_ADD_DELAYED_REF: + case BTRFS_ADD_DELAYED_EXTENT: + locked_ref-node.ref_mod -= ref-ref_mod; + break; + case BTRFS_DROP_DELAYED_REF: + locked_ref-node.ref_mod += ref-ref_mod; + break; + default: + WARN_ON(1); + } + } spin_unlock(delayed_refs-lock); ret = run_one_delayed_ref(trans, root, ref, extent_op, btrfs_lookup_extent_info takes the mutex on the head before it looks at it's ref_mod, so it should always be consistent. Maybe somebody else is messing with refs and not doing the same thing? If that's the case we should fix them by doing the same thing, this isn't a fix. Thanks, Josef I understand from discussion on IRC that there may be updates to this patch. But, FWIW, this patch addresses the multiple rsync problem I was seeing. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix deadlock in wait_for_more_refs
On Mon, Aug 6, 2012 at 3:18 PM, Arne Jansen sensi...@gmx.net wrote: Commit a168650c introduced a waiting mechanism to prevent busy waiting in btrfs_run_delayed_refs. This can deadlock with btrfs_run_ordered_operations, where a tree_mod_seq is held while waiting for the io to complete, while the end_io calls btrfs_run_delayed_refs. This whole mechanism is unnecessary. If not enough runnable refs are available to satisfy count, just return as count is more like a guideline than a strict requirement. In case we have to run all refs, commit transaction makes sure that no other threads are working in the transaction anymore, so we just assert here that no refs are blocked. I've been testing this patch after manually merging on top of Josef's Btrfs: barrier before waitqueue_active V2 patch. With that arrangement, I've been unable to reproduce the deadlock on my system. I'll continue banging away on it tomorrow, and let you know if I attain a deadlock. Also, let me know if you need me to test without including Josef's added barriers. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: barrier before waitqueue_active
On Wed, Aug 1, 2012 at 3:25 PM, Josef Bacik jba...@fusionio.com wrote: We need an smb_mb() before waitqueue_active to avoid missing wakeups. Before Mitch was hitting a deadlock between the ordered flushers and the transaction commit because the ordered flushers were waiting for more refs and were never woken up, so those smp_mb()'s are the most important. Everything else I added for correctness sake and to avoid getting bitten by this again somewhere else. Thanks, This patch seems to make it tougher to hit a deadlock, but I'm still encountering intermittent deadlocks using this patch when running multiple rsync threads. I've also tested Patch 2, and that has me hitting a deadlock even quicker (when starting several copying threads). I also found a slight performance hit using this patch. On a 3.4.6 kernel (merged with the 3.5_rc for-linus branch), I would typically complete my rsync test in ~265 seconds. Also, I can't recall hitting a deadlock on the 3.4.6 kernel (with 3.5_rc for-linus). When using this patch, the test would take ~310 seconds (when it didn't hit a deadlock). Here's the Delayed Tasks (Ctrl-SysRq-W) when using JUST this patch: [ 1568.794030] SysRq : Show Blocked State [ 1568.794101] taskPC stack pid father [ 1568.794123] btrfs-endio-wri D 88012579c000 0 3845 2 0x [ 1568.794128] 8801254f3c20 0046 8801254f2000 8801241b5a80 [ 1568.794132] 00012280 8801254f3fd8 00012280 4000 [ 1568.794136] 8801254f3fd8 00012280 880129af16a0 8801241b5a80 [ 1568.794140] Call Trace: [ 1568.794179] [a0068785] ? memcpy_extent_buffer+0x159/0x17a [btrfs] [ 1568.794200] [a0082ab7] ? find_ref_head+0xa3/0xc6 [btrfs] [ 1568.794220] [a008343c] ? btrfs_find_ref_cluster+0xdd/0x117 [btrfs] [ 1568.794225] [8162d58c] schedule+0x64/0x66 [ 1568.794241] [a003fc86] btrfs_run_delayed_refs+0x269/0x3f0 [btrfs] [ 1568.794246] [8104b10e] ? wake_up_bit+0x2a/0x2a [ 1568.794265] [a004fdc4] __btrfs_end_transaction+0xca/0x283 [btrfs] [ 1568.794283] [a004ffda] btrfs_end_transaction+0x15/0x17 [btrfs] [ 1568.794302] [a00555da] btrfs_finish_ordered_io+0x2e4/0x334 [btrfs] [ 1568.794306] [8103b980] ? run_timer_softirq+0x2d4/0x2d4 [ 1568.794325] [a005563f] finish_ordered_fn+0x15/0x17 [btrfs] [ 1568.794344] [a0070ef8] worker_loop+0x188/0x4e0 [btrfs] [ 1568.794365] [a0070d70] ? btrfs_queue_worker+0x275/0x275 [btrfs] [ 1568.794384] [a0070d70] ? btrfs_queue_worker+0x275/0x275 [btrfs] [ 1568.794387] [8104ac37] kthread+0x89/0x91 [ 1568.794391] [8162fd74] kernel_thread_helper+0x4/0x10 [ 1568.794395] [8104abae] ? kthread_freezable_should_stop+0x57/0x57 [ 1568.794398] [8162fd70] ? gs_change+0xb/0xb [ 1568.794400] btrfs-transacti D 88009912ba50 0 3851 2 0x [ 1568.794403] 8801241cfc70 0046 8801241ce000 8801248cda80 [ 1568.794407] 00012280 8801241cffd8 00012280 4000 [ 1568.794411] 8801241cffd8 00012280 8801254b8000 8801248cda80 [ 1568.794415] Call Trace: [ 1568.794436] [a0066646] ? extent_writepages+0x53/0x5d [btrfs] [ 1568.794455] [a005357b] ? uncompress_inline.clone.33+0x15f/0x15f [btrfs] [ 1568.794459] [810c9ada] ? pagevec_lookup_tag+0x24/0x2e [ 1568.794478] [a0052e0e] ? btrfs_writepages+0x27/0x29 [btrfs] [ 1568.794481] [810c90b1] ? do_writepages+0x20/0x29 [ 1568.794485] [8162d58c] schedule+0x64/0x66 [ 1568.794505] [a0061547] btrfs_start_ordered_extent+0xde/0xfa [btrfs] [ 1568.794508] [8104b10e] ? wake_up_bit+0x2a/0x2a [ 1568.794529] [a0061984] ? btrfs_lookup_first_ordered_extent+0x65/0x99 [btrfs] [ 1568.794549] [a0061a6a] btrfs_wait_ordered_range+0xb2/0xda [btrfs] [ 1568.794569] [a0061bcc] btrfs_run_ordered_operations+0x13a/0x1c1 [btrfs] [ 1568.794587] [a004f5f5] btrfs_commit_transaction+0x287/0x960 [btrfs] [ 1568.794606] [a00502b1] ? start_transaction+0x2d5/0x310 [btrfs] [ 1568.794609] [8104b10e] ? wake_up_bit+0x2a/0x2a [ 1568.794627] [a004913b] transaction_kthread+0x187/0x258 [btrfs] [ 1568.794644] [a0048fb4] ? btrfs_alloc_root+0x42/0x42 [btrfs] [ 1568.794661] [a0048fb4] ? btrfs_alloc_root+0x42/0x42 [btrfs] [ 1568.794664] [8104ac37] kthread+0x89/0x91 [ 1568.794668] [8162fd74] kernel_thread_helper+0x4/0x10 [ 1568.794671] [8104abae] ? kthread_freezable_should_stop+0x57/0x57 [ 1568.794674] [8162fd70] ? gs_change+0xb/0xb [ 1568.794676] flush-btrfs-1 D 88012579c000 0 3857 2 0x [ 1568.794680] 880037125670 0046 880037124000 8801254b8000 [ 1568.794684] 00012280 880037125fd8 00012280 4000 [
Btrfs Intermittent ENOSPC Issues
I've been working on running down intermittent ENOSPC issues. I can only seem to replicate ENOSPC errors when running zlib compression. However, I have been seeing similar ENOSPC errors to a lesser extent when playing with the LZ4HC patches. I apologize for not following up on this sooner, but I had drifted away from using zlib, and didn't notice there was still an issue. My test case involves un-archiving linux git sources to a freshly formatted btrfs partition, mounted with compress-force=zlib. I am using a 16 GB partition on a 250 GB Western Digital SATA Hard Disk. My current kernel is x86_64 linux-3.5.0 merged with Chris' for-linus branch (for 3.6_rc). This includes Josef's Btrfs: flush delayed inodes if we're short on space patch. I haven't isolated a root cause, but here's the feedback I have so far. (1) My test case won't generate ENOSPC issues with lzo compression or no compression. (2) I've inserted some trace_printk debugging statements to trace back the call stack, and the ENOSPC errors only seem to occur on a new transaction: vfs_create - btrfs_create - btrfs_start_transaction - start_transaction - btrfs_block_rsv_add - reserve_metadata_bytes. (3) The ENOSPC condition will usually clear in a few seconds, allowing writes to proceed. (4) I've added a loop to the reserve_metadata_bytes() function to loop back with 'flush_state = FLUSH_DELALLOC (1)' for 1024 retries. This reduces and/or eliminates the ENOSPC errors, as if we're waiting on something else that is trying to complete. (5) I've been heavily debugging the reserve_metadata_bytes() function, and I'm seeing problems with the way space_info-bytes_may_use is handled. The space_info-bytes_may_use value is important in determining if we're in an over-commit state. But space_info-bytes_may_use value is often increased arbitrarily without any mechanism for correcting the value. Subsequently, space_info-bytes_may_use quickly increases in size to the point where we are always in fallback allocation as if we're overcommitted. In my trials, it was hard to capture a point where space_info-bytes_may_use wasn't larger than the available size. (6) Even though reserve_metadata_bytes() is almost always in fallback overcommitted mode, it is still working pretty well, and I've developed the perception that the problem is something that needs to finish elsewhere. Sorry for not having a patch to fix the issue. I'll try to keep banging on it as time allows. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/1] Btrfs: Explicitly include vmalloc.h in send.c
When compiling without SMP and generic x86_64, I encountered the following errors due to vmalloc.h not being implicitly included: CC fs/btrfs/send.o fs/btrfs/send.c: In function ‘fs_path_free’: fs/btrfs/send.c:185:4: error: implicit declaration of function ‘vfree’ fs/btrfs/send.c: In function ‘fs_path_ensure_buf’: fs/btrfs/send.c:215:4: error: implicit declaration of function ‘vmalloc’ fs/btrfs/send.c:215:12: warning: assignment makes pointer from integer without a cast fs/btrfs/send.c:225:12: warning: assignment makes pointer from integer without a cast fs/btrfs/send.c:233:13: warning: assignment makes pointer from integer without a cast fs/btrfs/send.c: In function ‘iterate_dir_item’: fs/btrfs/send.c:900:10: warning: assignment makes pointer from integer without a cast fs/btrfs/send.c:909:11: warning: assignment makes pointer from integer without a cast fs/btrfs/send.c: In function ‘btrfs_ioctl_send’: fs/btrfs/send.c:4462:17: warning: assignment makes pointer from integer without a cast fs/btrfs/send.c:4468:17: warning: assignment makes pointer from integer without a cast fs/btrfs/send.c:4474:2: error: implicit declaration of function ‘vzalloc’ fs/btrfs/send.c:4474:20: warning: assignment makes pointer from integer without a cast fs/btrfs/send.c:4482:21: warning: assignment makes pointer from integer without a cast make[2]: *** [fs/btrfs/send.o] Error 1 make[1]: *** [fs/btrfs] Error 2 If it makes sense, please feel free to include this minor change in with other send/receive fixes. Mitch Harder (1): Btrfs: Explicitly include vmalloc.h in send.c fs/btrfs/send.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) -- 1.7.8.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] Btrfs: Explicitly include vmalloc.h in send.c
Certain architectures or platforms or combinations of CONFIG options require an explicit #include linux/vmalloc.h. Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org --- fs/btrfs/send.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index bf232c8..118e76d 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -25,6 +25,7 @@ #include linux/posix_acl_xattr.h #include linux/radix-tree.h #include linux/crc32c.h +#include linux/vmalloc.h #include send.h #include backref.h -- 1.7.8.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] Btrfs: Check INCOMPAT flags on remount and add helper function
In support of the recently added capability to remount with lzo compression, provide a helper function to check the compression INCOMPAT flags when remounting with lzo compression, and set the flags if necessary. Also, implement the new helper function when defragmenting with explicit lzo compression and when setting the default subvolume. Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org --- v1-v2 - Remove extraneous formatting change. v2-v3 - Consolidate into a single patch - Convert helper function to a static inline function. v3-v4 - Per feedback from Li Zefan, change function name from _chk_ to _set_ - Per feedback from David Sterba, make the helper function more generic. - The more generic function can also be implemented in the INCOMPAT check made for setting the default subvolume. fs/btrfs/ctree.h | 17 + fs/btrfs/ioctl.c | 16 ++-- fs/btrfs/super.c |1 + 3 files changed, 20 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a0ee2f8..5422e54 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3103,6 +3103,23 @@ void __btrfs_abort_transaction(struct btrfs_trans_handle *trans, struct btrfs_root *root, const char *function, unsigned int line, int errno); +#define btrfs_set_fs_incompat(__fs_info, opt) \ + __btrfs_set_fs_incompat((__fs_info), BTRFS_FEATURE_INCOMPAT_##opt) + +static inline void __btrfs_set_fs_incompat(struct btrfs_fs_info *fs_info, + u64 flag) +{ + struct btrfs_super_block *disk_super; + u64 features; + + disk_super = fs_info-super_copy; + features = btrfs_super_incompat_flags(disk_super); + if (!(features flag)) { + features |= flag; + btrfs_set_super_incompat_flags(disk_super, features); + } +} + #define btrfs_abort_transaction(trans, root, errno)\ do { \ __btrfs_abort_transaction(trans, root, __func__,\ diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 17facea..0d5d079 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1042,11 +1042,9 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, u64 newer_than, unsigned long max_to_defrag) { struct btrfs_root *root = BTRFS_I(inode)-root; - struct btrfs_super_block *disk_super; struct file_ra_state *ra = NULL; unsigned long last_index; u64 isize = i_size_read(inode); - u64 features; u64 last_len = 0; u64 skip = 0; u64 defrag_end = 0; @@ -1233,11 +1231,8 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, mutex_unlock(inode-i_mutex); } - disk_super = root-fs_info-super_copy; - features = btrfs_super_incompat_flags(disk_super); if (range-compress_type == BTRFS_COMPRESS_LZO) { - features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO; - btrfs_set_super_incompat_flags(disk_super, features); + btrfs_set_fs_incompat(root-fs_info, COMPRESS_LZO); } ret = defrag_count; @@ -2761,8 +2756,6 @@ static long btrfs_ioctl_default_subvol(struct file *file, void __user *argp) struct btrfs_path *path; struct btrfs_key location; struct btrfs_disk_key disk_key; - struct btrfs_super_block *disk_super; - u64 features; u64 objectid = 0; u64 dir_id; @@ -2813,12 +2806,7 @@ static long btrfs_ioctl_default_subvol(struct file *file, void __user *argp) btrfs_mark_buffer_dirty(path-nodes[0]); btrfs_free_path(path); - disk_super = root-fs_info-super_copy; - features = btrfs_super_incompat_flags(disk_super); - if (!(features BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL)) { - features |= BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL; - btrfs_set_super_incompat_flags(disk_super, features); - } + btrfs_set_fs_incompat(root-fs_info, DEFAULT_SUBVOL); btrfs_end_transaction(trans, root); return 0; diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 26da344..75ee2c7 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -401,6 +401,7 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) compress_type = lzo; info-compress_type = BTRFS_COMPRESS_LZO; btrfs_set_opt(info-mount_opt, COMPRESS); + btrfs_set_fs_incompat(info, COMPRESS_LZO); } else if (strncmp(args[0].from, no, 2) == 0) { compress_type = no; info-compress_type = BTRFS_COMPRESS_NONE; -- 1.7.8.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message
[PATCH 0/2] LZO INCOMPAT Checking
The following patches are against Josef's btrfs-next repository, and depend on Arnd Hannemann's Btrfs: allow mount -o remount,compress=no patch. The method was based on a previous example of checking for lzo INCOMPAT used by Li Zefan when defragmenting with explicit compression (btrfs: Allow to specify compress method when defrag) in ioctl.c. The second patch uses the new function in the above referenced existing check for lzo INCOMPAT performed when defragmenting with explicit lzo compression. This patch provides no functional changes. Mitch Harder (2): Btrfs: Check INCOMPAT flags on remount with lzo compression Btrfs: Use common function to check lzo INCOMPAT on defrag. fs/btrfs/ctree.h |1 + fs/btrfs/ioctl.c |7 +-- fs/btrfs/super.c | 21 - 3 files changed, 22 insertions(+), 7 deletions(-) -- 1.7.8.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Btrfs: Check INCOMPAT flags on remount with lzo compression
In support of the recently added capability to remount with lzo compression, check the compression INCOMPAT flags when remounting with lzo compression, and set the flags if necessary. Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org --- fs/btrfs/ctree.h |1 + fs/btrfs/super.c | 21 - 2 files changed, 21 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a0ee2f8..8bee032 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3094,6 +3094,7 @@ ssize_t btrfs_listxattr(struct dentry *dentry, char *buffer, size_t size); /* super.c */ int btrfs_parse_options(struct btrfs_root *root, char *options); +void btrfs_chk_lzo_incompat(struct btrfs_root *root); int btrfs_sync_fs(struct super_block *sb, int wait); void btrfs_printk(struct btrfs_fs_info *fs_info, const char *fmt, ...); void __btrfs_std_error(struct btrfs_fs_info *fs_info, const char *function, diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 26da344..4398fd2 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -401,11 +401,13 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) compress_type = lzo; info-compress_type = BTRFS_COMPRESS_LZO; btrfs_set_opt(info-mount_opt, COMPRESS); + btrfs_chk_lzo_incompat(root); } else if (strncmp(args[0].from, no, 2) == 0) { compress_type = no; info-compress_type = BTRFS_COMPRESS_NONE; btrfs_clear_opt(info-mount_opt, COMPRESS); - btrfs_clear_opt(info-mount_opt, FORCE_COMPRESS); + btrfs_clear_opt(info-mount_opt, + FORCE_COMPRESS); compress_force = false; } else { ret = -EINVAL; @@ -587,6 +589,23 @@ out: } /* + * Check the INCOMPAT features in the super block, and set the + * LZO INCOMPAT flag if it has not been set. + */ +void btrfs_chk_lzo_incompat(struct btrfs_root *root) +{ + struct btrfs_super_block *disk_super; + u64 features; + + disk_super = root-fs_info-super_copy; + features = btrfs_super_incompat_flags(disk_super); + if (!(features BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO)) { + features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO; + btrfs_set_super_incompat_flags(disk_super, features); + } +} + +/* * Parse mount options that are required early in the mount process. * * All other options will be parsed on much later in the mount process and -- 1.7.8.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Btrfs: Use common function to check lzo INCOMPAT on defrag.
When defragmenting with explicit lzo compression, simplify the check for lzo INCOMPAT by using the new common function introduced to support remounting with lzo compression. Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org --- fs/btrfs/ioctl.c |7 +-- 1 files changed, 1 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 17facea..d5fd69e 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1042,11 +1042,9 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, u64 newer_than, unsigned long max_to_defrag) { struct btrfs_root *root = BTRFS_I(inode)-root; - struct btrfs_super_block *disk_super; struct file_ra_state *ra = NULL; unsigned long last_index; u64 isize = i_size_read(inode); - u64 features; u64 last_len = 0; u64 skip = 0; u64 defrag_end = 0; @@ -1233,11 +1231,8 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, mutex_unlock(inode-i_mutex); } - disk_super = root-fs_info-super_copy; - features = btrfs_super_incompat_flags(disk_super); if (range-compress_type == BTRFS_COMPRESS_LZO) { - features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO; - btrfs_set_super_incompat_flags(disk_super, features); + btrfs_chk_lzo_incompat(root); } ret = defrag_count; -- 1.7.8.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/2] Btrfs: Check INCOMPAT flags on remount with lzo compression
In support of the recently added capability to remount with lzo compression, check the compression INCOMPAT flags when remounting with lzo compression, and set the flags if necessary. Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org --- v1-v2: - Remove extraneous formatting change. fs/btrfs/ctree.h |1 + fs/btrfs/super.c | 18 ++ 2 files changed, 19 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a0ee2f8..8bee032 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3094,6 +3094,7 @@ ssize_t btrfs_listxattr(struct dentry *dentry, char *buffer, size_t size); /* super.c */ int btrfs_parse_options(struct btrfs_root *root, char *options); +void btrfs_chk_lzo_incompat(struct btrfs_root *root); int btrfs_sync_fs(struct super_block *sb, int wait); void btrfs_printk(struct btrfs_fs_info *fs_info, const char *fmt, ...); void __btrfs_std_error(struct btrfs_fs_info *fs_info, const char *function, diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 26da344..f3a5967 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -401,6 +401,7 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) compress_type = lzo; info-compress_type = BTRFS_COMPRESS_LZO; btrfs_set_opt(info-mount_opt, COMPRESS); + btrfs_chk_lzo_incompat(root); } else if (strncmp(args[0].from, no, 2) == 0) { compress_type = no; info-compress_type = BTRFS_COMPRESS_NONE; @@ -587,6 +588,23 @@ out: } /* + * Check the INCOMPAT features in the super block, and set the + * LZO INCOMPAT flag if it has not been set. + */ +void btrfs_chk_lzo_incompat(struct btrfs_root *root) +{ + struct btrfs_super_block *disk_super; + u64 features; + + disk_super = root-fs_info-super_copy; + features = btrfs_super_incompat_flags(disk_super); + if (!(features BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO)) { + features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO; + btrfs_set_super_incompat_flags(disk_super, features); + } +} + +/* * Parse mount options that are required early in the mount process. * * All other options will be parsed on much later in the mount process and -- 1.7.8.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] Btrfs: Use common function to check lzo INCOMPAT on defrag.
When defragmenting with explicit lzo compression, simplify the check for lzo INCOMPAT by using the new common function introduced to support remounting with lzo compression. Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org --- fs/btrfs/ioctl.c |7 +-- 1 files changed, 1 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 17facea..d5fd69e 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1042,11 +1042,9 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, u64 newer_than, unsigned long max_to_defrag) { struct btrfs_root *root = BTRFS_I(inode)-root; - struct btrfs_super_block *disk_super; struct file_ra_state *ra = NULL; unsigned long last_index; u64 isize = i_size_read(inode); - u64 features; u64 last_len = 0; u64 skip = 0; u64 defrag_end = 0; @@ -1233,11 +1231,8 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, mutex_unlock(inode-i_mutex); } - disk_super = root-fs_info-super_copy; - features = btrfs_super_incompat_flags(disk_super); if (range-compress_type == BTRFS_COMPRESS_LZO) { - features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO; - btrfs_set_super_incompat_flags(disk_super, features); + btrfs_chk_lzo_incompat(root); } ret = defrag_count; -- 1.7.8.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/1] LZO INCOMPAT Checking
The following patch is against Josef's btrfs-next repository, and depends on Arnd Hannemann's patch: Btrfs: allow mount -o remount,compress=no The method was based on a previous example of checking for lzo INCOMPAT used by Li Zefan when defragmenting with explicit compression (btrfs: Allow to specify compress method when defrag) in ioctl.c. Based on feedback on IRC, the two patch version presented in the previous version has been consolidated into a single patch, and the helper function was converted to a static inline function. Mitch Harder (1): Btrfs: Check INCOMPAT flags on remount and add helper function fs/btrfs/ctree.h | 13 + fs/btrfs/ioctl.c |7 +-- fs/btrfs/super.c |1 + 3 files changed, 15 insertions(+), 6 deletions(-) -- 1.7.8.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/1] Btrfs: Check INCOMPAT flags on remount and add helper function
In support of the recently added capability to remount with lzo compression, provide a helper function to check the compression INCOMPAT flags when remounting with lzo compression, and set the flags if necessary. Also, implement the new helper function when defragmenting with explicit lzo compression. Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org --- v1-v2 - Remove extraneous formatting change. v2-v3 - Consolidate into a single patch - Convert helper function to a static inline function. fs/btrfs/ctree.h | 13 + fs/btrfs/ioctl.c |7 +-- fs/btrfs/super.c |1 + 3 files changed, 15 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a0ee2f8..3a1a700 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3103,6 +3103,19 @@ void __btrfs_abort_transaction(struct btrfs_trans_handle *trans, struct btrfs_root *root, const char *function, unsigned int line, int errno); +static inline void btrfs_chk_lzo_incompat(struct btrfs_root *root) +{ + struct btrfs_super_block *disk_super; + u64 features; + + disk_super = root-fs_info-super_copy; + features = btrfs_super_incompat_flags(disk_super); + if (!(features BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO)) { + features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO; + btrfs_set_super_incompat_flags(disk_super, features); + } +} + #define btrfs_abort_transaction(trans, root, errno)\ do { \ __btrfs_abort_transaction(trans, root, __func__,\ diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 17facea..d5fd69e 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1042,11 +1042,9 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, u64 newer_than, unsigned long max_to_defrag) { struct btrfs_root *root = BTRFS_I(inode)-root; - struct btrfs_super_block *disk_super; struct file_ra_state *ra = NULL; unsigned long last_index; u64 isize = i_size_read(inode); - u64 features; u64 last_len = 0; u64 skip = 0; u64 defrag_end = 0; @@ -1233,11 +1231,8 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, mutex_unlock(inode-i_mutex); } - disk_super = root-fs_info-super_copy; - features = btrfs_super_incompat_flags(disk_super); if (range-compress_type == BTRFS_COMPRESS_LZO) { - features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO; - btrfs_set_super_incompat_flags(disk_super, features); + btrfs_chk_lzo_incompat(root); } ret = defrag_count; diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 26da344..32c2bd9 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -401,6 +401,7 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) compress_type = lzo; info-compress_type = BTRFS_COMPRESS_LZO; btrfs_set_opt(info-mount_opt, COMPRESS); + btrfs_chk_lzo_incompat(root); } else if (strncmp(args[0].from, no, 2) == 0) { compress_type = no; info-compress_type = BTRFS_COMPRESS_NONE; -- 1.7.8.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs: allow mount -o remount,compress=no
On Wed, Jul 18, 2012 at 8:28 PM, David Sterba d...@jikos.cz wrote: On Fri, Jul 13, 2012 at 10:19:14AM -0500, Mitch Harder wrote: I was testing the lz4(hc) patches, and I found the the compression INCOMPAT flags are not being updated using the method in this patch. The compression INCOMPAT flags are generally checked and updated in the open_ctree() function. But, on remount, open_ctree() is not called. This currently happens with lzo as well, right? Yes, this will happen with lzo as implemented in the patch at the head of this thread. My preference is to let remount succeed and set the incompat bit, possibly with a KERN_INFO message to syslog in case the bit is yet unseen by the volume. Great. I've put together a patch that does just that, and I've been testing it to make sure it works as intended. I'll finish it up and send it to the list tomorrow. This patch will only address the lzo INCOMPAT from the remount capabilities provided by the patch at the head of the thread. A similar modification will be needed for lz4 patches that allow for remount. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs: allow mount -o remount,compress=no
On Thu, Jun 28, 2012 at 10:40 AM, David Sterba d...@jikos.cz wrote: On Tue, Jun 26, 2012 at 08:48:37AM +0200, Arnd Hannemann wrote: How show should we proceed to get above mentioned patch (or the similar patch from Andrei Popa) merged? Josef picked the patch into btrfs-next, I see not problem to include it in next merge window patchset. I was testing the lz4(hc) patches, and I found the the compression INCOMPAT flags are not being updated using the method in this patch. The compression INCOMPAT flags are generally checked and updated in the open_ctree() function. But, on remount, open_ctree() is not called. I was going to test a patch to update the INCOMPAT flags similar to the way lzo INCOMPAT is updated when specifying the compress method in defragmentation. http://kerneltrap.org/mailarchive/linux-btrfs/2010/11/18/6886194 But, let me know if it is preferred to just return -EINVAL when trying to remount with a compression method that has an INCOMPAT not yet seen by that volume. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cannot remove files: rm gives no space left on device, 3.2.0-24, ubuntu
On Sun, Jun 17, 2012 at 3:04 AM, rupert THURNER rupert.thur...@gmail.com wrote: On Sun, Jun 17, 2012 at 7:19 AM, Andrei Popa ierd...@gmail.com wrote: On Sun, 2012-06-17 at 06:14 +0200, rupert THURNER wrote: Will result in anything reported in 'dmesg' output? [ 6431.514454] device label 388gb-data devid 1 transid 1086 /dev/sda6 [ 6431.514969] btrfs: disabling disk space caching [ 6431.514977] btrfs: force clearing of disk cache tried the same with kernel versions from http://kernel.ubuntu.com/~kernel-ppa/mainline/: * 3.2.20 * 3.4.0 with version 3.4.0, i could delete one tiny file, but only one. peter mentioned before to run the rm as root. yes, i did that, with all kernel versions, the error was the same all the time. Have you tried to delete the files with echo file ? This will empty the file without requiring a new metadata allocation. thanks for the hint! i did with the original kernel, but now i tried it as root and with the 3.4.0 kernel as well. no space left on device. is there a special kernel version or a special btrfs tool which allows to remove a file without writing more data? Have you tried mounting with '-o nodatacow' yet? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html