BTRFS - Write Barriers
Hi everyone :-) Just one question for the gurus here. I was wondering: if I disable write barriers in btrfs with the mount option nobarrier I just disable the periodic flushes of the hardware disk cache or I'm disabling also the order of the writes directed to the hard disk? What I mean is: is it safe to disable write barrier with a UPS with which I will likely have the hardware always powered even in the event of a kernel crash, freeze, etc? I'm asking because if also the ordering of the write is no more guaranteed I guess it would not be safe to disable write barrier even if the possibility of an unexpected power down of the HD was remote because in the case of a crash the order of the write would be messed up anyway and we could boot up with a completely corrupted fs. Thank you very much for your kinds answers. Warm Regards, Mario -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS - Write Barriers
fugazzi® posted on Thu, 31 Dec 2015 09:01:51 + as excerpted: > Just one question for the gurus here. > > I was wondering: if I disable write barriers in btrfs with the mount > option nobarrier I just disable the periodic flushes of the hardware > disk cache or I'm disabling also the order of the writes directed to the > hard disk? > > What I mean is: is it safe to disable write barrier with a UPS with > which I will likely have the hardware always powered even in the event > of a kernel crash, freeze, etc? > > I'm asking because if also the ordering of the write is no more > guaranteed I guess it would not be safe to disable write barrier even if > the possibility of an unexpected power down of the HD was remote because > in the case of a crash the order of the write would be messed up anyway > and we could boot up with a completely corrupted fs. >From the wiki mount options page: https://btrfs.wiki.kernel.org/index.php/Mount_options > nobarrier Do not use device barriers. NOTE: Using this option greatly increases the chances of you experiencing data corruption during a system crash or a power failure. This means full file-system corruption, and not just losing or corrupting data that was being written during the event. < IOW, use at your own risk. In theory you should be fine if you *NEVER* have a system crash (it's not just loss of power!), but as btrfs itself is still "stabilizING, not yet fully stable and mature", it can itself crash the system occasionally, and while that's normally OK as the atomic transaction nature of btrfs and the fact that it crashes or at minimum forces itself to read-only if it thinks something's wrong will normally save the filesystem from too much damage, if it happens to get into that state with nobarrier set, then all bets are off, because the normally atomic transactions that would either all be there or not be there at all are no onger atomic, and who knows /what/ sort of state you'll be in when you reboot. So while it's there for the foolish, and perhaps the slightly less foolish once the filesystem fully stabilizes and matures, right now, you really /are/ playing Russian roulette with your data if you turn it on. I'd personally recommend staying about as far away from it as you do the uninsulated live service mains coming into your building... on the SUPPLY side of the voltage stepdown transformer! And for those that don't do that, well, there's Darwin awards. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BTRFS File trace from superblock using memory dumps
Respected sir, I have been researching on how btrfs works and manages files for a long time. What I want to achieve here is to trace a path of a file starting from superblock, then to root and so on. The problem is I dont know how to do it. I have been using dd and hexdump but I dont know what to look for and where to look for. I have been able to see some fragments of superblock at 64K but I dont know how to use it to trace the tree of tree roots node or its object_id. Any help is appreciated. Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS - Write Barriers
Thanks Duncan for your answer. Yes, I read that part of the wiki and since I saw the "system crash" just wanted to be sure because always only the power loss part is mentioned regarding barriers by other fs such as for example XFS. The fact is that since the barrier stuff was removed from the elevator and put into the file system layer in the form of flush, FUA, etc I doubted that now it is no more safe to disable write barrier even if you have a battery backed HD hardware cache. It could be that since the fs is no more issuing flush command to the elevator (barrier disable case) and the elevator barrier code was removed now, it could reorder also that precious writes and in doing so destroying the atomic order of the fs and as a consequence destroying the integrity of it also in the case where no power loss is experienced by the HD hardware cache. That said it was my guessing that now it could be no more safe to disable write barriers in any filesystem regardless of its age and supposed stability. The interaction between the kernel disk scheduler and the filesystem without barriers enabled could be unpredictable because the disk scheduler would now be authorized to change the order of any write to the underling hardware and not only of those not connected with the write barriers. Do I understood it correctly? Regards. On Thursday, December 31, 2015 9:34:24 AM WET Duncan wrote: > fugazzi® posted on Thu, 31 Dec 2015 09:01:51 + as excerpted: > > Just one question for the gurus here. > > > > I was wondering: if I disable write barriers in btrfs with the mount > > option nobarrier I just disable the periodic flushes of the hardware > > disk cache or I'm disabling also the order of the writes directed to the > > hard disk? > > > > What I mean is: is it safe to disable write barrier with a UPS with > > which I will likely have the hardware always powered even in the event > > of a kernel crash, freeze, etc? > > > > I'm asking because if also the ordering of the write is no more > > guaranteed I guess it would not be safe to disable write barrier even if > > the possibility of an unexpected power down of the HD was remote because > > in the case of a crash the order of the write would be messed up anyway > > and we could boot up with a completely corrupted fs. > > From the wiki mount options page: > > https://btrfs.wiki.kernel.org/index.php/Mount_options > > > > nobarrier > > Do not use device barriers. NOTE: Using this option greatly increases the > chances of you experiencing data corruption during a system crash or a > power failure. This means full file-system corruption, and not just > losing or corrupting data that was being written during the event. > > < > > IOW, use at your own risk. > > In theory you should be fine if you *NEVER* have a system crash (it's not > just loss of power!), but as btrfs itself is still "stabilizING, not yet > fully stable and mature", it can itself crash the system occasionally, > and while that's normally OK as the atomic transaction nature of btrfs > and the fact that it crashes or at minimum forces itself to read-only if > it thinks something's wrong will normally save the filesystem from too > much damage, if it happens to get into that state with nobarrier set, > then all bets are off, because the normally atomic transactions that > would either all be there or not be there at all are no onger atomic, and > who knows /what/ sort of state you'll be in when you reboot. > > So while it's there for the foolish, and perhaps the slightly less > foolish once the filesystem fully stabilizes and matures, right now, you > really /are/ playing Russian roulette with your data if you turn it on. > > I'd personally recommend staying about as far away from it as you do the > uninsulated live service mains coming into your building... on the SUPPLY > side of the voltage stepdown transformer! And for those that don't do > that, well, there's Darwin awards. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS File trace from superblock using memory dumps
On Thu, Dec 31, 2015 at 10:20:17AM +, Ahtisham wani wrote: > Respected sir, > I have been researching on how btrfs works and manages files for a long > time. What I want to achieve here is to trace a path of a file starting > from superblock, then to root and so on. The problem is I dont know how to > do it. I have been using dd and hexdump but I dont know what to look for > and where to look for. I have been able to see some fragments of > superblock at 64K but I dont know how to use it to trace the tree of tree > roots node or its object_id. Any help is appreciated. Thanks Start here: https://btrfs.wiki.kernel.org/index.php/Data_Structures This will give you the basic high-level data structures. You can explore those data structures fairly easily using btrfs-debug-tree. After that, you'll mostly have to start reading the code a little. fs/btrfs/ctree.h in the kernel sources is the place to get all of the data structures you'll need. Those will tell you the layout of the data items. To get from the superblock to the diagram on the data structures page, the first thing you'll need to do is read the list of system chunks at the end of the superblock. Those chunks contain the chunk tree, which contains the mapping from physical device addresses to internal (virtual) addresses. Everything else is done in terms of those virtual addresses. Once you have the chunk tree, you can start using the other addresses in the superblock to find the tree of tree roots, and then follow that into the other trees (at which point, you can start using the data structures page). Hugo. -- Hugo Mills | Comic Sans goes into a bar, and the barman says, "We hugo@... carfax.org.uk | don't serve your type here." http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
subvolume capacity
Hello, I create a subvolume, I assign a quota on it, I mount the subvolume, the capacity reported for that mount point is not the assigned quota but the total btrfs filesystem capacity. I would rather expect it to report the assigned space (unless there is no quota assigned obviously), that would perfectly fit on a multi-tenant environment; also using btrfs for virtual machines datastores, etc. Otherwise it will be really confusing and sometimes problematic when exporting subvolumes through NFS/CIFS/etc. Best regards, Xavier Romero -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: quota rescan hangs
Xavier Romero posted on Thu, 31 Dec 2015 10:09:22 + as excerpted: > Using BTRFS on CentOS 7, I get the filesystem hung by running btrfs > quota rescan /mnt/btrfs/ > > After that I could not access to the filesystem anymore until system > restart. > Additional Info: > > [root@nex-dstrg-ctrl-1 ~]# modinfo btrfs > filename: > /lib/modules/3.10.0-327.3.1.el7.x86_64/kernel/fs/btrfs/btrfs.ko > btrfs-progs v3.19.1 > I'm just starting with BTRFS so I could be doing something wrong! Any > ideas? [OK, the below lays it on kinda thick. I'm not saying your choice of centos /or/ of btrfs is bad, only that if your interest is in something so old and stal^hble as centos with its 3.10 kernel, then you really should reconsider whether btrfs is an appropriate choice for you, because it seems to be a seriously bad mismatch. The answer to your actual question is below that discussion.] First thing wrong, here's a quote from the Stability Status section right on the front page of the btrfs wiki: https://btrfs.wiki.kernel.org/index.php/Main_Page > The Btrfs code base is under heavy development. Every effort is being made to keep it stable and fast. Due to the fast development speed, the state of development of the filesystem improves noticeably with every new Linux version, so it's recommended to run the most modern kernel possible. < And here's what the getting started page says: https://btrfs.wiki.kernel.org/index.php/Getting_started > btrfs is a fast-moving target. There are typically a great many bug fixes and enhancements between one kernel release and the next. Therefore: If you have btrfs filesystems, run the latest kernel. If you are running a kernel two or more versions behind the latest one available from kernel.org, the first thing you will be asked to when you report a problem is to upgrade to the latest kernel. Some distributions keep backports of recent kernels to earlier releases -- see the page below for details. Having the latest user-space tools are also useful, as they contain additional features and tools which may be of use in debugging or recovering your filesystem if something goes wrong. < Centos, running kernel 3.10, is anything *BUT* "the latest kernel". With five release cycles a year, 4.0 being 10 release cycles beyond 3.10, and 4.4 very near release, 3.10 is now nearing three years old! Further, btrfs didn't even have the experimental sticker peeled off until IIRC 3.12 or so, so that btrfs 3.10 isn't just nearly three years outdated, it's also still experimental! OK, so we know that the enterprise distros support btrfs and backport stuff, but only they know what they backported, while we're focused on the mainline kernel here on this list. So while the upstream btrfs and list recommendation is keep current, you're running what for all we know is a three year old experimental btrfs, with who knows what backports? If you want support for that, you really should be asking the distro that says they support it, not the upstream that says it's now ancient history from when the filesystem was still experimental. Meanwhile, from here, running the still under heavy development "stabilizing but not yet entirely stable or mature" btrfs, on an enterprise distro that runs years old versions... seems there's some sort of bad-match incompatibility there. If your emphasis is that old and stable, you really should reconsider whether the still under heavy development btrfs is an appropriate choice for you, or if a filesystem more suitably stable is more in keeping with your stability needs. One or the other would seem to be the wrong choice, as they're at rather opposite ends of the spectrum and don't work well together. OK, on to the specific question. Tho the devs have been and are working very hard on quotas, to date (4.3 release kernel) they've never worked entirely correctly or reliably in btrfs, and my recommendation has always been, if you're not working with the devs on the latest version to help test, find and fix problems, which if you are, thanks, then you either need quota functionality or you don't. Since quotas have never worked reliably in btrfs, if you need that functionality, you really need to be on a filesystem where it's much more stable and reliable than that function has been on btrfs. OTOH, if you don't need quota functionality, then I strongly recommend turning it off and leaving it off until at least two kernel cycles have gone by with it working with no stability- level issues. Tho I'm not a dev, only a btrfs user and list regular, and my own use- case doesn't need quotas, so given their problems I've kept them off, and I'm not actually sure what the 4.4 status is. However, even if there's no known problems with btrf quotas in 4.4, given the history, as I said above, I strongly recommend not enabling them until at least two complete kernel cycles have completed with no quota issues,
Re: RAID10 question
Hugo Mills posted on Thu, 31 Dec 2015 11:51:53 + as excerpted: > On Thu, Dec 31, 2015 at 09:52:16AM +, Xavier Romero wrote: >> Hello, >> >> I have 2 completely independent set of 12 disks each, let's name them >> A1, A2, A3... A12 for first set, and B1, B2, B3...B12 for second set. >> For availability purposes I want disks to be paired that way: >> A1 <--> B1: RAID1 A2 <--> B2: RAID1 ... >> A12 <--> B12: RAID1 >> >> And then I want a RAID0 out of all these RAID1. >> >> I know I can achieve that by doing all the RAID1 with MD and then build >> the RAID0 with BTRFS. But my question is: can I achieve that directly >> with BTRFS RAID10? > >No, not at the moment. Additionally, if you're going to put btrfs on mdraid, then you may wish to consider reversing the above, doing raid01, which while ordinarily discouraged in favor of raid10, has some things going for it when the top level is btrfs, that raid10 doesn't. The btrfs feature in question here is data and metadata checksumming and file integrity. Btrfs normally checksums all data and metadata and verifies checksums at read-time, but when there's only one copy, as is the case with btrfs single and raid0 modes, if there's a checksum verify failure, all it can do is report it and fail the read. If however, there's a second copy, as there is with btrfs raid1, then a checksum failure on the first copy will automatically failover to trying the second. Assuming the second copy is good, it will use that instead of failing the read, and btrfs scrub can be used to systematically scrub and detect (if single/raid0 mode) or repair (if raid1/10 mode and the other copy is good) the entire filesystem. Mdraid doesn't have that sort of integrity verification. All it does with raid1 scrub is check that the copies agree, and pick an arbitrary copy to replace the other one with if they don't. But for all it or you know, it can be replacing the good copy with the bad one, since it has no checksum verification to tell which is actually the good copy. If that sort of data integrity verification and repair is of interest to you, you obviously want btrfs raid1, not mdraid1. But btrfs, as the filesystem, must be the top layer. So while raid10 is normally preferred over raid01, in this case, you may want to do raid01, putting the btrfs raid1 on top of the mdraid0. Unfortunately that won't let you do a1 <-> b1, a2 <-> b2, etc. But it will let you do a[1-6] <-> b[1-6], if that's good enough for your use- case. IOW, you have to choose between btrfs raid1 with data integrity repair on top, with only two mdraid0's underneath, or btrfs raid0 with only data integrity detection, not repair, on top, and a bunch of mdraid1 that don't have data integrity at all, underneath. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/10] btrfs: reada: Avoid many times of empty loop
This is some cleanup, bugfix and enhance for reada, tested by a script running scrub and relatice trace log. Zhao Lei (10): btrfs: reada: Avoid many times of empty loop btrfs: reada: Move is_need_to_readahead contition earlier btrfs: reada: add all reachable mirrors into reada device list btrfs: reada: bypass adding extent when all zone failed btrfs: reada: Remove level argument in severial functions btrfs: reada: move reada_extent_put() to place after __readahead_hook() btrfs: reada: Pass reada_extent into __readahead_hook() directly btrfs: reada: Use fs_info instead of root in __readahead_hook's argument btrfs: reada: Jump into cleanup in direct way for __readahead_hook() btrfs: reada: Fix a debug code typo fs/btrfs/ctree.h | 4 +- fs/btrfs/disk-io.c | 22 +++ fs/btrfs/reada.c | 173 ++--- 3 files changed, 98 insertions(+), 101 deletions(-) -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/10] btrfs: reada: add all reachable mirrors into reada device list
If some device is not reachable, we should bypass and continus addingb next, instead of break on bad device. Signed-off-by: Zhao Lei--- fs/btrfs/reada.c | 20 +--- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c index dcc5b69..7733a09 100644 --- a/fs/btrfs/reada.c +++ b/fs/btrfs/reada.c @@ -328,7 +328,6 @@ static struct reada_extent *reada_find_extent(struct btrfs_root *root, u64 length; int real_stripes; int nzones = 0; - int i; unsigned long index = logical >> PAGE_CACHE_SHIFT; int dev_replace_is_ongoing; @@ -380,9 +379,9 @@ static struct reada_extent *reada_find_extent(struct btrfs_root *root, dev = bbio->stripes[nzones].dev; zone = reada_find_zone(fs_info, dev, logical, bbio); if (!zone) - break; + continue; - re->zones[nzones] = zone; + re->zones[re->nzones++] = zone; spin_lock(>lock); if (!zone->elems) kref_get(>refcnt); @@ -392,8 +391,7 @@ static struct reada_extent *reada_find_extent(struct btrfs_root *root, kref_put(>refcnt, reada_zone_release); spin_unlock(_info->reada_lock); } - re->nzones = nzones; - if (nzones == 0) { + if (re->nzones == 0) { /* not a single zone found, error and out */ goto error; } @@ -418,8 +416,9 @@ static struct reada_extent *reada_find_extent(struct btrfs_root *root, prev_dev = NULL; dev_replace_is_ongoing = btrfs_dev_replace_is_ongoing( _info->dev_replace); - for (i = 0; i < nzones; ++i) { - dev = bbio->stripes[i].dev; + for (nzones = 0; nzones < re->nzones; ++nzones) { + dev = re->zones[nzones]->device; + if (dev == prev_dev) { /* * in case of DUP, just add the first zone. As both @@ -450,8 +449,8 @@ static struct reada_extent *reada_find_extent(struct btrfs_root *root, prev_dev = dev; ret = radix_tree_insert(>reada_extents, index, re); if (ret) { - while (--i >= 0) { - dev = bbio->stripes[i].dev; + while (--nzones >= 0) { + dev = re->zones[nzones]->device; BUG_ON(dev == NULL); /* ignore whether the entry was inserted */ radix_tree_delete(>reada_extents, index); @@ -470,10 +469,9 @@ static struct reada_extent *reada_find_extent(struct btrfs_root *root, return re; error: - while (nzones) { + for (nzones = 0; nzones < re->nzones; ++nzones) { struct reada_zone *zone; - --nzones; zone = re->zones[nzones]; kref_get(>refcnt); spin_lock(>lock); -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/10] btrfs: reada: Jump into cleanup in direct way for __readahead_hook()
Current code set nritems to 0 to make for_loop useless to bypass it, and set generation's value which is not necessary. Jump into cleanup directly is better choise. Signed-off-by: Zhao Lei--- fs/btrfs/reada.c | 40 +--- 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c index 869bb1c..902f899 100644 --- a/fs/btrfs/reada.c +++ b/fs/btrfs/reada.c @@ -130,26 +130,26 @@ static void __readahead_hook(struct btrfs_fs_info *fs_info, re->scheduled_for = NULL; spin_unlock(>lock); - if (err == 0) { - nritems = level ? btrfs_header_nritems(eb) : 0; - generation = btrfs_header_generation(eb); - /* -* FIXME: currently we just set nritems to 0 if this is a leaf, -* effectively ignoring the content. In a next step we could -* trigger more readahead depending from the content, e.g. -* fetch the checksums for the extents in the leaf. -*/ - } else { - /* -* this is the error case, the extent buffer has not been -* read correctly. We won't access anything from it and -* just cleanup our data structures. Effectively this will -* cut the branch below this node from read ahead. -*/ - nritems = 0; - generation = 0; - } + /* +* this is the error case, the extent buffer has not been +* read correctly. We won't access anything from it and +* just cleanup our data structures. Effectively this will +* cut the branch below this node from read ahead. +*/ + if (err) + goto cleanup; + /* +* FIXME: currently we just set nritems to 0 if this is a leaf, +* effectively ignoring the content. In a next step we could +* trigger more readahead depending from the content, e.g. +* fetch the checksums for the extents in the leaf. +*/ + if (!level) + goto cleanup; + + nritems = btrfs_header_nritems(eb); + generation = btrfs_header_generation(eb); for (i = 0; i < nritems; i++) { struct reada_extctl *rec; u64 n_gen; @@ -188,6 +188,8 @@ static void __readahead_hook(struct btrfs_fs_info *fs_info, reada_add_block(rc, bytenr, _key, n_gen); } } + +cleanup: /* * free extctl records */ -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/10] btrfs: reada: Remove level argument in severial functions
level is not used in severial functions, remove them from arguments, and remove relative code for get its value. Signed-off-by: Zhao Lei--- fs/btrfs/reada.c | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c index ef9457e..66409f3 100644 --- a/fs/btrfs/reada.c +++ b/fs/btrfs/reada.c @@ -101,7 +101,7 @@ static void reada_start_machine(struct btrfs_fs_info *fs_info); static void __reada_start_machine(struct btrfs_fs_info *fs_info); static int reada_add_block(struct reada_control *rc, u64 logical, - struct btrfs_key *top, int level, u64 generation); + struct btrfs_key *top, u64 generation); /* recurses */ /* in case of err, eb might be NULL */ @@ -197,8 +197,7 @@ static int __readahead_hook(struct btrfs_root *root, struct extent_buffer *eb, if (rec->generation == generation && btrfs_comp_cpu_keys(, >key_end) < 0 && btrfs_comp_cpu_keys(_key, >key_start) > 0) - reada_add_block(rc, bytenr, _key, - level - 1, n_gen); + reada_add_block(rc, bytenr, _key, n_gen); } } /* @@ -315,7 +314,7 @@ static struct reada_zone *reada_find_zone(struct btrfs_fs_info *fs_info, static struct reada_extent *reada_find_extent(struct btrfs_root *root, u64 logical, - struct btrfs_key *top, int level) + struct btrfs_key *top) { int ret; struct reada_extent *re = NULL; @@ -562,13 +561,13 @@ static void reada_control_release(struct kref *kref) } static int reada_add_block(struct reada_control *rc, u64 logical, - struct btrfs_key *top, int level, u64 generation) + struct btrfs_key *top, u64 generation) { struct btrfs_root *root = rc->root; struct reada_extent *re; struct reada_extctl *rec; - re = reada_find_extent(root, logical, top, level); /* takes one ref */ + re = reada_find_extent(root, logical, top); /* takes one ref */ if (!re) return -1; @@ -921,7 +920,6 @@ struct reada_control *btrfs_reada_add(struct btrfs_root *root, struct reada_control *rc; u64 start; u64 generation; - int level; int ret; struct extent_buffer *node; static struct btrfs_key max_key = { @@ -944,11 +942,10 @@ struct reada_control *btrfs_reada_add(struct btrfs_root *root, node = btrfs_root_node(root); start = node->start; - level = btrfs_header_level(node); generation = btrfs_header_generation(node); free_extent_buffer(node); - ret = reada_add_block(rc, start, _key, level, generation); + ret = reada_add_block(rc, start, _key, generation); if (ret) { kfree(rc); return ERR_PTR(ret); -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/10] btrfs: reada: Move is_need_to_readahead contition earlier
Move is_need_to_readahead contition earlier to avoid useless loop to get relative data for readahead. Signed-off-by: Zhao Lei--- fs/btrfs/reada.c | 20 +--- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c index fb21bf0..dcc5b69 100644 --- a/fs/btrfs/reada.c +++ b/fs/btrfs/reada.c @@ -665,7 +665,6 @@ static int reada_start_machine_dev(struct btrfs_fs_info *fs_info, u64 logical; int ret; int i; - int need_kick = 0; spin_lock(_info->reada_lock); if (dev->reada_curr_zone == NULL) { @@ -701,6 +700,15 @@ static int reada_start_machine_dev(struct btrfs_fs_info *fs_info, spin_unlock(_info->reada_lock); + spin_lock(>lock); + if (re->scheduled_for || list_empty(>extctl)) { + spin_unlock(>lock); + reada_extent_put(fs_info, re); + return 0; + } + re->scheduled_for = dev; + spin_unlock(>lock); + /* * find mirror num */ @@ -712,18 +720,8 @@ static int reada_start_machine_dev(struct btrfs_fs_info *fs_info, } logical = re->logical; - spin_lock(>lock); - if (!re->scheduled_for && !list_empty(>extctl)) { - re->scheduled_for = dev; - need_kick = 1; - } - spin_unlock(>lock); - reada_extent_put(fs_info, re); - if (!need_kick) - return 0; - atomic_inc(>reada_in_flight); ret = reada_tree_block_flagged(fs_info->extent_root, logical, mirror_num, ); -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RAID10 question
Hello, I have 2 completely independent set of 12 disks each, let's name them A1, A2, A3... A12 for first set, and B1, B2, B3...B12 for second set. For availability purposes I want disks to be paired that way: A1 <--> B1: RAID1 A2 <--> B2: RAID1 ... A12 <--> B12: RAID1 And then I want a RAID0 out of all these RAID1. I know I can achieve that by doing all the RAID1 with MD and then build the RAID0 with BTRFS. But my question is: can I achieve that directly with BTRFS RAID10? Best regards, Xavier Romero -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: quota rescan hangs
After restarting and removing all data, "qgroup show " tells me that a rescan is in progress, but "quota rescan -s" tells me the opposite. [root@nex-dstrg-ctrl-1 btrfs]# btrfs quota rescan -w /mnt/btrfs/ quota rescan started [root@nex-dstrg-ctrl-1 btrfs]# btrfs quota rescan -s /mnt/btrfs/ no rescan operation in progress [root@nex-dstrg-ctrl-1 btrfs]# btrfs qgroup show -pcre /mnt/btrfs/ WARNING: Rescan is running, qgroup data may be incorrect qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16.00KiB 16.00KiB0.00B0.00B --- --- 0/389 0.00B0.00B0.00B0.00B --- --- 0/391 0.00B0.00B 10.00GiB0.00B --- --- 0/525 9.99GiB0.00B0.00B0.00B --- --- 0/59916.00KiB 16.00KiB 2.00GiB0.00B --- --- [root@nex-dstrg-ctrl-1 btrfs]# btrfs sub list /mnt/btrfs/ ID 599 gen 10754 top level 5 path CLOUD_SSD_02 Not sure how to proceed ! -Mensaje original- De: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-ow...@vger.kernel.org] En nombre de Xavier Romero Enviado el: dijous, 31 de desembre de 2015 11:09 Para: linux-btrfs@vger.kernel.org Asunto: quota rescan hangs Hello, Using BTRFS on CentOS 7, I get the filesystem hung by running btrfs quota rescan /mnt/btrfs/ After that I could not access to the filesystem anymore until system restart. I did the scan because BTRFS suggested it: [root@nex-dstrg-ctrl-1 btrfs]# btrfs qgroup show -F /mnt/btrfs/ WARNING: Qgroup data inconsistent, rescan recommended qgroupid rfer excl 0/5 27.91GiB 27.91GiB [root@nex-dstrg-ctrl-1 ~]# btrfs qgroup show -pcre /mnt/btrfs/ WARNING: Qgroup data inconsistent, rescan recommended qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 27.91GiB 27.91GiB0.00B0.00B --- --- 0/389 9.77GiB 9.77GiB0.00B0.00B --- --- 0/39116.00KiB 16.00KiB 10.00GiB0.00B --- --- 0/525 9.99GiB0.00B0.00B0.00B --- --- 0/59916.00KiB 16.00KiB 2.00GiB0.00B --- --- dmesg output: [154385.628385] INFO: task btrfs:22324 blocked for more than 120 seconds. [154385.629044] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [154385.629648] btrfs D 881fc3bdef20 0 22324 3677 0x0080 [154385.629652] 88114d01bc60 0082 880e2fecae00 88114d01bfd8 [154385.629656] 88114d01bfd8 88114d01bfd8 880e2fecae00 881fc3be1100 [154385.629658] 883fc826c9f0 883fc826c9f0 0001 881fc3bdef20 [154385.629661] Call Trace: [154385.629673] [] schedule+0x29/0x70 [154385.629690] [] wait_current_trans.isra.20+0xe7/0x130 [btrfs] [154385.629696] [] ? wake_up_atomic_t+0x30/0x30 [154385.629704] [] start_transaction+0x2b8/0x5a0 [btrfs] [154385.629711] [] btrfs_join_transaction+0x17/0x20 [btrfs] [154385.629722] [] btrfs_qgroup_rescan+0x39/0x90 [btrfs] [154385.629731] [] btrfs_ioctl+0x20b2/0x2b70 [btrfs] [154385.629736] [] ? __mem_cgroup_commit_charge+0x152/0x390 [154385.629740] [] ? lru_cache_add+0xe/0x10 [154385.629745] [] ? page_add_new_anon_rmap+0x91/0x130 [154385.629750] [] ? handle_mm_fault+0x7c0/0xf50 [154385.629752] [] ? __vma_link_rb+0xb8/0xe0 [154385.629759] [] do_vfs_ioctl+0x2e5/0x4c0 [154385.629765] [] ? file_has_perm+0xae/0xc0 [154385.629769] [] ? __do_page_fault+0xb1/0x450 [154385.629771] [] SyS_ioctl+0xa1/0xc0 [154385.629776] [] system_call_fastpath+0x16/0x1b Additional Info: [root@nex-dstrg-ctrl-1 ~]# modinfo btrfs filename: /lib/modules/3.10.0-327.3.1.el7.x86_64/kernel/fs/btrfs/btrfs.ko license:GPL alias: devname:btrfs-control alias: char-major-10-234 alias: fs-btrfs rhelversion:7.2 srcversion: B92059408E7CB90AE2D9A2F depends:raid6_pq,xor,zlib_deflate intree: Y vermagic: 3.10.0-327.3.1.el7.x86_64 SMP mod_unload modversions signer: CentOS Linux kernel signing key sig_key:3D:4E:71:B0:42:9A:39:8B:8B:78:3B:6F:8B:ED:3B:AF:09:9E:E9:A7 sig_hashalgo: sha256 [root@nex-dstrg-ctrl-1 ~]# btrfs filesystem show /mnt/btrfs/ Label: none uuid: 6289b7b3-ba25-4c9e-af95-4a5fb18eeea1 Total devices 12 FS bytes used 29.86GiB devid1 size 894.13GiB used 4.52GiB path /dev/md101 devid2 size 894.13GiB used 4.52GiB path /dev/md102 devid3 size 894.13GiB used 4.52GiB path /dev/md103 devid4 size 894.13GiB used 4.52GiB path /dev/md104 devid5 size 894.13GiB used 4.52GiB path /dev/md105 devid6 size 894.13GiB used 4.52GiB path
quota rescan hangs
Hello, Using BTRFS on CentOS 7, I get the filesystem hung by running btrfs quota rescan /mnt/btrfs/ After that I could not access to the filesystem anymore until system restart. I did the scan because BTRFS suggested it: [root@nex-dstrg-ctrl-1 btrfs]# btrfs qgroup show -F /mnt/btrfs/ WARNING: Qgroup data inconsistent, rescan recommended qgroupid rfer excl 0/5 27.91GiB 27.91GiB [root@nex-dstrg-ctrl-1 ~]# btrfs qgroup show -pcre /mnt/btrfs/ WARNING: Qgroup data inconsistent, rescan recommended qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 27.91GiB 27.91GiB0.00B0.00B --- --- 0/389 9.77GiB 9.77GiB0.00B0.00B --- --- 0/39116.00KiB 16.00KiB 10.00GiB0.00B --- --- 0/525 9.99GiB0.00B0.00B0.00B --- --- 0/59916.00KiB 16.00KiB 2.00GiB0.00B --- --- dmesg output: [154385.628385] INFO: task btrfs:22324 blocked for more than 120 seconds. [154385.629044] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [154385.629648] btrfs D 881fc3bdef20 0 22324 3677 0x0080 [154385.629652] 88114d01bc60 0082 880e2fecae00 88114d01bfd8 [154385.629656] 88114d01bfd8 88114d01bfd8 880e2fecae00 881fc3be1100 [154385.629658] 883fc826c9f0 883fc826c9f0 0001 881fc3bdef20 [154385.629661] Call Trace: [154385.629673] [] schedule+0x29/0x70 [154385.629690] [] wait_current_trans.isra.20+0xe7/0x130 [btrfs] [154385.629696] [] ? wake_up_atomic_t+0x30/0x30 [154385.629704] [] start_transaction+0x2b8/0x5a0 [btrfs] [154385.629711] [] btrfs_join_transaction+0x17/0x20 [btrfs] [154385.629722] [] btrfs_qgroup_rescan+0x39/0x90 [btrfs] [154385.629731] [] btrfs_ioctl+0x20b2/0x2b70 [btrfs] [154385.629736] [] ? __mem_cgroup_commit_charge+0x152/0x390 [154385.629740] [] ? lru_cache_add+0xe/0x10 [154385.629745] [] ? page_add_new_anon_rmap+0x91/0x130 [154385.629750] [] ? handle_mm_fault+0x7c0/0xf50 [154385.629752] [] ? __vma_link_rb+0xb8/0xe0 [154385.629759] [] do_vfs_ioctl+0x2e5/0x4c0 [154385.629765] [] ? file_has_perm+0xae/0xc0 [154385.629769] [] ? __do_page_fault+0xb1/0x450 [154385.629771] [] SyS_ioctl+0xa1/0xc0 [154385.629776] [] system_call_fastpath+0x16/0x1b Additional Info: [root@nex-dstrg-ctrl-1 ~]# modinfo btrfs filename: /lib/modules/3.10.0-327.3.1.el7.x86_64/kernel/fs/btrfs/btrfs.ko license:GPL alias: devname:btrfs-control alias: char-major-10-234 alias: fs-btrfs rhelversion:7.2 srcversion: B92059408E7CB90AE2D9A2F depends:raid6_pq,xor,zlib_deflate intree: Y vermagic: 3.10.0-327.3.1.el7.x86_64 SMP mod_unload modversions signer: CentOS Linux kernel signing key sig_key:3D:4E:71:B0:42:9A:39:8B:8B:78:3B:6F:8B:ED:3B:AF:09:9E:E9:A7 sig_hashalgo: sha256 [root@nex-dstrg-ctrl-1 ~]# btrfs filesystem show /mnt/btrfs/ Label: none uuid: 6289b7b3-ba25-4c9e-af95-4a5fb18eeea1 Total devices 12 FS bytes used 29.86GiB devid1 size 894.13GiB used 4.52GiB path /dev/md101 devid2 size 894.13GiB used 4.52GiB path /dev/md102 devid3 size 894.13GiB used 4.52GiB path /dev/md103 devid4 size 894.13GiB used 4.52GiB path /dev/md104 devid5 size 894.13GiB used 4.52GiB path /dev/md105 devid6 size 894.13GiB used 4.52GiB path /dev/md106 devid7 size 894.13GiB used 4.52GiB path /dev/md107 devid8 size 894.13GiB used 4.52GiB path /dev/md108 devid9 size 894.13GiB used 4.52GiB path /dev/md109 devid 10 size 894.13GiB used 4.52GiB path /dev/md110 devid 11 size 894.13GiB used 4.52GiB path /dev/md111 devid 12 size 894.13GiB used 4.52GiB path /dev/md112 btrfs-progs v3.19.1 [root@nex-dstrg-ctrl-1 ~]# btrfs quota rescan -s /mnt/btrfs/ rescan operation running (current key 0) [root@nex-dstrg-ctrl-1 ~]# btrfs subvolume list /mnt/btrfs/ ID 389 gen 10663 top level 5 path vol0 ID 391 gen 10514 top level 5 path vol1 ID 599 gen 10664 top level 5 path CLOUD_SSD_02 I'm just starting with BTRFS so I could be doing something wrong! Any ideas? Best regards, Xavier Romero -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID10 question
On Thu, Dec 31, 2015 at 09:52:16AM +, Xavier Romero wrote: > Hello, > > I have 2 completely independent set of 12 disks each, let's name them A1, A2, > A3... A12 for first set, and B1, B2, B3...B12 for second set. For > availability purposes I want disks to be paired that way: > A1 <--> B1: RAID1 > A2 <--> B2: RAID1 > ... > A12 <--> B12: RAID1 > > And then I want a RAID0 out of all these RAID1. > > I know I can achieve that by doing all the RAID1 with MD and then build the > RAID0 with BTRFS. But my question is: can I achieve that directly with BTRFS > RAID10? No, not at the moment. Hugo. -- Hugo Mills | Comic Sans goes into a bar, and the barman says, "We hugo@... carfax.org.uk | don't serve your type here." http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
[PATCH 07/10] btrfs: reada: Pass reada_extent into __readahead_hook directly
reada_start_machine_dev() already have reada_extent pointer, pass it into __readahead_hook() directly instead of search radix_tree will make code run faster. Signed-off-by: Zhao Lei--- fs/btrfs/reada.c | 45 - 1 file changed, 24 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c index 7015906..7668066 100644 --- a/fs/btrfs/reada.c +++ b/fs/btrfs/reada.c @@ -105,33 +105,21 @@ static int reada_add_block(struct reada_control *rc, u64 logical, /* recurses */ /* in case of err, eb might be NULL */ -static int __readahead_hook(struct btrfs_root *root, struct extent_buffer *eb, - u64 start, int err) +static void __readahead_hook(struct btrfs_root *root, struct reada_extent *re, +struct extent_buffer *eb, u64 start, int err) { int level = 0; int nritems; int i; u64 bytenr; u64 generation; - struct reada_extent *re; struct btrfs_fs_info *fs_info = root->fs_info; struct list_head list; - unsigned long index = start >> PAGE_CACHE_SHIFT; struct btrfs_device *for_dev; if (eb) level = btrfs_header_level(eb); - /* find extent */ - spin_lock(_info->reada_lock); - re = radix_tree_lookup(_info->reada_tree, index); - if (re) - re->refcnt++; - spin_unlock(_info->reada_lock); - - if (!re) - return -1; - spin_lock(>lock); /* * just take the full list from the extent. afterwards we @@ -221,11 +209,11 @@ static int __readahead_hook(struct btrfs_root *root, struct extent_buffer *eb, reada_extent_put(fs_info, re); /* one ref for each entry */ } - reada_extent_put(fs_info, re); /* our ref */ + if (for_dev) atomic_dec(_dev->reada_in_flight); - return 0; + return; } /* @@ -235,12 +223,27 @@ static int __readahead_hook(struct btrfs_root *root, struct extent_buffer *eb, int btree_readahead_hook(struct btrfs_root *root, struct extent_buffer *eb, u64 start, int err) { - int ret; + int ret = 0; + struct reada_extent *re; + struct btrfs_fs_info *fs_info = root->fs_info; - ret = __readahead_hook(root, eb, start, err); + /* find extent */ + spin_lock(_info->reada_lock); + re = radix_tree_lookup(_info->reada_tree, + start >> PAGE_CACHE_SHIFT); + if (re) + re->refcnt++; + spin_unlock(_info->reada_lock); + if (!re) { + ret = -1; + goto start_machine; + } - reada_start_machine(root->fs_info); + __readahead_hook(fs_info, re, eb, start, err); + reada_extent_put(fs_info, re); /* our ref */ +start_machine: + reada_start_machine(fs_info); return ret; } @@ -726,9 +729,9 @@ static int reada_start_machine_dev(struct btrfs_fs_info *fs_info, ret = reada_tree_block_flagged(fs_info->extent_root, logical, mirror_num, ); if (ret) - __readahead_hook(fs_info->extent_root, NULL, logical, ret); + __readahead_hook(fs_info->extent_root, re, NULL, logical, ret); else if (eb) - __readahead_hook(fs_info->extent_root, eb, eb->start, ret); + __readahead_hook(fs_info->extent_root, re, eb, eb->start, ret); if (eb) free_extent_buffer(eb); -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/10] btrfs: reada: bypass adding extent when all zone failed
When failed adding all dev_zones for a reada_extent, the extent will have no chance to be selected to run, and keep in memory for ever. We should bypass this extent to avoid above case. Signed-off-by: Zhao Lei--- fs/btrfs/reada.c | 5 + 1 file changed, 5 insertions(+) diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c index 7733a09..ef9457e 100644 --- a/fs/btrfs/reada.c +++ b/fs/btrfs/reada.c @@ -330,6 +330,7 @@ static struct reada_extent *reada_find_extent(struct btrfs_root *root, int nzones = 0; unsigned long index = logical >> PAGE_CACHE_SHIFT; int dev_replace_is_ongoing; + int have_zone = 0; spin_lock(_info->reada_lock); re = radix_tree_lookup(_info->reada_tree, index); @@ -461,10 +462,14 @@ static struct reada_extent *reada_find_extent(struct btrfs_root *root, btrfs_dev_replace_unlock(_info->dev_replace); goto error; } + have_zone = 1; } spin_unlock(_info->reada_lock); btrfs_dev_replace_unlock(_info->dev_replace); + if (!have_zone) + goto error; + btrfs_put_bbio(bbio); return re; -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/10] btrfs: reada: Avoid many times of empty loop
We can see following loop(1 times) in trace_log: [ 75.416137] ZL_DEBUG: reada_start_machine_dev:730: pid=771 comm=kworker/u2:3 re->ref_cnt 88003741e0c0 1 -> 2 [ 75.417413] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = 88003741e0c0, refcnt = 2 -> 1 [ 75.418611] ZL_DEBUG: __readahead_hook:129: pid=771 comm=kworker/u2:3 re->ref_cnt 88003741e0c0 1 -> 2 [ 75.419793] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = 88003741e0c0, refcnt = 2 -> 1 [ 75.421016] ZL_DEBUG: reada_start_machine_dev:730: pid=771 comm=kworker/u2:3 re->ref_cnt 88003741e0c0 1 -> 2 [ 75.422324] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = 88003741e0c0, refcnt = 2 -> 1 [ 75.423661] ZL_DEBUG: __readahead_hook:129: pid=771 comm=kworker/u2:3 re->ref_cnt 88003741e0c0 1 -> 2 [ 75.424882] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = 88003741e0c0, refcnt = 2 -> 1 ...(1 times) [ 124.101672] ZL_DEBUG: reada_start_machine_dev:730: pid=771 comm=kworker/u2:3 re->ref_cnt 88003741e0c0 1 -> 2 [ 124.102850] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = 88003741e0c0, refcnt = 2 -> 1 [ 124.104008] ZL_DEBUG: __readahead_hook:129: pid=771 comm=kworker/u2:3 re->ref_cnt 88003741e0c0 1 -> 2 [ 124.105121] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = 88003741e0c0, refcnt = 2 -> 1 Reason: If more than one user trigger reada in same extent, the first task finished setting of reada data struct and call reada_start_machine() to start, and the second task only add a ref_count but have not add reada_extctl struct completely, the reada_extent can not finished all jobs, and will be selected in __reada_start_machine() for 1 times(total times in __reada_start_machine()). Fix: For a reada_extent without job, we don't need to run it, just return 0 to let caller break. Signed-off-by: Zhao Lei--- fs/btrfs/reada.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c index c65b42f..fb21bf0 100644 --- a/fs/btrfs/reada.c +++ b/fs/btrfs/reada.c @@ -713,7 +713,7 @@ static int reada_start_machine_dev(struct btrfs_fs_info *fs_info, logical = re->logical; spin_lock(>lock); - if (re->scheduled_for == NULL) { + if (!re->scheduled_for && !list_empty(>extctl)) { re->scheduled_for = dev; need_kick = 1; } -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/10] btrfs: reada: move reada_extent_put to place after __readahead_hook()
We can't release reada_extent earlier than __readahead_hook(), because __readahead_hook() still need to use it, it is necessary to hode a refcnt to avoid it be freed. Actually it is not a problem after my patch named: Avoid many times of empty loop It make reada_extent in above line include at least one reada_extctl, which keeps additional one refcnt for reada_extent. But we still need this patch to make the code in pretty logic. Signed-off-by: Zhao Lei--- fs/btrfs/reada.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c index 66409f3..7015906 100644 --- a/fs/btrfs/reada.c +++ b/fs/btrfs/reada.c @@ -722,8 +722,6 @@ static int reada_start_machine_dev(struct btrfs_fs_info *fs_info, } logical = re->logical; - reada_extent_put(fs_info, re); - atomic_inc(>reada_in_flight); ret = reada_tree_block_flagged(fs_info->extent_root, logical, mirror_num, ); @@ -735,6 +733,8 @@ static int reada_start_machine_dev(struct btrfs_fs_info *fs_info, if (eb) free_extent_buffer(eb); + reada_extent_put(fs_info, re); + return 1; } -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/10] btrfs: reada: Use fs_info instead of root in __readahead_hook's argument
What __readahead_hook() need exactly is fs_info, no need to convert fs_info to root in caller and convert back in __readahead_hook() Signed-off-by: Zhao Lei--- fs/btrfs/ctree.h | 4 ++-- fs/btrfs/disk-io.c | 22 +++--- fs/btrfs/reada.c | 23 +++ 3 files changed, 24 insertions(+), 25 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 54e7b0d..0912f89 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -4358,8 +4358,8 @@ struct reada_control *btrfs_reada_add(struct btrfs_root *root, struct btrfs_key *start, struct btrfs_key *end); int btrfs_reada_wait(void *handle); void btrfs_reada_detach(void *handle); -int btree_readahead_hook(struct btrfs_root *root, struct extent_buffer *eb, -u64 start, int err); +int btree_readahead_hook(struct btrfs_fs_info *fs_info, +struct extent_buffer *eb, u64 start, int err); static inline int is_fstree(u64 rootid) { diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 974be09..9d120e4 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -604,6 +604,7 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio, int found_level; struct extent_buffer *eb; struct btrfs_root *root = BTRFS_I(page->mapping->host)->root; + struct btrfs_fs_info *fs_info = root->fs_info; int ret = 0; int reads_done; @@ -629,21 +630,21 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio, found_start = btrfs_header_bytenr(eb); if (found_start != eb->start) { - btrfs_err_rl(eb->fs_info, "bad tree block start %llu %llu", - found_start, eb->start); + btrfs_err_rl(fs_info, "bad tree block start %llu %llu", +found_start, eb->start); ret = -EIO; goto err; } - if (check_tree_block_fsid(root->fs_info, eb)) { - btrfs_err_rl(eb->fs_info, "bad fsid on block %llu", - eb->start); + if (check_tree_block_fsid(fs_info, eb)) { + btrfs_err_rl(fs_info, "bad fsid on block %llu", +eb->start); ret = -EIO; goto err; } found_level = btrfs_header_level(eb); if (found_level >= BTRFS_MAX_LEVEL) { - btrfs_err(root->fs_info, "bad tree block level %d", - (int)btrfs_header_level(eb)); + btrfs_err(fs_info, "bad tree block level %d", + (int)btrfs_header_level(eb)); ret = -EIO; goto err; } @@ -651,7 +652,7 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio, btrfs_set_buffer_lockdep_class(btrfs_header_owner(eb), eb, found_level); - ret = csum_tree_block(root->fs_info, eb, 1); + ret = csum_tree_block(fs_info, eb, 1); if (ret) { ret = -EIO; goto err; @@ -672,7 +673,7 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio, err: if (reads_done && test_and_clear_bit(EXTENT_BUFFER_READAHEAD, >bflags)) - btree_readahead_hook(root, eb, eb->start, ret); + btree_readahead_hook(fs_info, eb, eb->start, ret); if (ret) { /* @@ -691,14 +692,13 @@ out: static int btree_io_failed_hook(struct page *page, int failed_mirror) { struct extent_buffer *eb; - struct btrfs_root *root = BTRFS_I(page->mapping->host)->root; eb = (struct extent_buffer *)page->private; set_bit(EXTENT_BUFFER_READ_ERR, >bflags); eb->read_mirror = failed_mirror; atomic_dec(>io_pages); if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, >bflags)) - btree_readahead_hook(root, eb, eb->start, -EIO); + btree_readahead_hook(eb->fs_info, eb, eb->start, -EIO); return -EIO;/* we fixed nothing */ } diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c index 7668066..869bb1c 100644 --- a/fs/btrfs/reada.c +++ b/fs/btrfs/reada.c @@ -105,15 +105,15 @@ static int reada_add_block(struct reada_control *rc, u64 logical, /* recurses */ /* in case of err, eb might be NULL */ -static void __readahead_hook(struct btrfs_root *root, struct reada_extent *re, -struct extent_buffer *eb, u64 start, int err) +static void __readahead_hook(struct btrfs_fs_info *fs_info, +struct reada_extent *re, struct extent_buffer *eb, +u64 start, int err) { int level = 0; int nritems; int i; u64 bytenr; u64 generation; - struct btrfs_fs_info *fs_info = root->fs_info; struct list_head list; struct btrfs_device *for_dev; @@
[PATCH 10/10] btrfs: reada: Fix a debug code typo
Remove one copy of loop to fix the typo of iterate zones. Signed-off-by: Zhao Lei--- fs/btrfs/reada.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c index 902f899..53ee7b1 100644 --- a/fs/btrfs/reada.c +++ b/fs/btrfs/reada.c @@ -898,14 +898,9 @@ static void dump_devs(struct btrfs_fs_info *fs_info, int all) printk(KERN_CONT " zone %llu-%llu devs", re->zones[i]->start, re->zones[i]->end); - for (i = 0; i < re->nzones; ++i) { - printk(KERN_CONT " zone %llu-%llu devs", - re->zones[i]->start, - re->zones[i]->end); - for (j = 0; j < re->zones[i]->ndevs; ++j) { - printk(KERN_CONT " %lld", - re->zones[i]->devs[j]->devid); - } + for (j = 0; j < re->zones[i]->ndevs; ++j) { + printk(KERN_CONT " %lld", + re->zones[i]->devs[j]->devid); } } printk(KERN_CONT "\n"); -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at fs/btrfs/send.c:1482
On Thu, Dec 31, 2015 at 4:27 PM, Stephen R. van den Bergwrote: > Stephen R. van den Berg wrote: >>I'm running 4.4.0-rc7. >>This exact problem was present on 4.0.5 and 4.3.3 too though. > >>I do a "btrfs send /var/lib/lxc/template64/rootfs", that generates >>the following error consistently at the same file, over and over again: > >>Dec 29 14:49:04 argo kernel: kernel BUG at fs/btrfs/send.c:1482! > > Ok, found part of the solution. > The kernel bug was being triggered by symbolic links in that > subvolume that have an empty target. It is unknown how > these ever ended up on that partition. Well they can happen due a crash or snapshotting at very specific points in time when an error happens while creating a symlink at least. I've sent a change for send that makes it not BUG_ON() but instead fail with an EIO error and print a message do dmesg/syslog telling that an empty symlink exists: https://patchwork.kernel.org/patch/7936741/ As for fixing the (very) rare cases where we end up creating empty symlinks, it's not trivial to fix. Thanks. > > The partitions have been created using regular btrfs. > The only strange thing that might have happened, is that I ran duperemove > over those partitions afterward. > -- > Stephen. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs scrub failing
Hi, I run a weekly scrub, using Marc Merlin's btrfs-scrub script. Usually, it completes without a problem, but this week it failed. I ran the scrub manually & it stops shortly: john@mariposa:~$ sudo /sbin/btrfs scrub start -BdR /dev/md124p2 ERROR: scrubbing /dev/md124p2 failed for device id 1: ret=-1, errno=5 (Input/output error) scrub device /dev/md124p2 (id 1) canceled scrub started at Thu Dec 31 00:26:34 2015 and was aborted after 00:01:29 data_extents_scrubbed: 110967 tree_extents_scrubbed: 99638 data_bytes_scrubbed: 2548817920 tree_bytes_scrubbed: 1632468992 read_errors: 0 csum_errors: 0 verify_errors: 0 no_csum: 1573 csum_discards: 74371 super_errors: 0 malloc_errors: 0 uncorrectable_errors: 0 unverified_errors: 0 corrected_errors: 0 last_physical: 4729667584 john@mariposa:~$ sudo /sbin/btrfs scrub status /dev/md124p2 scrub status for 9b5a6959-7df1-4455-a643-d369487d24aa scrub started at Thu Dec 31 00:29:06 2015, running for 00:01:15 total bytes scrubbed: 3.46GiB with 0 errors My Ubuntu 14.04 workstation is using the 4.2 kernel (Wily). I'm using btrfs-tools v4.3.1. Btrfs is on top of mdadm raid1 (imsm). Autodefrag is enabled. Both drives have checked out ok on smart tests. Some directories are set up with nodatacow for VMs, etc. john@mariposa:~$ sudo btrfs fi show Label: none uuid: 9b5a6959-7df1-4455-a643-d369487d24aa Total devices 1 FS bytes used 961.46GiB devid1 size 1.76TiB used 978.04GiB path /dev/md124p2 Funny thing is, if the scrub hadn't failed, I wouldn't know there were any problems! I've rebooted twice since the original scrub that failed w/o a problem. I've backed up all my files to an ext4 partition, again w/o a problem. I've been searching for a clue on the wiki, mailing list, etc. on how to fix this, but I'm at a loss. From what I read, I shouldn't be able to boot my workstation. How should I go about repairing this? Any help would be greatly appreciated. Thanks. -John BTW, I did run btrfs-find-root at one point & got the following: john@mariposa:~$ sudo btrfs-find-root /dev/md124p2 Superblock thinks the generation is 1031315 Superblock thinks the level is 1 Found tree root at 1039015591936 gen 1031315 level 1 Well block 1039013101568(gen: 1031314 level: 1) seems good, but generation/level doesn't match, want gen: 1031315 level: 1 Well block 1039003533312(gen: 1031313 level: 1) seems good, but generation/level doesn't match, want gen: 1031315 level: 1 Well block 1039006171136(gen: 1031311 level: 0) seems good, but generation/level doesn't match, want gen: 1031315 level: 1 ... 500+ lines skipped... Well block 519183810560(gen: 163422 level: 0) seems good, but generation/level doesn't match, want gen: 1031315 level: 1 Well block 143915876352(gen: 38834 level: 0) seems good, but generation/level doesn't match, want gen: 1031315 level: 1 Well block 4243456(gen: 3 level: 0) seems good, but generation/level doesn't match, want gen: 1031315 level: 1 Well block 4194304(gen: 2 level: 0) seems good, but generation/level doesn't match, want gen: 1031315 level: 1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs fail behavior when a device vanishes
This is a torture test, no data is at risk. Two devices, btrfs raid1 with some stuff on them. Copy from that array, elsewhere. During copy, yank the active device. dmesg shows many of these: [ 7179.373245] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 652123, rd 697237, flush 0, corrupt 0, gen 0 Why are the write errors nearly as high as the read errors, when there is only a copy from this device happening? Is Btrfs trying to write the read error count (for dev stats) of sdc1 onto sdc1, and that causes a write error? Also, is there a command to make a block device go away? At least in gnome shell when I eject a USB stick, it isn't just umounted, it no longer appears with lsblk or blkid, so I'm wondering if there's a way to vanish a misbehaving device so that Btrfs isn't bogged down with a flood of retries. In case anyone is curious, the entire dmesg from device insertion, formatting, mounting, copying to then from, and device yanking is here (should be permanent): http://pastebin.com/raw/Wfe1pY4N And the copy did successfully complete anyway, and the resulting files have the same hashes as their originals. So, yay, despite the noisy messages. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kernel BUG at fs/btrfs/send.c:1482
I'm running 4.4.0-rc7. This exact problem was present on 4.0.5 and 4.3.3 too though. I do a "btrfs send /var/lib/lxc/template64/rootfs", that generates the following error consistently at the same file, over and over again: Dec 29 14:49:04 argo kernel: kernel BUG at fs/btrfs/send.c:1482! Dec 29 14:49:04 argo kernel: Modules linked in: nfsd Dec 29 14:49:04 argo kernel: task: 880041295c40 ti: 88010423c000 task.ti: 88010423c000 Dec 29 14:49:04 argo kernel: RSP: 0018:88010423fb20 EFLAGS: 00010202 Dec 29 14:49:04 argo kernel: RDX: 0001 RSI: RDI: Dec 29 14:49:04 argo kernel: R10: 88019d53b5e0 R11: R12: 8801b35ac510 Dec 29 14:49:04 argo kernel: FS: 7fac9113f8c0() GS:88022fd8() knlGS: Dec 29 14:49:04 argo kernel: CR2: 7f99ba308520 CR3: 000154a4 CR4: 001006e0 Dec 29 14:49:04 argo kernel: 81ed 0009b15a a1ff Dec 29 14:49:04 argo kernel: 0009b15a 1a03 8801baf2e800 Dec 29 14:49:04 argo kernel: [] send_create_inode_if_needed+0x30/0x49 Dec 29 14:49:04 argo kernel: [] ? btrfs_item_key+0x19/0x1b Dec 29 14:49:04 argo kernel: [] btrfs_compare_trees+0x2f2/0x4fe Dec 29 14:49:04 argo kernel: [] btrfs_ioctl_send+0x846/0xce5 Dec 29 14:49:04 argo kernel: [] ? try_to_freeze_unsafe+0x9/0x32 Dec 29 14:49:04 argo kernel: [] ? _raw_spin_lock_irq+0xf/0x11 Dec 29 14:49:04 argo kernel: [] ? ptrace_do_notify+0x84/0x95 Dec 29 14:49:04 argo kernel: [] SyS_ioctl+0x43/0x61 Dec 29 14:49:04 argo kernel: RIP [] send_create_inode+0x1ce/0x30d On the receiving end, I have a "btrfs receive" which takes the above stream as input, and *always* reports this: receiving snapshot 20151230-141324.1451484804.965085668@argo uuid=53df0616-5715-ad40-ae81-78a023860fe0, ctransid=649684 parent_ uuid=d3f807da-1e9d-aa4d-ab01-77ce5e2fbcd7, parent_ctransid=649735 utimes rename bin -> o257-379784-0 mkdir o257-34888-0 rename o257-34888-0 -> bin utimes chown bin - uid=0, gid=0 chmod bin - mode=0755 utimes bin rmdir boot ERROR: rmdir boot failed. No such file or directory mkdir o258-34888-0 rename o258-34888-0 -> boot utimes chown boot - uid=0, gid=0 chmod boot - mode=0755 utimes boot rename dev -> o259-379784-0 mkdir o259-34888-0 rename o259-34888-0 -> dev ... rest of the logging follows as normal... ... then we get ... rmdir media mkdir o264-34888-0 rename o264-34888-0 -> media utimes chown media - uid=0, gid=0 chmod media - mode=0755 utimes media rmdir mnt ERROR: rmdir mnt failed. No such file or directory rmdir opt mkdir o266-34888-0 rename o266-34888-0 -> opt utimes ... continues as normal ... It then still creates lots of files, until it encounters the sudden EOF due to the sending side experiencing the kernel bug and abruptly halting the send. Since the problem is consistently and easily reproducible, I can immediately try any proposed patches or fixes (or provide more insight into the subvolume this problem occurs with). Numerous other subvolumes in the same BTRFS partition work flawlessly using btrfs send/receive. The sending partition is RAID0 with two 512GB SSD drives. The receiving partition is RAID1 with 6 6TB HDD drives. -- Stephen. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at fs/btrfs/send.c:1482
Stephen R. van den Berg wrote: >I'm running 4.4.0-rc7. >This exact problem was present on 4.0.5 and 4.3.3 too though. >I do a "btrfs send /var/lib/lxc/template64/rootfs", that generates >the following error consistently at the same file, over and over again: >Dec 29 14:49:04 argo kernel: kernel BUG at fs/btrfs/send.c:1482! Ok, found part of the solution. The kernel bug was being triggered by symbolic links in that subvolume that have an empty target. It is unknown how these ever ended up on that partition. The partitions have been created using regular btrfs. The only strange thing that might have happened, is that I ran duperemove over those partitions afterward. -- Stephen. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix number of transaction units required to create symlink
From: Filipe MananaWe weren't accounting for the insertion of an inline extent item for the symlink inode nor that we need to update the parent inode item (through the call to btrfs_add_nondir()). So fix this by including two more transaction units. Signed-off-by: Filipe Manana --- fs/btrfs/inode.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 2ea2e0e..5dbc07a 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9660,9 +9660,11 @@ static int btrfs_symlink(struct inode *dir, struct dentry *dentry, /* * 2 items for inode item and ref * 2 items for dir items +* 1 item for updating parent inode item +* 1 item for the inline extent item * 1 item for xattr if selinux is on */ - trans = btrfs_start_transaction(root, 5); + trans = btrfs_start_transaction(root, 7); if (IS_ERR(trans)) return PTR_ERR(trans); -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: don't leave dangling dentry if symlink creation failed
From: Filipe MananaWhen we are creating a symlink we might fail with an error after we created its inode and added the corresponding directory indexes to its parent inode. In this case we end up never removing the directory indexes because the inode eviction handler, called for our symlink inode on the final iput(), only removes items associated with the symlink inode and not with the parent inode. Example: $ mkfs.btrfs -f /dev/sdi $ mount /dev/sdi /mnt $ touch /mnt/foo $ ln -s /mnt/foo /mnt/bar ln: failed to create symbolic link ‘bar’: Cannot allocate memory $ umount /mnt $ btrfsck /dev/sdi Checking filesystem on /dev/sdi UUID: d5acb5ba-31bd-42da-b456-89dca2e716e1 checking extents checking free space cache checking fs roots root 5 inode 258 errors 2001, no inode item, link count wrong unresolved ref dir 256 index 3 namelen 3 name bar filetype 7 errors 4, no inode ref found 131073 bytes used err is 1 total csum bytes: 0 total tree bytes: 131072 total fs tree bytes: 32768 total extent tree bytes: 16384 btree space waste bytes: 124305 file data blocks allocated: 262144 referenced 262144 btrfs-progs v4.2.3 So fix this by adding the directory index entries as the very last step of symlink creation. Signed-off-by: Filipe Manana --- fs/btrfs/inode.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index bdb0008..2ea2e0e 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9693,10 +9693,6 @@ static int btrfs_symlink(struct inode *dir, struct dentry *dentry, if (err) goto out_unlock_inode; - err = btrfs_add_nondir(trans, dir, dentry, inode, 0, index); - if (err) - goto out_unlock_inode; - path = btrfs_alloc_path(); if (!path) { err = -ENOMEM; @@ -9733,6 +9729,13 @@ static int btrfs_symlink(struct inode *dir, struct dentry *dentry, inode_set_bytes(inode, name_len); btrfs_i_size_write(inode, name_len); err = btrfs_update_inode(trans, root, inode); + /* +* Last step, add directory indexes for our symlink inode. This is the +* last step to avoid extra cleanup of these indexes if an error happens +* elsewhere above. +*/ + if (!err) + err = btrfs_add_nondir(trans, dir, dentry, inode, 0, index); if (err) { drop_inode = 1; goto out_unlock_inode; -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: send, don't BUG_ON() when an empty symlink is found
From: Filipe MananaWhen a symlink is successfully created it always has an inline extent containing the source path. However if an error happens when creating the symlink, we can leave in the subvolume's tree a symlink inode without any such inline extent item - this happens if after btrfs_symlink() calls btrfs_end_transaction() and before it calls the inode eviction handler (through the final iput() call), the transaction gets committed and a crash happens before the eviction handler gets called, or if a snapshot of the subvolume is made before the eviction handler gets called. Sadly we can't just avoid this by making btrfs_symlink() call btrfs_end_transaction() after it calls the eviction handler, because the later can commit the current transaction before it removes any items from the subvolume tree (if it encounters ENOSPC errors while reserving space for removing all the items). So make send fail more gracefully, with an -EIO error, and print a message to dmesg/syslog informing that there's an empty symlink inode, so that the user can delete the empty symlink or do something else about it. Reported-by: Stephen R. van den Berg Signed-off-by: Filipe Manana --- fs/btrfs/send.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 355a458..63a6152 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -1469,7 +1469,21 @@ static int read_symlink(struct btrfs_root *root, ret = btrfs_search_slot(NULL, root, , path, 0, 0); if (ret < 0) goto out; - BUG_ON(ret); + if (ret) { + /* +* An empty symlink inode. Can happen in rare error paths when +* creating a symlink (transaction committed before the inode +* eviction handler removed the symlink inode items and a crash +* happened in between or the subvol was snapshoted in between). +* Print an informative message to dmesg/syslog so that the user +* can delete the symlink. +*/ + btrfs_err(root->fs_info, + "Found empty symlink inode %llu at root %llu", + ino, root->root_key.objectid); + ret = -EIO; + goto out; + } ei = btrfs_item_ptr(path->nodes[0], path->slots[0], struct btrfs_file_extent_item); -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at fs/btrfs/send.c:1482
On Thu, 2015-12-31 at 18:29 +, Filipe Manana wrote: > As for fixing the (very) rare cases where we end up creating empty > symlinks, it's not trivial to fix. Would it be reasonable to have btrfs-check list such broken symlinks? Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: btrfs fail behavior when a device vanishes
Here is a kludge I hacked up. Someone that cares could clean this up and start building a proper test suite or something. This test script creates a 3 disk raid1 filesystem and very slowly writes a large file onto the filesystem while, one by one each disk is disconnected then reconnected in a loop. It is fairly trivial to trigger dataloss when devices are bounced like this. You have to run the script as root due to the calls to [u]mount and iscsiadm On Thu, Dec 31, 2015 at 1:23 PM, ronnie sahlbergwrote: > On Thu, Dec 31, 2015 at 12:11 PM, Chris Murphy > wrote: >> This is a torture test, no data is at risk. >> >> Two devices, btrfs raid1 with some stuff on them. >> Copy from that array, elsewhere. >> During copy, yank the active device. >> >> dmesg shows many of these: >> >> [ 7179.373245] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr >> 652123, rd 697237, flush 0, corrupt 0, gen 0 > > For automated tests a good way could be to build a multi device btrfs > filesystem > ontop of it. > For example STGT exporting n# volumes and then mount via the loopback > interface. > Then you could just use tgtadm to add / remove the device in a > controlled fashion and to any filesystem it will look exactly like if > you pulled the device physically. > > This allows you to run fully automated and scripted "how long before > the filesystem goes into total dataloss mode" tests. > > > > If you want more fine control than just plug/unplug on a live > filesystem , you can use > https://github.com/rsahlberg/flaky-stgt > Again, this uses iSCSI but it allows you to script event such as > "this range of blocks are now Uncorrectable read error" etc. > To automatically stress test that the filesystem can deal with it. > > > I created this STGT fork so that filesystem testers would have a way > to automate testing of their failure paths. > In particular for BTRFS which seems to still be incredible fragile > when devices fail or disconnect. > > Unfortunately I don't think anyone cared very much. :-( > Please BTRFS devs, please use something like this for testing of > failure modes and robustness. Please! > > > >> >> Why are the write errors nearly as high as the read errors, when there >> is only a copy from this device happening? >> >> Is Btrfs trying to write the read error count (for dev stats) of sdc1 >> onto sdc1, and that causes a write error? >> >> Also, is there a command to make a block device go away? At least in >> gnome shell when I eject a USB stick, it isn't just umounted, it no >> longer appears with lsblk or blkid, so I'm wondering if there's a way >> to vanish a misbehaving device so that Btrfs isn't bogged down with a >> flood of retries. >> >> In case anyone is curious, the entire dmesg from device insertion, >> formatting, mounting, copying to then from, and device yanking is here >> (should be permanent): >> http://pastebin.com/raw/Wfe1pY4N >> >> And the copy did successfully complete anyway, and the resulting files >> have the same hashes as their originals. So, yay, despite the noisy >> messages. >> >> >> -- >> Chris Murphy >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html test_0100_write_raid1_unplug.sh Description: Bourne shell script functions.sh Description: Bourne shell script
Re: btrfs fail behavior when a device vanishes
On Thu, Dec 31, 2015 at 6:09 PM, ronnie sahlbergwrote: > Here is a kludge I hacked up. > Someone that cares could clean this up and start building a proper > test suite or something. > > This test script creates a 3 disk raid1 filesystem and very slowly > writes a large file onto the filesystem while, one by one each disk is > disconnected then reconnected in a loop. > It is fairly trivial to trigger dataloss when devices are bounced like this. Yes, it's quite a torture test. I'd expect this would be a problem for Btrfs until this feature is done at least: https://btrfs.wiki.kernel.org/index.php/Project_ideas#Take_device_with_heavy_IO_errors_offline_or_mark_as_.22unreliable.22 And maybe this one too https://btrfs.wiki.kernel.org/index.php/Project_ideas#False_alarm_on_bad_disk_-_rebuild_mitigation Already we know that Btrfs tries to write indefinitely to missing devices. If it reappears, what gets written? Will that device be consistent? And then another one goes missing, comes back, now possibly two devices with totally different states for identical generations. It's a mess. We know that trivially causes major corruption with btrfs raid1 if a user mounts e.g. devid1 rw,degraded modifies that; then mounts devid2 (only) rw,degraded and modifies it; and then mounts both devids together. Kablewy. Big mess. And that's umounting each one in between those steps; not even the abrupt disconnect/reconnect. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs fail behavior when a device vanishes
On Thu, Dec 31, 2015 at 1:24 PM, Hugo Millswrote: > On Thu, Dec 31, 2015 at 01:11:25PM -0700, Chris Murphy wrote: >> This is a torture test, no data is at risk. >> >> Two devices, btrfs raid1 with some stuff on them. >> Copy from that array, elsewhere. >> During copy, yank the active device. >> >> dmesg shows many of these: >> >> [ 7179.373245] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr >> 652123, rd 697237, flush 0, corrupt 0, gen 0 >> >> Why are the write errors nearly as high as the read errors, when there >> is only a copy from this device happening? > >I'm guessing completely here, but maybe it's trying to write > corrected data to sdc1, because the original read failed? > Egads. OK that makes sense. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs fail behavior when a device vanishes
On Thu, Dec 31, 2015 at 01:11:25PM -0700, Chris Murphy wrote: > This is a torture test, no data is at risk. > > Two devices, btrfs raid1 with some stuff on them. > Copy from that array, elsewhere. > During copy, yank the active device. > > dmesg shows many of these: > > [ 7179.373245] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr > 652123, rd 697237, flush 0, corrupt 0, gen 0 > > Why are the write errors nearly as high as the read errors, when there > is only a copy from this device happening? I'm guessing completely here, but maybe it's trying to write corrected data to sdc1, because the original read failed? Hugo. > Is Btrfs trying to write the read error count (for dev stats) of sdc1 > onto sdc1, and that causes a write error? > > Also, is there a command to make a block device go away? At least in > gnome shell when I eject a USB stick, it isn't just umounted, it no > longer appears with lsblk or blkid, so I'm wondering if there's a way > to vanish a misbehaving device so that Btrfs isn't bogged down with a > flood of retries. > > In case anyone is curious, the entire dmesg from device insertion, > formatting, mounting, copying to then from, and device yanking is here > (should be permanent): > http://pastebin.com/raw/Wfe1pY4N > > And the copy did successfully complete anyway, and the resulting files > have the same hashes as their originals. So, yay, despite the noisy > messages. > > -- Hugo Mills | Well, sir, the floor is yours. But remember, the hugo@... carfax.org.uk | roof is ours! http://carfax.org.uk/ | PGP: E2AB1DE4 | The Goons signature.asc Description: Digital signature
btrfs send clone use case
I haven't previously heard of this use case for -c option. It seems to work (no errors or fs weirdness afterward). The gist: send a snapshot from drive 1 to drive 2; rw snapshot of the drive 2 copy, and then make changes to it, then make an ro snapshot; now send it back to drive 1 *as an incremental* send. [dated subvolumes are ro, undated ones are rw] # btrfs send /brick1/chrishome-20151128 | btrfs receive /brick2 # btrfs sub snap /brick2/chrishome-20151128 /brick2/chrishome ## make some modifications to chrishome contents # btrfs sub snap -r /brick2/chrishome /brick2/chrishome-20151230 # btrfs send -p /brick2/chrishome-20151128 chrishome-20151230 | btrfs receive /brick1 ERROR: check if we support uuid tree fails - Operation not permitted At subvol chrishome:20151230/ However, # btrfs send -p /brick2/chrishome-20151128 -c /brick2/chrishome-20151128 chrishome-20151230 | btrfs receive /brick1 works. And it's fast (it's ~100G so I'd know if it weren't sending an increment). chrishome-20151128 is obviously identical on both sides in this case; but I guess -c just acts to explicitly confirm this is true? The brick2/chrishome-20151128 has a Received UUID that matches the UUID of brick1/chrishome-20151128, so it seems their identical states should be known? Slightly confusing though: brick1/chrishome:20151230 (the one resulting from the successful -p -c command) has the same Parent UUID and Received UUID, which is the UUID for brick1/chrishome:20151128. That's not really its parent, since it's a received subvolume I'd expect this to be -, like it is for any other received subvolume (which doesn't really have a parent). Anyway it seems to be working. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs fail behavior when a device vanishes
On Thu, Dec 31, 2015 at 12:11 PM, Chris Murphywrote: > This is a torture test, no data is at risk. > > Two devices, btrfs raid1 with some stuff on them. > Copy from that array, elsewhere. > During copy, yank the active device. > > dmesg shows many of these: > > [ 7179.373245] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr > 652123, rd 697237, flush 0, corrupt 0, gen 0 For automated tests a good way could be to build a multi device btrfs filesystem ontop of it. For example STGT exporting n# volumes and then mount via the loopback interface. Then you could just use tgtadm to add / remove the device in a controlled fashion and to any filesystem it will look exactly like if you pulled the device physically. This allows you to run fully automated and scripted "how long before the filesystem goes into total dataloss mode" tests. If you want more fine control than just plug/unplug on a live filesystem , you can use https://github.com/rsahlberg/flaky-stgt Again, this uses iSCSI but it allows you to script event such as "this range of blocks are now Uncorrectable read error" etc. To automatically stress test that the filesystem can deal with it. I created this STGT fork so that filesystem testers would have a way to automate testing of their failure paths. In particular for BTRFS which seems to still be incredible fragile when devices fail or disconnect. Unfortunately I don't think anyone cared very much. :-( Please BTRFS devs, please use something like this for testing of failure modes and robustness. Please! > > Why are the write errors nearly as high as the read errors, when there > is only a copy from this device happening? > > Is Btrfs trying to write the read error count (for dev stats) of sdc1 > onto sdc1, and that causes a write error? > > Also, is there a command to make a block device go away? At least in > gnome shell when I eject a USB stick, it isn't just umounted, it no > longer appears with lsblk or blkid, so I'm wondering if there's a way > to vanish a misbehaving device so that Btrfs isn't bogged down with a > flood of retries. > > In case anyone is curious, the entire dmesg from device insertion, > formatting, mounting, copying to then from, and device yanking is here > (should be permanent): > http://pastebin.com/raw/Wfe1pY4N > > And the copy did successfully complete anyway, and the resulting files > have the same hashes as their originals. So, yay, despite the noisy > messages. > > > -- > Chris Murphy > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID10 question
On Thu, Dec 31, 2015 at 6:31 AM, Duncan <1i5t5.dun...@cox.net> wrote: > Additionally, if you're going to put btrfs on mdraid, then you may wish > to consider reversing the above, doing raid01, which while ordinarily > discouraged in favor of raid10, has some things going for it when the top > level is btrfs, that raid10 doesn't. Yes, although it's a fine line how to create such a large volume of so many drives. If you use many drives per raid0, when there is a failure it takes a long time to rebuild. If you use few drives per raid0, fast rebuild, but the exposure/risk with a 2nd failure is higher. e.g. two extremes: 12x raid0 "bank A" and 12x raid0 "bank B" If one drive dies, an entire bank is gone, and it's a long rebuild, but if a 2nd drive dies, nearly 50/50 chance it dies in the same already dead bank. 2x raid0 "bank A" and 2x raid0 "bank C" and through "bank L" If one drive dies in bank A, then A is gone, short rebuild time, but if a 2nd drive dies, almost certainly it will not be the 2nd bank A drive, meaning it's in another bank and that means the whole array is mortally wounded. Depending on what's missing and what needs to be accessed, it might work OK for seconds, minutes, or hours, and then totally implode. There's no way to predict it in advance. Anyway, I'd sooner go with 3x raid5, or 6x raid6, and then pool them with glusterfs. Even with a single node using replication only for the separate raid5 bricks is more reliable than a 24x raid10 no matter md+xfs or btrfs. That makes it effectively a raid 51. And if half the storage is put on another node, now you have power supply and some network redundancy too. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Unrecoverable fs corruption?
Hello, I had a power fail today at my home server and after the reboot the btrfs RAID1 won't come back up. When trying to mount one of the 2 disks of the array I get the following error: [ 4126.316396] BTRFS info (device sdb2): disk space caching is enabled [ 4126.316402] BTRFS: has skinny extents [ 4126.337324] BTRFS: failed to read chunk tree on sdb2 [ 4126.353027] BTRFS: open_ctree failed a btrfs check segfaults after a few seconds with the following message: (0:29)[root@hera]~ # ❯❯❯ btrfs check /dev/sdb2 warning devid 1 not found already bad key ordering 68 69 Checking filesystem on /dev/sdb2 UUID: d55fa866-3baa-4e73-bf3e-5fda29672df3 checking extents bad key ordering 68 69 bad block 6513625202688 Errors found in extent allocation tree or chunk allocation [1]11164 segmentation fault btrfs check /dev/sdb2 I have 2 btrfs-images (one with -w, one without) but they are 6.1G and 1.1G repectively, I don't know if I can upload them at all and also not where to store such large files. I did try a btrfs check --repair on one of the disks which gave the following result: enabling repair mode warning devid 1 not found already bad key ordering 68 69 repair mode will force to clear out log tree, Are you sure? [y/N]: y Unable to find block group for 0 extent-tree.c:289: find_search_start: Assertion `1` failed. btrfs[0x44161e] btrfs(btrfs_reserve_extent+0xa7b)[0x4463db] btrfs(btrfs_alloc_free_block+0x5f)[0x44649f] btrfs(__btrfs_cow_block+0xc4)[0x437d64] btrfs(btrfs_cow_block+0x35)[0x438365] btrfs[0x43d3d6] btrfs(btrfs_commit_transaction+0x95)[0x43f125] btrfs(cmd_check+0x5ec)[0x429cdc] btrfs(main+0x82)[0x40ef32] /usr/lib/libc.so.6(__libc_start_main+0xf0)[0x7f881f983610] btrfs(_start+0x29)[0x40f039] That's all I tried so far. btrfs restore -viD seems to find most of the files accessible but since I don't have a spare hdd of sufficient size I would have to break the array and reformat and use one of the disk as restore target. I'm not prepared to do this before I know there is no other way to fix the drives since I'm essentially destroying one more chance at saving the data. Is there anything I can do to get the fs out of this mess? -- Alexander Duscheleit -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs fail behavior when a device vanishes
On Thu, Dec 31, 2015 at 5:27 PM, Chris Murphywrote: > On Thu, Dec 31, 2015 at 6:09 PM, ronnie sahlberg > wrote: >> Here is a kludge I hacked up. >> Someone that cares could clean this up and start building a proper >> test suite or something. >> >> This test script creates a 3 disk raid1 filesystem and very slowly >> writes a large file onto the filesystem while, one by one each disk is >> disconnected then reconnected in a loop. >> It is fairly trivial to trigger dataloss when devices are bounced like this. > > Yes, it's quite a torture test. I'd expect this would be a problem for > Btrfs until this feature is done at least: > > https://btrfs.wiki.kernel.org/index.php/Project_ideas#Take_device_with_heavy_IO_errors_offline_or_mark_as_.22unreliable.22 > > And maybe this one too > https://btrfs.wiki.kernel.org/index.php/Project_ideas#False_alarm_on_bad_disk_-_rebuild_mitigation > > Already we know that Btrfs tries to write indefinitely to missing > devices. If it reappears, what gets written? Will that device be > consistent? And then another one goes missing, comes back, now > possibly two devices with totally different states for identical > generations. It's a mess. We know that trivially causes major > corruption with btrfs raid1 if a user mounts e.g. devid1 rw,degraded > modifies that; then mounts devid2 (only) rw,degraded and modifies it; > and then mounts both devids together. Kablewy. Big mess. And that's > umounting each one in between those steps; not even the abrupt > disconnect/reconnect. Based on my test_0100... create a test script for that scenario too. Even if btrfs can not handle it yet, it does not hurt to have these tests for scenarios that MUST work before the filesystem go officially "stable+production". Having these tests will possibly even make the work to close the robustness gap easier since the devs will have reproducible test scripts they can validate new features against. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs fail behavior when a device vanishes
On Thu, Dec 31, 2015 at 1:11 PM, Chris Murphywrote: > Also, is there a command to make a block device go away? Maybe? echo 1 > /sys/block/device-name/device/delete -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unrecoverable fs corruption?
On Thu, Dec 31, 2015 at 4:36 PM, Alexander Duscheleitwrote: > Hello, > > I had a power fail today at my home server and after the reboot the btrfs > RAID1 won't come back up. > > When trying to mount one of the 2 disks of the array I get the following > error: > [ 4126.316396] BTRFS info (device sdb2): disk space caching is enabled > [ 4126.316402] BTRFS: has skinny extents > [ 4126.337324] BTRFS: failed to read chunk tree on sdb2 > [ 4126.353027] BTRFS: open_ctree failed Why are you trying to mount only one? What mount options did you use when you did this? > > a btrfs check segfaults after a few seconds with the following message: > (0:29)[root@hera]~ # ❯❯❯ btrfs check /dev/sdb2 > warning devid 1 not found already > bad key ordering 68 69 > Checking filesystem on /dev/sdb2 > UUID: d55fa866-3baa-4e73-bf3e-5fda29672df3 > checking extents > bad key ordering 68 69 > bad block 6513625202688 > Errors found in extent allocation tree or chunk allocation > [1]11164 segmentation fault btrfs check /dev/sdb2 > > I have 2 btrfs-images (one with -w, one without) but they are 6.1G and 1.1G > repectively, I don't know > if I can upload them at all and also not where to store such large files. > > I did try a btrfs check --repair on one of the disks which gave the > following result: > enabling repair mode > warning devid 1 not found already > bad key ordering 68 69 > repair mode will force to clear out log tree, Are you sure? [y/N]: y > Unable to find block group for 0 > extent-tree.c:289: find_search_start: Assertion `1` failed. > btrfs[0x44161e] > btrfs(btrfs_reserve_extent+0xa7b)[0x4463db] > btrfs(btrfs_alloc_free_block+0x5f)[0x44649f] > btrfs(__btrfs_cow_block+0xc4)[0x437d64] > btrfs(btrfs_cow_block+0x35)[0x438365] > btrfs[0x43d3d6] > btrfs(btrfs_commit_transaction+0x95)[0x43f125] > btrfs(cmd_check+0x5ec)[0x429cdc] > btrfs(main+0x82)[0x40ef32] > /usr/lib/libc.so.6(__libc_start_main+0xf0)[0x7f881f983610] > btrfs(_start+0x29)[0x40f039] > > > That's all I tried so far. > btrfs restore -viD seems to find most of the files accessible but since I > don't have a spare hdd of > sufficient size I would have to break the array and reformat and use one of > the disk as restore > target. I'm not prepared to do this before I know there is no other way to > fix the drives since I'm > essentially destroying one more chance at saving the data. > > Is there anything I can do to get the fs out of this mess? I'm skeptical about the logic of using --repair, which modifies the filesystem, on just one device of a two device rai1, while saying you're reluctant to "break the array." It doesn't make sense to me to expect such modification on one of the drives, keeps it at all consistent with the other. I hope a dev can say whether --repair with a missing device is a bad idea, because if so maybe degraded repairs need a --force flag to help users from making things worse. Anyway, in the meantime, my advice is do not mount either device rw (together or separately). The less changes you make right now the better. What kernel and btrfs-progs version are you using? Did you try to mount with -o recovery, or -o ro,recovery before trying 'btrfs check --repair' ? If so, post all relevant kernel messages. Don't try -o recovery now if you haven't previously tried it; it's probably safe to try -o ro,recovery if you haven't tried that yet. I would try -o ro,recovery three ways: both devs, and each dev separately (for which you'll use -o ro,recovery,degraded). If that doesn't work, it sounds like it might be a task for 'btrfs rescue chunk-recover' which will take a long time. But I suggest waiting as long as possible for a reply, and in the meantime I suggest looking at getting another drive to use as spare so you can keep both of these drives. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs fail behavior when a device vanishes
On Thu, Dec 31, 2015 at 5:27 PM, Chris Murphywrote: > On Thu, Dec 31, 2015 at 6:09 PM, ronnie sahlberg > wrote: >> Here is a kludge I hacked up. >> Someone that cares could clean this up and start building a proper >> test suite or something. >> >> This test script creates a 3 disk raid1 filesystem and very slowly >> writes a large file onto the filesystem while, one by one each disk is >> disconnected then reconnected in a loop. >> It is fairly trivial to trigger dataloss when devices are bounced like this. > > Yes, it's quite a torture test. I'd expect this would be a problem for > Btrfs until this feature is done at least: > > https://btrfs.wiki.kernel.org/index.php/Project_ideas#Take_device_with_heavy_IO_errors_offline_or_mark_as_.22unreliable.22 > > And maybe this one too > https://btrfs.wiki.kernel.org/index.php/Project_ideas#False_alarm_on_bad_disk_-_rebuild_mitigation > > Already we know that Btrfs tries to write indefinitely to missing > devices. Another question is how it handles writes when the mirrorset becomes degraded that way. I would expect it would : * immediately emergency destage any dirty data in the write cache to the surviving member disks. * switch any future I/O to that mirrorset to use ordered and synchronous writes to the surviving members. > If it reappears, what gets written? Will that device be > consistent? And then another one goes missing, comes back, now > possibly two devices with totally different states for identical > generations. It's a mess. We know that trivially causes major > corruption with btrfs raid1 if a user mounts e.g. devid1 rw,degraded > modifies that; then mounts devid2 (only) rw,degraded and modifies it; > and then mounts both devids together. Kablewy. Big mess. And that's > umounting each one in between those steps; not even the abrupt > disconnect/reconnect. > > > -- > Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs scrub failing
John Center posted on Thu, 31 Dec 2015 11:20:28 -0500 as excerpted: > I run a weekly scrub, using Marc Merlin's btrfs-scrub script. > Usually, it completes without a problem, but this week it failed. I ran > the scrub manually & it stops shortly: > > john@mariposa:~$ sudo /sbin/btrfs scrub start -BdR /dev/md124p2 > ERROR: scrubbing /dev/md124p2 failed for device id 1: > ret=-1, errno=5 (Input/output error) > scrub device /dev/md124p2 (id 1) canceled > scrub started at Thu Dec 31 00:26:34 2015 > and was aborted after 00:01:29 [...] > My Ubuntu 14.04 workstation is using the 4.2 kernel (Wily). > I'm using btrfs-tools v4.3.1. [...] A couple months ago, which would have made it around the 4.2 kernel you're running (with 4.3 being current and 4.4 nearly out), there were a number of similar scrub aborted reports on the list. I don't recall seeing any directly related patches, but the reports died down, whether because everybody having them had reported already, or because a newer kernel fixed the problem, I'm not sure, as I never had the problem myself[1]. So I'd suggest upgrading to either the current 4.3 kernel or the latest 4.4-rc, and hopefully the problem will be gone. If I'd had the problem myself I could tell you for sure whether it went away for me with 4.3, but as I didn't... In any event, the 4.2 kernel wasn't a long-term-support kernel anyway. 4.1 was and 4.4 is scheduled to be. So upstream 4.2 updates should be ended or coming to an end in any case, and an upgrade (or possibly downgrade to the LTS 4.1) is recommended anyway, tho of course Ubuntu could be doing its own support, independent of upstream, but then you'd need to get support from them as upstream won't be tracking patches they've backported and thus can't provide proper support. --- [1] All my btrfs are relatively small, under 50 GiB per device, and on SSD, so scrubs normally complete in under a minute, while your report of scrub aborting after a minute and a half was typical, so it's likely my scrubs were simply done before whatever the problem was could trigger. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html