Re: Scrub priority, am I using it wrong?
Gareth Pye posted on Tue, 05 Apr 2016 13:45:11 +1000 as excerpted: > On Tue, Apr 5, 2016 at 12:37 PM, Duncan <1i5t5.dun...@cox.net> wrote: >> CPU bound, 0% IOWait even at idle IO priority, in addition to the >> hundreds of M/s values per thread/device, here. You OTOH are showing >> under 20 M/s per thread/device on spinning rust, with an IOWait near >> 90%, >> thus making it IO bound. > > > And yes I'd love to switch to SSD, but 12 2TB drives is a bit pricey > still No kidding. That's why my media partition remains spinning rust. (Tho FWIW, not btrfs, I use btrfs only on my ssds, and still use the old and stable reiserfs on my spinning rust.) But my media partition is small enough, and ssd prices now low enough up to the 1 TB level, that when I upgrade I'll probably switch to ssd for the media partition as well, and leave spinning rust only as second or third level backups. But that's because it all, including first level backups, fits in under a TB (and if pressed I could do it under a half TB). Multi-TB, as you have, definitely still spinning rust, for me too. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrub priority, am I using it wrong?
Gareth Pye posted on Tue, 05 Apr 2016 13:44:05 +1000 as excerpted: > On Tue, Apr 5, 2016 at 12:37 PM, Duncan <1i5t5.dun...@cox.net> wrote: >> 1) It appears btrfs scrub start's -c option only takes numeric class, >> so try -c3 instead of -c idle. > > > Does it count as a bug if it silently accepts the way I was doing it? > > I've switched to -c3 and at least now the idle class listed in iotop is > idle, so I hope that means it will be more friendly to other processes. I'd say yes, particularly given that the value must be the numeric class isn't documented in the manpage at all. Whether the bug is then one of documentation (say it must be numeric) or of implementation (take the class name as well) is then up for debate. I'd call fixing either one a fix. If it must be numeric, document that (and optionally change the implementation to error in some way if a numeric parameter isn't supplied for -c), else change the implementation so the name can be taken as well (and optionally change the documentation to explicitly mention that either one can be used). Doesn't matter to me which. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 00/27][For 4.7] Btrfs: Add inband (write time) de-duplication framework
Alex Lyakas wrote on 2016/04/03 10:22 +0200: Hello Qu, Wang, On Wed, Mar 30, 2016 at 2:34 AM, Qu Wenruo wrote: Alex Lyakas wrote on 2016/03/29 19:22 +0200: Greetings Qu Wenruo, I have reviewed the dedup patchset found in the github account you mentioned. I have several questions. Please note that by all means I am not criticizing your design or code. I just want to make sure that my understanding of the code is proper. It's OK to criticize the design or code, and that's how review works. 1) You mentioned in several emails that at some point byte-to-byte comparison is to be performed. However, I do not see this in the code. It seems that generic_search() only looks for the hash value match. If there is a match, it goes ahead and adds a delayed ref. I mentioned byte-to-byte comparison as, "not to be implemented in any time soon". Considering the lack of facility to read out extent contents without any inode structure, it's not going to be done in any time soon. 2) If btrfs_dedupe_search() does not find a match, we unlock the dedup mutex and proceed with the normal COW. What happens if there are several IO streams to different files writing an identical block, but we don't have such block in our dedup DB? Then all btrfs_dedupe_search() calls will not find a match, so all streams will allocate space for their block (which are all identical). At some point, they will call insert_reserved_file_extent() and will call btrfs_dedupe_add(). Since there is a global mutex, the first stream will insert the dedup hash entries into the DB, and all other streams will find that such hash entry already exists. So the end result is that we have the hash entry in the DB, but still we have multiple copies of the same block allocated, due to timing issues. Is this correct? That's right, and that's also unavoidable for the hash initializing stage. 3) generic_search() competes with __btrfs_free_extent(). Meaning that generic_search() wants to add a delayed ref to an existing extent, whereas __btrfs_free_extent() wants to delete an entry from the dedup DB. The race is resolved as follows: - generic_search attempts to lock the delayed ref head - if it succeeds to lock, then __btrfs_free_extent() is not running right now. So we can add a delayed ref. Later, when delayed ref head will be run, it will figure out what needs to be done (free the extent or not) - if we fail to lock, then there is a delayed ref processing for this bytenr. We drop all locks and redo the search from the top. If __btrfs_free_extent() has deleted the dedup hash meanwhile, we will not find it, and proceed with normal COW. Is my understanding correct? Yes that's correct. Reviewing the code again, it seems that I still lack understanding. What is special about the dedup code adding a delayed data ref versus other places doing that? In other places, we do not insist on locking the delayed ref head, but in dedup we do. For example, __btrfs_drop_extents calls btrfs_inc_extent_ref, without locking the ref head. I know that one of your purposes was to draw attention to delayed ref processing, so you have succeeded. In the patchset, the delayed_ref related part is not only to draw attention, it's to resolve problems. For example, there is a case where an extent has a ref in extent tree, while it's going to be freed, which means there is a DROP ref in delayed_refs: For extent A: Extent tree| Delayed refs 1 | -1 (Drop ref) While we call dedupe_del() only at __btrfs_free_extents() time, which means unless delayed_refs are run, we still have the hash for Extent A. If we don't lock delayed_ref_head, the following case may happen: Dedupe routine | run_delayed_refs() dedupe_search()| |- Found hash| || btrfs_delayed_ref_lock() || |- btrfs_delayed_ref_lock() || |- run_one_delayed_ref || | |- __btrfs_free_extent() || |- btrfs_delayed_ref_unlock() |- btrfs_inc_extent_ref()| In that case, we will increase extent ref to a non-exist extent. It will cause next run_delayed_refs() return -ENOENT and cause abort transaction. We have hit such problem several times in our test. If we lock delayed ref head, we will ensure the delayed ref of that extent won't be run. Either we increase extent ref before run_one_delayed_ref, or after it. If we run before run delayed ref on that extent, we will increase_extent_ref() and won't go to __btrfs_free_extent(), that extent will still be there. If we run after run delayed ref, we will not find the hash, and cause a hash miss and continue to write the data into disk. In case we can't find delayed_ref_head, which means there is not delayed refs for that data extent yet. We directly insert delayed_data_ref while holding delayed_refs->lock, to avoid any possible concurrency. B
Re: Scrub priority, am I using it wrong?
On Tue, Apr 5, 2016 at 12:37 PM, Duncan <1i5t5.dun...@cox.net> wrote: > CPU bound, 0% IOWait even at idle IO priority, in addition to the > hundreds of M/s values per thread/device, here. You OTOH are showing > under 20 M/s per thread/device on spinning rust, with an IOWait near 90%, > thus making it IO bound. And yes I'd love to switch to SSD, but 12 2TB drives is a bit pricey still -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrub priority, am I using it wrong?
On Tue, Apr 5, 2016 at 12:37 PM, Duncan <1i5t5.dun...@cox.net> wrote: > 1) It appears btrfs scrub start's -c option only takes numeric class, so > try -c3 instead of -c idle. Does it count as a bug if it silently accepts the way I was doing it? I've switched to -c3 and at least now the idle class listed in iotop is idle, so I hope that means it will be more friendly to other processes. -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 00/27][For 4.7] Btrfs: Add inband (write time) de-duplication framework
David Sterba wrote on 2016/04/04 18:55 +0200: On Fri, Mar 25, 2016 at 09:38:50AM +0800, Qu Wenruo wrote: Please use the newly added BTRFS_PERSISTENT_ITEM_KEY instead of a new key type. As this is the second user of that item, there's no precendent how to select the subtype. Right now 0 is for the dev stats item, but I'd like to leave some space between them, so it should be 256 at best. The space is 64bit so there's enough room but this also means defining the on-disk format. After checking BTRFS_PERSISENT_ITEM_KEY, it seems that its value is larger than current DEDUPE_BYTENR/HASH_ITEM_KEY, and since the objectid of DEDUPE_HASH_ITEM_KEY, it won't be the first item of the tree. Although that's not a big problem, but for user using debug-tree, it would be quite annoying to find it located among tons of other hashes. You can alternatively store it in the tree_root, but I don't know how frquently it's supposed to be changed. Storing in tree root sounds pretty good. As such status doesn't change until we enable/disable (including configure), so tree root seems good. But we still need to consider the later dedupe rate statistics key order. In that case, I hope to restore them both into dedupe tree. So personally, if using PERSISTENT_ITEM_KEY, at least I prefer to keep objectid to 0, and modify DEDUPE_BYTENR/HASH_ITEM_KEY to higher value, to ensure dedupe status to be the first item of dedupe tree. 0 is unfortnuatelly taken by BTRFS_DEV_STATS_OBJECTID, but I don't see problem with the ordering. DEDUPE_BYTENR/HASH_ITEM_KEY store a large number in the objectid, either part of a hash, that's unlikely to be almost-all zeros and bytenr which will be larger than 1MB. OK, as long as we can search the status item with exactly match key, it shouldn't cause big problem. 4) Ioctl interface with persist dedup status I'd like to see the ioctl specified in more detail. So far there's enable, disable and status. I'd expect some way to control the in-memory limits, let it "forget" current hash cache, specify the dedupe chunk size, maybe sync of the in-memory hash cache to disk. So current and planned ioctl should be the following, with some details related to your in-memory limit control concerns. 1) Enable Enable dedupe if it's not enabled already. (disabled -> enabled) Ok, so it should also take a parameter which bckend is about to be enabled. It already has. It also has limit_nr and limit_mem parameter for in-memory backend. Or change current dedupe setting to another. (re-configure) Doing that in 'enable' sounds confusing, any changes belong to a separate command. This depends the aspect of view. For "Enable/config/disable" case, it will introduce a state machine for end-user. Yes, that's exacly my point. Personally, I doesn't state machine for end user. Yes, I also hate merging play and pause button together on music player. I don't see this reference relevant, we're not designing a music player. If using state machine, user must ensure the dedupe is enabled before doing any configuration. For user convenience we can copy the configuration options to the dedup enable subcommand, but it will still do separate enable and configure ioctl calls. So, that's to say, user can assume there is a state machine, and to do enable-configure method. And other user can use the state-less enable-enable method. If so, I'm OK to add a configure ioctl interface. (As it's still enable-enable stateless one beneath the stateful ioctl) But in that case, if user forget to enable dedupe and call configure directly, btrfs won't give any warning and just enable dedupe. Will that design be OK for you? Or we need to share most part of enable and configure ioctl, but configure ioctl will do extra check? For me, user only need to care the result of the operation. User can now configure dedupe to their need without need to know previous setting. From this aspect of view, "Enable/Disable" is much easier than "Enable/Config/Disable". Getting the usability is hard and that's why we're having this dicussion. What suites you does not suite others, we have different habits, expectations and there are existing usage patterns. We better stick to something that's not too surprising yet still flexible enough to cover broad needs. I'm leaving this open, but I strongly disagree with the current interface proposal. I'm still open to new ioctl interface design, as long as we can re-use most of current code. Anyway, just as you pointed, the stateless one is just my personal taste. For dedupe_bs/backend/hash algorithm(only SHA256 yet) change, it will disable dedupe(dropping all hash) and then enable with new setting. For in-memory backend, if only limit is different from previous setting, limit can be changed on the fly without dropping any hash. This is obviously misplaced in 'enable'. Then, changing the 'enable' to 'configure' or other pr
Re: [PATCH 00/13 v3] Introduce device state 'failed', Hot spare and Auto replace
Kai Krakow posted on Mon, 04 Apr 2016 22:15:13 +0200 as excerpted: > Your argument would be less important if it did copy-back, tho... ;-) FWIW, I completely misunderstood your description of copy-back in my original reply, and didn't realize what you meant (and thus my mistaken understanding) until I read some of the other replies today. What I /thought/ you meant was some totally nonsense/WTF idea of keeping the newly substituted hot-spare in place, and taking the newly vacated "defective" device and putting it back in the the hot-spare list. That rightly seemed stupid to me (it's a device just replaced as defective, now you're putting it back as a hot-spare? WTF?), but that's how I read what you were asking for and saying that other solutions did, so... Of course today when I read the other replies and realized what you were /actually/ describing, returning the hot-spare to hot-spare status after physically replacing the actually failed drive with a new one and logically replacing the hot-spare with it in the filesystem, thereby making the hot-spare a spare once again, my reaction was "DUH!! NOW it makes sense!" But I was just going to let it go and go hide my original misunderstanding in a hole somewhere. But now you replied to my reply, so I figured I would reply back, explaining what on earth I was thinking when I wrote it, and why it must have seemed rather out of left field and didn't make much sense -- because what I was thinking you were suggesting /didn't/ make sense, but of course that's because I totally misunderstood what you were suggesting. So now my very-much-former misunderstanding is out of the hole and posted for everyone to see and have a good laugh at, and I'm much the wiser on what copy-back actually entails. =:^) Tho it seems I was correct in the one aspect, currently ENotImplemented, even if my idea of what you were asking to be implemented was totally and completely off-the-wall wrong. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrub priority, am I using it wrong?
Gareth Pye posted on Tue, 05 Apr 2016 09:36:48 +1000 as excerpted: > I've got a btrfs file system set up on 6 drbd disks running on 2Tb > spinning disks. The server is moderately loaded with various regular > tasks that use a fair bit of disk IO, but I've scheduled my weekly btrfs > scrub for the best quiet time in the week. > > The command that is run is: > /usr/local/bin/btrfs scrub start -Bd -c idle /data > > Which is my best attempt to try and get it to have a low impact on user > operations > > But iotop shows me: > > 1765 be/4 root 14.84 M/s0.00 B/s 0.00 % 96.65 % btrfs scrub > start -Bd -c idle /data > 1767 be/4 root 14.70 M/s0.00 B/s 0.00 % 95.35 % btrfs > scrub start -Bd -c idle /data > 1768 be/4 root 13.47 M/s0.00 B/s 0.00 % 92.59 % btrfs > scrub start -Bd -c idle /data > 1764 be/4 root 12.61 M/s0.00 B/s 0.00 % 88.77 % btrfs > scrub start -Bd -c idle /data > 1766 be/4 root 11.24 M/s0.00 B/s 0.00 % 85.18 % btrfs > scrub start -Bd -c idle /data > 1763 be/4 root7.79 M/s0.00 B/s 0.00 % 63.30 % btrfs > scrub start -Bd -c idle /data > 28858 be/4 root0.00 B/s 810.50 B/s 0.00 % 61.32 % [kworker/ u16:25] > > > Which doesn't look like an idle priority to me. And the system sure > feels like a system with a lot of heavy io going on. Is there something > I'm doing wrong? Two points: 1) It appears btrfs scrub start's -c option only takes numeric class, so try -c3 instead of -c idle. Works for me with the numeric class (same results as you with spelled out class), tho I'm on ssd with multiple independent btrfs on partitions, the biggest of which is 24 GiB, 18.something GiB used, which scrubs in all of 20 seconds, so I don't need and hadn't tried the -c option at all until now. 2) What a difference an ssd makes! $$ sudo btrfs scrub start -c3 /p scrub started on /p, [...] $$ sudo iotop -obn1 Total DISK READ : 626.53 M/s | Total DISK WRITE : 0.00 B/s Actual DISK READ: 596.93 M/s | Actual DISK WRITE: 0.00 B/s TID PRIO USER DISK READ DISK WRITE SWAPIN IOCOMMAND 872 idle root 268.40 M/s0.00 B/s 0.00 % 0.00 % btrfs scrub start -c3 /p 873 idle root 358.13 M/s0.00 B/s 0.00 % 0.00 % btrfs scrub start -c3 /p CPU bound, 0% IOWait even at idle IO priority, in addition to the hundreds of M/s values per thread/device, here. You OTOH are showing under 20 M/s per thread/device on spinning rust, with an IOWait near 90%, thus making it IO bound. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfsck: backpointer mismatch (and multiple other errors)
Kai Krakow posted on Mon, 04 Apr 2016 21:26:28 +0200 as excerpted: > I'll go test the soon-to-die SSD as soon as it replaced. I think it's > still far from failing with bitrot. It was overprovisioned by 30% most > of the time, with the spare space trimmed. Same here, FWIW. In fact, I had expected to get ~128 GB SSDs and ended up getting 256 GB, such that I was only using about 130 GiB, so depending on relative to what the overprovisioning percentage is calculated against, I was and am near 50% or 100% overprovisioned. So in my case I think the SSD was simply defective, such that the overprovisioning and trim simply didn't help. Tho the other two identical brand and model devices I bought from the same store at the same time, so very likely the same manufacturing lot, were and are just fine (tho one is showing a trivial non-zero raw value for 5, reallocated sector count, and 182, erase fail count total, but both remain at 100% "cooked" value, but absolutely no issues on the other one, actually the one that wasn't replaced of the original pair, at all). But based on that experience, while overprovisioning may help in terms of normal wearout, it doesn't necessarily help at all if the device is actually going bad. > It certainly should have a > lot of sectors for wear levelling. In addition, smartctl shows no sector > errors at all - except for one: raw_read_error_rate. I'm not sure what > all those sensors tell me, but that one I'm also seeing on hard disks > which show absolutely no data damage. > > In fact, I see those counters for my hard disks. But dd to /dev/null of > the complete raw hard disk shows no sector errors. It seems good. But > well, counting 1+1 together: I currently see data damage. But I guess > that's unrelated. > > Is there some documentation somewhere what each of those sensors > technically mean and how to read the raw values and thresh values? Nothing user/admin level that I'm aware of. I'm sure there's some smart docs somewhere that describe them as part of the standard, but they could easily be effectively unavailable for those unwilling to pay a big- corporate-sized consortium membership fee (as was the case with one of the CompactDisc specs, Orange Book IIRC, at one point). I know there's some discussion by allusion in the smartctl manpage and docs, but many attributes appear to be manufacturer specific and/or to have been reverse-engineered by the smartctl devs, meaning even /they/ don't really have access to proper documentation for at least some attributes. Which is sad, but in a majority proprietary or at best don't-care market... > I'm also seeing multi_zone_error_rate on my spinning rust. > According to smartctl health check and smartctl extended selftest, > there's no problems at all - and the smart error log is empty. There has > never been an ATA error in dmesg... No relocated sectors... From my > naive view the drives still look good. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 01/22] btrfs-progs: convert: Introduce functions to read used space
David Sterba wrote on 2016/04/04 15:35 +0200: On Fri, Jan 29, 2016 at 01:03:11PM +0800, Qu Wenruo wrote: Before we do real convert, we need to read and build up used space cache tree for later data/meta separate chunk layout. This patch will iterate all used blocks in ext2 filesystem and record it into cctx->used cache tree, for later use. This provides the very basic of later btrfs-convert rework. Signed-off-by: Qu Wenruo Signed-off-by: David Sterba --- btrfs-convert.c | 80 + 1 file changed, 80 insertions(+) diff --git a/btrfs-convert.c b/btrfs-convert.c index 4baa68e..65841bd 100644 --- a/btrfs-convert.c +++ b/btrfs-convert.c @@ -81,6 +81,7 @@ struct btrfs_convert_context; struct btrfs_convert_operations { const char *name; int (*open_fs)(struct btrfs_convert_context *cctx, const char *devname); + int (*read_used_space)(struct btrfs_convert_context *cctx); int (*alloc_block)(struct btrfs_convert_context *cctx, u64 goal, u64 *block_ret); int (*alloc_block_range)(struct btrfs_convert_context *cctx, u64 goal, @@ -230,6 +231,73 @@ fail: return -1; } +static int __ext2_add_one_block(ext2_filsys fs, char *bitmap, + unsigned long group_nr, struct cache_tree *used) +{ + unsigned long offset; + unsigned i; + int ret = 0; + + offset = fs->super->s_first_data_block; + offset /= EXT2FS_CLUSTER_RATIO(fs); This macro does not exist on my reference host for old distros. The e2fsprogs version is 1.41.14 and I'd like to keep the compatibility at least at that level. The clustering has been added in 1.42 but can we add some compatibility layer that will work on both version? No problem. It's a simple macro. For older version which doesn't provide it, we can just define it in btrfs-covert.c. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: fsck: Fix a false metadata extent warning
David Sterba wrote on 2016/04/04 13:18 +0200: On Fri, Apr 01, 2016 at 04:50:06PM +0800, Qu Wenruo wrote: After another look, why don't we use nodesize directly? Or stripesize where applies. With max_size == 0 the test does not make sense, we ought to know the alignment. Yes, my first though is also to use nodesize directly, which should be always correct. But the problem is, the related function call stack doesn't have any member to reach btrfs_root or btrfs_fs_info. JFYI, there's global_info avalaible, so it's not necessary to pass fs_info down the callstacks. Oh, that's a good news. Do I need to re-submit the patch to use fs_info->tree_root->nodesize to avoid false alert? Or wait for your refactor? Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Qgroups wrong after snapshot create
Hi, Thanks for the report. Mark Fasheh wrote on 2016/04/04 16:06 -0700: Hi, Making a snapshot gets us the wrong qgroup numbers. This is very easy to reproduce. From a fresh btrfs filesystem, simply enable qgroups and create a snapshot. In this example we have mounted a newly created fresh filesystem and mounted it at /btrfs: # btrfs quota enable /btrfs # btrfs sub sna /btrfs/ /btrfs/snap1 # btrfs qg show /btrfs qgroupid rfer excl 0/5 32.00KiB 32.00KiB 0/25716.00KiB 16.00KiB Also reproduced it. My first idea is, old snapshot qgroup hack is involved. Unlike btrfs_inc/dec_extent_ref(), snapshotting just use a dirty hack to handle it: Copy rfer from source subvolume, and directly set excl to nodesize. If such work is before adding snapshot inode into src subvolume, it may be the reason causing the bug. In the example above, the default subvolume (0/5) should read 16KiB referenced and 16KiB exclusive. A rescan fixes things, so we know the rescan process is doing the math right: # btrfs quota rescan /btrfs # btrfs qgroup show /btrfs qgroupid rfer excl 0/5 16.00KiB 16.00KiB 0/25716.00KiB 16.00KiB So the base of qgroup code is not affected, or we may need another painful rework. The last kernel to get this right was v4.1: # uname -r 4.1.20 # btrfs quota enable /btrfs # btrfs sub sna /btrfs/ /btrfs/snap1 Create a snapshot of '/btrfs/' in '/btrfs/snap1' # btrfs qg show /btrfs qgroupid rfer excl 0/5 16.00KiB 16.00KiB 0/25716.00KiB 16.00KiB Which leads me to believe that this was a regression introduced by Qu's rewrite as that is the biggest change to qgroups during that development period. Going back to upstream, I applied my tracing patch from this list ( http://thread.gmane.org/gmane.comp.file-systems.btrfs/54685 ), with a couple changes - I'm printing the rfer/excl bytecounts in qgroup_update_counters AND I print them twice - once before we make any changes and once after the changes. If I enable tracing in btrfs_qgroup_account_extent and qgroup_update_counters just before the snapshot creation, we get the following trace: # btrfs quota enable /btrfs # # echo 1 > /sys/kernel/debug/tracing/events/btrfs/btrfs_qgroup_account_extent/enable # echo 1 > //sys/kernel/debug/tracing/events/btrfs/qgroup_update_counters/enable # btrfs sub sna /btrfs/ /btrfs/snap2 Create a snapshot of '/btrfs/' in '/btrfs/snap2' # btrfs qg show /btrfs qgroupid rfer excl 0/5 32.00KiB 32.00KiB 0/25716.00KiB 16.00KiB # fstest1:~ # cat /sys/kernel/debug/tracing/trace # tracer: nop # # entries-in-buffer/entries-written: 13/13 #P:2 # # _-=> irqs-off # / _=> need-resched #| / _---=> hardirq/softirq #|| / _--=> preempt-depth #||| / delay # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | btrfs-10233 [001] 260298.823339: btrfs_qgroup_account_extent: bytenr = 29360128, num_bytes = 16384, nr_old_roots = 1, nr_new_roots = 0 btrfs-10233 [001] 260298.823342: qgroup_update_counters: qgid = 5, cur_old_count = 1, cur_new_count = 0, rfer = 16384, excl = 16384 btrfs-10233 [001] 260298.823342: qgroup_update_counters: qgid = 5, cur_old_count = 1, cur_new_count = 0, rfer = 0, excl = 0 btrfs-10233 [001] 260298.823343: btrfs_qgroup_account_extent: bytenr = 29720576, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 0 btrfs-10233 [001] 260298.823345: btrfs_qgroup_account_extent: bytenr = 29736960, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 0 btrfs-10233 [001] 260298.823347: btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 1 Now, for extent 29786112, its nr_new_roots is 1. btrfs-10233 [001] 260298.823347: qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 0, excl = 0 btrfs-10233 [001] 260298.823348: qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 16384, excl = 16384 btrfs-10233 [001] 260298.823421: btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 0 Now the problem is here, nr_old_roots should be 1, not 0. Just as previous trace shows, we increased extent ref on that extent, but now it dropped back to 0. Since its old_root == new_root == 0, qgroup code doesn't do anything on it. If its nr_old_roots is 1, qgroup will drop it's excl/rfer to 0, and then accounting may goes back to normal.
[PATCH] Btrfs: fix missing s_id setting
When fs_devices->latest_bdev is deleted or is replaced, sb->s_id has not been updated. As a result, the deleted device name is displayed by btrfs_printk. [before fix] # btrfs dev del /dev/sdc4 /mnt2 # btrfs dev add /dev/sdb6 /mnt2 [ 217.458249] BTRFS info (device sdc4): found 1 extents [ 217.695798] BTRFS info (device sdc4): disk deleted /dev/sdc4 [ 217.941284] BTRFS info (device sdc4): disk added /dev/sdb6 [after fix] # btrfs dev del /dev/sdc4 /mnt2 # btrfs dev add /dev/sdb6 /mnt2 [ 83.835072] BTRFS info (device sdc4): found 1 extents [ 84.080617] BTRFS info (device sdc3): disk deleted /dev/sdc4 [ 84.401951] BTRFS info (device sdc3): disk added /dev/sdb6 Signed-off-by: Tsutomu Itoh --- fs/btrfs/dev-replace.c | 5 - fs/btrfs/volumes.c | 11 +-- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index a1d6652..11c4198 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -560,8 +560,11 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, tgt_device->commit_bytes_used = src_device->bytes_used; if (fs_info->sb->s_bdev == src_device->bdev) fs_info->sb->s_bdev = tgt_device->bdev; - if (fs_info->fs_devices->latest_bdev == src_device->bdev) + if (fs_info->fs_devices->latest_bdev == src_device->bdev) { fs_info->fs_devices->latest_bdev = tgt_device->bdev; + snprintf(fs_info->sb->s_id, sizeof(fs_info->sb->s_id), "%pg", +tgt_device->bdev); + } list_add(&tgt_device->dev_alloc_list, &fs_info->fs_devices->alloc_list); fs_info->fs_devices->rw_devices++; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index e2b54d5..a471385 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1846,8 +1846,12 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) struct btrfs_device, dev_list); if (device->bdev == root->fs_info->sb->s_bdev) root->fs_info->sb->s_bdev = next_device->bdev; - if (device->bdev == root->fs_info->fs_devices->latest_bdev) + if (device->bdev == root->fs_info->fs_devices->latest_bdev) { root->fs_info->fs_devices->latest_bdev = next_device->bdev; + snprintf(root->fs_info->sb->s_id, +sizeof(root->fs_info->sb->s_id), "%pg", +next_device->bdev); + } if (device->bdev) { device->fs_devices->open_devices--; @@ -2034,8 +2038,11 @@ void btrfs_destroy_dev_replace_tgtdev(struct btrfs_fs_info *fs_info, struct btrfs_device, dev_list); if (tgtdev->bdev == fs_info->sb->s_bdev) fs_info->sb->s_bdev = next_device->bdev; - if (tgtdev->bdev == fs_info->fs_devices->latest_bdev) + if (tgtdev->bdev == fs_info->fs_devices->latest_bdev) { fs_info->fs_devices->latest_bdev = next_device->bdev; + snprintf(fs_info->sb->s_id, sizeof(fs_info->sb->s_id), "%pg", +next_device->bdev); + } list_del_rcu(&tgtdev->dev_list); call_rcu(&tgtdev->rcu, free_device); -- 2.6.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Scrub priority, am I using it wrong?
I've got a btrfs file system set up on 6 drbd disks running on 2Tb spinning disks. The server is moderately loaded with various regular tasks that use a fair bit of disk IO, but I've scheduled my weekly btrfs scrub for the best quiet time in the week. The command that is run is: /usr/local/bin/btrfs scrub start -Bd -c idle /data Which is my best attempt to try and get it to have a low impact on user operations But iotop shows me: 1765 be/4 root 14.84 M/s0.00 B/s 0.00 % 96.65 % btrfs scrub start -Bd -c idle /data 1767 be/4 root 14.70 M/s0.00 B/s 0.00 % 95.35 % btrfs scrub start -Bd -c idle /data 1768 be/4 root 13.47 M/s0.00 B/s 0.00 % 92.59 % btrfs scrub start -Bd -c idle /data 1764 be/4 root 12.61 M/s0.00 B/s 0.00 % 88.77 % btrfs scrub start -Bd -c idle /data 1766 be/4 root 11.24 M/s0.00 B/s 0.00 % 85.18 % btrfs scrub start -Bd -c idle /data 1763 be/4 root7.79 M/s0.00 B/s 0.00 % 63.30 % btrfs scrub start -Bd -c idle /data 28858 be/4 root0.00 B/s 810.50 B/s 0.00 % 61.32 % [kworker/u16:25] Which doesn't look like an idle priority to me. And the system sure feels like a system with a lot of heavy io going on. Is there something I'm doing wrong? System details: # uname -a Linux emile 4.4.3-040403-generic #201602251634 SMP Thu Feb 25 21:36:25 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux # /usr/local/bin/btrfs --version btrfs-progs v4.4.1 I'm waiting on the ppa version of 4.5.1 before upgrading, that is my usual kernel update strategy. # cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=14.04 DISTRIB_CODENAME=trusty DISTRIB_DESCRIPTION="Ubuntu 14.04.4 LTS" Any other details that people would like to see that are relevant to this question? -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfsck: backpointer mismatch (and multiple other errors)
On Mon, Apr 4, 2016 at 2:50 PM, Kai Krakow wrote: >> Anyway the 2nd 4 is not possible. The seed is ro by definition so you >> can't remove snapshots from the seed. If you remove them from the >> mounted rw sprout volume, they're removed from the sprout, not the >> seed. If you want them on the sprout, but not on the seed, you need to >> delete snapshots only after the seed is a.) removed from the sprout >> and b.) made no longer a seed with btrfstune -S 0 and c.) mounted rw. > > If I understand right, the seed device won't change? So whatever action > I apply to the sprout pool, I can later remove the seed from the pool > and it will still be kind of untouched. Except, I'll have to return it > no non-seed mode (step b). Correct. In a sense, making a volume a seed is like making it a volume-wide read-only snapshot. Any changes are applied via COW only to added device(s). > > Why couldn't/shouldn't I remove snapshots before detaching the seed > device? I want to keep them on the seed but they are useless to me on > the sprout. You can remove snapshots before or after detaching the seed device, it doesn't matter, but such snapshot removal only affects the sprout. You wrote: "remove all left-over snapshots from the seed" The seed is read only, you can't modify the contents of the seed device. What you should do is just delete the snapshots you don't want migrated over to the sprout right away before you even do the balance -dconvert -mconvert. That way you aren't wasting time moving things over that you don't want. To be clear: btrfstune -S 0 mount /dev/seed /mnt/ btrfs dev add /dev/new1 btrfs dev add /dev/new2 mount -o remount,rw /mnt/ btrfs sub del blah/ blah2/ blah3/ blah4/ btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/ btrfs dev del /dev/seed /mnt/ If you're doing any backsup once remounting rw, note those backups will only be on the sprout. Backups will not be on the seed because it's read-only. > > What happens to the UUIDs when I separate seed and sprout? Nothing. They remain intact and unique, per volume. > > I'd now reboot into the system to see if it's working. Note you'll need to change grub.cfg, possibly fstab, and possibly the initramfs, all three of which may be referencing the old volume. > By then, it's > time for some cleanup (remove the previously deferred "trashes" and > retention snapshots), then separate the seed from the sprout. During > that time, I could already use my system again while it's migrating for > me in the background. > > I'd then return the seed back to non-seed, so it can take the role of > my backup storage again. I'd do a rebalance now. OK? I don't know why you need to balance the seed at all, let alone afterward, but it seems like it might be a more efficient replication if you balanced before making it a seed? > > During the whole process, the backup storage will still stay safe for > me. If something goes wrong, I could easily start over. > > Did I miss something? Is it too much of an experimental kind of stuff? I'm not sure where all the bugs are. It's good to find bugs though and get them squashed. I have an idea of making live media use Btrfs instead of using a loop mounted file to back a rw lvm snapshot device (persistent overlay), which I think is really fragile and a lot more complicated in the initramfs. It's also good to take advantage of checksumming after having written an ISO to flash media, where users often don't verify or something can mount the USB stick rw and immediately modify the stick in such a way that media verification will fail anyway. So, a number of plusses, I'd like to see the seed device be robust. > > BTW: The way it is arranged now, the backup storage is bootable by > setting the scratch area subvolume as the rootfs on kernel cmdline, > USB drivers are included in the kernel, it's tested and works. I guess, > this isn't possible while the backup storage acts as a seed device? But > I have an initrd with latest btrfs-progs on my boot device (which is an > UEFI ESP, so not related to btrfs at all), I should be able to use that > to revert changes preventing me from booting. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Qgroups wrong after snapshot create
Hi, Making a snapshot gets us the wrong qgroup numbers. This is very easy to reproduce. From a fresh btrfs filesystem, simply enable qgroups and create a snapshot. In this example we have mounted a newly created fresh filesystem and mounted it at /btrfs: # btrfs quota enable /btrfs # btrfs sub sna /btrfs/ /btrfs/snap1 # btrfs qg show /btrfs qgroupid rfer excl 0/5 32.00KiB 32.00KiB 0/25716.00KiB 16.00KiB In the example above, the default subvolume (0/5) should read 16KiB referenced and 16KiB exclusive. A rescan fixes things, so we know the rescan process is doing the math right: # btrfs quota rescan /btrfs # btrfs qgroup show /btrfs qgroupid rfer excl 0/5 16.00KiB 16.00KiB 0/25716.00KiB 16.00KiB The last kernel to get this right was v4.1: # uname -r 4.1.20 # btrfs quota enable /btrfs # btrfs sub sna /btrfs/ /btrfs/snap1 Create a snapshot of '/btrfs/' in '/btrfs/snap1' # btrfs qg show /btrfs qgroupid rfer excl 0/5 16.00KiB 16.00KiB 0/25716.00KiB 16.00KiB Which leads me to believe that this was a regression introduced by Qu's rewrite as that is the biggest change to qgroups during that development period. Going back to upstream, I applied my tracing patch from this list ( http://thread.gmane.org/gmane.comp.file-systems.btrfs/54685 ), with a couple changes - I'm printing the rfer/excl bytecounts in qgroup_update_counters AND I print them twice - once before we make any changes and once after the changes. If I enable tracing in btrfs_qgroup_account_extent and qgroup_update_counters just before the snapshot creation, we get the following trace: # btrfs quota enable /btrfs # # echo 1 > /sys/kernel/debug/tracing/events/btrfs/btrfs_qgroup_account_extent/enable # echo 1 > //sys/kernel/debug/tracing/events/btrfs/qgroup_update_counters/enable # btrfs sub sna /btrfs/ /btrfs/snap2 Create a snapshot of '/btrfs/' in '/btrfs/snap2' # btrfs qg show /btrfs qgroupid rfer excl 0/5 32.00KiB 32.00KiB 0/25716.00KiB 16.00KiB # fstest1:~ # cat /sys/kernel/debug/tracing/trace # tracer: nop # # entries-in-buffer/entries-written: 13/13 #P:2 # # _-=> irqs-off # / _=> need-resched #| / _---=> hardirq/softirq #|| / _--=> preempt-depth #||| / delay # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | btrfs-10233 [001] 260298.823339: btrfs_qgroup_account_extent: bytenr = 29360128, num_bytes = 16384, nr_old_roots = 1, nr_new_roots = 0 btrfs-10233 [001] 260298.823342: qgroup_update_counters: qgid = 5, cur_old_count = 1, cur_new_count = 0, rfer = 16384, excl = 16384 btrfs-10233 [001] 260298.823342: qgroup_update_counters: qgid = 5, cur_old_count = 1, cur_new_count = 0, rfer = 0, excl = 0 btrfs-10233 [001] 260298.823343: btrfs_qgroup_account_extent: bytenr = 29720576, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 0 btrfs-10233 [001] 260298.823345: btrfs_qgroup_account_extent: bytenr = 29736960, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 0 btrfs-10233 [001] 260298.823347: btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 1 btrfs-10233 [001] 260298.823347: qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 0, excl = 0 btrfs-10233 [001] 260298.823348: qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 16384, excl = 16384 btrfs-10233 [001] 260298.823421: btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 0 btrfs-10233 [001] 260298.823422: btrfs_qgroup_account_extent: bytenr = 29835264, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 0 btrfs-10233 [001] 260298.823425: btrfs_qgroup_account_extent: bytenr = 29851648, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 1 btrfs-10233 [001] 260298.823426: qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 16384, excl = 16384 btrfs-10233 [001] 260298.823426: qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 32768, excl = 32768 If you read through the whole log we do some ... interesting.. things - at the start, we *subtract* from qgroup 5, making it's count go to zero. I want to say that this is kind of unexpected for a snapshot create but perhaps there's something I'm missing. Remember that I'm printing each qgroup tw
Re: btrfsck: backpointer mismatch (and multiple other errors)
Am Mon, 4 Apr 2016 22:50:18 +0200 schrieb Kai Krakow : > Am Mon, 4 Apr 2016 13:57:50 -0600 > schrieb Chris Murphy : > > > On Mon, Apr 4, 2016 at 1:36 PM, Kai Krakow > > wrote: > > > > > > > [...] > [...] > > > > > > In the following sense: I should disable the automounter and > > > backup job for the seed device while I let my data migrate back > > > to main storage in the background... > > > > The sprout can be written to just fine by the backup, just > > understand that the seed and sprout volume UUID are different. Your > > automounter is probably looking for the seed's UUID, and that seed > > can only be mounted ro. The sprout UUID however can be mounted rw. > > > > I would probably skip the automounter. Do the seed setup, mount it, > > add all devices you're planning to add, then -o > > remount,rw,compress... , and then activate the backup. But maybe > > your backup also is looking for UUID? If so, that needs to be > > updated first. Once the balance -dconvert=raid1 and -mconvert=raid1 > > is finished, then you can remove the seed device. And now might be > > a good time to give the raid1 a new label, I think it inherits the > > label of the seed but I'm not certain of this. > > > > > > > My intention is to use fully my system while btrfs migrates the > > > data from seed to main storage. Then, afterwards I'd like to > > > continue using the seed device for backups. > > > > > > I'd probably do the following: > > > > > > 1. create btrfs pool, attach seed > > > > I don't understand that step in terms of commands. Sprouts are made > > with btrfs dev add, not with mkfs. There is no pool creation. You > > make a seed. You mount it. Add devices to it. Then remount it. > > Hmm, yes. I didn't think this through into detail yet. It actually > works that way. I more commonly referenced to the general approach. > > But I think this answers my question... ;-) > > > > 2. recreate my original subvolume structure by snapshotting the > > > backup scratch area multiple times into each subvolume > > > 3. rearrange the files in each subvolume to match their intended > > > use by using rm and mv > > > 4. reboot into full system > > > 4. remove all left-over snapshots from the seed > > > 5. remove (detach) the seed device > > > > You have two 4's. > > Oh... Sorry... I think one week of 80 work hours, and another of 60 > was a bit too much... ;-) > > > Anyway the 2nd 4 is not possible. The seed is ro by definition so > > you can't remove snapshots from the seed. If you remove them from > > the mounted rw sprout volume, they're removed from the sprout, not > > the seed. If you want them on the sprout, but not on the seed, you > > need to delete snapshots only after the seed is a.) removed from > > the sprout and b.) made no longer a seed with btrfstune -S 0 and > > c.) mounted rw. > > If I understand right, the seed device won't change? So whatever > action I apply to the sprout pool, I can later remove the seed from > the pool and it will still be kind of untouched. Except, I'll have to > return it no non-seed mode (step b). > > Why couldn't/shouldn't I remove snapshots before detaching the seed > device? I want to keep them on the seed but they are useless to me on > the sprout. > > What happens to the UUIDs when I separate seed and sprout? > > This is my layout: > > /dev/sde1 contains my backup storage: btrfs with multiple weeks worth > of retention in form of ro snapshots, and one scratch area in which > the backup is performed. Snapshots are created from the scratch area. > The scratch area is one single subvolume updated by rsync. > > I want to turn this into a seed for my newly created btrfs pool. This > one has subvolumes for /home, /home/my_user, /distribution_name/rootfs > and a few more (like var/log etc). > > Since the backup is not split by those subvolumes but contains just > the single runtime view of my system rootfs, I'm planning to clone > this single subvolume back into each of my previously used subvolumes > which in turn of course now contain all the same complete filesystem > tree. Thus, in the next step, I'm planning to mv/rm the contents to > get back to the original subvolume structure - mv should be a fast > operation here, rm probably not so but I don't bother. I could defer > that until later by moving those rm-candidates into some trash folder > per subvolume. > > Now, I still have the ro-snapshots worth of multiple weeks of > retention. I only need those in my backup storage, not in the storage > proposed to become my bootable system. So I'd simply remove them. I > could also defer that until later easily. > > This should get my system back into working state pretty fast and > easily if I didn't miss a point. > > I'd now reboot into the system to see if it's working. By then, it's > time for some cleanup (remove the previously deferred "trashes" and > retention snapshots), then separate the seed from the sprout. During > that time, I co
Re: btrfsck: backpointer mismatch (and multiple other errors)
Am Mon, 4 Apr 2016 13:57:50 -0600 schrieb Chris Murphy : > On Mon, Apr 4, 2016 at 1:36 PM, Kai Krakow > wrote: > > > > [...] > >> > >> ? > > > > In the following sense: I should disable the automounter and backup > > job for the seed device while I let my data migrate back to main > > storage in the background... > > The sprout can be written to just fine by the backup, just understand > that the seed and sprout volume UUID are different. Your automounter > is probably looking for the seed's UUID, and that seed can only be > mounted ro. The sprout UUID however can be mounted rw. > > I would probably skip the automounter. Do the seed setup, mount it, > add all devices you're planning to add, then -o remount,rw,compress... > , and then activate the backup. But maybe your backup also is looking > for UUID? If so, that needs to be updated first. Once the balance > -dconvert=raid1 and -mconvert=raid1 is finished, then you can remove > the seed device. And now might be a good time to give the raid1 a new > label, I think it inherits the label of the seed but I'm not certain > of this. > > > > My intention is to use fully my system while btrfs migrates the data > > from seed to main storage. Then, afterwards I'd like to continue > > using the seed device for backups. > > > > I'd probably do the following: > > > > 1. create btrfs pool, attach seed > > I don't understand that step in terms of commands. Sprouts are made > with btrfs dev add, not with mkfs. There is no pool creation. You make > a seed. You mount it. Add devices to it. Then remount it. Hmm, yes. I didn't think this through into detail yet. It actually works that way. I more commonly referenced to the general approach. But I think this answers my question... ;-) > > 2. recreate my original subvolume structure by snapshotting the > > backup scratch area multiple times into each subvolume > > 3. rearrange the files in each subvolume to match their intended > > use by using rm and mv > > 4. reboot into full system > > 4. remove all left-over snapshots from the seed > > 5. remove (detach) the seed device > > You have two 4's. Oh... Sorry... I think one week of 80 work hours, and another of 60 was a bit too much... ;-) > Anyway the 2nd 4 is not possible. The seed is ro by definition so you > can't remove snapshots from the seed. If you remove them from the > mounted rw sprout volume, they're removed from the sprout, not the > seed. If you want them on the sprout, but not on the seed, you need to > delete snapshots only after the seed is a.) removed from the sprout > and b.) made no longer a seed with btrfstune -S 0 and c.) mounted rw. If I understand right, the seed device won't change? So whatever action I apply to the sprout pool, I can later remove the seed from the pool and it will still be kind of untouched. Except, I'll have to return it no non-seed mode (step b). Why couldn't/shouldn't I remove snapshots before detaching the seed device? I want to keep them on the seed but they are useless to me on the sprout. What happens to the UUIDs when I separate seed and sprout? This is my layout: /dev/sde1 contains my backup storage: btrfs with multiple weeks worth of retention in form of ro snapshots, and one scratch area in which the backup is performed. Snapshots are created from the scratch area. The scratch area is one single subvolume updated by rsync. I want to turn this into a seed for my newly created btrfs pool. This one has subvolumes for /home, /home/my_user, /distribution_name/rootfs and a few more (like var/log etc). Since the backup is not split by those subvolumes but contains just the single runtime view of my system rootfs, I'm planning to clone this single subvolume back into each of my previously used subvolumes which in turn of course now contain all the same complete filesystem tree. Thus, in the next step, I'm planning to mv/rm the contents to get back to the original subvolume structure - mv should be a fast operation here, rm probably not so but I don't bother. I could defer that until later by moving those rm-candidates into some trash folder per subvolume. Now, I still have the ro-snapshots worth of multiple weeks of retention. I only need those in my backup storage, not in the storage proposed to become my bootable system. So I'd simply remove them. I could also defer that until later easily. This should get my system back into working state pretty fast and easily if I didn't miss a point. I'd now reboot into the system to see if it's working. By then, it's time for some cleanup (remove the previously deferred "trashes" and retention snapshots), then separate the seed from the sprout. During that time, I could already use my system again while it's migrating for me in the background. I'd then return the seed back to non-seed, so it can take the role of my backup storage again. I'd do a rebalance now. During the whole process, the backup storage will still stay safe for me. If something goes wro
Re: csum failed on innexistent inode
Am Mon, 4 Apr 2016 03:50:54 -0400 schrieb Jérôme Poulin : > How is it possible to get rid of the referenced csum errors if they do > not exist? Also, the expected checksum looks suspiciously the same for > multiple errors. Could it be bad RAM in that case? Can I convince > BTRFS to update the csum? > > # btrfs inspect-internal logical-resolve -v 1809149952 /mnt/btrfs/ > ioctl ret=-1, error: No such file or directory > # btrfs inspect-internal inode-resolve -v 296 /mnt/btrfs/ > ioctl ret=-1, error: No such file or directory I fell into that pitfall, too. If you have multiple subvolumes, you need to pass the correct subvolume path for the inode to properly resolve. Maybe that's the case for you? First, take a look at what "btrfs subvol list /mnt/btrfs" shows you. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/13 v3] Introduce device state 'failed', Hot spare and Auto replace
Am Mon, 4 Apr 2016 04:45:16 + (UTC) schrieb Duncan <1i5t5.dun...@cox.net>: > Kai Krakow posted on Mon, 04 Apr 2016 02:00:43 +0200 as excerpted: > > > Does this also implement "copy-back" - thus, it returns the > > hot-spare device to global hot-spares when the failed device has > > been replaced? > > I don't believe it does that in this initial implementation, anyway. > > There's a number of issues with the initial implementation, including > the fact that the hot-spare is global only and can't be specifically > assigned to a filesystem or set of filesystems, which means, if you > have multiple filesystems using different sized devices, the > hot-spares must be sized to match the largest device they could > replace, and thus would be mostly wasted if they ended up replacing a > far smaller device. If the spares could be associated with specific > filesystems, then specifically sized spares could be associated > appropriately, avoiding that waste. Additionally, it would then be > possible to queue up say 20 spares on an important filesystem, with > no spares on another that you'd rather just go down if a device fails. > > So obviously the initial implementation isn't seriously > enterprise-ready and is sub-optimal in many ways, but it's better > than what is currently available (no automated spare handling at > all), and an implementation must start somewhere, so as long as it's > designed to be improved and extended with the missing features over > time, as has been indicated, it's a reasonable first-implementation. Your argument would be less important if it did copy-back, tho... ;-) It's a very welcome and good start, I didn't mean to talk it useless. By no way. But to handle it right, that point should be clear. Currently, if the global spare jumps in, you can always simulate "hot spare" by manually putting back a correctly sized drive, then remove the spare again to simulate copy-back, then make it global spare again. Since such an incident needs manual investigation anyways, it's totally reasonable to start with this implementation. This sort of handling could be made into a guide within the docs. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] delete obsolete function btrfs_print_tree()
On Mon, Apr 04, 2016 at 05:02:38PM +0100, Filipe Manana wrote: > It's not serious if it doesn't have all the proper error handling > and etc, it's just something for debugging purposes. I'm slowly trying to remove static checker warnings so that we can detect real bugs. People sometimes leave little messages for me in their code because they know I will review the new warning: foo = kmalloc(); /* error handling deliberately left out */ It make me quite annoyed because it's like "Oh. No if we added error handling that would take 40 extra bytes of memory! Such a waste!" But we could use instead use___GFP_NOFAIL instead. Or BUG_ON(!foo)". I have gotten distracted. What was the question again? regards, dan carpenter -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/13 v3] Introduce device state 'failed', Hot spare and Auto replace
Am Mon, 4 Apr 2016 14:19:23 +0800 schrieb Anand Jain : > > Otherwise, I find "hot spare" misleading and it should be renamed. > > I never thought hot spare would be narrowed to such a specifics. [...] > About the naming.. the progs called it 'global spare' (device), > kernel calls is 'spare'. Sorry this email thread called it > hot spare. I should have paid little more attention here to maintain > consistency. > > Thanks for the note. I think that's okay. Maybe man pages / doc should put a note that there's no copy-back and that the spare takes a permanent replacement role. Side note: When I started managing hardware RAIDs a few years back, "hot spare" wasn't very clear to me, and I didn't understand why there is a copy-back operation (given that "useless" +1x IO). But in the long term it keeps drive arrangement at expectations - which is good. RAID board manufacturers seem to differentiate between those two replacement strategies - and "hot spare" always involved copy-back for me: The spare drive automatically returns to its hot spare role. I learned to like this strategy. It has some advantages. You could instead assign a replacement drive - then drives will become rearranged in the array. This is usually done by just onlining one spare disk, start a replace action, then offline the old drive and pull it from the array. It's not "hot" in that sense. It's been unconfigured good. Not sure if this could be automated - I did it this way only when the array hasn't been equipped with a spare inside the enclosure but the drive being still in its original box. Other than that, I always used the hot spare method. That's why I stumbled across... -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfsck: backpointer mismatch (and multiple other errors)
On Mon, Apr 4, 2016 at 1:36 PM, Kai Krakow wrote: > >> > I guess the >> > seed source cannot be mounted or modified... >> >> ? > > In the following sense: I should disable the automounter and backup job > for the seed device while I let my data migrate back to main storage in > the background... The sprout can be written to just fine by the backup, just understand that the seed and sprout volume UUID are different. Your automounter is probably looking for the seed's UUID, and that seed can only be mounted ro. The sprout UUID however can be mounted rw. I would probably skip the automounter. Do the seed setup, mount it, add all devices you're planning to add, then -o remount,rw,compress... , and then activate the backup. But maybe your backup also is looking for UUID? If so, that needs to be updated first. Once the balance -dconvert=raid1 and -mconvert=raid1 is finished, then you can remove the seed device. And now might be a good time to give the raid1 a new label, I think it inherits the label of the seed but I'm not certain of this. > My intention is to use fully my system while btrfs migrates the data > from seed to main storage. Then, afterwards I'd like to continue using > the seed device for backups. > > I'd probably do the following: > > 1. create btrfs pool, attach seed I don't understand that step in terms of commands. Sprouts are made with btrfs dev add, not with mkfs. There is no pool creation. You make a seed. You mount it. Add devices to it. Then remount it. > 2. recreate my original subvolume structure by snapshotting the backup >scratch area multiple times into each subvolume > 3. rearrange the files in each subvolume to match their intended use by >using rm and mv > 4. reboot into full system > 4. remove all left-over snapshots from the seed > 5. remove (detach) the seed device You have two 4's. Anyway the 2nd 4 is not possible. The seed is ro by definition so you can't remove snapshots from the seed. If you remove them from the mounted rw sprout volume, they're removed from the sprout, not the seed. If you want them on the sprout, but not on the seed, you need to delete snapshots only after the seed is a.) removed from the sprout and b.) made no longer a seed with btrfstune -S 0 and c.) mounted rw. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfsck: backpointer mismatch (and multiple other errors)
Am Sun, 3 Apr 2016 18:51:07 -0600 schrieb Chris Murphy : > > BTW: Is it possible to use my backup drive (it's btrfs single-data > > dup-metadata, single device) as a seed device for my newly created > > btrfs pool (raid0-data, raid1-metadata, three devices)? > > Yes. > > I just tried doing the conversion to raid1 before and after seed > removal, but with the small amount of data (4GiB) I can't tell a > difference. It seems like -dconvert=raid with seed still connected > makes two rw copies (i.e. there's a ro copy which is the original, and > then two rw copies on 2 of the 3 devices I added all at the same time > to the seed), and the 'btrfs dev remove' command to remove the seed > happened immediately, suggested the prior balances had already > migrated copies off the seed. This may or may not be optimal for your > case. > > Two gotchas. > > I ran into this bug: > btrfs fi usage crash when volume contains seed device > https://bugzilla.kernel.org/show_bug.cgi?id=115851 > > And there is a phantom single chunk on one of the new rw devices that > was added. Data,single: Size:1.00GiB, Used:0.00B >/dev/dm-8 1.00GiB > > It's still there after the -dconvert=raid1 and separate -mconvert=raid > and after seed device removal. A balance start without filters removes > it, chances are had I used -dconvert=raid1,soft it would have vanished > also but I didn't retest for that. Good to know, thanks. > > I guess the > > seed source cannot be mounted or modified... > > ? In the following sense: I should disable the automounter and backup job for the seed device while I let my data migrate back to main storage in the background... My intention is to use fully my system while btrfs migrates the data from seed to main storage. Then, afterwards I'd like to continue using the seed device for backups. I'd probably do the following: 1. create btrfs pool, attach seed 2. recreate my original subvolume structure by snapshotting the backup scratch area multiple times into each subvolume 3. rearrange the files in each subvolume to match their intended use by using rm and mv 4. reboot into full system 4. remove all left-over snapshots from the seed 5. remove (detach) the seed device 6. rebalance 7. switch bcache to write-back mode (or attach bcache only now) -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Global hotspare functionality
2016-04-01 18:15 GMT-07:00 Anand Jain : Issue 2. At start of autoreplacig drive by hotspare, kernel craches in transaction handling code (inside of btrfs_commit_transaction() called by autoreplace initiating routines). I 'fixed' this by removing of closing of bdev in btrfs_close_one_device_dont_free(), see https://bitbucket.org/jekhor/linux-btrfs/commits/dfa441c9ec7b3833f6a5e4d0b6f8c678faea29bb?at=master (oops text is attached also). Bdev is closed after replacing by btrfs_dev_replace_finishing(), so this is safe but doesn't seem to be right way. >>> >>> >>> I have sent out V2. I don't see that issue with this, >>> could you pls try ? >> >> >> Yes, it reproduced on v4.4.5 kernel. I will try with current >> 'for-linus-4.6' Chris' tree soon. >> >> To emulate a drive failure, I disconnect the drive in VirtualBox, so bdev >> can be freed by kernel after releasing of all references to it. > > > So far the raid group profile would adapt to lower suitable > group profile when device is missing/failed. This appears to > be not happening with RAID56 OR there are stale IO which wasn't > flushed out. Anyway to have this fixed I am moving the patch >btrfs: introduce device dynamic state transition to offline or failed > to the top in v3 for any potential changes. > But firstly we need a reliable test case, or a very carefully > crafted test case which can create this situation > > Here below is the dm-error that I am using for testing, which > apparently doesn't report this issue. Could you please try on V3. ? > (pls note the device names are hard coded in the test script > sorry about that) This would eventually be fstests script. Hi, I have reproduced this oops with attached script. I don't use any dm layer, but just detach drive at scsi layer as xfstests do (device management functions were copy-pasted from it). test-autoreplace2-mainline.sh Description: Bourne shell script
Re: btrfsck: backpointer mismatch (and multiple other errors)
Am Mon, 4 Apr 2016 04:34:54 + (UTC) schrieb Duncan <1i5t5.dun...@cox.net>: > Meanwhile, putting bcache into write-around mode, so it makes no > further changes to the ssd and only uses it for reads, is probably > wise, and should help limit further damage. Tho if in that mode > bcache still does writeback of existing dirty and cached data to the > backing store, some further damage could occur from that. But I > don't know enough about bcache to know what its behavior and level of > available configuration in that regard actually are. As long as it's > not trying to write anything from the ssd to the backing store, I > think further damage should be very limited. bcache has 0 for dirty data most of the time for me - even in write back mode. It does write back during idle time and at reduced rate, usually that finishes within a few minutes. Switching the cache to write-around initiates instant write-back of all dirty data, so within seconds it goes down to zero and the cache becomes detachable. I'll go test the soon-to-die SSD as soon as it replaced. I think it's still far from failing with bitrot. It was overprovisioned by 30% most of the time, with the spare space trimmed. It certainly should have a lot of sectors for wear levelling. In addition, smartctl shows no sector errors at all - except for one: raw_read_error_rate. I'm not sure what all those sensors tell me, but that one I'm also seeing on hard disks which show absolutely no data damage. In fact, I see those counters for my hard disks. But dd to /dev/null of the complete raw hard disk shows no sector errors. It seems good. But well, counting 1+1 together: I currently see data damage. But I guess that's unrelated. Is there some documentation somewhere what each of those sensors technically mean and how to read the raw values and thresh values? I'm also seeing multi_zone_error_rate on my spinning rust. According to smartctl health check and smartctl extended selftest, there's no problems at all - and the smart error log is empty. There has never been an ATA error in dmesg... No relocated sectors... From my naive view the drives still look good. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 00/27][For 4.7] Btrfs: Add inband (write time) de-duplication framework
On Fri, Mar 25, 2016 at 09:38:50AM +0800, Qu Wenruo wrote: > > Please use the newly added BTRFS_PERSISTENT_ITEM_KEY instead of a new > > key type. As this is the second user of that item, there's no precendent > > how to select the subtype. Right now 0 is for the dev stats item, but > > I'd like to leave some space between them, so it should be 256 at best. > > The space is 64bit so there's enough room but this also means defining > > the on-disk format. > > After checking BTRFS_PERSISENT_ITEM_KEY, it seems that its value is > larger than current DEDUPE_BYTENR/HASH_ITEM_KEY, and since the objectid > of DEDUPE_HASH_ITEM_KEY, it won't be the first item of the tree. > > Although that's not a big problem, but for user using debug-tree, it > would be quite annoying to find it located among tons of other hashes. You can alternatively store it in the tree_root, but I don't know how frquently it's supposed to be changed. > So personally, if using PERSISTENT_ITEM_KEY, at least I prefer to keep > objectid to 0, and modify DEDUPE_BYTENR/HASH_ITEM_KEY to higher value, > to ensure dedupe status to be the first item of dedupe tree. 0 is unfortnuatelly taken by BTRFS_DEV_STATS_OBJECTID, but I don't see problem with the ordering. DEDUPE_BYTENR/HASH_ITEM_KEY store a large number in the objectid, either part of a hash, that's unlikely to be almost-all zeros and bytenr which will be larger than 1MB. > 4) Ioctl interface with persist dedup status > >>> > >>> I'd like to see the ioctl specified in more detail. So far there's > >>> enable, disable and status. I'd expect some way to control the in-memory > >>> limits, let it "forget" current hash cache, specify the dedupe chunk > >>> size, maybe sync of the in-memory hash cache to disk. > >> > >> So current and planned ioctl should be the following, with some details > >> related to your in-memory limit control concerns. > >> > >> 1) Enable > >> Enable dedupe if it's not enabled already. (disabled -> enabled) > > > > Ok, so it should also take a parameter which bckend is about to be > > enabled. > > It already has. > It also has limit_nr and limit_mem parameter for in-memory backend. > > > > >> Or change current dedupe setting to another. (re-configure) > > > > Doing that in 'enable' sounds confusing, any changes belong to a > > separate command. > > This depends the aspect of view. > > For "Enable/config/disable" case, it will introduce a state machine for > end-user. Yes, that's exacly my point. > Personally, I doesn't state machine for end user. Yes, I also hate > merging play and pause button together on music player. I don't see this reference relevant, we're not designing a music player. > If using state machine, user must ensure the dedupe is enabled before > doing any configuration. For user convenience we can copy the configuration options to the dedup enable subcommand, but it will still do separate enable and configure ioctl calls. > For me, user only need to care the result of the operation. User can now > configure dedupe to their need without need to know previous setting. > From this aspect of view, "Enable/Disable" is much easier than > "Enable/Config/Disable". Getting the usability is hard and that's why we're having this dicussion. What suites you does not suite others, we have different habits, expectations and there are existing usage patterns. We better stick to something that's not too surprising yet still flexible enough to cover broad needs. I'm leaving this open, but I strongly disagree with the current interface proposal. > >> For dedupe_bs/backend/hash algorithm(only SHA256 yet) change, it > >> will disable dedupe(dropping all hash) and then enable with new > >> setting. > >> > >> For in-memory backend, if only limit is different from previous > >> setting, limit can be changed on the fly without dropping any hash. > > > > This is obviously misplaced in 'enable'. > > Then, changing the 'enable' to 'configure' or other proper naming would > be better. > > The point is, user only need to care what they want to do, not previous > setup. > > > > >> 2) Disable > >> Disable will drop all hash and delete the dedupe tree if it exists. > >> Imply a full sync_fs(). > > > > That is again combining too many things into one. Say I want to disable > > deduplication and want to enable it later. And not lose the whole state > > between that. Not to say deleting the dedup tree. > > > > IOW, deleting the tree belongs to a separate command, though in the > > userspace tools it could be done in one command, but we're talking about > > the kernel ioctls now. > > > > I'm not sure if the sync is required, but it's acceptable for first > > implementation. > > The design is just to to reduce complexity. > If want to keep hash but disable dedupe, it will make dedupe only handle > extent remove, but ignore any new coming write. > > It will introduce a new state for dedupe, other than current s
Re: [PATCH] delete obsolete function btrfs_print_tree()
On 04/04/16 18:02, Filipe Manana wrote: > I use this function frequently during development, and there's a good > reason to use it instead of the user space tool btrfs-debug-tree. Good to know, that's why I asked. Printing unwritten extents makes sense. -h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] delete obsolete function btrfs_print_tree()
On Mon, Apr 4, 2016 at 4:54 PM, Holger Hoffstätte wrote: > On 04/04/16 15:56, David Sterba wrote: >> On Fri, Mar 25, 2016 at 03:53:17PM +0100, Holger Hoffstätte wrote: >>> Dan Carpenter's static checker recently found missing IS_ERR handling >>> in print-tree.c:btrfs_print_tree(). While looking into this I found that >>> this function is no longer called anywhere and was moved to btrfs-progs >>> long ago. It can simply be removed. >> >> I'm not sure, the function could be used for debugging, and it's hard to > > ..but is it? So far nobody has complained. I will complain. I use this function frequently during development, and there's a good reason to use it instead of the user space tool btrfs-debug-tree. > >> say if we'll ever need it. Printing the whole tree to the system log >> would produce a lot of text so some manual filtering would be required, >> the function could serve as a template. > > The original problem of missing error handling from btrfs_read_tree_block() > remains as well. I don't remember if that also was true for the btrfs-progs > counterpart, but in in any case I didn't really know what to do there. > Print an error? silently ignore the stripe? abort? When I realized that the > function was not called anywhere, deleting it seemed more effective. > > Under what circumstances would the in-kernel function be more > practical or useful than the userland tool? The user land tool requires the btree nodes to be on disk. With the in kernel function we can print nodes that are not yet on disk, very useful during development. So no, we should not delete it in my opinion. It's not serious if it doesn't have all the proper error handling and etc, it's just something for debugging purposes. It does the same, won't disturb > or wedge the kernel further, is up-to-date and can be scripted. > I agree that in-place filtering (while iterating) would be nice to have, > but that's also a whole different problem and would IMHO also be better > suited for userland. > > When in doubt cut it out. When in doubt leave it alone. > > Holger > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] delete obsolete function btrfs_print_tree()
On 04/04/16 15:56, David Sterba wrote: > On Fri, Mar 25, 2016 at 03:53:17PM +0100, Holger Hoffstätte wrote: >> Dan Carpenter's static checker recently found missing IS_ERR handling >> in print-tree.c:btrfs_print_tree(). While looking into this I found that >> this function is no longer called anywhere and was moved to btrfs-progs >> long ago. It can simply be removed. > > I'm not sure, the function could be used for debugging, and it's hard to ..but is it? So far nobody has complained. > say if we'll ever need it. Printing the whole tree to the system log > would produce a lot of text so some manual filtering would be required, > the function could serve as a template. The original problem of missing error handling from btrfs_read_tree_block() remains as well. I don't remember if that also was true for the btrfs-progs counterpart, but in in any case I didn't really know what to do there. Print an error? silently ignore the stripe? abort? When I realized that the function was not called anywhere, deleting it seemed more effective. Under what circumstances would the in-kernel function be more practical or useful than the userland tool? It does the same, won't disturb or wedge the kernel further, is up-to-date and can be scripted. I agree that in-place filtering (while iterating) would be nice to have, but that's also a whole different problem and would IMHO also be better suited for userland. When in doubt cut it out. Holger -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL] Misc fixes for 4.6, part 2
Hi, please pull the following patches to 4.6. They fix some user visible problems, improve error handling and there are two debugging enhancements. Thanks. The following changes since commit 232cad8413a0bfbd25f11cc19fd13dfd85e1d8ad: Merge branch 'misc-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.6 (2016-03-24 17:36:13 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git misc-4.6 for you to fetch changes up to 7ccefb98ce3e5c4493cd213cd03714b7149cf0cb: btrfs: Reset IO error counters before start of device replacing (2016-04-04 16:29:22 +0200) David Sterba (1): btrfs: fallback to vmalloc in btrfs_compare_tree Davide Italiano (1): Btrfs: Improve FL_KEEP_SIZE handling in fallocate Josef Bacik (1): Btrfs: don't use src fd for printk Liu Bo (1): Btrfs: fix invalid reference in replace_path Mark Fasheh (2): btrfs: handle non-fatal errors in btrfs_qgroup_inherit() btrfs: Add qgroup tracing Qu Wenruo (1): btrfs: Output more info for enospc_debug mount option Yauhen Kharuzhy (1): btrfs: Reset IO error counters before start of device replacing fs/btrfs/ctree.c | 12 -- fs/btrfs/dev-replace.c | 2 + fs/btrfs/extent-tree.c | 21 ++- fs/btrfs/file.c | 9 +++-- fs/btrfs/ioctl.c | 2 +- fs/btrfs/qgroup.c| 63 --- fs/btrfs/relocation.c| 1 + include/trace/events/btrfs.h | 89 +++- 8 files changed, 166 insertions(+), 33 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] btrfs: fix typo in btrfs_statfs()
On 04/04/16 15:45, David Sterba wrote: > On Mon, Apr 04, 2016 at 03:31:22PM +0100, Luis de Bethencourt wrote: >> Correct a typo in the chunk_mutex name to make it grepable. >> >> Since it is better to fix several typos at once, fixing the 2 more in the >> same file. >> >> Signed-off-by: Luis de Bethencourt > > Now the subject does not match the patch contents, but I can fix that so > you don't have to resend it again. > Sorry David. That was a poor decision by me of keeping the subject by considering the typo in btrfs_statfs() as the core and the other two as appended corrections. Thank you for fixing it. I understand what you mean. In an unrelated note, do you think this bug would be a good one for me to tackle? https://bugzilla.kernel.org/show_bug.cgi?id=115851 Thanks, Luis -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] btrfs: fix typo in btrfs_statfs()
On Mon, Apr 04, 2016 at 03:31:22PM +0100, Luis de Bethencourt wrote: > Correct a typo in the chunk_mutex name to make it grepable. > > Since it is better to fix several typos at once, fixing the 2 more in the > same file. > > Signed-off-by: Luis de Bethencourt Now the subject does not match the patch contents, but I can fix that so you don't have to resend it again. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] btrfs: fix typo in btrfs_statfs()
Correct a typo in the chunk_mutex name to make it grepable. Since it is better to fix several typos at once, fixing the 2 more in the same file. Signed-off-by: Luis de Bethencourt --- Hi, Sorry for sending again. Previous version had a line over 80 characters. Explanation from previous patch: David recommended I look around the rest of the file for other typos to fix. These two more are all I see in the rest of the file without nitpicking. Thanks and apologies for sending v3 without thoroughly checking, Luis fs/btrfs/super.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 7e766ffc..bc060cf 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1484,10 +1484,10 @@ static int setup_security_options(struct btrfs_fs_info *fs_info, memcpy(&fs_info->security_opts, sec_opts, sizeof(*sec_opts)); } else { /* -* Since SELinux(the only one supports security_mnt_opts) does -* NOT support changing context during remount/mount same sb, -* This must be the same or part of the same security options, -* just free it. +* Since SELinux (the only one supporting security_mnt_opts) +* does NOT support changing context during remount/mount of +* the same sb, this must be the same or part of the same +* security options, just free it. */ security_free_mnt_opts(sec_opts); } @@ -1665,8 +1665,8 @@ static inline void btrfs_remount_cleanup(struct btrfs_fs_info *fs_info, unsigned long old_opts) { /* -* We need cleanup all defragable inodes if the autodefragment is -* close or the fs is R/O. +* We need to cleanup all defragable inodes if the autodefragment is +* close or the filesystem is read only. */ if (btrfs_raw_test_opt(old_opts, AUTO_DEFRAG) && (!btrfs_raw_test_opt(fs_info->mount_opt, AUTO_DEFRAG) || @@ -2050,7 +2050,7 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) int mixed = 0; /* -* holding chunk_muext to avoid allocating new chunks, holding +* holding chunk_mutex to avoid allocating new chunks, holding * device_list_mutex to avoid the device being removed */ rcu_read_lock(); -- 2.6.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] btrfs: fix typo in btrfs_statfs()
Correct a typo in the chunk_mutex name to make it grepable. Since it is better to fix several typos at once, fixing the 2 more in the same file. Signed-off-by: Luis de Bethencourt --- Hi, David recommended I look around the rest of the file for other typos to fix. These two more are all I see in the rest of the file without nitpicking. Thanks, Luis fs/btrfs/super.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 7e766ffc..73bdfd4 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1484,9 +1484,9 @@ static int setup_security_options(struct btrfs_fs_info *fs_info, memcpy(&fs_info->security_opts, sec_opts, sizeof(*sec_opts)); } else { /* -* Since SELinux(the only one supports security_mnt_opts) does -* NOT support changing context during remount/mount same sb, -* This must be the same or part of the same security options, +* Since SELinux (the only one supporting security_mnt_opts) does +* NOT support changing context during remount/mount of the same sb, +* this must be the same or part of the same security options, * just free it. */ security_free_mnt_opts(sec_opts); @@ -1665,8 +1665,8 @@ static inline void btrfs_remount_cleanup(struct btrfs_fs_info *fs_info, unsigned long old_opts) { /* -* We need cleanup all defragable inodes if the autodefragment is -* close or the fs is R/O. +* We need to cleanup all defragable inodes if the autodefragment is +* close or the filesystem is read only. */ if (btrfs_raw_test_opt(old_opts, AUTO_DEFRAG) && (!btrfs_raw_test_opt(fs_info->mount_opt, AUTO_DEFRAG) || @@ -2050,7 +2050,7 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) int mixed = 0; /* -* holding chunk_muext to avoid allocating new chunks, holding +* holding chunk_mutex to avoid allocating new chunks, holding * device_list_mutex to avoid the device being removed */ rcu_read_lock(); -- 2.6.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] delete obsolete function btrfs_print_tree()
On Fri, Mar 25, 2016 at 03:53:17PM +0100, Holger Hoffstätte wrote: > Dan Carpenter's static checker recently found missing IS_ERR handling > in print-tree.c:btrfs_print_tree(). While looking into this I found that > this function is no longer called anywhere and was moved to btrfs-progs > long ago. It can simply be removed. I'm not sure, the function could be used for debugging, and it's hard to say if we'll ever need it. Printing the whole tree to the system log would produce a lot of text so some manual filtering would be required, the function could serve as a template. The function is not that big that it would save bytes, but putting it under the debug config would help a bit. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 01/22] btrfs-progs: convert: Introduce functions to read used space
On Fri, Jan 29, 2016 at 01:03:11PM +0800, Qu Wenruo wrote: > Before we do real convert, we need to read and build up used space cache > tree for later data/meta separate chunk layout. > > This patch will iterate all used blocks in ext2 filesystem and record it > into cctx->used cache tree, for later use. > > This provides the very basic of later btrfs-convert rework. > > Signed-off-by: Qu Wenruo > Signed-off-by: David Sterba > --- > btrfs-convert.c | 80 > + > 1 file changed, 80 insertions(+) > > diff --git a/btrfs-convert.c b/btrfs-convert.c > index 4baa68e..65841bd 100644 > --- a/btrfs-convert.c > +++ b/btrfs-convert.c > @@ -81,6 +81,7 @@ struct btrfs_convert_context; > struct btrfs_convert_operations { > const char *name; > int (*open_fs)(struct btrfs_convert_context *cctx, const char *devname); > + int (*read_used_space)(struct btrfs_convert_context *cctx); > int (*alloc_block)(struct btrfs_convert_context *cctx, u64 goal, > u64 *block_ret); > int (*alloc_block_range)(struct btrfs_convert_context *cctx, u64 goal, > @@ -230,6 +231,73 @@ fail: > return -1; > } > > +static int __ext2_add_one_block(ext2_filsys fs, char *bitmap, > + unsigned long group_nr, struct cache_tree *used) > +{ > + unsigned long offset; > + unsigned i; > + int ret = 0; > + > + offset = fs->super->s_first_data_block; > + offset /= EXT2FS_CLUSTER_RATIO(fs); This macro does not exist on my reference host for old distros. The e2fsprogs version is 1.41.14 and I'd like to keep the compatibility at least at that level. The clustering has been added in 1.42 but can we add some compatibility layer that will work on both version? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: fix typo in btrfs_statfs()
On Mon, Apr 04, 2016 at 11:13:57AM +0100, Luis de Bethencourt wrote: > Correct a typo in the chunk_mutex name. > > Signed-off-by: Luis de Bethencourt > --- > > Hi, > > I noticed this typo while fixing bug 114281 [0]. If this type of fixes are > not welcomed I could squash it into the patch for that bug. > - * holding chunk_muext to avoid allocating new chunks, holding > + * holding chunk_mutex to avoid allocating new chunks, holding In this case it's a name of a mutex, so this makes sense eg. when one is grepping for it. I'm not against fixing typos in comments in general, it's usually better to fix several at once, eg. per file, see bb7ab3b92e46da06b580c6f83abe7894dc449cca . If you find more, then please send an updated patch, I'll queue this one into cleanups and can replace it later. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: fsck: Fix a false metadata extent warning
On Fri, Apr 01, 2016 at 04:50:06PM +0800, Qu Wenruo wrote: > > After another look, why don't we use nodesize directly? Or stripesize > > where applies. With max_size == 0 the test does not make sense, we ought > > to know the alignment. > > > > > Yes, my first though is also to use nodesize directly, which should be > always correct. > > But the problem is, the related function call stack doesn't have any > member to reach btrfs_root or btrfs_fs_info. JFYI, there's global_info avalaible, so it's not necessary to pass fs_info down the callstacks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] btrfs: fix typo in btrfs_statfs()
Correct a typo in the chunk_mutex name. Signed-off-by: Luis de Bethencourt --- Hi, I noticed this typo while fixing bug 114281 [0]. Sending a second version because the first one didn't ammend cleanly after the latest changes in the 'for-next' branch. Thanks, Luis [0] https://bugzilla.kernel.org/show_bug.cgi?id=114281 fs/btrfs/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 7e766ffc..9c79337 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2050,7 +2050,7 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) int mixed = 0; /* -* holding chunk_muext to avoid allocating new chunks, holding +* holding chunk_mutex to avoid allocating new chunks, holding * device_list_mutex to avoid the device being removed */ rcu_read_lock(); -- 2.6.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: fix typo in btrfs_statfs()
Correct a typo in the chunk_mutex name. Signed-off-by: Luis de Bethencourt --- Hi, I noticed this typo while fixing bug 114281 [0]. If this type of fixes are not welcomed I could squash it into the patch for that bug. Thanks, Luis [0] https://bugzilla.kernel.org/show_bug.cgi?id=114281 fs/btrfs/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index a8e049a..86364b7 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2028,7 +2028,7 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) u64 thresh = 0; /* -* holding chunk_muext to avoid allocating new chunks, holding +* holding chunk_mutex to avoid allocating new chunks, holding * device_list_mutex to avoid the device being removed */ rcu_read_lock(); -- 2.6.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: csum failed on innexistent inode
On Mon, Apr 4, 2016 at 9:50 AM, Jérôme Poulin wrote: > Hi all, > > I have a BTRFS on disks running in RAID10 meta+data, one of the disk > has been going bad and scrub was showing 18 uncorrectable errors > (which is weird in RAID10). I tried using --repair-sector with hdparm > even if it shouldn't be necessary since BTRFS would overwrite the > sector. Repair sector fixed the sector in SMART but BTRFS was still > showing 18 uncorr. errors. > > I finally decided to give up this opportunity to test the error > correction property of BTRFS (this is a home system, backed up) and > installed a brand new disk in the machine. After running btrfs > replace, everything was fine, I decided to run btrfs scrub again and I > still have the same 18 uncorrectable errors. You might want this patch: http://www.spinics.net/lists/linux-btrfs/msg53552.html As workaround, you can reset the counters on new/healty device with: btrfs device stats [-z] | > Later on, since I had a new disk with more space, I decided to run a > balance to free up the new space but the balance has stopped with csum > errors too. Here are the output of multiple programs. > > How is it possible to get rid of the referenced csum errors if they do > not exist? Also, the expected checksum looks suspiciously the same for > multiple errors. Could it be bad RAM in that case? Can I convince > BTRFS to update the csum? > > # btrfs inspect-internal logical-resolve -v 1809149952 /mnt/btrfs/ > ioctl ret=-1, error: No such file or directory > # btrfs inspect-internal inode-resolve -v 296 /mnt/btrfs/ > ioctl ret=-1, error: No such file or directory > > > dmesg after first bad sector: > avr 01 18:29:52 p4.i.ticpu.net kernel: BTRFS info (device dm-43): read > error corrected: ino 1 off 655368716288 (dev /dev/dm-42 sector > 2939136) > avr 01 18:29:52 p4.i.ticpu.net kernel: BTRFS info (device dm-43): read > error corrected: ino 1 off 655368720384 (dev /dev/dm-42 sector > 2939144) > avr 01 18:29:52 p4.i.ticpu.net kernel: BTRFS info (device dm-43): read > error corrected: ino 1 off 655368724480 (dev /dev/dm-42 sector > 2939152) > avr 01 18:29:52 p4.i.ticpu.net kernel: BTRFS info (device dm-43): read > error corrected: ino 1 off 655368728576 (dev /dev/dm-42 sector > 2939160) > > dmesg after balance: > [1738474.444648] BTRFS warning (device dm-40): csum failed ino 296 off > 1809195008 csum 1515428513 expected csum 2566472073 > [1738474.444649] BTRFS warning (device dm-40): csum failed ino 296 off > 1809084416 csum 4147641019 expected csum 1755301217 > [1738474.444702] BTRFS warning (device dm-40): csum failed ino 296 off > 1809199104 csum 1927504681 expected csum 2566472073 > [1738474.444717] BTRFS warning (device dm-40): csum failed ino 296 off > 1809211392 csum 3086571080 expected csum 2566472073 > [1738474.444917] BTRFS warning (device dm-40): csum failed ino 296 off > 1809084416 csum 4147641019 expected csum 1755301217 > [1738474.444962] BTRFS warning (device dm-40): csum failed ino 296 off > 1809195008 csum 1515428513 expected csum 2566472073 > [1738474.444998] BTRFS warning (device dm-40): csum failed ino 296 off > 1809199104 csum 1927504681 expected csum 2566472073 > [1738474.445034] BTRFS warning (device dm-40): csum failed ino 296 off > 1809211392 csum 3086571080 expected csum 2566472073 > [1738474.473286] BTRFS warning (device dm-40): csum failed ino 296 off > 1809149952 csum 3254083717 expected csum 2566472073 > [1738474.473357] BTRFS warning (device dm-40): csum failed ino 296 off > 1809162240 csum 3157020538 expected csum 2566472073 > > btrfs check: > ./btrfs check /dev/mapper/luksbtrfsdata2 > Checking filesystem on /dev/mapper/luksbtrfsdata2 > UUID: 805f6ad7-1188-448d-aee4-8ddeeb70c8a7 > checking extents > bad metadata [1453741768704, 1453741785088) crossing stripe boundary > bad metadata [1454487764992, 1454487781376) crossing stripe boundary > bad metadata [1454828552192, 1454828568576) crossing stripe boundary > bad metadata [1454879735808, 1454879752192) crossing stripe boundary > bad metadata [1455087222784, 1455087239168) crossing stripe boundary > bad metadata [1456269426688, 1456269443072) crossing stripe boundary > bad metadata [1456273227776, 1456273244160) crossing stripe boundary > bad metadata [1456404234240, 1456404250624) crossing stripe boundary > bad metadata [1456418914304, 1456418930688) crossing stripe boundary Those are false alerts; This patch handles that: https://patchwork.kernel.org/patch/8706891/ > checking free space cache > checking fs roots > checking csums > checking root refs > found 689292505473 bytes used err is 0 > total csum bytes: 660112536 > total tree bytes: 1764098048 > total fs tree bytes: 961921024 > total extent tree bytes: 79331328 > btree space waste bytes: 232774315 > file data blocks allocated: 4148513517568 > referenced 972284129280 > > btrfs scrub: > I don't have the output handy but the dmesg output were pairs of > logical blocks like balance and no errors were corrected. > -- > To unsubscribe from this l
csum failed on innexistent inode
Hi all, I have a BTRFS on disks running in RAID10 meta+data, one of the disk has been going bad and scrub was showing 18 uncorrectable errors (which is weird in RAID10). I tried using --repair-sector with hdparm even if it shouldn't be necessary since BTRFS would overwrite the sector. Repair sector fixed the sector in SMART but BTRFS was still showing 18 uncorr. errors. I finally decided to give up this opportunity to test the error correction property of BTRFS (this is a home system, backed up) and installed a brand new disk in the machine. After running btrfs replace, everything was fine, I decided to run btrfs scrub again and I still have the same 18 uncorrectable errors. Later on, since I had a new disk with more space, I decided to run a balance to free up the new space but the balance has stopped with csum errors too. Here are the output of multiple programs. How is it possible to get rid of the referenced csum errors if they do not exist? Also, the expected checksum looks suspiciously the same for multiple errors. Could it be bad RAM in that case? Can I convince BTRFS to update the csum? # btrfs inspect-internal logical-resolve -v 1809149952 /mnt/btrfs/ ioctl ret=-1, error: No such file or directory # btrfs inspect-internal inode-resolve -v 296 /mnt/btrfs/ ioctl ret=-1, error: No such file or directory dmesg after first bad sector: avr 01 18:29:52 p4.i.ticpu.net kernel: BTRFS info (device dm-43): read error corrected: ino 1 off 655368716288 (dev /dev/dm-42 sector 2939136) avr 01 18:29:52 p4.i.ticpu.net kernel: BTRFS info (device dm-43): read error corrected: ino 1 off 655368720384 (dev /dev/dm-42 sector 2939144) avr 01 18:29:52 p4.i.ticpu.net kernel: BTRFS info (device dm-43): read error corrected: ino 1 off 655368724480 (dev /dev/dm-42 sector 2939152) avr 01 18:29:52 p4.i.ticpu.net kernel: BTRFS info (device dm-43): read error corrected: ino 1 off 655368728576 (dev /dev/dm-42 sector 2939160) dmesg after balance: [1738474.444648] BTRFS warning (device dm-40): csum failed ino 296 off 1809195008 csum 1515428513 expected csum 2566472073 [1738474.444649] BTRFS warning (device dm-40): csum failed ino 296 off 1809084416 csum 4147641019 expected csum 1755301217 [1738474.444702] BTRFS warning (device dm-40): csum failed ino 296 off 1809199104 csum 1927504681 expected csum 2566472073 [1738474.444717] BTRFS warning (device dm-40): csum failed ino 296 off 1809211392 csum 3086571080 expected csum 2566472073 [1738474.444917] BTRFS warning (device dm-40): csum failed ino 296 off 1809084416 csum 4147641019 expected csum 1755301217 [1738474.444962] BTRFS warning (device dm-40): csum failed ino 296 off 1809195008 csum 1515428513 expected csum 2566472073 [1738474.444998] BTRFS warning (device dm-40): csum failed ino 296 off 1809199104 csum 1927504681 expected csum 2566472073 [1738474.445034] BTRFS warning (device dm-40): csum failed ino 296 off 1809211392 csum 3086571080 expected csum 2566472073 [1738474.473286] BTRFS warning (device dm-40): csum failed ino 296 off 1809149952 csum 3254083717 expected csum 2566472073 [1738474.473357] BTRFS warning (device dm-40): csum failed ino 296 off 1809162240 csum 3157020538 expected csum 2566472073 btrfs check: ./btrfs check /dev/mapper/luksbtrfsdata2 Checking filesystem on /dev/mapper/luksbtrfsdata2 UUID: 805f6ad7-1188-448d-aee4-8ddeeb70c8a7 checking extents bad metadata [1453741768704, 1453741785088) crossing stripe boundary bad metadata [1454487764992, 1454487781376) crossing stripe boundary bad metadata [1454828552192, 1454828568576) crossing stripe boundary bad metadata [1454879735808, 1454879752192) crossing stripe boundary bad metadata [1455087222784, 1455087239168) crossing stripe boundary bad metadata [1456269426688, 1456269443072) crossing stripe boundary bad metadata [1456273227776, 1456273244160) crossing stripe boundary bad metadata [1456404234240, 1456404250624) crossing stripe boundary bad metadata [1456418914304, 1456418930688) crossing stripe boundary checking free space cache checking fs roots checking csums checking root refs found 689292505473 bytes used err is 0 total csum bytes: 660112536 total tree bytes: 1764098048 total fs tree bytes: 961921024 total extent tree bytes: 79331328 btree space waste bytes: 232774315 file data blocks allocated: 4148513517568 referenced 972284129280 btrfs scrub: I don't have the output handy but the dmesg output were pairs of logical blocks like balance and no errors were corrected. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html