Re: shall distros run btrfsck on boot?
Am Mittwoch, 25. November 2015, 07:32:34 CET schrieb Austin S Hemmelgarn: > On 2015-11-24 17:26, Eric Sandeen wrote: > > On 11/24/15 2:38 PM, Austin S Hemmelgarn wrote: > >> if the system was > >> shut down cleanly, you're fine barring software bugs, but if it > >> crashed, you should be running a check on the FS. > > > > Um, no... > > > > The *entire point* of having a journaling filesystem is that after a > > crash or power loss, a journal replay on next mount will bring the > > metadata into a consistent state. > > OK, first, that was in reference to BTRFS, not ext4, and BTRFS is a COW > filesystem, not a journaling one, which is an important distinction as > mentioned by Hugo in his reply. Second, there are two reasons that you > should be running a check even of a journaled filesystem when the system > crashes (this also applies to COW filesystems, and anything else that > relies on atomicity of write operations for consistency): > > 1. Disks don't atomically write anything bigger than a sector, and may > not even atomically write the sector itself. This means that it's > possible to get a partial write to the journal, which in turn has > significant potential to put the metadata in an inconsistent state when > the journal gets replayed (IIRC, ext4 has a journal_checksum mount > option that is supposed to mitigate this possibility). This sounds like > something that shouldn't happen all that often, but on a busy > filesystem, the probability is exactly proportionate to the size of the > journal relative to the size of the FS. > > 2. If the system crashed, all code running on it immediately before the > crash is instantly suspect, and you have no way to know for certain that > something didn't cause random garbage to be written to the disk. On top > of this, hardware is potentially suspect, and when your hardware is > misbehaving, then all bets as to consistency are immediately off. In the case of shaky hardware a fsck run can report bogus data, i.e. problems where they are none or vice versa. If I suspect defect memory or controller I would check the device on different hardware only. Especially on attempts to repair any possible issues. -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check help
I should probably point out that there is 64GB of RAM on this machine and it’s a dual Xeon processor (LGA2011-3) system. Also, there is only Btrfs served via Samba and the kernel panic was caused Btrfs (as per what I remember from the log on the screen just before I rebooted) and happened in the middle of the night when zero (0) client was connected. You will find below the full “btrfs check” log for each device in the order it is listed by “btrfs fi show”. Ca I get a strong confirmation that I should run with the “—repair” option on each device? Thanks. Vincent Checking filesystem on /dev/sdk UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents [o] checking free space cache [.] root 5 inode 1341670 errors 400, nbytes wrong root 11406 inode 1341670 errors 400, nbytes wrong found 19328980191604 bytes used err is 1 total csum bytes: 18849205856 total tree bytes: 27393392640 total fs tree bytes: 4452958208 total extent tree bytes: 3075571712 btree space waste bytes: 2881050910 file data blocks allocated: 19445786390528 referenced 20138885959680 Checking filesystem on /dev/sdp UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents [O] checking free space cache [o] root 5 inode 1341670 errors 400, nbytes wrong root 11406 inode 1341670 errors 400, nbytes wrong found 19328980191604 bytes used err is 1 total csum bytes: 18849205856 total tree bytes: 27393392640 total fs tree bytes: 4452958208 total extent tree bytes: 3075571712 btree space waste bytes: 2881050910 file data blocks allocated: 19445786390528 referenced 20138885959680 Checking filesystem on /dev/sdi UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents [.] checking free space cache [.] root 5 inode 1341670 errors 400, nbytes wrong root 11406 inode 1341670 errors 400, nbytes wrong found 19328980191604 bytes used err is 1 total csum bytes: 18849205856 total tree bytes: 27393392640 total fs tree bytes: 4452958208 total extent tree bytes: 3075571712 btree space waste bytes: 2881050910 file data blocks allocated: 19445786390528 referenced 20138885959680 Checking filesystem on /dev/sdq UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents [o] checking free space cache [o] root 5 inode 1341670 errors 400, nbytes wrong root 11406 inode 1341670 errors 400, nbytes wrong found 19328980191604 bytes used err is 1 total csum bytes: 18849205856 total tree bytes: 27393392640 total fs tree bytes: 4452958208 total extent tree bytes: 3075571712 btree space waste bytes: 2881050910 file data blocks allocated: 19445786390528 referenced 20138885959680 Checking filesystem on /dev/sdh UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents [o] checking free space cache [.] root 5 inode 1341670 errors 400, nbytes wrong root 11406 inode 1341670 errors 400, nbytes wrong found 19328980191604 bytes used err is 1 total csum bytes: 18849205856 total tree bytes: 27393392640 total fs tree bytes: 4452958208 total extent tree bytes: 3075571712 btree space waste bytes: 2881050910 file data blocks allocated: 19445786390528 referenced 20138885959680 Checking filesystem on /dev/sdm UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents [O] checking free space cache [.] root 5 inode 1341670 errors 400, nbytes wrong root 11406 inode 1341670 errors 400, nbytes wrong found 19328980191604 bytes used err is 1 total csum bytes: 18849205856 total tree bytes: 27393392640 total fs tree bytes: 4452958208 total extent tree bytes: 3075571712 btree space waste bytes: 2881050910 file data blocks allocated: 19445786390528 referenced 20138885959680 Checking filesystem on /dev/sdj UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents [.] checking free space cache [.] root 5 inode 1341670 errors 400, nbytes wrong root 11406 inode 1341670 errors 400, nbytes wrong found 19328980191604 bytes used err is 1 total csum bytes: 18849205856 total tree bytes: 27393392640 total fs tree bytes: 4452958208 total extent tree bytes: 3075571712 btree space waste bytes: 2881050910 file data blocks allocated: 19445786390528 referenced 20138885959680 Checking filesystem on /dev/sdo UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents [O] checking free space cache [.] checking fs roots [o] root 5 inode 1341670 errors 400, nbytes wrong root 11406 inode 1341670 errors 400, nbytes wrong found 19328980191604 bytes used err is 1 total csum bytes: 18849205856 total tree bytes: 27393392640 total fs tree bytes: 4452958208 total extent tree bytes: 3075571712 btree space waste bytes: 2881050910 file data blocks allocated: 19445786390528 referenced 20138885959680 Checking filesystem on /dev/sdg UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents [o] checking free space cache [o] root 5 inode 1341670 errors 400, nbytes wrong root 11406 inode 1341670 errors 400, nbytes wrong found 19328980191604 bytes used err is 1 total csum bytes: 18849205856 total tree bytes: 27393392640 total fs tree bytes: 4452958208 total extent tree bytes: 3075571712 btree space waste bytes:
Re: [4.3-rc4] scrubbing aborts before finishing
Am Samstag, 31. Oktober 2015, 12:10:37 CET schrieb Martin Steigerwald: > Am Donnerstag, 22. Oktober 2015, 10:41:15 CET schrieb Martin Steigerwald: > > I get this: > > > > merkaba:~> btrfs scrub status -d / > > scrub status for […] > > scrub device /dev/mapper/sata-debian (id 1) history > > > > scrub started at Thu Oct 22 10:05:49 2015 and was aborted after > > 00:00:00 > > total bytes scrubbed: 0.00B with 0 errors > > > > scrub device /dev/dm-2 (id 2) history > > > > scrub started at Thu Oct 22 10:05:49 2015 and was aborted after > > 00:01:30 > > total bytes scrubbed: 23.81GiB with 0 errors > > > > For / scrub aborts for sata SSD immediately. > > > > For /home scrub aborts for both SSDs at some time. > > > > merkaba:~> btrfs scrub status -d /home > > scrub status for […] > > scrub device /dev/mapper/msata-home (id 1) history > > > > scrub started at Thu Oct 22 10:09:37 2015 and was aborted after > > 00:01:31 > > total bytes scrubbed: 22.03GiB with 0 errors > > > > scrub device /dev/dm-3 (id 2) history > > > > scrub started at Thu Oct 22 10:09:37 2015 and was aborted after > > 00:03:34 > > total bytes scrubbed: 53.30GiB with 0 errors > > > > Also single volume BTRFS is affected: > > > > merkaba:~> btrfs scrub status /daten > > scrub status for […] > > > > scrub started at Thu Oct 22 10:36:38 2015 and was aborted after > > 00:00:00 > > total bytes scrubbed: 0.00B with 0 errors > > > > No errors in dmesg, btrfs device stat or smartctl -a. > > > > Any known issue? > > I am still seeing this in 4.3-rc7. It happens so that on one SSD BTRFS > doesn´t even start scrubbing. But in the end it aborts it scrubbing anyway. > > I do not see any other issue so far. But I would really like to be able to > scrub my BTRFS filesystems completely again. Any hints? Any further > information needed? > > merkaba:~> btrfs scrub status -d / > scrub status for […] > scrub device /dev/dm-5 (id 1) history > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00 > total bytes scrubbed: 0.00B with 0 errors > scrub device /dev/mapper/msata-debian (id 2) status > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:20 > total bytes scrubbed: 5.27GiB with 0 errors > merkaba:~> btrfs scrub status -d / > scrub status for […] > scrub device /dev/dm-5 (id 1) history > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00 > total bytes scrubbed: 0.00B with 0 errors > scrub device /dev/mapper/msata-debian (id 2) status > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:25 > total bytes scrubbed: 6.59GiB with 0 errors > merkaba:~> btrfs scrub status -d / > scrub status for […] > scrub device /dev/dm-5 (id 1) history > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00 > total bytes scrubbed: 0.00B with 0 errors > scrub device /dev/mapper/msata-debian (id 2) status > scrub started at Sat Oct 31 11:58:45 2015, running for 00:01:25 > total bytes scrubbed: 21.97GiB with 0 errors > merkaba:~> btrfs scrub status -d / > scrub status for […] > scrub device /dev/dm-5 (id 1) history > scrub started at Sat Oct 31 11:58:45 2015 and was aborted after > 00:00:00 total bytes scrubbed: 0.00B with 0 errors > scrub device /dev/mapper/msata-debian (id 2) history > scrub started at Sat Oct 31 11:58:45 2015 and was aborted after > 00:01:32 total bytes scrubbed: 23.63GiB with 0 errors > > > For the sake of it I am going to btrfs check one of the filesystem where > BTRFS aborts scrubbing (which is all of the laptop filesystems, not only > the RAID 1 one). > > I will use the /daten filesystem as I can unmount it during laptop runtime > easily. There scrubbing aborts immediately: > > merkaba:~> btrfs scrub start /daten > scrub started on /daten, fsid […] (pid=13861) > merkaba:~> btrfs scrub status /daten > scrub status for […] > scrub started at Sat Oct 31 12:04:25 2015 and was aborted after > 00:00:00 total bytes scrubbed: 0.00B with 0 errors > > It is single device: > > merkaba:~> btrfs fi sh /daten > Label: 'daten' uuid: […] > Total devices 1 FS bytes used 227.23GiB > devid1 size 230.00GiB used 230.00GiB path > /dev/mapper/msata-daten > > btrfs-progs v4.2.2 > merkaba:~> btrfs fi df /daten > Data, single: total=228.99GiB, used=226.79GiB > System, single: total=4.00MiB, used=48.00KiB > Metadata, single: total=1.01GiB, used=449.50MiB > GlobalReserve, single: total=160.00MiB, used=0.00B > > > I do not see any output in btrfs check that points to any issue: > > merkaba:~> btrfs check /dev/msata/daten > Checking filesystem on /dev/msata/daten > UUID: 7918274f-e2ec-4983-bbb0-aa93ef95fcf7 > checking extents > checking free space cache > checking fs roots > checking csums > checking root refs > found 243936530607 bytes used err is 0 > total
Re: [auto-]defrag, nodatacow - general suggestions?(was: btrfs: poor performance on deleting many large files?)
On Thu, Nov 26, 2015 at 01:23:59AM +0100, Christoph Anton Mitterer wrote: > 2) Why does notdatacow imply nodatasum and can that ever be decoupled? Answering the second part first, no, it can't. The issue is that nodatacow bypasses the transactional nature of the FS, making changes to live data immediately. This then means that if you modify a modatacow file, the csum for that modified section is out of date, and won't be back in sync again until the latest transaction is committed. So you can end up with an inconsistent filesystem if there's a crash between the two events. > For me the checksumming is actually the most important part of btrfs > (not that I wouldn't like its other features as well)... so turning it > off is something I really would want to avoid. > > Plus it opens questions like: When there are no checksums, how can it > (in the RAID cases) decide which block is the good one in case of > corruptions? It doesn't decide -- both copies look equally good, because there's no checksum, so if you read the data, the FS will return whatever data was on the copy it happened to pick. > 3) When I would actually disable datacow for e.g. a subvolume that > holds VMs or DBs... what are all the implications? > Obviously no checksumming, but what happens if I snapshot such a > subvolume or if I send/receive it? After snapshotting, modifications are CoWed precisely once, and then it reverts to nodatacow again. This means that making a snapshot of a nodatacow object will cause it to fragment as writes are made to it. > I'd expect that then some kind of CoW needs to take place or does that > simply not work? > > > 4) Duncan mentioned that defrag (and I guess that's also for auto- > defrag) isn't ref-link aware... > Isn't that somehow a complete showstopper? It is, but the one attempt at dealing with it caused massive data corruption, and it was turned off again. autodefrag, however, has always been snapshot aware and snapshot safe, and would be the recommended approach here. (Actually, it was broken in the same incident I just described -- but fixed again when the broken patches were reverted). > As soon as one uses snapshot, and would defrag or auto defrag any of > them, space usage would just explode, perhaps to the extent of ENOSPC, > and rendering the fs effectively useless. > > That sounds to me like, either I can't use ref-links, which are crucial > not only to snapshots but every file I copy with cp --reflink auto ... > or I can't defrag... which however will sooner or later cause quite > some fragmentation issues on btrfs? > > > 5) Especially keeping (4) in mind but also the other comments in from > Duncan and Austin... > Is auto-defrag now recommended to be generally used? Absolutely, yes. It's late for me, and this email was longer than I suspected, so I'm going to stop here, but I'll try to pick it up again and answer your other questions tomorrow. Hugo. > Are both auto-defrag and defrag considered stable to be used? Or are > there other implications, like when I use compression > > > 6) Does defragmentation work with compression? Or is it just filefrag > which can't cope with it? > > Any other combinations or things with the typicaly btrfs technologies > (cow/nowcow, compression, snapshots, subvols, compressions, defrag, > balance) that one can do but which lead to unexpected problems (I, for > example, wouldn't have expected that defragmentation isn't ref-link > aware... still kinda shocked ;) ) > > For example, when I do a balance and change the compression, and I have > multiple snaphots or files within one subvol that share their blocks... > would that also lead to copies being made and the space growing > possibly dramatically? > > > 7) How das free-space defragmentation happen (or is there even such a > thing)? > For example, when I have my big qemu images, *not* using nodatacow, and > I copy the image e.g. with qemu-img old.img new.img ... and delete the > old then. > Then I'd expect that the new.img is more or less not fragmented,... but > will my free space (from the removed old.img) still be completely > messed up sooner or later driving me into problems? > > > 8) why does a balance not also defragment? Since everything is anyway > copied... why not defragmenting it? > I somehow would have hoped that a balance cleans up all kinds of > things,... like free space issues and also fragmentation. > > > Given all these issues,... fragmentation, situations in which space may > grow dramatically where the end-user/admin may not necessarily expect > it (e.g. the defrag or the balance+compression case?)... btrfs seem to > require much more in-depth knowledge and especially care (that even > depends on the type of data) on the end-user/admin side than the > traditional filesystems. > Are there for example any general recommendations what to regularly to > do keep the fs in a clean and proper shape (and I don't count "start > with a fresh one and
Re: [PATCH 00/25] Btrfs-convert rework to support native separate
David Sterba wrote on 2015/11/25 13:42 +0100: On Tue, Nov 24, 2015 at 04:50:00PM +0800, Qu Wenruo wrote: It seems the conflict is quite huge, your reiserfs support is based on the old behavior, just like what old ext2 one do: custom extent allocation. I'm afraid the rebase will take a lot of time since I'm completely a newbie about reiserfs... :( Yeah, the ext2 callbacks are abstracted and replaced by reiserfs implementations, and the abstratction is quite direct. This might be a problem with merging your patchset. The abstraction is better than I expected, and should be quite handle to use. Although a lot of my codes will be changed to use it. I may need to change a lot of ext2 direct call to generic one, and may even change the generic function calls.(no alloc/free, only free space lookup) And some (maybe a lot) of reiserfs codes may be removed during the rework. As far as the conversion support stays, it's not a problem of course. I don't have a complete picture of all the actual merging conflicts, but the idea is to provide the callback abstraction v2 to allow ext2 and reiser plus allow all the changes of this pathcset. Glad to hear that. BTW, which reiserfs progs headers are you using? It seems that the headers you are using is quite different from what my distribution is providing, and this makes compile impossible. For example, in my /usr/include/reiserfs, there is no io.h, no reiserfs.h. No structure named reseifs_key, but only key. Not sure if it is my progsreiserfs is too old or whatever other reason. What progsreiserfs are you using? Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check help
[...] > Ca I get a strong confirmation that I should run with the “—repair” option on > each device? Thanks. > > Vincent > > > Checking filesystem on /dev/sdk > UUID: 6a742786-070d-4557-9e67-c73b84967bf5 > checking extents [o] > checking free space cache [.] > root 5 inode 1341670 errors 400, nbytes wrong > root 11406 inode 1341670 errors 400, nbytes wrong [...] I just remember that I have seen this kind of error before; luckily, I found the btrfs check output (august 2015) on some backup of an old snapshot; In my case it was on a raid5 fs from november 2013, 7 small txt files (all several 100 bytes) and the 7 errors are repeated for about 10 snapshots. I did # find . -inum to find the files. 2 of the 7 were still in the latest/actual subvol and I just recreated them. The errors from the older snapshots are still there as far as I remember from the last btrfs check I did (with kernel 4.3.0 tools 4.3.x). The fs is converted to raid10 since 3 months. As I also got other fake errors (as in this https://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg48325.html ), I won't run a repair until I see proof that this 'errors 400, nbytes wrong' is a risk for file-server stability. I just see that on an archive clone fs with this 10 old snapshots (created via send|receive), there is no error. In your case, it is likely just 1 small file in rootvol (5) and the same allocation in other subvol (11406), so maybe you can fix this like I did and don't run a '--repair' -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V1.1 2/7] btrfs-progs: add kernel alias for each of the features in the list
We should have maintained feature's name same across progs UI and sysfs UI. For example, progs mixed-bg is /sys/fs/btrfs/features/mixed_groups in sysfs. As these are already released and is UIs, there is nothing much can be done about it, except for creating the alias and making it aware. Add kernel alias for each of the features in the list. eg: The string with in () is the sysfs name for the same feaure mkfs.btrfs -O list-all Filesystem features available: mixed-bg (mixed_groups) - mixed data and metadata block groups (0x4, 2.7.37) extref (extended_iref)- increased hardlink limit per file to 65536 (0x40, 3.7, default) raid56 (raid56) - raid56 extended format (0x80, 3.9) skinny-metadata (skinny_metadata) - reduced-size metadata extent refs (0x100, 3.10, default) no-holes (no_holes) - no explicit hole extents for files (0x200, 3.14) btrfs-convert -O list-all Filesystem features available: extref (extended_iref)- increased hardlink limit per file to 65536 (0x40, 3.7, default) skinny-metadata (skinny_metadata) - reduced-size metadata extent refs (0x100, 3.10, default) no-holes (no_holes) - no explicit hole extents for files (0x200, 3.14) Signed-off-by: Anand Jain--- V1.1 add signed-off-by utils.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/utils.c b/utils.c index 0163915..6d2675d 100644 --- a/utils.c +++ b/utils.c @@ -648,17 +648,26 @@ void btrfs_process_fs_features(u64 flags) void btrfs_list_all_fs_features(u64 mask_disallowed) { int i; + u64 feature_per_sysfs; + + btrfs_features_allowed_by_sysfs(_per_sysfs); fprintf(stderr, "Filesystem features available:\n"); for (i = 0; i < ARRAY_SIZE(mkfs_features) - 1; i++) { char *is_default = ""; + char name[256]; if (mkfs_features[i].flag & mask_disallowed) continue; if (mkfs_features[i].flag & BTRFS_MKFS_DEFAULT_FEATURES) is_default = ", default"; - fprintf(stderr, "%-20s- %s (0x%llx, %s%s)\n", - mkfs_features[i].name, + if (mkfs_features[i].flag & feature_per_sysfs) + sprintf(name, "%s (%s)", + mkfs_features[i].name, mkfs_features[i].name_ker); + else + sprintf(name, "%s", mkfs_features[i].name); + fprintf(stderr, "%-34s- %s (0x%llx, %s%s)\n", + name, mkfs_features[i].desc, mkfs_features[i].flag, mkfs_features[i].min_ker_ver, -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH] Btrfs: improve performance on dbench
Kent Overstreet posted some dbench test numbers in the announcement of bcachefs[1], in which btrfs's performance is much worse than that of ext4 and xfs, especially in the case of multiple threads. This difference can be observed on fast storage, I ran 'dbench -t10 64' with 1.6T NVMe disk, Processor: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz Memory:504G I took time to dig it a bit, perf shows that in the case of multiple threads we spend most of cpu cycles on spin_lock_irqsave() and spin_unlock_irqrestore() pair, which is called by wait_event() in btree locking. 72.84%72.84% dbench [kernel.vmlinux][k] native_queued_spin_lock_slowpath | ---native_queued_spin_lock_slowpath | |--71.64%-- _raw_spin_lock_irqsave | | | |--52.17%-- prepare_to_wait_event | | | | | |--94.33%-- btrfs_tree_lock | | | | | | | |--99.10%-- btrfs_lock_root_node | | | | btrfs_search_slot | | | | | | | | | |--26.44%-- btrfs_lookup_dir_item | | | | | | | | | | | |--99.31%-- __btrfs_unlink_inode Having serious contention on btree lock can also be proved by another fact, that is, if you use subvolume instead of directory for each dbench client, the test number in the case of multiple threads will be considerably nice, for 64 clients, Throughput 5904.71 MB/sec 64 clients 64 procs max_latency=816.715 ms I did a few things to avoid waiting for blocking writers and readers: 1) Use path->leave_spinning=1 as much as possible, this leaves us holding spinning lock after searching btree. 2) Find out the cases that we don't have to take blocking lock, for example, we don't need blocking lock when parent node has more than 1/4 items it can hold. 3) Avoid unnecessary "goto again", eg. on btree root level, just update write_lock_level if we already hold BTRFS_WRITE_LOCK. 4) Remove btrfs_set_path_blocking() in btrfs_clear_path_blocking(), this contributes to a large part of improved numbers, this function is introduced to avoid lockdep warning, but after I turned lockdep on, xfstests didn't report about such a warning. 5) Make btrfs_search_forward to use non-sleepable function to find eb, this fixes a deadlock with previous changes. Here is the end results for 64 clients, with vanilla 4.2, btrfs runs 15x faster but with higher latency. tput(MB/sec) max_latency(ms) xfs 2742.9321.855 ext47182.9219.053 btrfs+subvol w/o5904.71 816.715 btrfs+dir w/o122.778 718.674 *btrfs+dir w1715.77 1366.981 I've marked it as RFC since I'm not confident on the lockdep part. Any comments are welcome! [1]: https://lkml.org/lkml/2015/8/21/22 Signed-off-by: Liu Bo--- fs/btrfs/ctree.c | 79 +++- fs/btrfs/dir-item.c | 1 + fs/btrfs/file-item.c | 3 +- fs/btrfs/file.c | 7 - fs/btrfs/inode-map.c | 2 ++ fs/btrfs/inode.c | 3 ++ fs/btrfs/orphan.c| 2 ++ fs/btrfs/root-tree.c | 1 + fs/btrfs/xattr.c | 2 ++ 9 files changed, 84 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 5b8e235..a27dbae 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -87,7 +87,6 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path *p, else if (held_rw == BTRFS_READ_LOCK) held_rw = BTRFS_READ_LOCK_BLOCKING; } - btrfs_set_path_blocking(p); for (i = BTRFS_MAX_LEVEL - 1; i >= 0; i--) { if (p->nodes[i] && p->locks[i]) { @@ -2536,8 +2535,16 @@ setup_nodes_for_search(struct btrfs_trans_handle *trans, if (*write_lock_level < level + 1) { *write_lock_level = level + 1; - btrfs_release_path(p); - goto again; + + ASSERT(p->locks[level] == BTRFS_WRITE_LOCK || + p->locks[level] == BTRFS_READ_LOCK); + + /* if it's not root node or the lock is not WRTIE_LOCK */ + if ((level < BTRFS_MAX_LEVEL - 1 && p->nodes[level + 1]) || + p->locks[level] != BTRFS_WRITE_LOCK) { + btrfs_release_path(p); + goto again; + } }
Re: [PATCH 2/7] btrfs-progs: add kernel alias for each of the features in the list
Liu Bo wrote: On Wed, Nov 25, 2015 at 08:08:15PM +0800, Anand Jain wrote: We should have maintained feature's name same across progs UI and sysfs UI. For example, progs mixed-bg is /sys/fs/btrfs/features/mixed_groups in sysfs. As these are already released and is UIs, there is nothing much can be done about it, except for creating the alias and making it aware. Add kernel alias for each of the features in the list. eg: The string with in () is the sysfs name for the same feaure mkfs.btrfs -O list-all Filesystem features available: mixed-bg (mixed_groups) - mixed data and metadata block groups (0x4, 2.7.37) extref (extended_iref)- increased hardlink limit per file to 65536 (0x40, 3.7, default) raid56 (raid56) - raid56 extended format (0x80, 3.9) skinny-metadata (skinny_metadata) - reduced-size metadata extent refs (0x100, 3.10, default) no-holes (no_holes) - no explicit hole extents for files (0x200, 3.14) btrfs-convert -O list-all Filesystem features available: extref (extended_iref)- increased hardlink limit per file to 65536 (0x40, 3.7, default) skinny-metadata (skinny_metadata) - reduced-size metadata extent refs (0x100, 3.10, default) no-holes (no_holes) - no explicit hole extents for files (0x200, 3.14) You miss a signed-off-by here. oh no. thanks for the catch. Thanks, -liubo --- utils.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/utils.c b/utils.c index 0163915..6d2675d 100644 --- a/utils.c +++ b/utils.c @@ -648,17 +648,26 @@ void btrfs_process_fs_features(u64 flags) void btrfs_list_all_fs_features(u64 mask_disallowed) { int i; + u64 feature_per_sysfs; + + btrfs_features_allowed_by_sysfs(_per_sysfs); fprintf(stderr, "Filesystem features available:\n"); for (i = 0; i < ARRAY_SIZE(mkfs_features) - 1; i++) { char *is_default = ""; + char name[256]; if (mkfs_features[i].flag & mask_disallowed) continue; if (mkfs_features[i].flag & BTRFS_MKFS_DEFAULT_FEATURES) is_default = ", default"; - fprintf(stderr, "%-20s- %s (0x%llx, %s%s)\n", - mkfs_features[i].name, + if (mkfs_features[i].flag & feature_per_sysfs) + sprintf(name, "%s (%s)", + mkfs_features[i].name, mkfs_features[i].name_ker); + else + sprintf(name, "%s", mkfs_features[i].name); + fprintf(stderr, "%-34s- %s (0x%llx, %s%s)\n", + name, mkfs_features[i].desc, mkfs_features[i].flag, mkfs_features[i].min_ker_ver, -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [auto-]defrag, nodatacow - general suggestions?(was: btrfs: poor performance on deleting many large files?)
Hey. I've worried before about the topics Mitch has raised. Some questions. 1) AFAIU, the fragmentation problem exists especially for those files that see many random writes, especially, but not limited to, big files. Now that databases and VMs are affected by this, is probably broadly known in the meantime (well at least by people on that list). But I'd guess there are n other cases where such IO patterns can happen which one simply never notices, while the btrfs continues to degrade. So is there any general approach towards this? And what are the actual possible consequences? Is it just that fs gets slower (due to the fragmentation) or may I even run into other issues to the point the space is eaten up or the fs becomes basically unusable? This is especially important for me, because for some VMs and even DBs I wouldn't want to use nodatacow, because I want to have the checksumming. (i.e. those cases where data integrity is much more important than security) 2) Why does notdatacow imply nodatasum and can that ever be decoupled? For me the checksumming is actually the most important part of btrfs (not that I wouldn't like its other features as well)... so turning it off is something I really would want to avoid. Plus it opens questions like: When there are no checksums, how can it (in the RAID cases) decide which block is the good one in case of corruptions? 3) When I would actually disable datacow for e.g. a subvolume that holds VMs or DBs... what are all the implications? Obviously no checksumming, but what happens if I snapshot such a subvolume or if I send/receive it? I'd expect that then some kind of CoW needs to take place or does that simply not work? 4) Duncan mentioned that defrag (and I guess that's also for auto- defrag) isn't ref-link aware... Isn't that somehow a complete showstopper? As soon as one uses snapshot, and would defrag or auto defrag any of them, space usage would just explode, perhaps to the extent of ENOSPC, and rendering the fs effectively useless. That sounds to me like, either I can't use ref-links, which are crucial not only to snapshots but every file I copy with cp --reflink auto ... or I can't defrag... which however will sooner or later cause quite some fragmentation issues on btrfs? 5) Especially keeping (4) in mind but also the other comments in from Duncan and Austin... Is auto-defrag now recommended to be generally used? Are both auto-defrag and defrag considered stable to be used? Or are there other implications, like when I use compression 6) Does defragmentation work with compression? Or is it just filefrag which can't cope with it? Any other combinations or things with the typicaly btrfs technologies (cow/nowcow, compression, snapshots, subvols, compressions, defrag, balance) that one can do but which lead to unexpected problems (I, for example, wouldn't have expected that defragmentation isn't ref-link aware... still kinda shocked ;) ) For example, when I do a balance and change the compression, and I have multiple snaphots or files within one subvol that share their blocks... would that also lead to copies being made and the space growing possibly dramatically? 7) How das free-space defragmentation happen (or is there even such a thing)? For example, when I have my big qemu images, *not* using nodatacow, and I copy the image e.g. with qemu-img old.img new.img ... and delete the old then. Then I'd expect that the new.img is more or less not fragmented,... but will my free space (from the removed old.img) still be completely messed up sooner or later driving me into problems? 8) why does a balance not also defragment? Since everything is anyway copied... why not defragmenting it? I somehow would have hoped that a balance cleans up all kinds of things,... like free space issues and also fragmentation. Given all these issues,... fragmentation, situations in which space may grow dramatically where the end-user/admin may not necessarily expect it (e.g. the defrag or the balance+compression case?)... btrfs seem to require much more in-depth knowledge and especially care (that even depends on the type of data) on the end-user/admin side than the traditional filesystems. Are there for example any general recommendations what to regularly to do keep the fs in a clean and proper shape (and I don't count "start with a fresh one and copy the data over" as a valid way). Thanks, Chris. > smime.p7s Description: S/MIME cryptographic signature
Re: 4.2.6: livelock in recovery (free_reloc_roots)?
On 11/21/2015 10:01 PM, Alexander Fougner wrote as excerpted: > This is fixed in btrfs-progs 4.3.1, that allows you to delete a > device again by the 'missing' keyword. Thanks Alexander! I just found the thread reporting the bug but not the patch with the corresponding btrfs-tools version it was merged in. Lukas -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] Let user specify the kernel version for features
Anand Jain wrote on 2015/11/25 20:08 +0800: Sometimes users may want to have a btrfs to be supported on multiple kernel version. A simple example, USB drive can be used with multiple system running different kernel versions. Or in a data center a SAN LUN could be mounted on any system with different kernel version. Thanks for providing comments and feedback. Further to it, here below is a set of patch which will introduce, to specify a kernel version so that default features can be set based on what features were supported at that kernel version. With the new -O comp= option, the concern on user who want to make a btrfs for newer kernel is hugely reduced. But I still prefer such feature align to be done only when specified by user, instead of automatically. (yeah, already told for several times though) Warning should be enough for user, sometimes too automatic is not good, especially for tests. A lot of btrfs-progs change, like recent disabling mixed-bg for small volume has already cause regression in generic/077 testcase. And Dave is already fed up with such problem from btrfs... Especially such auto-detection will make default behavior more unstable, at least not a good idea for me. Beside this, I'm curious how other filesystm user tools handle such kernel mismatch, or do they? Thanks, Qu First of all to let user know what features was supported at what kernel version. Patch 1/7 updates -O list-all which will list the feature with version. As we didn't maintain the sysfs and progs feature names consistent, so to avoid confusion Patch 2/7 displays sysfs feature name as well again in the list-all output. Next, Patch 3,4,5/7 are helper functions. Patch 6,7/7 provides the -O comp= for mkfs.btrfs and btrfs-convert respectively Thanks, Anand Anand Jain (7): btrfs-progs: show the version for -O list-all btrfs-progs: add kernel alias for each of the features in the list btrfs-progs: make is_numerical non static btrfs-progs: check for numerical in version_to_code() btrfs-progs: introduce framework version to features btrfs-progs: add -O comp= option for mkfs.btrfs btrfs-progs: add -O comp= option for btrfs-convert btrfs-convert.c | 21 + cmds-replace.c | 11 --- mkfs.c | 24 ++-- utils.c | 58 - utils.h | 2 ++ 5 files changed, 98 insertions(+), 18 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix error path when failing to submit bio for direct IO write
On Wed, Nov 25, 2015 at 5:58 PM, Liu Bowrote: > On Tue, Nov 24, 2015 at 05:25:18PM +, fdman...@kernel.org wrote: >> From: Filipe Manana >> >> Commit 61de718fceb6 ("Btrfs: fix memory corruption on failure to submit >> bio for direct IO") fixed problems with the error handling code after we >> fail to submit a bio for direct IO. However there were 2 problems that it >> did not address when the failure is due to memory allocation failures for >> direct IO writes: >> >> 1) We considered that there could be only one ordered extent for the whole >>IO range, which is not always true, as we can have multiple; >> >> 2) It did not set the bit BTRFS_ORDERED_IO_DONE in the ordered extent, >>which can make other tasks running btrfs_wait_logged_extents() hang >>forever, since they wait for that bit to be set. The general assumption >>is that regardless of an error, the BTRFS_ORDERED_IO_DONE is always set >>and it precedes setting the bit BTRFS_ORDERED_COMPLETE. >> >> Fix these issues by moving part of the btrfs_endio_direct_write() handler >> into a new helper function and having that new helper function called when >> we fail to allocate memory to submit the bio (and its private object) for >> a direct IO write. >> >> Signed-off-by: Filipe Manana >> --- >> fs/btrfs/inode.c | 54 +++--- >> 1 file changed, 27 insertions(+), 27 deletions(-) >> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >> index f82d1f4..4f8560c 100644 >> --- a/fs/btrfs/inode.c >> +++ b/fs/btrfs/inode.c >> @@ -7995,22 +7995,22 @@ static void btrfs_endio_direct_read(struct bio *bio) >> bio_put(bio); >> } >> >> -static void btrfs_endio_direct_write(struct bio *bio) >> +static void btrfs_endio_direct_write_update_ordered(struct inode *inode, >> + const u64 offset, >> + const u64 bytes, >> + const int uptodate) >> { >> - struct btrfs_dio_private *dip = bio->bi_private; >> - struct inode *inode = dip->inode; >> struct btrfs_root *root = BTRFS_I(inode)->root; >> struct btrfs_ordered_extent *ordered = NULL; >> - u64 ordered_offset = dip->logical_offset; >> - u64 ordered_bytes = dip->bytes; >> - struct bio *dio_bio; >> + u64 ordered_offset = offset; >> + u64 ordered_bytes = bytes; >> int ret; >> >> again: >> ret = btrfs_dec_test_first_ordered_pending(inode, , >> _offset, >> ordered_bytes, >> -!bio->bi_error); >> +uptodate); >> if (!ret) >> goto out_test; >> >> @@ -8023,13 +8023,22 @@ out_test: >>* our bio might span multiple ordered extents. If we haven't >>* completed the accounting for the whole dio, go back and try again >>*/ >> - if (ordered_offset < dip->logical_offset + dip->bytes) { >> - ordered_bytes = dip->logical_offset + dip->bytes - >> - ordered_offset; >> + if (ordered_offset < offset + bytes) { >> + ordered_bytes = offset + bytes - ordered_offset; >> ordered = NULL; >> goto again; >> } >> - dio_bio = dip->dio_bio; >> +} >> + >> +static void btrfs_endio_direct_write(struct bio *bio) >> +{ >> + struct btrfs_dio_private *dip = bio->bi_private; >> + struct bio *dio_bio = dip->dio_bio; >> + >> + btrfs_endio_direct_write_update_ordered(dip->inode, >> + dip->logical_offset, >> + dip->bytes, >> + !bio->bi_error); >> >> kfree(dip); >> >> @@ -8365,24 +8374,15 @@ free_ordered: >> dip = NULL; >> io_bio = NULL; >> } else { >> - if (write) { >> - struct btrfs_ordered_extent *ordered; >> - >> - ordered = btrfs_lookup_ordered_extent(inode, >> - file_offset); >> - set_bit(BTRFS_ORDERED_IOERR, >flags); >> - /* >> - * Decrements our ref on the ordered extent and removes >> - * the ordered extent from the inode's ordered tree, >> - * doing all the proper resource cleanup such as for >> the >> - * reserved space and waking up any waiters for this >> - * ordered extent (through >> btrfs_remove_ordered_extent). >> - */ >> - btrfs_finish_ordered_io(ordered); >> - } else { >> + if (write) >> +
[PATCH v2] Btrfs: fix error path when failing to submit bio for direct IO write
From: Filipe MananaCommit 61de718fceb6 ("Btrfs: fix memory corruption on failure to submit bio for direct IO") fixed problems with the error handling code after we fail to submit a bio for direct IO. However there were 2 problems that it did not address when the failure is due to memory allocation failures for direct IO writes: 1) We considered that there could be only one ordered extent for the whole IO range, which is not always true, as we can have multiple; 2) It did not set the bit BTRFS_ORDERED_IO_DONE in the ordered extent, which can make other tasks running btrfs_wait_logged_extents() hang forever, since they wait for that bit to be set. The general assumption is that regardless of an error, the BTRFS_ORDERED_IO_DONE is always set and it precedes setting the bit BTRFS_ORDERED_COMPLETE. Fix these issues by moving part of the btrfs_endio_direct_write() handler into a new helper function and having that new helper function called when we fail to allocate memory to submit the bio (and its private object) for a direct IO write. Signed-off-by: Filipe Manana --- V2: Fixed wrong uptodate value passed to helper btrfs_endio_direct_write_update_ordered() (1 vs 0). fs/btrfs/inode.c | 54 +++--- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index f82d1f4..66106b4 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7995,22 +7995,22 @@ static void btrfs_endio_direct_read(struct bio *bio) bio_put(bio); } -static void btrfs_endio_direct_write(struct bio *bio) +static void btrfs_endio_direct_write_update_ordered(struct inode *inode, + const u64 offset, + const u64 bytes, + const int uptodate) { - struct btrfs_dio_private *dip = bio->bi_private; - struct inode *inode = dip->inode; struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_ordered_extent *ordered = NULL; - u64 ordered_offset = dip->logical_offset; - u64 ordered_bytes = dip->bytes; - struct bio *dio_bio; + u64 ordered_offset = offset; + u64 ordered_bytes = bytes; int ret; again: ret = btrfs_dec_test_first_ordered_pending(inode, , _offset, ordered_bytes, - !bio->bi_error); + uptodate); if (!ret) goto out_test; @@ -8023,13 +8023,22 @@ out_test: * our bio might span multiple ordered extents. If we haven't * completed the accounting for the whole dio, go back and try again */ - if (ordered_offset < dip->logical_offset + dip->bytes) { - ordered_bytes = dip->logical_offset + dip->bytes - - ordered_offset; + if (ordered_offset < offset + bytes) { + ordered_bytes = offset + bytes - ordered_offset; ordered = NULL; goto again; } - dio_bio = dip->dio_bio; +} + +static void btrfs_endio_direct_write(struct bio *bio) +{ + struct btrfs_dio_private *dip = bio->bi_private; + struct bio *dio_bio = dip->dio_bio; + + btrfs_endio_direct_write_update_ordered(dip->inode, + dip->logical_offset, + dip->bytes, + !bio->bi_error); kfree(dip); @@ -8365,24 +8374,15 @@ free_ordered: dip = NULL; io_bio = NULL; } else { - if (write) { - struct btrfs_ordered_extent *ordered; - - ordered = btrfs_lookup_ordered_extent(inode, - file_offset); - set_bit(BTRFS_ORDERED_IOERR, >flags); - /* -* Decrements our ref on the ordered extent and removes -* the ordered extent from the inode's ordered tree, -* doing all the proper resource cleanup such as for the -* reserved space and waking up any waiters for this -* ordered extent (through btrfs_remove_ordered_extent). -*/ - btrfs_finish_ordered_io(ordered); - } else { + if (write) + btrfs_endio_direct_write_update_ordered(inode, + file_offset, + dio_bio->bi_iter.bi_size, + 0);
Re: [PATCH] Btrfs: fix error path when failing to submit bio for direct IO write
On Tue, Nov 24, 2015 at 05:25:18PM +, fdman...@kernel.org wrote: > From: Filipe Manana> > Commit 61de718fceb6 ("Btrfs: fix memory corruption on failure to submit > bio for direct IO") fixed problems with the error handling code after we > fail to submit a bio for direct IO. However there were 2 problems that it > did not address when the failure is due to memory allocation failures for > direct IO writes: > > 1) We considered that there could be only one ordered extent for the whole >IO range, which is not always true, as we can have multiple; > > 2) It did not set the bit BTRFS_ORDERED_IO_DONE in the ordered extent, >which can make other tasks running btrfs_wait_logged_extents() hang >forever, since they wait for that bit to be set. The general assumption >is that regardless of an error, the BTRFS_ORDERED_IO_DONE is always set >and it precedes setting the bit BTRFS_ORDERED_COMPLETE. > > Fix these issues by moving part of the btrfs_endio_direct_write() handler > into a new helper function and having that new helper function called when > we fail to allocate memory to submit the bio (and its private object) for > a direct IO write. > > Signed-off-by: Filipe Manana > --- > fs/btrfs/inode.c | 54 +++--- > 1 file changed, 27 insertions(+), 27 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index f82d1f4..4f8560c 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -7995,22 +7995,22 @@ static void btrfs_endio_direct_read(struct bio *bio) > bio_put(bio); > } > > -static void btrfs_endio_direct_write(struct bio *bio) > +static void btrfs_endio_direct_write_update_ordered(struct inode *inode, > + const u64 offset, > + const u64 bytes, > + const int uptodate) > { > - struct btrfs_dio_private *dip = bio->bi_private; > - struct inode *inode = dip->inode; > struct btrfs_root *root = BTRFS_I(inode)->root; > struct btrfs_ordered_extent *ordered = NULL; > - u64 ordered_offset = dip->logical_offset; > - u64 ordered_bytes = dip->bytes; > - struct bio *dio_bio; > + u64 ordered_offset = offset; > + u64 ordered_bytes = bytes; > int ret; > > again: > ret = btrfs_dec_test_first_ordered_pending(inode, , > _offset, > ordered_bytes, > -!bio->bi_error); > +uptodate); > if (!ret) > goto out_test; > > @@ -8023,13 +8023,22 @@ out_test: >* our bio might span multiple ordered extents. If we haven't >* completed the accounting for the whole dio, go back and try again >*/ > - if (ordered_offset < dip->logical_offset + dip->bytes) { > - ordered_bytes = dip->logical_offset + dip->bytes - > - ordered_offset; > + if (ordered_offset < offset + bytes) { > + ordered_bytes = offset + bytes - ordered_offset; > ordered = NULL; > goto again; > } > - dio_bio = dip->dio_bio; > +} > + > +static void btrfs_endio_direct_write(struct bio *bio) > +{ > + struct btrfs_dio_private *dip = bio->bi_private; > + struct bio *dio_bio = dip->dio_bio; > + > + btrfs_endio_direct_write_update_ordered(dip->inode, > + dip->logical_offset, > + dip->bytes, > + !bio->bi_error); > > kfree(dip); > > @@ -8365,24 +8374,15 @@ free_ordered: > dip = NULL; > io_bio = NULL; > } else { > - if (write) { > - struct btrfs_ordered_extent *ordered; > - > - ordered = btrfs_lookup_ordered_extent(inode, > - file_offset); > - set_bit(BTRFS_ORDERED_IOERR, >flags); > - /* > - * Decrements our ref on the ordered extent and removes > - * the ordered extent from the inode's ordered tree, > - * doing all the proper resource cleanup such as for the > - * reserved space and waking up any waiters for this > - * ordered extent (through btrfs_remove_ordered_extent). > - */ > - btrfs_finish_ordered_io(ordered); > - } else { > + if (write) > + btrfs_endio_direct_write_update_ordered(inode, > + file_offset, > +
Re: [PATCH v2] Btrfs: fix error path when failing to submit bio for direct IO write
On Tue, Nov 24, 2015 at 11:35:25PM +, fdman...@kernel.org wrote: > From: Filipe Manana> > Commit 61de718fceb6 ("Btrfs: fix memory corruption on failure to submit > bio for direct IO") fixed problems with the error handling code after we > fail to submit a bio for direct IO. However there were 2 problems that it > did not address when the failure is due to memory allocation failures for > direct IO writes: > > 1) We considered that there could be only one ordered extent for the whole >IO range, which is not always true, as we can have multiple; > > 2) It did not set the bit BTRFS_ORDERED_IO_DONE in the ordered extent, >which can make other tasks running btrfs_wait_logged_extents() hang >forever, since they wait for that bit to be set. The general assumption >is that regardless of an error, the BTRFS_ORDERED_IO_DONE is always set >and it precedes setting the bit BTRFS_ORDERED_COMPLETE. > > Fix these issues by moving part of the btrfs_endio_direct_write() handler > into a new helper function and having that new helper function called when > we fail to allocate memory to submit the bio (and its private object) for > a direct IO write. > You can have Reviewed-by: Liu Bo > Signed-off-by: Filipe Manana > --- > > V2: Fixed wrong uptodate value passed to helper > btrfs_endio_direct_write_update_ordered() (1 vs 0). > > fs/btrfs/inode.c | 54 +++--- > 1 file changed, 27 insertions(+), 27 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index f82d1f4..66106b4 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -7995,22 +7995,22 @@ static void btrfs_endio_direct_read(struct bio *bio) > bio_put(bio); > } > > -static void btrfs_endio_direct_write(struct bio *bio) > +static void btrfs_endio_direct_write_update_ordered(struct inode *inode, > + const u64 offset, > + const u64 bytes, > + const int uptodate) > { > - struct btrfs_dio_private *dip = bio->bi_private; > - struct inode *inode = dip->inode; > struct btrfs_root *root = BTRFS_I(inode)->root; > struct btrfs_ordered_extent *ordered = NULL; > - u64 ordered_offset = dip->logical_offset; > - u64 ordered_bytes = dip->bytes; > - struct bio *dio_bio; > + u64 ordered_offset = offset; > + u64 ordered_bytes = bytes; > int ret; > > again: > ret = btrfs_dec_test_first_ordered_pending(inode, , > _offset, > ordered_bytes, > -!bio->bi_error); > +uptodate); > if (!ret) > goto out_test; > > @@ -8023,13 +8023,22 @@ out_test: >* our bio might span multiple ordered extents. If we haven't >* completed the accounting for the whole dio, go back and try again >*/ > - if (ordered_offset < dip->logical_offset + dip->bytes) { > - ordered_bytes = dip->logical_offset + dip->bytes - > - ordered_offset; > + if (ordered_offset < offset + bytes) { > + ordered_bytes = offset + bytes - ordered_offset; > ordered = NULL; > goto again; > } > - dio_bio = dip->dio_bio; > +} > + > +static void btrfs_endio_direct_write(struct bio *bio) > +{ > + struct btrfs_dio_private *dip = bio->bi_private; > + struct bio *dio_bio = dip->dio_bio; > + > + btrfs_endio_direct_write_update_ordered(dip->inode, > + dip->logical_offset, > + dip->bytes, > + !bio->bi_error); > > kfree(dip); > > @@ -8365,24 +8374,15 @@ free_ordered: > dip = NULL; > io_bio = NULL; > } else { > - if (write) { > - struct btrfs_ordered_extent *ordered; > - > - ordered = btrfs_lookup_ordered_extent(inode, > - file_offset); > - set_bit(BTRFS_ORDERED_IOERR, >flags); > - /* > - * Decrements our ref on the ordered extent and removes > - * the ordered extent from the inode's ordered tree, > - * doing all the proper resource cleanup such as for the > - * reserved space and waking up any waiters for this > - * ordered extent (through btrfs_remove_ordered_extent). > - */ > - btrfs_finish_ordered_io(ordered); > - } else { > + if (write) > +
Re: btrfs: poor performance on deleting many large files
On Mon, 2015-11-23 at 06:29 +, Duncan wrote: > Using subvolumes was the first recommendation I was going to make, too, > so you're on the right track. =:^) > > Also, in case you are using it (you didn't say, but this has been > demonstrated to solve similar issues for others so it's worth > mentioning), try turning btrfs quota functionality off. While the devs > are working very hard on that feature for btrfs, the fact is that it's > simply still buggy and doesn't work reliably anyway, in addition to > triggering scaling issues before they'd otherwise occur. So my > recommendation has been, and remains, unless you're working directly with > the devs to fix quota issues (in which case, thanks!), if you actually > NEED quota functionality, use a filesystem where it works reliably, while > if you don't, just turn it off and avoid the scaling and other issues > that currently still come with it. > I did indeed have quotas turned on for the home directories! Since they were mostly to calculate space used by everyone (since du -hs is so slow) and not actually needed to limit people, I disabled them. > As for defrag, that's quite a topic of its own, with complications > related to snapshots and the nocow file attribute. Very briefly, if you > haven't been running it regularly or using the autodefrag mount option by > default, chances are your available free space is rather fragmented as > well, and while defrag may help, it may not reduce fragmentation to the > degree you'd like. (I'd suggest using filefrag to check fragmentation, > but it doesn't know how to deal with btrfs compression, and will report > heavy fragmentation for compressed files even if they're fine. Since you > use compression, that kind of eliminates using filefrag to actually see > what your fragmentation is.) > Additionally, defrag isn't snapshot aware (they tried it for a few > kernels a couple years ago but it simply didn't scale), so if you're > using snapshots (as I believe Ubuntu does by default on btrfs, at least > taking snapshots for upgrade-in-place), so using defrag on files that > exist in the snapshots as well can dramatically increase space usage, > since defrag will break the reflinks to the snapshotted extents and > create new extents for defragged files. > > Meanwhile, the absolute worst-case fragmentation on btrfs occurs with > random-internal-rewrite-pattern files (as opposed to never changed, or > append-only). Common examples are database files and VM images. For > /relatively/ small files, to say 256 MiB, the autodefrag mount option is > a reasonably effective solution, but it tends to have scaling issues with > files over half a GiB so you can call this a negative recommendation for > trying that option with half-gig-plus internal-random-rewrite-pattern > files. There are other mitigation strategies that can be used, but here > the subject gets complex so I'll not detail them. Suffice it to say that > if the filesystem in question is used with large VM images or database > files and you haven't taken specific fragmentation avoidance measures, > that's very likely a good part of your problem right there, and you can > call this a hint that further research is called for. > > If your half-gig-plus files are mostly write-once, for example most media > files unless you're doing heavy media editing, however, then autodefrag > could be a good option in general, as it deals well with such files and > with random-internal-rewrite-pattern files under a quarter gig or so. Be > aware, however, that if it's enabled on an already heavily fragmented > filesystem (as yours likely is), it's likely to actually make performance > worse until it gets things under control. Your best bet in that case, if > you have spare devices available to do so, is probably to create a fresh > btrfs and consistently use autodefrag as you populate it from the > existing heavily fragmented btrfs. That way, it'll never have a chance > for the fragmentation to build up in the first place, and autodefrag used > as a routine mount option should keep it from getting bad in normal use. Thanks for explaining that! Most of these files are written once and then read from for the rest of their "lifetime" until the simulations are done and they get archived/deleted. I'll try leaving autodefrag on and defragging directories over the holiday weekend when no one is using the server. There is some database usage, but I turned off COW for its folder and it only gets used sporadically and shouldn't be a huge factor in day-to-day usage. Also, is there a recommendation for relatime vs noatime mount options? I don't believe anything that runs on the server needs to use file access times, so if it can help with performance/disk usage I'm fine with setting it to noatime. I just tried copying a 70GB folder and then rm -rf it and it didn't appear to impact performance, and I plan to try some larger
Re: [PATCH 2/7] btrfs-progs: add kernel alias for each of the features in the list
On Wed, Nov 25, 2015 at 08:08:15PM +0800, Anand Jain wrote: > We should have maintained feature's name same across progs UI and sysfs UI. > For example, progs mixed-bg is /sys/fs/btrfs/features/mixed_groups > in sysfs. As these are already released and is UIs, there is nothing much > can be done about it, except for creating the alias and making it aware. > > Add kernel alias for each of the features in the list. > > eg: The string with in () is the sysfs name for the same feaure > > mkfs.btrfs -O list-all > Filesystem features available: > mixed-bg (mixed_groups) - mixed data and metadata block groups > (0x4, 2.7.37) > extref (extended_iref)- increased hardlink limit per file to > 65536 (0x40, 3.7, default) > raid56 (raid56) - raid56 extended format (0x80, 3.9) > skinny-metadata (skinny_metadata) - reduced-size metadata extent refs (0x100, > 3.10, default) > no-holes (no_holes) - no explicit hole extents for files > (0x200, 3.14) > > btrfs-convert -O list-all > Filesystem features available: > extref (extended_iref)- increased hardlink limit per file to > 65536 (0x40, 3.7, default) > skinny-metadata (skinny_metadata) - reduced-size metadata extent refs (0x100, > 3.10, default) > no-holes (no_holes) - no explicit hole extents for files > (0x200, 3.14) You miss a signed-off-by here. Thanks, -liubo > --- > utils.c | 13 +++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/utils.c b/utils.c > index 0163915..6d2675d 100644 > --- a/utils.c > +++ b/utils.c > @@ -648,17 +648,26 @@ void btrfs_process_fs_features(u64 flags) > void btrfs_list_all_fs_features(u64 mask_disallowed) > { > int i; > + u64 feature_per_sysfs; > + > + btrfs_features_allowed_by_sysfs(_per_sysfs); > > fprintf(stderr, "Filesystem features available:\n"); > for (i = 0; i < ARRAY_SIZE(mkfs_features) - 1; i++) { > char *is_default = ""; > + char name[256]; > > if (mkfs_features[i].flag & mask_disallowed) > continue; > if (mkfs_features[i].flag & BTRFS_MKFS_DEFAULT_FEATURES) > is_default = ", default"; > - fprintf(stderr, "%-20s- %s (0x%llx, %s%s)\n", > - mkfs_features[i].name, > + if (mkfs_features[i].flag & feature_per_sysfs) > + sprintf(name, "%s (%s)", > + mkfs_features[i].name, > mkfs_features[i].name_ker); > + else > + sprintf(name, "%s", mkfs_features[i].name); > + fprintf(stderr, "%-34s- %s (0x%llx, %s%s)\n", > + name, > mkfs_features[i].desc, > mkfs_features[i].flag, > mkfs_features[i].min_ker_ver, > -- > 2.6.2 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Imbalanced RAID1 with three unequal disks
On Wed, Nov 25, 2015 at 12:36:32PM +0100, Mario wrote: > > Hi, > > I pushed a subvolume using send/receive to an 8 TB disk, added > two 4 TB disks and started a balance with conversion to RAID1. > > Afterwards, I got the following: > > Total devices 3 FS bytes used 5.40TiB > devid1 size 7.28TiB used 4.54TiB path /dev/mapper/yellow4 > devid2 size 3.64TiB used 3.17TiB path /dev/mapper/yellow1 > devid3 size 3.64TiB used 3.17TiB path /dev/mapper/yellow2 > > Btrfs v3.17 > Data, RAID1: total=5.43TiB, used=5.39TiB > System, RAID1: total=64.00MiB, used=800.00KiB > Metadata, RAID1: total=14.00GiB, used=5.55GiB > GlobalReserve, single: total=512.00MiB, used=0.00B > > In my understanding, the data isn't properly balanced and I > only get around 5.9TB of usable space. As suggested in #btrfs, > I started a second balance without filters and got this: > > Total devices 3 FS bytes used 5.40TiB > devid1 size 7.28TiB used 5.41TiB path /dev/mapper/yellow4 > devid2 size 3.64TiB used 2.71TiB path /dev/mapper/yellow1 > devid3 size 3.64TiB used 2.71TiB path /dev/mapper/yellow2 > > Data, RAID1: total=5.41TiB, used=5.39TiB > System, RAID1: total=32.00MiB, used=784.00KiB > Metadata, RAID1: total=7.00GiB, used=5.54GiB > GlobalReserve, single: total=512.00MiB, used=0.00B > > /dev/mapper/yellow4 7,3T5,4T 969G 86% /mnt/yellow > > Now, I get 6.3TB of usable space but, in my understand, I should > get around 7.28 TB or am I missing something here? Also, a second > balance shouldn't change the data distribution, right? The first balance, because it was converting, didn't get the final outcome right. (Possibly an area for further research on appropriate algorithms for balance). The second one appears to have done the right thing. It actually threw me slightly, because your free space isn't equal on all the devices, but on reflection, that's expected in this case. You have dev 1 double the size of devs 2 and 3, so in each RAID-1 block group, one chunk will go on dev 1, and the other chunk will go on one of the other two devices (spread evenly). This means that *eventually*, they'll hit all devices with equal free space, but only when everything's filled up completely. It's just on the edge case between getting equal free space on all devices and having unusable space (which you'd have if that 8TB drive was any larger). So, yes, all normal and all good. As to remaining free space from df, I'm fairly sure that the algorithm for computing free space for df is just plain wrong. I've spotted it not quite getting the answer right before. That seems to be the case here, too; just more so. You have something just under 2 TiB of usable space on the FS, according to my calculations. > I'm using kernel v4.3 with a patch [1] from kernel bugzilla [2] for > the 8 TB SMR drive. The send/receive of a 5 TB subvolume worked > flawlessly with the patch. Without, I got a lot of errors in dmesg > within the first 200GB of transferred data. The OS is a x86_64 > Ubuntu 15.04. That's useful to know, in case anyone else shows up with write errors on SMR devices. Hugo. > Thank you! > Mario > > [1] > http://git.kernel.org/cgit/linux/kernel/git/mkp/linux.git/commit/?h=bugzilla-93581=7c4fbd50bfece00abf529bc96ac989dd2bb83ca4 > [2] https://bugzilla.kernel.org/show_bug.cgi?id=93581 -- Hugo Mills | I was cursed with poetry very young. It creates hugo@... carfax.org.uk | unrealistic expectations. http://carfax.org.uk/ | Victor Frankenstein PGP: E2AB1DE4 |Penny Dreadful signature.asc Description: Digital signature
Re: [PATCH 0/7] Let user specify the kernel version for features
Anand Jain wrote on 2015/11/26 14:07 +0800: On 11/26/2015 10:02 AM, Qu Wenruo wrote: Anand Jain wrote on 2015/11/25 20:08 +0800: Sometimes users may want to have a btrfs to be supported on multiple kernel version. A simple example, USB drive can be used with multiple system running different kernel versions. Or in a data center a SAN LUN could be mounted on any system with different kernel version. Thanks for providing comments and feedback. Further to it, here below is a set of patch which will introduce, to specify a kernel version so that default features can be set based on what features were supported at that kernel version. With the new -O comp= option, the concern on user who want to make a btrfs for newer kernel is hugely reduced. NO!. actually new option -O comp= provides no concern for users who want to create _a btrfs disk layout which is compatible with more than one kernel_. above there are two examples of it. Why you can't give a higher kernel version than current kernel? But I still prefer such feature align to be done only when specified by user, instead of automatically. (yeah, already told for several times though) Warning should be enough for user, sometimes too automatic is not good, As said before. We need latest btrfs-progs on older kernels, for obvious reasons of btrfs-progs bug fixes. We don't have to back port fixes even on btrfs-progs as we already do it in btrfs kernel. A btrfs-progs should work on any kernel with the "default features as prescribed for that kernel". Let's say if we don't do this automatic then, latest btrfs-progs with default mkfs.btfs && mount fails. But a user upgrading btrfs-progs for fsck bug fixes, shouldn't find 'default mkfs.btfs && mount' failing. Nor they have to use a "new" set of mkfs option to create all default FS for a LTS kernel. Default features based on btrfs-progs version instead of kernel version- makes NO sense. Kernel version never makes sense, especially for non-vanilla. And unfortunately, most of kernels used in stable distribution is not vanilla. And that's the *POINT1*. That's why I stand against kernel version based detection. You can use stable /sys/fs/btrfs/features/, but kernel version? Not an option even as fallback. And adding a warning for not using latest features which is not in their running kernel is pointless. You didn't get the point of what to WARN. Not warning user they are not using latest features, but warning some features may prevent the fs being mounted for current kernel. That's _not_ a backward kernel compatible tool. btrfs-progs should work "for the kernel". We should avoid adding too much intelligence into btrfs-progs. I have fixed too many issues and redesigned progs in this area. Too many bugs were mainly because of the idea of copy and maintain same code on btrfs-progs and btrfs-kernel approach for progs. (ref wiki and my email before). Thats a wrong approach. Totally agree with this point. Too many non-sense in btrfs-progs codes copied from kernel, and due to lack of update, it's very buggy now. Just check volume.c for allocating data chunk. But I didn't see the point related to the feature auto align here. I don't understand- if the purpose of both of these isn't same what is the point in maintaining same code? It won't save in efforts mainly because its like developing a distributed FS where two parties has to be communicated to be in sync. Which is like using the canon to shoo a crow. But if the reason was fuse like kernel-free FS (no one said that though) then its better to do it as a separate project. especially for tests. It depends whats being tested kernel OR progs? Its kernel not progs. No, both kernel and progs. Just from Dave, even with his typo: "xfstests is not jsut for testing kernel changes - it tests all of the filesystem utilities for regressions, too. And so when inadvertant changes in default behaviour occur, it detects those regressions too." Automatic will keep default feature constant for a given kernel version. Further, for testing using a known set of options is even better. Yeah, known set of options get unknown on different kernels, thanks to the hidden feature align. Unless you specify it by -O options. That's the *POINT2*: Default auto feature align are making mkfs.btrfs behavior *unpredictable*. Before auto feature align, QA/end-user only needs to check the btrfs-progs announcement to know the default behavior change. And after it, wow, QA testers will need to check the feature matrix to know what's the default feature on their kernel, not to mention it may even be wrong due to more unpredictable kernel version. That's why I strongly recommend to make it just a warning other than default behavior. A lot of btrfs-progs change, like recent disabling mixed-bg for small volume has already cause regression in generic/077 testcase. And Dave is already fed up with such problem from btrfs... I don't know
Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk
On 25/11/2015 00:46, Qu Wenruo wrote: > The size seems small enough, I'll try to download it as it's super useful to > debug it. Thanks ! > Nice reproducer. > Is it 100% reproducible or has a chance to reproduce? I tried a second time and got a similar kernel backtrace. > BTW, did you encountered the same btrfsck error "chunk type dismatch" from > Christoph? Yes, that's what drew me to this discussion :>. I also tried the --repair option and that is perhaps what corrupted my FS. -- Laurent. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix error path when failing to submit bio for direct IO write
From: Filipe MananaCommit 61de718fceb6 ("Btrfs: fix memory corruption on failure to submit bio for direct IO") fixed problems with the error handling code after we fail to submit a bio for direct IO. However there were 2 problems that it did not address when the failure is due to memory allocation failures for direct IO writes: 1) We considered that there could be only one ordered extent for the whole IO range, which is not always true, as we can have multiple; 2) It did not set the bit BTRFS_ORDERED_IO_DONE in the ordered extent, which can make other tasks running btrfs_wait_logged_extents() hang forever, since they wait for that bit to be set. The general assumption is that regardless of an error, the BTRFS_ORDERED_IO_DONE is always set and it precedes setting the bit BTRFS_ORDERED_COMPLETE. Fix these issues by moving part of the btrfs_endio_direct_write() handler into a new helper function and having that new helper function called when we fail to allocate memory to submit the bio (and its private object) for a direct IO write. Signed-off-by: Filipe Manana --- fs/btrfs/inode.c | 54 +++--- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index f82d1f4..4f8560c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7995,22 +7995,22 @@ static void btrfs_endio_direct_read(struct bio *bio) bio_put(bio); } -static void btrfs_endio_direct_write(struct bio *bio) +static void btrfs_endio_direct_write_update_ordered(struct inode *inode, + const u64 offset, + const u64 bytes, + const int uptodate) { - struct btrfs_dio_private *dip = bio->bi_private; - struct inode *inode = dip->inode; struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_ordered_extent *ordered = NULL; - u64 ordered_offset = dip->logical_offset; - u64 ordered_bytes = dip->bytes; - struct bio *dio_bio; + u64 ordered_offset = offset; + u64 ordered_bytes = bytes; int ret; again: ret = btrfs_dec_test_first_ordered_pending(inode, , _offset, ordered_bytes, - !bio->bi_error); + uptodate); if (!ret) goto out_test; @@ -8023,13 +8023,22 @@ out_test: * our bio might span multiple ordered extents. If we haven't * completed the accounting for the whole dio, go back and try again */ - if (ordered_offset < dip->logical_offset + dip->bytes) { - ordered_bytes = dip->logical_offset + dip->bytes - - ordered_offset; + if (ordered_offset < offset + bytes) { + ordered_bytes = offset + bytes - ordered_offset; ordered = NULL; goto again; } - dio_bio = dip->dio_bio; +} + +static void btrfs_endio_direct_write(struct bio *bio) +{ + struct btrfs_dio_private *dip = bio->bi_private; + struct bio *dio_bio = dip->dio_bio; + + btrfs_endio_direct_write_update_ordered(dip->inode, + dip->logical_offset, + dip->bytes, + !bio->bi_error); kfree(dip); @@ -8365,24 +8374,15 @@ free_ordered: dip = NULL; io_bio = NULL; } else { - if (write) { - struct btrfs_ordered_extent *ordered; - - ordered = btrfs_lookup_ordered_extent(inode, - file_offset); - set_bit(BTRFS_ORDERED_IOERR, >flags); - /* -* Decrements our ref on the ordered extent and removes -* the ordered extent from the inode's ordered tree, -* doing all the proper resource cleanup such as for the -* reserved space and waking up any waiters for this -* ordered extent (through btrfs_remove_ordered_extent). -*/ - btrfs_finish_ordered_io(ordered); - } else { + if (write) + btrfs_endio_direct_write_update_ordered(inode, + file_offset, + dio_bio->bi_iter.bi_size, + 1); + else unlock_extent(_I(inode)->io_tree, file_offset,
Re: shall distros run btrfsck on boot?
On 2015-11-24 17:26, Eric Sandeen wrote: On 11/24/15 2:38 PM, Austin S Hemmelgarn wrote: if the system was shut down cleanly, you're fine barring software bugs, but if it crashed, you should be running a check on the FS. Um, no... The *entire point* of having a journaling filesystem is that after a crash or power loss, a journal replay on next mount will bring the metadata into a consistent state. OK, first, that was in reference to BTRFS, not ext4, and BTRFS is a COW filesystem, not a journaling one, which is an important distinction as mentioned by Hugo in his reply. Second, there are two reasons that you should be running a check even of a journaled filesystem when the system crashes (this also applies to COW filesystems, and anything else that relies on atomicity of write operations for consistency): 1. Disks don't atomically write anything bigger than a sector, and may not even atomically write the sector itself. This means that it's possible to get a partial write to the journal, which in turn has significant potential to put the metadata in an inconsistent state when the journal gets replayed (IIRC, ext4 has a journal_checksum mount option that is supposed to mitigate this possibility). This sounds like something that shouldn't happen all that often, but on a busy filesystem, the probability is exactly proportionate to the size of the journal relative to the size of the FS. 2. If the system crashed, all code running on it immediately before the crash is instantly suspect, and you have no way to know for certain that something didn't cause random garbage to be written to the disk. On top of this, hardware is potentially suspect, and when your hardware is misbehaving, then all bets as to consistency are immediately off. smime.p7s Description: S/MIME Cryptographic Signature
[PATCH 10/12] Fix btrfs/098 to work on non-4k block sized filesystems
This commit makes use of the new _filter_xfs_io_blocks_modified filtering function to print information in terms of file blocks rather than file offset. Signed-off-by: Chandan Rajendra--- tests/btrfs/098 | 65 ++--- tests/btrfs/098.out | 7 +- 2 files changed, 38 insertions(+), 34 deletions(-) diff --git a/tests/btrfs/098 b/tests/btrfs/098 index 8aef119..4879a90 100755 --- a/tests/btrfs/098 +++ b/tests/btrfs/098 @@ -58,43 +58,49 @@ _scratch_mkfs >>$seqres.full 2>&1 _init_flakey _mount_flakey -# Create our test file with a single 100K extent starting at file offset 800K. -# We fsync the file here to make the fsync log tree gets a single csum item that -# covers the whole 100K extent, which causes the second fsync, done after the -# cloning operation below, to not leave in the log tree two csum items covering -# two sub-ranges ([0, 20K[ and [20K, 100K[)) of our extent. -$XFS_IO_PROG -f -c "pwrite -S 0xaa 800K 100K" \ +BLOCK_SIZE=$(get_block_size $SCRATCH_MNT) + +# Create our test file with a single 25 block extent starting at file offset +# mapped by 200th block We fsync the file here to make the fsync log tree get a +# single csum item that covers the whole 25 block extent, which causes the +# second fsync, done after the cloning operation below, to not leave in the log +# tree two csum items covering two block sub-ranges ([0, 5[ and [5, 25[)) of our +# extent. +$XFS_IO_PROG -f -c "pwrite -S 0xaa $((200 * $BLOCK_SIZE)) $((25 * $BLOCK_SIZE))" \ -c "fsync" \ - $SCRATCH_MNT/foo | _filter_xfs_io + $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified + -# Now clone part of our extent into file offset 400K. This adds a file extent -# item to our inode's metadata that points to the 100K extent we created before, -# using a data offset of 20K and a data length of 20K, so that it refers to -# the sub-range [20K, 40K[ of our original extent. -$CLONER_PROG -s $((800 * 1024 + 20 * 1024)) -d $((400 * 1024)) \ - -l $((20 * 1024)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo +# Now clone part of our extent into file offset mapped by 100th block. This adds +# a file extent item to our inode's metadata that points to the 25 block extent +# we created before, using a data offset of 5 blocks and a data length of 5 +# blocks, so that it refers to the block sub-range [5, 10[ of our original +# extent. +$CLONER_PROG -s $(((200 * $BLOCK_SIZE) + (5 * $BLOCK_SIZE))) \ +-d $((100 * $BLOCK_SIZE)) -l $((5 * $BLOCK_SIZE)) \ +$SCRATCH_MNT/foo $SCRATCH_MNT/foo # Now fsync our file to make sure the extent cloning is durably persisted. This # fsync will not add a second csum item to the log tree containing the checksums -# for the blocks in the sub-range [20K, 40K[ of our extent, because there was +# for the blocks in the block sub-range [5, 10[ of our extent, because there was # already a csum item in the log tree covering the whole extent, added by the # first fsync we did before. $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo -echo "File digest before power failure:" -md5sum $SCRATCH_MNT/foo | _filter_scratch +orig_hash=$(md5sum $SCRATCH_MNT/foo | cut -f 1 -d ' ') # The fsync log replay first processes the file extent item corresponding to the -# file offset 400K (the one which refers to the [20K, 40K[ sub-range of our 100K -# extent) and then processes the file extent item for file offset 800K. It used -# to happen that when processing the later, it erroneously left in the csum tree -# 2 csum items that overlapped each other, 1 for the sub-range [20K, 40K[ and 1 -# for the whole range of our extent. This introduced a problem where subsequent -# lookups for the checksums of blocks within the range [40K, 100K[ of our extent -# would not find anything because lookups in the csum tree ended up looking only -# at the smaller csum item, the one covering the subrange [20K, 40K[. This made -# read requests assume an expected checksum with a value of 0 for those blocks, -# which caused checksum verification failure when the read operations finished. +# file offset mapped by 100th block (the one which refers to the [5, 10[ block +# sub-range of our 25 block extent) and then processes the file extent item for +# file offset mapped by 200th block. It used to happen that when processing the +# later, it erroneously left in the csum tree 2 csum items that overlapped each +# other, 1 for the block sub-range [5, 10[ and 1 for the whole range of our +# extent. This introduced a problem where subsequent lookups for the checksums +# of blocks within the block range [10, 25[ of our extent would not find +# anything because lookups in the csum tree ended up looking only at the smaller +# csum item, the one covering the block subrange [5, 10[. This made read +# requests assume an expected checksum with a value of 0 for those blocks, which +# caused checksum
[PATCH 05/12] Fix btrfs/056 to work on non-4k block sized filesystems
This commit makes use of the new _filter_xfs_io_blocks_modified and _filter_od filtering functions to print information in terms of file blocks rather than file offset. Signed-off-by: Chandan Rajendra--- tests/btrfs/056 | 51 ++ tests/btrfs/056.out | 152 +--- 2 files changed, 90 insertions(+), 113 deletions(-) diff --git a/tests/btrfs/056 b/tests/btrfs/056 index 66a59b8..6dc3bfd 100755 --- a/tests/btrfs/056 +++ b/tests/btrfs/056 @@ -68,33 +68,42 @@ test_btrfs_clone_fsync_log_recover() MOUNT_OPTIONS="$MOUNT_OPTIONS $2" _mount_flakey - # Create a file with 4 extents and 1 hole, all with a size of 8Kb each. - # The hole is in the range [16384, 24576[. - $XFS_IO_PROG -s -f -c "pwrite -S 0x01 -b 8192 0 8192" \ - -c "pwrite -S 0x02 -b 8192 8192 8192" \ - -c "pwrite -S 0x04 -b 8192 24576 8192" \ - -c "pwrite -S 0x05 -b 8192 32768 8192" \ - $SCRATCH_MNT/foo | _filter_xfs_io - - # Clone destination file, 1 extent of 96kb. - $XFS_IO_PROG -f -c "pwrite -S 0xff -b 98304 0 98304" -c "fsync" \ - $SCRATCH_MNT/bar | _filter_xfs_io - - # Clone second half of the 2nd extent, the 8kb hole, the 3rd extent + BLOCK_SIZE=$(get_block_size $SCRATCH_MNT) + + EXTENT_SIZE=$((2 * $BLOCK_SIZE)) + + # Create a file with 4 extents and 1 hole, all with a size of + # 2 blocks each. + # The hole is in the block range [4, 5]. + $XFS_IO_PROG -s -f -c "pwrite -S 0x01 -b $EXTENT_SIZE 0 $EXTENT_SIZE" \ + -c "pwrite -S 0x02 -b $EXTENT_SIZE $((2 * $BLOCK_SIZE)) $EXTENT_SIZE" \ + -c "pwrite -S 0x04 -b $EXTENT_SIZE $((6 * $BLOCK_SIZE)) $EXTENT_SIZE" \ + -c "pwrite -S 0x05 -b $EXTENT_SIZE $((8 * $BLOCK_SIZE)) $EXTENT_SIZE" \ + $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified + + # Clone destination file, 1 extent of 24 blocks. + $XFS_IO_PROG -f -c "pwrite -S 0xff -b $((24 * $BLOCK_SIZE)) 0 $((24 * $BLOCK_SIZE))" \ +-c "fsync" $SCRATCH_MNT/bar | _filter_xfs_io_blocks_modified + + # Clone second half of the 2nd extent, the 2 block hole, the 3rd extent # and the first half of the 4th extent into file bar. - $CLONER_PROG -s 12288 -d 0 -l 24576 $SCRATCH_MNT/foo $SCRATCH_MNT/bar + $CLONER_PROG -s $((3 * $BLOCK_SIZE)) -d 0 -l $((6 * $BLOCK_SIZE)) \ +$SCRATCH_MNT/foo $SCRATCH_MNT/bar $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar # Test small files too consisting of 1 inline extent - $XFS_IO_PROG -f -c "pwrite -S 0x00 -b 3500 0 3500" -c "fsync" \ - $SCRATCH_MNT/foo2 | _filter_xfs_io + EXTENT_SIZE=$(($BLOCK_SIZE - 48)) + $XFS_IO_PROG -f -c "pwrite -S 0x00 -b $EXTENT_SIZE 0 $EXTENT_SIZE" -c "fsync" \ + $SCRATCH_MNT/foo2 | _filter_xfs_io_blocks_modified - $XFS_IO_PROG -f -c "pwrite -S 0xcc -b 1000 0 1000" -c "fsync" \ - $SCRATCH_MNT/bar2 | _filter_xfs_io + EXTENT_SIZE=$(($BLOCK_SIZE - 1048)) + $XFS_IO_PROG -f -c "pwrite -S 0xcc -b $EXTENT_SIZE 0 $EXTENT_SIZE" -c "fsync" \ + $SCRATCH_MNT/bar2 | _filter_xfs_io_blocks_modified # Clone the entire foo2 file into bar2, overwriting all data in bar2 # and increasing its size. - $CLONER_PROG -s 0 -d 0 -l 3500 $SCRATCH_MNT/foo2 $SCRATCH_MNT/bar2 + EXTENT_SIZE=$(($BLOCK_SIZE - 48)) + $CLONER_PROG -s 0 -d 0 -l $EXTENT_SIZE $SCRATCH_MNT/foo2 $SCRATCH_MNT/bar2 $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar2 _flakey_drop_and_remount yes @@ -102,10 +111,10 @@ test_btrfs_clone_fsync_log_recover() # Verify the cloned range was persisted by fsync and the log recovery # code did its work well. echo "Verifying file bar content" - od -t x1 $SCRATCH_MNT/bar + od -t x1 $SCRATCH_MNT/bar | _filter_od echo "Verifying file bar2 content" - od -t x1 $SCRATCH_MNT/bar2 + od -t x1 $SCRATCH_MNT/bar2 | _filter_od _unmount_flakey diff --git a/tests/btrfs/056.out b/tests/btrfs/056.out index 1b77ae3..c4c6b2c 100644 --- a/tests/btrfs/056.out +++ b/tests/btrfs/056.out @@ -1,129 +1,97 @@ QA output created by 056 Testing without the NO_HOLES feature -wrote 8192/8192 bytes at offset 0 -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -wrote 8192/8192 bytes at offset 8192 -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -wrote 8192/8192 bytes at offset 24576 -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -wrote 8192/8192 bytes at offset 32768 -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -wrote 98304/98304 bytes at offset 0 -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -wrote 3500/3500 bytes at offset 0 -XXX Bytes, X ops;
[PATCH 09/12] Fix btrfs/097 to work on non-4k block sized filesystems
This commit makes use of the new _filter_xfs_io_blocks_modified filtering function to print information in terms of file blocks rather than file offset. Signed-off-by: Chandan Rajendra--- tests/btrfs/097 | 42 +- tests/btrfs/097.out | 7 +-- 2 files changed, 26 insertions(+), 23 deletions(-) diff --git a/tests/btrfs/097 b/tests/btrfs/097 index d9138ea..915ff9d 100755 --- a/tests/btrfs/097 +++ b/tests/btrfs/097 @@ -57,22 +57,29 @@ mkdir $send_files_dir _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount -# Create our test file with a single extent of 64K starting at file offset 128K. -$XFS_IO_PROG -f -c "pwrite -S 0xaa 128K 64K" $SCRATCH_MNT/foo | _filter_xfs_io +BLOCK_SIZE=$(get_block_size $SCRATCH_MNT) + +# Create our test file with a single extent of 16 blocks starting at a file +# offset mapped by 32nd block. +$XFS_IO_PROG -f -c "pwrite -S 0xaa $((32 * $BLOCK_SIZE)) $((16 * $BLOCK_SIZE))" \ +$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1 # Now clone parts of the original extent into lower offsets of the file. # # The first clone operation adds a file extent item to file offset 0 that points -# to our initial extent with a data offset of 16K. The corresponding data back -# reference in the extent tree has an offset of 18446744073709535232, which is -# the result of file_offset - data_offset = 0 - 16K. -# -# The second clone operation adds a file extent item to file offset 16K that -# points to our initial extent with a data offset of 48K. The corresponding data -# back reference in the extent tree has an offset of 18446744073709518848, which -# is the result of file_offset - data_offset = 16K - 48K. +# to our initial extent with a data offset of 4 blocks. The corresponding data back +# reference in the extent tree has a large value for the 'offset' field, which is +# the result of file_offset - data_offset = 0 - (file offset of 4th block). For +# example in case of 4k block size, it will be 0 - 16k = 18446744073709535232. + +# The second clone operation adds a file extent item to file offset mapped by +# 4th block that points to our initial extent with a data offset of 12 +# blocks. The corresponding data back reference in the extent tree has a large +# value for the 'offset' field, which is the result of file_offset - data_offset +# = (file offset of 4th block) - (file offset of 12th block). For example in +# case of 4k block size, it will be 16K - 48K = 18446744073709518848. # # Those large back reference offsets (result of unsigned arithmetic underflow) # confused the back reference walking code (used by an incremental send and @@ -83,10 +90,10 @@ _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1 # "BTRFS error (device sdc): did not find backref in send_root. inode=257, \ # offset=0, disk_byte=12845056 found extent=12845056" # -$CLONER_PROG -s $(((128 + 16) * 1024)) -d 0 -l $((16 * 1024)) \ - $SCRATCH_MNT/foo $SCRATCH_MNT/foo -$CLONER_PROG -s $(((128 + 48) * 1024)) -d $((16 * 1024)) -l $((16 * 1024)) \ +$CLONER_PROG -s $(((32 + 4) * $BLOCK_SIZE)) -d 0 -l $((4 * $BLOCK_SIZE)) \ $SCRATCH_MNT/foo $SCRATCH_MNT/foo +$CLONER_PROG -s $(((32 + 12) * $BLOCK_SIZE)) -d $((4 * $BLOCK_SIZE)) \ +-l $((4 * $BLOCK_SIZE)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2 @@ -94,8 +101,7 @@ _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $send_files_dir/1.snap _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \ -f $send_files_dir/2.snap -echo "File digest in the original filesystem:" -md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch +orig_hash=$(md5sum $SCRATCH_MNT/mysnap2/foo | cut -f 1 -d ' ') # Now recreate the filesystem by receiving both send streams and verify we get # the same file contents that the original filesystem had. @@ -106,8 +112,10 @@ _scratch_mount _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/1.snap _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/2.snap -echo "File digest in the new filesystem:" -md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch +hash=$(md5sum $SCRATCH_MNT/mysnap2/foo | cut -f 1 -d ' ') +if [ $orig_hash != $hash ]; then + echo "Btrfs send/receive failed: Mismatching hash values detected." +fi status=0 exit diff --git a/tests/btrfs/097.out b/tests/btrfs/097.out index 5e87eb2..c3a19c1 100644 --- a/tests/btrfs/097.out +++ b/tests/btrfs/097.out @@ -1,7 +1,2 @@ QA output created by 097 -wrote 65536/65536 bytes at offset 131072 -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -File digest in the original filesystem: -6c6079335cff141b8a31233ead04cbff SCRATCH_MNT/mysnap2/foo -File digest in the new filesystem: -6c6079335cff141b8a31233ead04cbff SCRATCH_MNT/mysnap2/foo +Blocks
[PATCH 07/12] Fix btrfs/095 to work on non-4k block sized filesystems
This commit makes use of the new _filter_xfs_io_blocks_modified filtering function to print information in terms of file blocks rather than file offset. Signed-off-by: Chandan Rajendra--- tests/btrfs/095 | 113 +--- tests/btrfs/095.out | 10 + 2 files changed, 66 insertions(+), 57 deletions(-) diff --git a/tests/btrfs/095 b/tests/btrfs/095 index 1b4ba90..e73b14e 100755 --- a/tests/btrfs/095 +++ b/tests/btrfs/095 @@ -63,85 +63,100 @@ _scratch_mkfs >>$seqres.full 2>&1 _init_flakey _mount_flakey -# Create prealloc extent covering range [160K, 620K[ -$XFS_IO_PROG -f -c "falloc 160K 460K" $SCRATCH_MNT/foo +BLOCK_SIZE=$(get_block_size $SCRATCH_MNT) -# Now write to the last 80K of the prealloc extent plus 40K to the unallocated -# space that immediately follows it. This creates a new extent of 40K that spans -# the range [620K, 660K[. -$XFS_IO_PROG -c "pwrite -S 0xaa 540K 120K" $SCRATCH_MNT/foo | _filter_xfs_io +# Create prealloc extent covering file block range [40, 155[ +$XFS_IO_PROG -f -c "falloc $((40 * $BLOCK_SIZE)) $((115 * $BLOCK_SIZE))" \ +$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified + +# Now write to the last 20 blocks of the prealloc extent plus 10 blocks to the +# unallocated space that immediately follows it. This creates a new extent of 10 +# blocks that spans the block range [155, 165[. +$XFS_IO_PROG -c "pwrite -S 0xaa $((135 * $BLOCK_SIZE)) $((30 * $BLOCK_SIZE))" \ +$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified # At this point, there are now 2 back references to the prealloc extent in our -# extent tree. Both are for our file offset 160K and one relates to a file -# extent item with a data offset of 0 and a length of 380K, while the other -# relates to a file extent item with a data offset of 380K and a length of 80K. +# extent tree. Both are for our file offset mapped by the 40th block of the file +# and one relates to a file extent item with a data offset of 0 and a length of +# 95 blocks, while the other relates to a file extent item with a data offset of +# 95 blocks and a length of 20 blocks. # Make sure everything done so far is durably persisted (all back references are # in the extent tree, etc). sync -# Now clone all extents of our file that cover the offset 160K up to its eof -# (660K at this point) into itself at offset 2M. This leaves a hole in the file -# covering the range [660K, 2M[. The prealloc extent will now be referenced by -# the file twice, once for offset 160K and once for offset 2M. The 40K extent -# that follows the prealloc extent will also be referenced twice by our file, -# once for offset 620K and once for offset 2M + 460K. -$CLONER_PROG -s $((160 * 1024)) -d $((2 * 1024 * 1024)) -l 0 $SCRATCH_MNT/foo \ - $SCRATCH_MNT/foo - -# Now create one new extent in our file with a size of 100Kb. It will span the -# range [3M, 3M + 100K[. It also will cause creation of a hole spanning the -# range [2M + 460K, 3M[. Our new file size is 3M + 100K. -$XFS_IO_PROG -c "pwrite -S 0xbb 3M 100K" $SCRATCH_MNT/foo | _filter_xfs_io +# Now clone all extents of our file that cover the file range spanned by 40th +# block up to its eof (165th block at this point) into itself at 512th +# block. This leaves a hole in the file covering the block range [165, 512[. The +# prealloc extent will now be referenced by the file twice, once for offset +# mapped by the 40th block and once for offset mapped by 512th block. The 10 +# blocks extent that follows the prealloc extent will also be referenced twice +# by our file, once for offset mapped by the 155th block and once for offset +# (512 block + 115 blocks) +$CLONER_PROG -s $((40 * $BLOCK_SIZE)) -d $((512 * $BLOCK_SIZE)) -l 0 \ +$SCRATCH_MNT/foo $SCRATCH_MNT/foo + +# Now create one new extent in our file with a size of 25 blocks. It will span +# the block range [768, 768 + 25[. It also will cause creation of a hole +# spanning the block range [512 + 115, 768[. Our new file size is the file +# offset mapped by (768 + 25)th block. +$XFS_IO_PROG -c "pwrite -S 0xbb $((768 * $BLOCK_SIZE)) $((25 * $BLOCK_SIZE))" \ +$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified # At this point, there are now (in memory) 4 back references to the prealloc # extent. # -# Two of them are for file offset 160K, related to file extent items -# matching the file offsets 160K and 540K respectively, with data offsets of -# 0 and 380K respectively, and with lengths of 380K and 80K respectively. +# Two of them are for file offset mapped by the 40th block, related to file +# extent items matching the file offsets mapped by 40th and 135th block +# respectively, with data offsets of 0 and 95 blocks respectively, and with +# lengths of 95 and 20 blocks respectively. # -# The other two references are for file offset 2M, related to file extent items -# matching the file offsets 2M and 2M + 380K respectively, with
[PATCH 08/12] Fix btrfs/096 to work on non-4k block sized filesystems
This commit makes use of the new _filter_xfs_io_blocks_modified filtering function to print information in terms of file blocks rather than file offset. Signed-off-by: Chandan Rajendra--- tests/btrfs/096 | 45 + tests/btrfs/096.out | 15 +-- 2 files changed, 30 insertions(+), 30 deletions(-) diff --git a/tests/btrfs/096 b/tests/btrfs/096 index f5b3a7f..896a209 100755 --- a/tests/btrfs/096 +++ b/tests/btrfs/096 @@ -51,30 +51,35 @@ rm -f $seqres.full _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount -# Create our test files. File foo has the same 2K of data at offset 4K as file -# bar has at its offset 0. -$XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \ - -c "pwrite -S 0xbb 4k 2K" \ - -c "pwrite -S 0xcc 8K 4K" \ - $SCRATCH_MNT/foo | _filter_xfs_io +BLOCK_SIZE=$(get_block_size $SCRATCH_MNT) -# File bar consists of a single inline extent (2K size). -$XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \ - $SCRATCH_MNT/bar | _filter_xfs_io +# Create our test files. File foo has the same 2k of data at offset $BLOCK_SIZE +# as file bar has at its offset 0. +$XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 $BLOCK_SIZE" \ + -c "pwrite -S 0xbb $BLOCK_SIZE 2k" \ + -c "pwrite -S 0xcc $(($BLOCK_SIZE * 2)) $BLOCK_SIZE" \ + $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified -# Now call the clone ioctl to clone the extent of file bar into file foo at its -# offset 4K. This made file foo have an inline extent at offset 4K, something -# which the btrfs code can not deal with in future IO operations because all -# inline extents are supposed to start at an offset of 0, resulting in all sorts -# of chaos. -# So here we validate that the clone ioctl returns an EOPNOTSUPP, which is what -# it returns for other cases dealing with inlined extents. -$CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \ +# File bar consists of a single inline extent (2k in size). +$XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2k" \ + $SCRATCH_MNT/bar | _filter_xfs_io_blocks_modified + +# Now call the clone ioctl to clone the extent of file bar into file +# foo at its $BLOCK_SIZE offset. This made file foo have an inline +# extent at offset $BLOCK_SIZE, something which the btrfs code can not +# deal with in future IO operations because all inline extents are +# supposed to start at an offset of 0, resulting in all sorts of +# chaos. +# So here we validate that the clone ioctl returns an EOPNOTSUPP, +# which is what it returns for other cases dealing with inlined +# extents. +$CLONER_PROG -s 0 -d $BLOCK_SIZE -l 2048 \ $SCRATCH_MNT/bar $SCRATCH_MNT/foo -# Because of the inline extent at offset 4K, the following write made the kernel -# crash with a BUG_ON(). -$XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io +# Because of the inline extent at offset $BLOCK_SIZE, the following +# write made the kernel crash with a BUG_ON(). +$XFS_IO_PROG -c "pwrite -S 0xdd $(($BLOCK_SIZE + 2048)) 2k" \ +$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified status=0 exit diff --git a/tests/btrfs/096.out b/tests/btrfs/096.out index 235198d..2a4251e 100644 --- a/tests/btrfs/096.out +++ b/tests/btrfs/096.out @@ -1,12 +1,7 @@ QA output created by 096 -wrote 4096/4096 bytes at offset 0 -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -wrote 2048/2048 bytes at offset 4096 -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -wrote 4096/4096 bytes at offset 8192 -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -wrote 2048/2048 bytes at offset 0 -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +Blocks modified: [0 - 0] +Blocks modified: [1 - 1] +Blocks modified: [2 - 2] +Blocks modified: [0 - 0] clone failed: Operation not supported -wrote 2048/2048 bytes at offset 6144 -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +Blocks modified: [1 - 1] -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/12] Fix btrfs/055 to work on non-4k block sized filesystems
This commit makes use of the new _filter_xfs_io_blocks_modified and _filter_od filtering functions to print information in terms of file blocks rather than file offset. Signed-off-by: Chandan Rajendra--- tests/btrfs/055 | 128 ++ tests/btrfs/055.out | 378 +--- 2 files changed, 259 insertions(+), 247 deletions(-) diff --git a/tests/btrfs/055 b/tests/btrfs/055 index c0dd9ed..1f50850 100755 --- a/tests/btrfs/055 +++ b/tests/btrfs/055 @@ -60,88 +60,110 @@ test_btrfs_clone_with_holes() _scratch_mkfs "$1" >/dev/null 2>&1 _scratch_mount - # Create a file with 4 extents and 1 hole, all with a size of 8Kb each. - # The hole is in the range [16384, 24576[. - $XFS_IO_PROG -s -f -c "pwrite -S 0x01 -b 8192 0 8192" \ - -c "pwrite -S 0x02 -b 8192 8192 8192" \ - -c "pwrite -S 0x04 -b 8192 24576 8192" \ - -c "pwrite -S 0x05 -b 8192 32768 8192" \ - $SCRATCH_MNT/foo | _filter_xfs_io + BLOCK_SIZE=$(get_block_size $SCRATCH_MNT) - # Clone destination file, 1 extent of 96kb. - $XFS_IO_PROG -s -f -c "pwrite -S 0xff -b 98304 0 98304" \ - $SCRATCH_MNT/bar | _filter_xfs_io + EXTENT_SIZE=$((2 * $BLOCK_SIZE)) - # Clone 2nd extent, 8Kb hole and 3rd extent of foo into bar. - $CLONER_PROG -s 8192 -d 0 -l 24576 $SCRATCH_MNT/foo $SCRATCH_MNT/bar + OFFSET=0 + + # Create a file with 4 extents and 1 hole, all with 2 blocks each. + # The hole is in the block range [4, 5[. + $XFS_IO_PROG -s -f -c "pwrite -S 0x01 -b $EXTENT_SIZE $OFFSET $EXTENT_SIZE" \ +$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified + + OFFSET=$(($OFFSET + $EXTENT_SIZE)) + $XFS_IO_PROG -s -f -c "pwrite -S 0x02 -b $EXTENT_SIZE $OFFSET $EXTENT_SIZE" \ +$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified + + OFFSET=$(($OFFSET + 2 * $EXTENT_SIZE)) + $XFS_IO_PROG -s -f -c "pwrite -S 0x04 -b $EXTENT_SIZE $OFFSET $EXTENT_SIZE" \ +$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified + + OFFSET=$(($OFFSET + $EXTENT_SIZE)) + $XFS_IO_PROG -s -f -c "pwrite -S 0x05 -b $EXTENT_SIZE $OFFSET $EXTENT_SIZE" \ +$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified + + # Clone destination file, 1 extent of 24 blocks. + EXTENT_SIZE=$((24 * $BLOCK_SIZE)) + $XFS_IO_PROG -s -f -c "pwrite -S 0xff -b $EXTENT_SIZE 0 $EXTENT_SIZE" \ + $SCRATCH_MNT/bar | _filter_xfs_io_blocks_modified + + # Clone 2nd extent, 2-blocks sized hole and 3rd extent of foo into bar. + $CLONER_PROG -s $((2 * $BLOCK_SIZE)) -d 0 -l $((6 * $BLOCK_SIZE)) \ +$SCRATCH_MNT/foo $SCRATCH_MNT/bar # Verify both extents and the hole were cloned. echo "1) Check both extents and the hole were cloned" - od -t x1 $SCRATCH_MNT/bar + od -t x1 $SCRATCH_MNT/bar | _filter_od - # Cloning range starts at the middle of an hole. - $CLONER_PROG -s 20480 -d 32768 -l 12288 $SCRATCH_MNT/foo \ - $SCRATCH_MNT/bar + # Cloning range starts at the middle of a hole. + $CLONER_PROG -s $((5 * $BLOCK_SIZE)) -d $((8 * $BLOCK_SIZE)) \ +-l $((3 * $BLOCK_SIZE)) $SCRATCH_MNT/foo $SCRATCH_MNT/bar - # Verify that half of the hole and the following 8Kb extent were cloned. - echo "2) Check half hole and one 8Kb extent were cloned" - od -t x1 $SCRATCH_MNT/bar + # Verify that half of the hole and the following 2 block extent were cloned. + echo "2) Check half hole and the following 2 block extent were cloned" + od -t x1 $SCRATCH_MNT/bar | _filter_od - # Cloning range ends at the middle of an hole. - $CLONER_PROG -s 0 -d 65536 -l 20480 $SCRATCH_MNT/foo $SCRATCH_MNT/bar + # Cloning range ends at the middle of a hole. + $CLONER_PROG -s 0 -d $((16 * $BLOCK_SIZE)) -l $((5 * $BLOCK_SIZE)) \ +$SCRATCH_MNT/foo $SCRATCH_MNT/bar - # Verify that 2 extents of 8kb and a 4kb hole were cloned. - echo "3) Check that 2 extents of 8kb eacg and a 4kb hole were cloned" - od -t x1 $SCRATCH_MNT/bar + # Verify that 2 extents of 2 blocks size and a 1-block hole were cloned. + echo "3) Check that 2 extents of 2 blocks each and a hole of 1 block were cloned" + od -t x1 $SCRATCH_MNT/bar | _filter_od - # Create a 24Kb hole at the end of the source file (foo). - $XFS_IO_PROG -c "truncate 65536" $SCRATCH_MNT/foo + # Create a 6-block hole at the end of the source file (foo). + $XFS_IO_PROG -c "truncate $((16 * $BLOCK_SIZE))" $SCRATCH_MNT/foo \ + | _filter_xfs_io_blocks_modified sync # Now clone a range that overlaps that hole at the end of the foo file. - # It should
[PATCH 11/12] Fix btrfs/103 to work on non-4k block sized filesystems
This commit makes use of the new _filter_xfs_io_blocks_modified filtering function to print information in terms of file blocks rather than file offset. Signed-off-by: Chandan Rajendra--- tests/btrfs/103 | 47 +-- tests/btrfs/103.out | 48 2 files changed, 41 insertions(+), 54 deletions(-) diff --git a/tests/btrfs/103 b/tests/btrfs/103 index 3020c86..a807900 100755 --- a/tests/btrfs/103 +++ b/tests/btrfs/103 @@ -56,31 +56,34 @@ test_clone_and_read_compressed_extent() _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount $mount_opts + BLOCK_SIZE=$(get_block_size $SCRATCH_MNT) + # Create a test file with a single extent that is compressed (the # data we write into it is highly compressible no matter which # compression algorithm is used, zlib or lzo). - $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 4K"\ - -c "pwrite -S 0xbb 4K 8K"\ - -c "pwrite -S 0xcc 12K 4K" \ - $SCRATCH_MNT/foo | _filter_xfs_io + $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K $((1 * $BLOCK_SIZE))" \ + -c "pwrite -S 0xbb $((1 * $BLOCK_SIZE)) $((2 * $BLOCK_SIZE))" \ + -c "pwrite -S 0xcc $((3 * $BLOCK_SIZE)) $((1 * $BLOCK_SIZE))" \ + $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified + # Now clone our extent into an adjacent offset. - $CLONER_PROG -s $((4 * 1024)) -d $((16 * 1024)) -l $((8 * 1024)) \ - $SCRATCH_MNT/foo $SCRATCH_MNT/foo + $CLONER_PROG -s $((1 * $BLOCK_SIZE)) -d $((4 * $BLOCK_SIZE)) \ +-l $((2 * $BLOCK_SIZE)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo # Same as before but for this file we clone the extent into a lower # file offset. - $XFS_IO_PROG -f -c "pwrite -S 0xaa 8K 4K" \ - -c "pwrite -S 0xbb 12K 8K"\ - -c "pwrite -S 0xcc 20K 4K"\ - $SCRATCH_MNT/bar | _filter_xfs_io + $XFS_IO_PROG -f \ + -c "pwrite -S 0xaa $((2 * $BLOCK_SIZE)) $((1 * $BLOCK_SIZE))" \ + -c "pwrite -S 0xbb $((3 * $BLOCK_SIZE)) $((2 * $BLOCK_SIZE))" \ + -c "pwrite -S 0xcc $((5 * $BLOCK_SIZE)) $((1 * $BLOCK_SIZE))" \ + $SCRATCH_MNT/bar | _filter_xfs_io_blocks_modified - $CLONER_PROG -s $((12 * 1024)) -d 0 -l $((8 * 1024)) \ + $CLONER_PROG -s $((3 * $BLOCK_SIZE)) -d 0 -l $((2 * $BLOCK_SIZE)) \ $SCRATCH_MNT/bar $SCRATCH_MNT/bar - echo "File digests before unmounting filesystem:" - md5sum $SCRATCH_MNT/foo | _filter_scratch - md5sum $SCRATCH_MNT/bar | _filter_scratch + foo_orig_hash=$(md5sum $SCRATCH_MNT/foo | cut -f 1 -d ' ') + bar_orig_hash=$(md5sum $SCRATCH_MNT/bar | cut -f 1 -d ' ') # Evicting the inode or clearing the page cache before reading again # the file would also trigger the bug - reads were returning all bytes @@ -91,10 +94,18 @@ test_clone_and_read_compressed_extent() # ranges that point to the same compressed extent. _scratch_remount - echo "File digests after mounting filesystem again:" - # Must match the same digests we got before. - md5sum $SCRATCH_MNT/foo | _filter_scratch - md5sum $SCRATCH_MNT/bar | _filter_scratch + foo_hash=$(md5sum $SCRATCH_MNT/foo | cut -f 1 -d ' ') + bar_hash=$(md5sum $SCRATCH_MNT/bar | cut -f 1 -d ' ') + + if [ $foo_orig_hash != $foo_hash ]; then + echo "Read operation failed on $SCRATCH_MNT/foo: "\ +"Mimatching hash values detected." + fi + + if [ $bar_orig_hash != $bar_hash ]; then + echo "Read operation failed on $SCRATCH_MNT/bar: "\ +"Mimatching hash values detected." + fi } echo -e "\nTesting with zlib compression..." diff --git a/tests/btrfs/103.out b/tests/btrfs/103.out index f62de2f..964b70f 100644 --- a/tests/btrfs/103.out +++ b/tests/btrfs/103.out @@ -1,41 +1,17 @@ QA output created by 103 Testing with zlib compression... -wrote 4096/4096 bytes at offset 0 -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -wrote 8192/8192 bytes at offset 4096 -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -wrote 4096/4096 bytes at offset 12288 -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -wrote 4096/4096 bytes at offset 8192 -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -wrote 8192/8192 bytes at offset 12288 -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -wrote 4096/4096 bytes at offset 20480 -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -File digests before unmounting filesystem: -4b985a45790261a706c3ddbf22c5f765 SCRATCH_MNT/foo -fd331e6b7a9ab105f48f71b53162d5b5 SCRATCH_MNT/bar -File digests after
[PATCH 00/12] Fix Btrfs tests to work on non-4k block sized fs instances
This patchset fixes Btrfs tests to work on variable block size. This is based off the RFC patch sent during March of this year (https://www.marc.info/?l=linux-btrfs=142736088310300=2). Currently, some of the tests are written with the assumption that 4k is the block size of the filesystem instance. On architectures (e.g. ppc64) with a larger page size (and hence larger block size), these tests fail because the block bondaries assumed by the the tests are no longer true and hence btrfs_ioctl_clone() (which requires a block aligned file offset range) returns -EINVAL. To fix the issue, This patchset adds three new filter functions: 1. _filter_xfs_io_blocks_modified 2. _filter_xfs_io_pages_modified 3. _filter_od P.S: Since the changes made are trivial, I could have clubbed all the patches into two patches. First patch introducing the new filtering functions and the second patch containing the changes to made to the tests. If this approach sounds right, I can post version V2 with the two patches containing the relevant changes. Chandan Rajendra (12): Filter xfs_io and od's output in units of FS block size and the CPU's page size Fix btrfs/017 to work on non-4k block sized filesystems Fix btrfs/052 to work on non-4k block sized filesystems Fix btrfs/055 to work on non-4k block sized filesystems Fix btrfs/056 to work on non-4k block sized filesystems Fix btrfs/094 to work on non-4k block sized filesystems Fix btrfs/095 to work on non-4k block sized filesystems Fix btrfs/096 to work on non-4k block sized filesystems Fix btrfs/097 to work on non-4k block sized filesystems Fix btrfs/098 to work on non-4k block sized filesystems Fix btrfs/103 to work on non-4k block sized filesystems Fix btrfs/106 to work on non-4k block sized filesystems common/filter | 52 + common/rc | 5 + tests/btrfs/017 | 16 +- tests/btrfs/017.out | 3 +- tests/btrfs/052 | 127 +++- tests/btrfs/052.out | 546 +++- tests/btrfs/055 | 128 +++- tests/btrfs/055.out | 378 ++-- tests/btrfs/056 | 51 +++-- tests/btrfs/056.out | 152 ++- tests/btrfs/094 | 78 +--- tests/btrfs/094.out | 17 +- tests/btrfs/095 | 113 ++- tests/btrfs/095.out | 10 +- tests/btrfs/096 | 45 +++-- tests/btrfs/096.out | 15 +- tests/btrfs/097 | 42 ++-- tests/btrfs/097.out | 7 +- tests/btrfs/098 | 65 --- tests/btrfs/098.out | 7 +- tests/btrfs/103 | 47 +++-- tests/btrfs/103.out | 48 ++--- tests/btrfs/106 | 42 ++-- tests/btrfs/106.out | 14 +- 24 files changed, 1018 insertions(+), 990 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/12] Filter xfs_io and od's output in units of FS block size and the CPU's page size
The helpers will be used to make btrfs tests that assume 4k as the block size to work on non-4k blocksized filesystem instances as well. Signed-off-by: Chandan Rajendra--- common/filter | 52 common/rc | 5 + 2 files changed, 57 insertions(+) diff --git a/common/filter b/common/filter index af456c9..faa6f82 100644 --- a/common/filter +++ b/common/filter @@ -229,6 +229,45 @@ _filter_xfs_io_unique() common_line_filter | _filter_xfs_io } +_filter_xfs_io_units_modified() +{ + UNIT=$1 + UNIT_SIZE=$2 + + $AWK_PROG -v unit="$UNIT" -v unit_size=$UNIT_SIZE ' + /wrote/ { + split($2, bytes, "/") + + bytes_written = strtonum(bytes[1]) + + offset = strtonum($NF) + + unit_start = offset / unit_size + unit_start = int(unit_start) + unit_end = (offset + bytes_written - 1) / unit_size + unit_end = int(unit_end) + + printf("%ss modified: [%d - %d]\n", unit, unit_start, unit_end) + + next + } + ' +} + +_filter_xfs_io_blocks_modified() +{ + BLOCK_SIZE=$(get_block_size $SCRATCH_MNT) + + _filter_xfs_io_units_modified "Block" $BLOCK_SIZE +} + +_filter_xfs_io_pages_modified() +{ + PAGE_SIZE=$(get_page_size) + + _filter_xfs_io_units_modified "Page" $PAGE_SIZE +} + _filter_test_dir() { sed -e "s,$TEST_DEV,TEST_DEV,g" -e "s,$TEST_DIR,TEST_DIR,g" @@ -323,5 +362,18 @@ _filter_ro_mount() { -e "s/mount: cannot mount block device/mount: cannot mount/g" } +_filter_od() +{ + BLOCK_SIZE=$(get_block_size $SCRATCH_MNT) + $AWK_PROG -v block_size=$BLOCK_SIZE ' + /^[0-9]+/ { + offset = strtonum("0"$1); + $1 = sprintf("%o", offset / block_size); + print $0; + } + /\*/ + ' +} + # make sure this script returns success /bin/true diff --git a/common/rc b/common/rc index 4c2f42c..acda6cb 100644 --- a/common/rc +++ b/common/rc @@ -3151,6 +3151,11 @@ get_block_size() echo `stat -f -c %S $1` } +get_page_size() +{ + echo $(getconf PAGE_SIZE) +} + init_rc -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/12] Fix btrfs/052 to work on non-4k block sized filesystems
This commit makes use of the new _filter_xfs_io_blocks_modified filtering function to print information in terms of file blocks rather than file offset. Signed-off-by: Chandan Rajendra--- tests/btrfs/052 | 127 +++- tests/btrfs/052.out | 546 +++- 2 files changed, 323 insertions(+), 350 deletions(-) diff --git a/tests/btrfs/052 b/tests/btrfs/052 index c75193d..55c8332 100755 --- a/tests/btrfs/052 +++ b/tests/btrfs/052 @@ -59,78 +59,105 @@ test_btrfs_clone_same_file() _scratch_mkfs >/dev/null 2>&1 _scratch_mount $MOUNT_OPTIONS - # Create a file with 5 extents, 4 of 8Kb each and 1 of 64Kb. - $XFS_IO_PROG -f -c "pwrite -S 0x01 -b 8192 0 8192" $SCRATCH_MNT/foo \ - | _filter_xfs_io + BLOCK_SIZE=$(get_block_size $SCRATCH_MNT) + + EXTENT_SIZE=$((2 * $BLOCK_SIZE)) + + # Create a file with 5 extents, 4 of 2 blocks each and 1 of 16 blocks. + OFFSET=0 + $XFS_IO_PROG -f -c "pwrite -S 0x01 -b $EXTENT_SIZE $OFFSET $EXTENT_SIZE" $SCRATCH_MNT/foo \ + | _filter_xfs_io_blocks_modified sync - $XFS_IO_PROG -c "pwrite -S 0x02 -b 8192 8192 8192" $SCRATCH_MNT/foo \ - | _filter_xfs_io + + OFFSET=$(($OFFSET + $EXTENT_SIZE)) + $XFS_IO_PROG -c "pwrite -S 0x02 -b $EXTENT_SIZE $OFFSET $EXTENT_SIZE" $SCRATCH_MNT/foo \ + | _filter_xfs_io_blocks_modified sync - $XFS_IO_PROG -c "pwrite -S 0x03 -b 8192 16384 8192" $SCRATCH_MNT/foo \ - | _filter_xfs_io + + OFFSET=$(($OFFSET + $EXTENT_SIZE)) + $XFS_IO_PROG -c "pwrite -S 0x03 -b $EXTENT_SIZE $OFFSET $EXTENT_SIZE" $SCRATCH_MNT/foo \ + | _filter_xfs_io_blocks_modified sync - $XFS_IO_PROG -c "pwrite -S 0x04 -b 8192 24576 8192" $SCRATCH_MNT/foo \ - | _filter_xfs_io + + OFFSET=$(($OFFSET + $EXTENT_SIZE)) + $XFS_IO_PROG -c "pwrite -S 0x04 -b $EXTENT_SIZE $OFFSET $EXTENT_SIZE" $SCRATCH_MNT/foo \ + | _filter_xfs_io_blocks_modified sync - $XFS_IO_PROG -c "pwrite -S 0x05 -b 65536 32768 65536" $SCRATCH_MNT/foo \ - | _filter_xfs_io + + OFFSET=$(($OFFSET + $EXTENT_SIZE)) + EXTENT_SIZE=$((16 * $BLOCK_SIZE)) + $XFS_IO_PROG -c "pwrite -S 0x05 -b $EXTENT_SIZE $OFFSET $EXTENT_SIZE" $SCRATCH_MNT/foo \ + | _filter_xfs_io_blocks_modified sync # Digest of initial content. - md5sum $SCRATCH_MNT/foo | _filter_scratch + orig_hash=$(md5sum $SCRATCH_MNT/foo | cut -f 1 -d ' ') # Same source and target ranges - must fail. - $CLONER_PROG -s 8192 -d 8192 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/foo + $CLONER_PROG -s $((2 * $BLOCK_SIZE)) -d $((2 * $BLOCK_SIZE)) \ +-l $((2 * $BLOCK_SIZE)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo # Check file content didn't change. - md5sum $SCRATCH_MNT/foo | _filter_scratch + hash=$(md5sum $SCRATCH_MNT/foo | cut -f 1 -d ' ') + if [ $orig_hash != $hash ]; then + echo "Cloning same source and target ranges:"\ +"Mimatching hash values detected." + fi # Intersection between source and target ranges - must fail too. - $CLONER_PROG -s 4096 -d 8192 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/foo + # $CLONER_PROG -s 4096 -d 8192 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/foo + $CLONER_PROG -s $((1 * $BLOCK_SIZE)) -d $((2 * $BLOCK_SIZE)) \ +-l $((2 * $BLOCK_SIZE)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo # Check file content didn't change. - md5sum $SCRATCH_MNT/foo | _filter_scratch + hash=$(md5sum $SCRATCH_MNT/foo | cut -f 1 -d ' ') + if [ $orig_hash != $hash ]; then + echo "Cloning intersection between source and target ranges:"\ +"Mismatching hash values detected." + fi # Clone an entire extent from a higher range to a lower range. - $CLONER_PROG -s 24576 -d 0 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/foo - - # Check entire file, the 8Kb block at offset 0 now has the same content - # as the 8Kb block at offset 24576. - od -t x1 $SCRATCH_MNT/foo + $CLONER_PROG -s $((6 * $BLOCK_SIZE)) -d 0 -l $((2 * $BLOCK_SIZE)) \ +$SCRATCH_MNT/foo $SCRATCH_MNT/foo + # Check entire file, 0th and 1st blocks now have the same content + # as the 6th and 7th blocks. + od -t x1 $SCRATCH_MNT/foo | _filter_od # Clone an entire extent from a lower range to a higher range. - $CLONER_PROG -s 8192 -d 16384 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/foo - - # Check entire file, the 8Kb block at offset 0 now has the same content - # as the 8Kb block at offset 24576, and the 8Kb block at offset 16384 - # now has the same content as the 8Kb block at offset 8192. - od -t x1 $SCRATCH_MNT/foo - - #
[PATCH 02/12] Fix btrfs/017 to work on non-4k block sized filesystems
This commit makes use of the new _filter_xfs_io_blocks_modified filtering function to print information in terms of file blocks rather than file offset. Signed-off-by: Chandan Rajendra--- tests/btrfs/017 | 16 tests/btrfs/017.out | 3 +-- 2 files changed, 13 insertions(+), 6 deletions(-) diff --git a/tests/btrfs/017 b/tests/btrfs/017 index f8855e3..34c5f0a 100755 --- a/tests/btrfs/017 +++ b/tests/btrfs/017 @@ -63,13 +63,21 @@ rm -f $seqres.full _scratch_mkfs "--nodesize 65536" >>$seqres.full 2>&1 _scratch_mount -$XFS_IO_PROG -f -d -c "pwrite 0 8K" $SCRATCH_MNT/foo | _filter_xfs_io +BLOCK_SIZE=$(get_block_size $SCRATCH_MNT) +EXTENT_SIZE=$((2 * $BLOCK_SIZE)) + +$XFS_IO_PROG -f -d -c "pwrite 0 $EXTENT_SIZE" $SCRATCH_MNT/foo \ + | _filter_xfs_io_blocks_modified _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap -$CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/foo-reflink -$CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/snap/foo-reflink -$CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/snap/foo-reflink2 +$CLONER_PROG -s 0 -d 0 -l $EXTENT_SIZE $SCRATCH_MNT/foo $SCRATCH_MNT/foo-reflink + +$CLONER_PROG -s 0 -d 0 -l $EXTENT_SIZE $SCRATCH_MNT/foo \ +$SCRATCH_MNT/snap/foo-reflink + +$CLONER_PROG -s 0 -d 0 -l $EXTENT_SIZE $SCRATCH_MNT/foo \ +$SCRATCH_MNT/snap/foo-reflink2 _run_btrfs_util_prog quota enable $SCRATCH_MNT _run_btrfs_util_prog quota rescan -w $SCRATCH_MNT diff --git a/tests/btrfs/017.out b/tests/btrfs/017.out index f940f3a..503eb88 100644 --- a/tests/btrfs/017.out +++ b/tests/btrfs/017.out @@ -1,5 +1,4 @@ QA output created by 017 -wrote 8192/8192 bytes at offset 0 -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +Blocks modified: [0 - 1] 65536 65536 65536 65536 -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/12] Fix btrfs/094 to work on non-4k block sized filesystems
On Wednesday 25 Nov 2015 11:51:52 Filipe Manana wrote: > On Wed, Nov 25, 2015 at 11:47 AM, Chandan Rajendra > >wrote: > > On Wednesday 25 Nov 2015 11:11:27 Filipe Manana wrote: > >> Hi Chandan, > >> > >> I can't agree with this change. We're no longer checking that file > >> data is correct after the cloning operations. The md5sum checks were > >> exactly for that. So essentially the test is only verifying the clone > >> operations don't fail with errors, it no longer checks for data > >> corruption... > >> > >> Same comment applies to at least a few other patches in the series. > > > > Hello Filipe, > > > > All the tests where we had md5sum being echoed into output have been > > replaced with code to verify the md5sum values as shown below, > > > > if [ $foo_orig_hash != $foo_hash ]; then > > > > echo "Read operation failed on $SCRATCH_MNT/foo: "\ > > > > "Mimatching hash values detected." > > > > fi > > > > This will cause a diff between the test's ideal output versus the output > > obtained during the test run. > > Right, it compares the digests before and after some operation (which > should always match). However we no longer validate that the file > content is correct before the operation. For some of the tests that is > more important, like the ones that test read corruption after cloning > compressed extents. Filipe, you are right. I will drop the fautly patches and send V2 containing fixes for only btrfs/017, btrfs/055 and btrfs/056. Thanks for providing the review comments. -- chandan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/7] btrfs-progs: introduce framework version to features
As discussed in the mailing list this provides a framework to introduce the feature where mkfs and btrfs-convert can set the default features as per the given mainline kernel version. Suggested-by: David SterbaSigned-off-by: Anand Jain --- utils.c | 23 +++ utils.h | 1 + 2 files changed, 24 insertions(+) diff --git a/utils.c b/utils.c index 216efa6..a9b46b8 100644 --- a/utils.c +++ b/utils.c @@ -3222,3 +3222,26 @@ int btrfs_features_allowed_by_sysfs(u64 *features) closedir(dir); return 0; } + +int btrfs_features_allowed_by_version(char *version, u64 *features) +{ + int i; + int code; + char *ver = strdup(version); + + *features = 0; + code = version_to_code(ver); + free(ver); + if (code < 0) + return code; + + for (i = 0; i < ARRAY_SIZE(mkfs_features) - 1; i++) { + ver = strdup(mkfs_features[i].min_ker_ver); + + if (code >= version_to_code(ver)) + *features |= mkfs_features[i].flag; + + free(ver); + } + return 0; +} diff --git a/utils.h b/utils.h index cb20d73..1418e84 100644 --- a/utils.h +++ b/utils.h @@ -106,6 +106,7 @@ void btrfs_process_fs_features(u64 flags); void btrfs_parse_features_to_string(char *buf, u64 flags); u64 btrfs_features_allowed_by_kernel(void); int btrfs_features_allowed_by_sysfs(u64 *features); +int btrfs_features_allowed_by_version(char *version, u64 *features); struct btrfs_mkfs_config { char *label; -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] btrfs-progs: show the version for -O list-all
Shows min kernel version in the -O list-all output eg: (version is show with in ()) btrfs-convert -O list-all Filesystem features available: extref - increased hardlink limit per file to 65536 (0x40, 3.7, default) skinny-metadata - reduced-size metadata extent refs (0x100, 3.10, default) no-holes- no explicit hole extents for files (0x200, 3.14) mkfs.btrfs -O list-all Filesystem features available: mixed-bg- mixed data and metadata block groups (0x4, 2.7.37) extref - increased hardlink limit per file to 65536 (0x40, 3.7, default) raid56 - raid56 extended format (0x80, 3.9) skinny-metadata - reduced-size metadata extent refs (0x100, 3.10, default) no-holes- no explicit hole extents for files (0x200, 3.14) Signed-off-by: Anand Jain--- utils.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/utils.c b/utils.c index 2710ed7..0163915 100644 --- a/utils.c +++ b/utils.c @@ -657,10 +657,11 @@ void btrfs_list_all_fs_features(u64 mask_disallowed) continue; if (mkfs_features[i].flag & BTRFS_MKFS_DEFAULT_FEATURES) is_default = ", default"; - fprintf(stderr, "%-20s- %s (0x%llx%s)\n", + fprintf(stderr, "%-20s- %s (0x%llx, %s%s)\n", mkfs_features[i].name, mkfs_features[i].desc, mkfs_features[i].flag, + mkfs_features[i].min_ker_ver, is_default); } } -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/7] btrfs-progs: add -O comp= option for mkfs.btrfs
This provides default feature set by version, for mkfs.btrfs through the new option '-O comp=|', where x.y.z is the minimum kernel version that should be supported. Signed-off-by: Anand Jain--- mkfs.c | 24 ++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/mkfs.c b/mkfs.c index 6cb998b..34ba77d 100644 --- a/mkfs.c +++ b/mkfs.c @@ -324,7 +324,9 @@ static void print_usage(int ret) fprintf(stderr, "\t-s|--sectorsize SIZEmin block allocation (may not mountable by current kernel)\n"); fprintf(stderr, "\t-r|--rootdir DIRthe source directory\n"); fprintf(stderr, "\t-K|--nodiscard do not perform whole device TRIM\n"); - fprintf(stderr, "\t-O|--features LIST comma separated list of filesystem features, use '-O list-all' to list features\n"); + fprintf(stderr, "\t-O|--features LIST comma separated list of filesystem features\n"); + fprintf(stderr, "\t use '-O list-all' to list features\n"); + fprintf(stderr, "\t use '-O comp=|' x.y.z is the minimum kernel version to be supported\n"); fprintf(stderr, "\t-U|--uuid UUID specify the filesystem UUID\n"); fprintf(stderr, "\t-q|--quiet no messages except errors\n"); fprintf(stderr, "\t-V|--versionprint the mkfs.btrfs version and exit\n"); @@ -1439,7 +1441,24 @@ int main(int ac, char **av) case 'O': { char *orig = strdup(optarg); char *tmp = orig; - + char *tok; + + tok = strtok(tmp, "="); + if (!strcmp(tok, "comp")) { + tok = strtok(NULL, "="); + if (!tok) { + fprintf(stderr, + "Provide a version for 'comp=' option, ref to 'mkfs.btrfs -O list-all'\n"); + exit(1); + } + if (btrfs_features_allowed_by_version(tok, ) < 0) { + fprintf(stderr, "Wrong version format: '%s'\n", tok); + exit(1); + } + features &= BTRFS_MKFS_DEFAULT_FEATURES; + goto cont; + } + tmp = orig; tmp = btrfs_parse_fs_features(tmp, ); if (tmp) { fprintf(stderr, @@ -1448,6 +1467,7 @@ int main(int ac, char **av) free(orig); exit(1); } +cont: free(orig); if (features & BTRFS_FEATURE_LIST_ALL) { btrfs_list_all_fs_features(0); -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/7] Let user specify the kernel version for features
Sometimes users may want to have a btrfs to be supported on multiple kernel version. A simple example, USB drive can be used with multiple system running different kernel versions. Or in a data center a SAN LUN could be mounted on any system with different kernel version. Thanks for providing comments and feedback. Further to it, here below is a set of patch which will introduce, to specify a kernel version so that default features can be set based on what features were supported at that kernel version. First of all to let user know what features was supported at what kernel version. Patch 1/7 updates -O list-all which will list the feature with version. As we didn't maintain the sysfs and progs feature names consistent, so to avoid confusion Patch 2/7 displays sysfs feature name as well again in the list-all output. Next, Patch 3,4,5/7 are helper functions. Patch 6,7/7 provides the -O comp= for mkfs.btrfs and btrfs-convert respectively Thanks, Anand Anand Jain (7): btrfs-progs: show the version for -O list-all btrfs-progs: add kernel alias for each of the features in the list btrfs-progs: make is_numerical non static btrfs-progs: check for numerical in version_to_code() btrfs-progs: introduce framework version to features btrfs-progs: add -O comp= option for mkfs.btrfs btrfs-progs: add -O comp= option for btrfs-convert btrfs-convert.c | 21 + cmds-replace.c | 11 --- mkfs.c | 24 ++-- utils.c | 58 - utils.h | 2 ++ 5 files changed, 98 insertions(+), 18 deletions(-) -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/7] btrfs-progs: add kernel alias for each of the features in the list
We should have maintained feature's name same across progs UI and sysfs UI. For example, progs mixed-bg is /sys/fs/btrfs/features/mixed_groups in sysfs. As these are already released and is UIs, there is nothing much can be done about it, except for creating the alias and making it aware. Add kernel alias for each of the features in the list. eg: The string with in () is the sysfs name for the same feaure mkfs.btrfs -O list-all Filesystem features available: mixed-bg (mixed_groups) - mixed data and metadata block groups (0x4, 2.7.37) extref (extended_iref)- increased hardlink limit per file to 65536 (0x40, 3.7, default) raid56 (raid56) - raid56 extended format (0x80, 3.9) skinny-metadata (skinny_metadata) - reduced-size metadata extent refs (0x100, 3.10, default) no-holes (no_holes) - no explicit hole extents for files (0x200, 3.14) btrfs-convert -O list-all Filesystem features available: extref (extended_iref)- increased hardlink limit per file to 65536 (0x40, 3.7, default) skinny-metadata (skinny_metadata) - reduced-size metadata extent refs (0x100, 3.10, default) no-holes (no_holes) - no explicit hole extents for files (0x200, 3.14) --- utils.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/utils.c b/utils.c index 0163915..6d2675d 100644 --- a/utils.c +++ b/utils.c @@ -648,17 +648,26 @@ void btrfs_process_fs_features(u64 flags) void btrfs_list_all_fs_features(u64 mask_disallowed) { int i; + u64 feature_per_sysfs; + + btrfs_features_allowed_by_sysfs(_per_sysfs); fprintf(stderr, "Filesystem features available:\n"); for (i = 0; i < ARRAY_SIZE(mkfs_features) - 1; i++) { char *is_default = ""; + char name[256]; if (mkfs_features[i].flag & mask_disallowed) continue; if (mkfs_features[i].flag & BTRFS_MKFS_DEFAULT_FEATURES) is_default = ", default"; - fprintf(stderr, "%-20s- %s (0x%llx, %s%s)\n", - mkfs_features[i].name, + if (mkfs_features[i].flag & feature_per_sysfs) + sprintf(name, "%s (%s)", + mkfs_features[i].name, mkfs_features[i].name_ker); + else + sprintf(name, "%s", mkfs_features[i].name); + fprintf(stderr, "%-34s- %s (0x%llx, %s%s)\n", + name, mkfs_features[i].desc, mkfs_features[i].flag, mkfs_features[i].min_ker_ver, -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/7] btrfs-progs: check for numerical in version_to_code()
As the version is now being passed by user it should be checked if its numerical. We didn't need this before as version wasn't passed by used. So this is not a bug fix. Signed-off-by: Anand Jain--- utils.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/utils.c b/utils.c index 0e66e2b..216efa6 100644 --- a/utils.c +++ b/utils.c @@ -3119,14 +3119,18 @@ static int version_to_code(char *v) for (b[i] = strtok_r(v, ".", _b); b[i] != NULL; - b[i] = strtok_r(NULL, ".", _b)) + b[i] = strtok_r(NULL, ".", _b)) { + if (!is_numerical(b[i])) + return -EINVAL; i++; + } + if (b[1] == NULL) + return KERNEL_VERSION(atoi(b[0]), 0, 0); if (b[2] == NULL) return KERNEL_VERSION(atoi(b[0]), atoi(b[1]), 0); - else - return KERNEL_VERSION(atoi(b[0]), atoi(b[1]), atoi(b[2])); + return KERNEL_VERSION(atoi(b[0]), atoi(b[1]), atoi(b[2])); } static int get_kernel_code() -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/7] btrfs-progs: make is_numerical non static
Signed-off-by: Anand Jain--- cmds-replace.c | 11 --- utils.c| 11 +++ utils.h| 1 + 3 files changed, 12 insertions(+), 11 deletions(-) diff --git a/cmds-replace.c b/cmds-replace.c index 9ab8438..86162b6 100644 --- a/cmds-replace.c +++ b/cmds-replace.c @@ -65,17 +65,6 @@ static const char * const replace_cmd_group_usage[] = { NULL }; -static int is_numerical(const char *str) -{ - if (!(*str >= '0' && *str <= '9')) - return 0; - while (*str >= '0' && *str <= '9') - str++; - if (*str != '\0') - return 0; - return 1; -} - static int dev_replace_cancel_fd = -1; static void dev_replace_sigint_handler(int signal) { diff --git a/utils.c b/utils.c index 6d2675d..0e66e2b 100644 --- a/utils.c +++ b/utils.c @@ -178,6 +178,17 @@ int test_uuid_unique(char *fs_uuid) return unique; } +int is_numerical(const char *str) +{ + if (!(*str >= '0' && *str <= '9')) + return 0; + while (*str >= '0' && *str <= '9') + str++; + if (*str != '\0') + return 0; + return 1; +} + /* * @fs_uuid - if NULL, generates a UUID, returns back the new filesystem UUID */ diff --git a/utils.h b/utils.h index af0aa31..cb20d73 100644 --- a/utils.h +++ b/utils.h @@ -271,5 +271,6 @@ const char *get_argv0_buf(void); "-t|--tbytesshow sizes in TiB, or TB with --si" unsigned int get_unit_mode_from_arg(int *argc, char *argv[], int df_mode); +int is_numerical(const char *str); #endif -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/7] btrfs-progs: add -O comp= option for btrfs-convert
User may want to convert the FS to a minimum kernel version. As they may need to use btrfs on a set of known kernel versions. And have the disk layout compatible. Signed-off-by: Anand Jain--- btrfs-convert.c | 21 + 1 file changed, 21 insertions(+) diff --git a/btrfs-convert.c b/btrfs-convert.c index b0a998b..01b8940 100644 --- a/btrfs-convert.c +++ b/btrfs-convert.c @@ -2879,6 +2879,8 @@ static void print_usage(void) printf("\t-L|--copy-labeluse label from converted filesystem\n"); printf("\t-p|--progress show converting progress (default)\n"); printf("\t-O|--features LIST comma separated list of filesystem features\n"); + printf("\t use '-O list-all' to list features\n"); + printf("\t use '-O comp=|' x.y.z is the minimum kernel version to be supported\n"); printf("\t--no-progress show only overview, not the detailed progress\n"); } @@ -2970,6 +2972,24 @@ int main(int argc, char *argv[]) case 'O': { char *orig = strdup(optarg); char *tmp = orig; + char *tok; + + tok = strtok(tmp, "="); + if (!strcmp(tok, "comp")) { + tok = strtok(NULL, "="); + if (!tok) { + fprintf(stderr, + "Provide a version for 'comp=' option, ref to 'mkfs.btrfs -O list-all'\n"); + exit(1); + } + if (btrfs_features_allowed_by_version(tok, ) < 0) { + fprintf(stderr, "Wrong version format: '%s'\n", tok); + exit(1); + } + features &= BTRFS_MKFS_DEFAULT_FEATURES; + goto cont; + } + tmp = orig; tmp = btrfs_parse_fs_features(tmp, ); if (tmp) { @@ -2979,6 +2999,7 @@ int main(int argc, char *argv[]) free(orig); exit(1); } +cont: free(orig); if (features & BTRFS_FEATURE_LIST_ALL) { btrfs_list_all_fs_features( -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/5] btrfs-progs: introduce framework to check kernel supported features
In the newer kernel, supported kernel features can be known from /sys/fs/btrfs/features however this interface was introduced only after 3.14, and most the incompatible FS features were introduce before 3.14. This patch proposes to maintain kernel version against the feature list, and so that will be the minimum kernel version needed to use the feature. Further, for features supported later than 3.14 this list can still be updated, so it serves as a repository which can be displayed for easy reference. Signed-off-by: Anand Jain--- v3: Mike pointed out that mixed-bg was from version 2.6.37, update it v2: Check for condition that what happens when we fail to read kernel version. Now the code will fail back to use the default as set by the progs. utils.c | 80 - utils.h | 1 + 2 files changed, 76 insertions(+), 5 deletions(-) diff --git a/utils.c b/utils.c index b754686..cc0bdfb 100644 --- a/utils.c +++ b/utils.c @@ -32,10 +32,12 @@ #include #include #include +#include #include #include #include #include +#include #include #include "kerncompat.h" @@ -567,21 +569,28 @@ out: return ret; } +/* + * min_ker_ver: update with minimum kernel version at which the feature + * was integrated into the mainline. For the transit period, that is + * feature not yet in mainline but in mailing list and for testing, + * please use "0.0" to indicate the same. + */ static const struct btrfs_fs_feature { const char *name; u64 flag; const char *desc; + const char *min_ker_ver; } mkfs_features[] = { { "mixed-bg", BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS, - "mixed data and metadata block groups" }, + "mixed data and metadata block groups", "2.7.37"}, { "extref", BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF, - "increased hardlink limit per file to 65536" }, + "increased hardlink limit per file to 65536", "3.7"}, { "raid56", BTRFS_FEATURE_INCOMPAT_RAID56, - "raid56 extended format" }, + "raid56 extended format", "3.9"}, { "skinny-metadata", BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA, - "reduced-size metadata extent refs" }, + "reduced-size metadata extent refs", "3.10"}, { "no-holes", BTRFS_FEATURE_INCOMPAT_NO_HOLES, - "no explicit hole extents for files" }, + "no explicit hole extents for files", "3.14"}, /* Keep this one last */ { "list-all", BTRFS_FEATURE_LIST_ALL, NULL } }; @@ -3077,3 +3086,64 @@ unsigned int get_unit_mode_from_arg(int *argc, char *argv[], int df_mode) return unit_mode; } + +static int version_to_code(char *v) +{ + int i = 0; + char *b[3] = {NULL}; + char *save_b = NULL; + + for (b[i] = strtok_r(v, ".", _b); + b[i] != NULL; + b[i] = strtok_r(NULL, ".", _b)) + i++; + + if (b[2] == NULL) + return KERNEL_VERSION(atoi(b[0]), atoi(b[1]), 0); + else + return KERNEL_VERSION(atoi(b[0]), atoi(b[1]), atoi(b[2])); + +} + +static int get_kernel_code() +{ + int ret; + struct utsname utsbuf; + char *version; + + ret = uname(); + if (ret) + return -ret; + + if (!strlen(utsbuf.release)) + return -EINVAL; + + version = strtok(utsbuf.release, "-"); + + return version_to_code(version); +} + +u64 btrfs_features_allowed_by_kernel(void) +{ + int i; + int local_kernel_code = get_kernel_code(); + u64 features = 0; + + /* +* When system did not provide the kernel version then just +* return 0, the caller has to depend on the intelligence as +* per btrfs-progs version +*/ + if (local_kernel_code <= 0) + return 0; + + for (i = 0; i < ARRAY_SIZE(mkfs_features) - 1; i++) { + char *ver = strdup(mkfs_features[i].min_ker_ver); + + if (local_kernel_code >= version_to_code(ver)) + features |= mkfs_features[i].flag; + + free(ver); + } + return (features); +} diff --git a/utils.h b/utils.h index 192f3d1..9044643 100644 --- a/utils.h +++ b/utils.h @@ -104,6 +104,7 @@ void btrfs_list_all_fs_features(u64 mask_disallowed); char* btrfs_parse_fs_features(char *namelist, u64 *flags); void btrfs_process_fs_features(u64 flags); void btrfs_parse_features_to_string(char *buf, u64 flags); +u64 btrfs_features_allowed_by_kernel(void); struct btrfs_mkfs_config { char *label; -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/12] Fix btrfs/106 to work on non-4k block sized filesystems
This commit makes use of the new _filter_xfs_io_pages_modified filtering function to print information in terms of file blocks rather than file offset. Signed-off-by: Chandan Rajendra--- tests/btrfs/106 | 42 -- tests/btrfs/106.out | 14 ++ 2 files changed, 26 insertions(+), 30 deletions(-) diff --git a/tests/btrfs/106 b/tests/btrfs/106 index 1670453..a1bf4ec 100755 --- a/tests/btrfs/106 +++ b/tests/btrfs/106 @@ -58,31 +58,37 @@ test_clone_and_read_compressed_extent() _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount $mount_opts - # Create our test file with a single extent of 64Kb that is going to be - # compressed no matter which compression algorithm is used (zlib/lzo). - $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 64K" \ - $SCRATCH_MNT/foo | _filter_xfs_io - + PAGE_SIZE=$(get_page_size) + + # Create our test file with 16 pages worth of data in a single extent + # that is going to be compressed no matter which compression algorithm + # is used (zlib/lzo). + $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K $((16 * $PAGE_SIZE))" \ +$SCRATCH_MNT/foo | _filter_xfs_io_pages_modified + # Now clone the compressed extent into an adjacent file offset. - $CLONER_PROG -s 0 -d $((64 * 1024)) -l $((64 * 1024)) \ + $CLONER_PROG -s 0 -d $((16 * $PAGE_SIZE)) -l $((16 * $PAGE_SIZE)) \ $SCRATCH_MNT/foo $SCRATCH_MNT/foo - echo "File digest before unmount:" - md5sum $SCRATCH_MNT/foo | _filter_scratch + orig_hash=$(md5sum $SCRATCH_MNT/foo | cut -f 1 -d ' ') # Remount the fs or clear the page cache to trigger the bug in btrfs. - # Because the extent has an uncompressed length that is a multiple of - # 16 pages, all the pages belonging to the second range of the file - # (64K to 128K), which points to the same extent as the first range - # (0K to 64K), had their contents full of zeroes instead of the byte - # 0xaa. This was a bug exclusively in the read path of compressed - # extents, the correct data was stored on disk, btrfs just failed to - # fill in the pages correctly. + # Because the extent has an uncompressed length that is a multiple of 16 + # pages, all the pages belonging to the second range of the file that is + # mapped by the page index range [16, 31], which points to the same + # extent as the first file range mapped by the page index range [0, 15], + # had their contents full of zeroes instead of the byte 0xaa. This was a + # bug exclusively in the read path of compressed extents, the correct + # data was stored on disk, btrfs just failed to fill in the pages + # correctly. _scratch_remount - echo "File digest after remount:" - # Must match the digest we got before. - md5sum $SCRATCH_MNT/foo | _filter_scratch + hash=$(md5sum $SCRATCH_MNT/foo | cut -f 1 -d ' ') + + if [ $orig_hash != $hash ]; then + echo "Read operation failed on $SCRATCH_MNT/foo: "\ +"Mimatching hash values detected." + fi } echo -e "\nTesting with zlib compression..." diff --git a/tests/btrfs/106.out b/tests/btrfs/106.out index 692108d..eceabfa 100644 --- a/tests/btrfs/106.out +++ b/tests/btrfs/106.out @@ -1,17 +1,7 @@ QA output created by 106 Testing with zlib compression... -wrote 65536/65536 bytes at offset 0 -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -File digest before unmount: -be68df46e3cf60b559376a35f9fbb05d SCRATCH_MNT/foo -File digest after remount: -be68df46e3cf60b559376a35f9fbb05d SCRATCH_MNT/foo +Pages modified: [0 - 15] Testing with lzo compression... -wrote 65536/65536 bytes at offset 0 -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -File digest before unmount: -be68df46e3cf60b559376a35f9fbb05d SCRATCH_MNT/foo -File digest after remount: -be68df46e3cf60b559376a35f9fbb05d SCRATCH_MNT/foo +Pages modified: [0 - 15] -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/12] Fix btrfs/094 to work on non-4k block sized filesystems
This commit makes use of the new _filter_xfs_io_blocks_modified filtering function to print information in terms of file blocks rather than file offset. Signed-off-by: Chandan Rajendra--- tests/btrfs/094 | 78 +++-- tests/btrfs/094.out | 17 +++- 2 files changed, 49 insertions(+), 46 deletions(-) diff --git a/tests/btrfs/094 b/tests/btrfs/094 index 6f6cdeb..868c088 100755 --- a/tests/btrfs/094 +++ b/tests/btrfs/094 @@ -67,36 +67,41 @@ mkdir $send_files_dir _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount "-o compress" -# Create the file with a single extent of 128K. This creates a metadata file -# extent item with a data start offset of 0 and a logical length of 128K. -$XFS_IO_PROG -f -c "pwrite -S 0xaa 64K 128K" -c "fsync" \ - $SCRATCH_MNT/foo | _filter_xfs_io - -# Now rewrite the range 64K to 112K of our file. This will make the inode's -# metadata continue to point to the 128K extent we created before, but now -# with an extent item that points to the extent with a data start offset of -# 112K and a logical length of 16K. -# That metadata file extent item is associated with the logical file offset -# at 176K and covers the logical file range 176K to 192K. -$XFS_IO_PROG -c "pwrite -S 0xbb 64K 112K" -c "fsync" \ - $SCRATCH_MNT/foo | _filter_xfs_io - -# Now rewrite the range 180K to 12K. This will make the inode's metadata -# continue to point the the 128K extent we created earlier, with a single -# extent item that points to it with a start offset of 112K and a logical -# length of 4K. -# That metadata file extent item is associated with the logical file offset -# at 176K and covers the logical file range 176K to 180K. -$XFS_IO_PROG -c "pwrite -S 0xcc 180K 12K" -c "fsync" \ - $SCRATCH_MNT/foo | _filter_xfs_io +BLOCK_SIZE=$(get_block_size $SCRATCH_MNT) + +# Create the file with a single extent of 32 blocks. This creates a metadata +# file extent item with a data start offset of 0 and a logical length of +# 32 blocks. +$XFS_IO_PROG -f -c "pwrite -S 0xaa $((16 * $BLOCK_SIZE)) $((32 * $BLOCK_SIZE))" \ +-c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified + +# Now rewrite the block range [16, 28[ of our file. This will make +# the inode's metadata continue to point to the single 32 block extent +# we created before, but now with an extent item that points to the +# extent with a data start offset referring to the 28th block and a +# logical length of 4 blocks. +# That metadata file extent item is associated with the block range +# [44, 48[. +$XFS_IO_PROG -c "pwrite -S 0xbb $((16 * $BLOCK_SIZE)) $((28 * $BLOCK_SIZE))" \ +-c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified + + +# Now rewrite the block range [45, 48[. This will make the inode's +# metadata continue to point the 32 block extent we created earlier, +# with a single extent item that points to it with a start offset +# referring to the 28th block and a logical length of 1 block. +# That metadata file extent item is associated with the block range +# [44, 45[. +$XFS_IO_PROG -c "pwrite -S 0xcc $((45 * $BLOCK_SIZE)) $((3 * $BLOCK_SIZE))" \ +-c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1 -# Now clone that same region of the 128K extent into a new file, so that it +# Now clone that same region of the 32 block extent into a new file, so that it # gets referenced twice and the incremental send operation below decides to # issue a clone operation instead of copying the data. touch $SCRATCH_MNT/bar -$CLONER_PROG -s $((176 * 1024)) -d $((176 * 1024)) -l $((4 * 1024)) \ +$CLONER_PROG -s $((44 * $BLOCK_SIZE)) -d $((44 * $BLOCK_SIZE)) -l $BLOCK_SIZE \ $SCRATCH_MNT/foo $SCRATCH_MNT/bar _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2 @@ -105,10 +110,11 @@ _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $send_files_dir/1.snap _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \ -f $send_files_dir/2.snap -echo "File digests in the original filesystem:" -md5sum $SCRATCH_MNT/mysnap1/foo | _filter_scratch -md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch -md5sum $SCRATCH_MNT/mysnap2/bar | _filter_scratch +# echo "File digests in the original filesystem:" +declare -A src_fs_hash +src_fs_hash[mysnap1_foo]=$(md5sum $SCRATCH_MNT/mysnap1/foo | cut -f 1 -d ' ') +src_fs_hash[mysnap2_foo]=$(md5sum $SCRATCH_MNT/mysnap2/foo | cut -f 1 -d ' ') +src_fs_hash[mysnap2_bar]=$(md5sum $SCRATCH_MNT/mysnap2/bar | cut -f 1 -d ' ') # Now recreate the filesystem by receiving both send streams and verify we get # the same file contents that the original filesystem had. @@ -119,10 +125,18 @@ _scratch_mount _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/1.snap _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/2.snap -echo
Re: [PATCH 06/12] Fix btrfs/094 to work on non-4k block sized filesystems
On Wed, Nov 25, 2015 at 11:03 AM, Chandan Rajendrawrote: > This commit makes use of the new _filter_xfs_io_blocks_modified filtering > function to print information in terms of file blocks rather than file > offset. > > Signed-off-by: Chandan Rajendra > --- > tests/btrfs/094 | 78 > +++-- > tests/btrfs/094.out | 17 +++- > 2 files changed, 49 insertions(+), 46 deletions(-) > > diff --git a/tests/btrfs/094 b/tests/btrfs/094 > index 6f6cdeb..868c088 100755 > --- a/tests/btrfs/094 > +++ b/tests/btrfs/094 > @@ -67,36 +67,41 @@ mkdir $send_files_dir > _scratch_mkfs >>$seqres.full 2>&1 > _scratch_mount "-o compress" > > -# Create the file with a single extent of 128K. This creates a metadata file > -# extent item with a data start offset of 0 and a logical length of 128K. > -$XFS_IO_PROG -f -c "pwrite -S 0xaa 64K 128K" -c "fsync" \ > - $SCRATCH_MNT/foo | _filter_xfs_io > - > -# Now rewrite the range 64K to 112K of our file. This will make the inode's > -# metadata continue to point to the 128K extent we created before, but now > -# with an extent item that points to the extent with a data start offset of > -# 112K and a logical length of 16K. > -# That metadata file extent item is associated with the logical file offset > -# at 176K and covers the logical file range 176K to 192K. > -$XFS_IO_PROG -c "pwrite -S 0xbb 64K 112K" -c "fsync" \ > - $SCRATCH_MNT/foo | _filter_xfs_io > - > -# Now rewrite the range 180K to 12K. This will make the inode's metadata > -# continue to point the the 128K extent we created earlier, with a single > -# extent item that points to it with a start offset of 112K and a logical > -# length of 4K. > -# That metadata file extent item is associated with the logical file offset > -# at 176K and covers the logical file range 176K to 180K. > -$XFS_IO_PROG -c "pwrite -S 0xcc 180K 12K" -c "fsync" \ > - $SCRATCH_MNT/foo | _filter_xfs_io > +BLOCK_SIZE=$(get_block_size $SCRATCH_MNT) > + > +# Create the file with a single extent of 32 blocks. This creates a metadata > +# file extent item with a data start offset of 0 and a logical length of > +# 32 blocks. > +$XFS_IO_PROG -f -c "pwrite -S 0xaa $((16 * $BLOCK_SIZE)) $((32 * > $BLOCK_SIZE))" \ > +-c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified > + > +# Now rewrite the block range [16, 28[ of our file. This will make > +# the inode's metadata continue to point to the single 32 block extent > +# we created before, but now with an extent item that points to the > +# extent with a data start offset referring to the 28th block and a > +# logical length of 4 blocks. > +# That metadata file extent item is associated with the block range > +# [44, 48[. > +$XFS_IO_PROG -c "pwrite -S 0xbb $((16 * $BLOCK_SIZE)) $((28 * $BLOCK_SIZE))" > \ > +-c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified > + > + > +# Now rewrite the block range [45, 48[. This will make the inode's > +# metadata continue to point the 32 block extent we created earlier, > +# with a single extent item that points to it with a start offset > +# referring to the 28th block and a logical length of 1 block. > +# That metadata file extent item is associated with the block range > +# [44, 45[. > +$XFS_IO_PROG -c "pwrite -S 0xcc $((45 * $BLOCK_SIZE)) $((3 * $BLOCK_SIZE))" \ > +-c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified > > _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1 > > -# Now clone that same region of the 128K extent into a new file, so that it > +# Now clone that same region of the 32 block extent into a new file, so that > it > # gets referenced twice and the incremental send operation below decides to > # issue a clone operation instead of copying the data. > touch $SCRATCH_MNT/bar > -$CLONER_PROG -s $((176 * 1024)) -d $((176 * 1024)) -l $((4 * 1024)) \ > +$CLONER_PROG -s $((44 * $BLOCK_SIZE)) -d $((44 * $BLOCK_SIZE)) -l > $BLOCK_SIZE \ > $SCRATCH_MNT/foo $SCRATCH_MNT/bar > > _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2 > @@ -105,10 +110,11 @@ _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f > $send_files_dir/1.snap > _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \ > -f $send_files_dir/2.snap > > -echo "File digests in the original filesystem:" > -md5sum $SCRATCH_MNT/mysnap1/foo | _filter_scratch > -md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch > -md5sum $SCRATCH_MNT/mysnap2/bar | _filter_scratch > +# echo "File digests in the original filesystem:" > +declare -A src_fs_hash > +src_fs_hash[mysnap1_foo]=$(md5sum $SCRATCH_MNT/mysnap1/foo | cut -f 1 -d ' ') > +src_fs_hash[mysnap2_foo]=$(md5sum $SCRATCH_MNT/mysnap2/foo | cut -f 1 -d ' ') > +src_fs_hash[mysnap2_bar]=$(md5sum $SCRATCH_MNT/mysnap2/bar | cut -f 1 -d ' ') > > # Now recreate the filesystem by receiving
Imbalanced RAID1 with three unequal disks
Hi, I pushed a subvolume using send/receive to an 8 TB disk, added two 4 TB disks and started a balance with conversion to RAID1. Afterwards, I got the following: Total devices 3 FS bytes used 5.40TiB devid1 size 7.28TiB used 4.54TiB path /dev/mapper/yellow4 devid2 size 3.64TiB used 3.17TiB path /dev/mapper/yellow1 devid3 size 3.64TiB used 3.17TiB path /dev/mapper/yellow2 Btrfs v3.17 Data, RAID1: total=5.43TiB, used=5.39TiB System, RAID1: total=64.00MiB, used=800.00KiB Metadata, RAID1: total=14.00GiB, used=5.55GiB GlobalReserve, single: total=512.00MiB, used=0.00B In my understanding, the data isn't properly balanced and I only get around 5.9TB of usable space. As suggested in #btrfs, I started a second balance without filters and got this: Total devices 3 FS bytes used 5.40TiB devid1 size 7.28TiB used 5.41TiB path /dev/mapper/yellow4 devid2 size 3.64TiB used 2.71TiB path /dev/mapper/yellow1 devid3 size 3.64TiB used 2.71TiB path /dev/mapper/yellow2 Data, RAID1: total=5.41TiB, used=5.39TiB System, RAID1: total=32.00MiB, used=784.00KiB Metadata, RAID1: total=7.00GiB, used=5.54GiB GlobalReserve, single: total=512.00MiB, used=0.00B /dev/mapper/yellow4 7,3T5,4T 969G 86% /mnt/yellow Now, I get 6.3TB of usable space but, in my understand, I should get around 7.28 TB or am I missing something here? Also, a second balance shouldn't change the data distribution, right? I'm using kernel v4.3 with a patch [1] from kernel bugzilla [2] for the 8 TB SMR drive. The send/receive of a 5 TB subvolume worked flawlessly with the patch. Without, I got a lot of errors in dmesg within the first 200GB of transferred data. The OS is a x86_64 Ubuntu 15.04. Thank you! Mario [1] http://git.kernel.org/cgit/linux/kernel/git/mkp/linux.git/commit/?h=bugzilla-93581=7c4fbd50bfece00abf529bc96ac989dd2bb83ca4 [2] https://bugzilla.kernel.org/show_bug.cgi?id=93581 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/12] Fix btrfs/094 to work on non-4k block sized filesystems
On Wed, Nov 25, 2015 at 11:47 AM, Chandan Rajendrawrote: > On Wednesday 25 Nov 2015 11:11:27 Filipe Manana wrote: >> >> Hi Chandan, >> >> I can't agree with this change. We're no longer checking that file >> data is correct after the cloning operations. The md5sum checks were >> exactly for that. So essentially the test is only verifying the clone >> operations don't fail with errors, it no longer checks for data >> corruption... >> >> Same comment applies to at least a few other patches in the series. > > Hello Filipe, > > All the tests where we had md5sum being echoed into output have been replaced > with code to verify the md5sum values as shown below, > > if [ $foo_orig_hash != $foo_hash ]; then > echo "Read operation failed on $SCRATCH_MNT/foo: "\ > "Mimatching hash values detected." > fi > > This will cause a diff between the test's ideal output versus the output > obtained during the test run. Right, it compares the digests before and after some operation (which should always match). However we no longer validate that the file content is correct before the operation. For some of the tests that is more important, like the ones that test read corruption after cloning compressed extents. > > In case of btrfs/094, I have added an associative array to hold the md5sums > and > the file content verification is being performed by the following code, > > for key in "${!src_fs_hash[@]}"; do > if [ ${src_fs_hash[$key]} != ${dst_fs_hash[$key]} ]; then > echo "Mimatching hash value detected against \ > $(echo $key | tr _ /)" > fi > done > > -- > chandan > -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/12] Fix btrfs/094 to work on non-4k block sized filesystems
On Wednesday 25 Nov 2015 11:11:27 Filipe Manana wrote: > > Hi Chandan, > > I can't agree with this change. We're no longer checking that file > data is correct after the cloning operations. The md5sum checks were > exactly for that. So essentially the test is only verifying the clone > operations don't fail with errors, it no longer checks for data > corruption... > > Same comment applies to at least a few other patches in the series. Hello Filipe, All the tests where we had md5sum being echoed into output have been replaced with code to verify the md5sum values as shown below, if [ $foo_orig_hash != $foo_hash ]; then echo "Read operation failed on $SCRATCH_MNT/foo: "\ "Mimatching hash values detected." fi This will cause a diff between the test's ideal output versus the output obtained during the test run. In case of btrfs/094, I have added an associative array to hold the md5sums and the file content verification is being performed by the following code, for key in "${!src_fs_hash[@]}"; do if [ ${src_fs_hash[$key]} != ${dst_fs_hash[$key]} ]; then echo "Mimatching hash value detected against \ $(echo $key | tr _ /)" fi done -- chandan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/25] Btrfs-convert rework to support native separate
On Tue, Nov 24, 2015 at 04:50:00PM +0800, Qu Wenruo wrote: > It seems the conflict is quite huge, your reiserfs support is based on > the old behavior, just like what old ext2 one do: custom extent allocation. > I'm afraid the rebase will take a lot of time since I'm completely a > newbie about reiserfs... :( Yeah, the ext2 callbacks are abstracted and replaced by reiserfs implementations, and the abstratction is quite direct. This might be a problem with merging your patchset. > I may need to change a lot of ext2 direct call to generic one, and may > even change the generic function calls.(no alloc/free, only free space > lookup) > > And some (maybe a lot) of reiserfs codes may be removed during the rework. As far as the conversion support stays, it's not a problem of course. I don't have a complete picture of all the actual merging conflicts, but the idea is to provide the callback abstraction v2 to allow ext2 and reiser plus allow all the changes of this pathcset. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] Let user specify the kernel version for features
On 11/26/2015 10:02 AM, Qu Wenruo wrote: Anand Jain wrote on 2015/11/25 20:08 +0800: Sometimes users may want to have a btrfs to be supported on multiple kernel version. A simple example, USB drive can be used with multiple system running different kernel versions. Or in a data center a SAN LUN could be mounted on any system with different kernel version. Thanks for providing comments and feedback. Further to it, here below is a set of patch which will introduce, to specify a kernel version so that default features can be set based on what features were supported at that kernel version. With the new -O comp= option, the concern on user who want to make a btrfs for newer kernel is hugely reduced. NO!. actually new option -O comp= provides no concern for users who want to create _a btrfs disk layout which is compatible with more than one kernel_. above there are two examples of it. But I still prefer such feature align to be done only when specified by user, instead of automatically. (yeah, already told for several times though) Warning should be enough for user, sometimes too automatic is not good, As said before. We need latest btrfs-progs on older kernels, for obvious reasons of btrfs-progs bug fixes. We don't have to back port fixes even on btrfs-progs as we already do it in btrfs kernel. A btrfs-progs should work on any kernel with the "default features as prescribed for that kernel". Let's say if we don't do this automatic then, latest btrfs-progs with default mkfs.btfs && mount fails. But a user upgrading btrfs-progs for fsck bug fixes, shouldn't find 'default mkfs.btfs && mount' failing. Nor they have to use a "new" set of mkfs option to create all default FS for a LTS kernel. Default features based on btrfs-progs version instead of kernel version- makes NO sense. And adding a warning for not using latest features which is not in their running kernel is pointless. That's _not_ a backward kernel compatible tool. btrfs-progs should work "for the kernel". We should avoid adding too much intelligence into btrfs-progs. I have fixed too many issues and redesigned progs in this area. Too many bugs were mainly because of the idea of copy and maintain same code on btrfs-progs and btrfs-kernel approach for progs. (ref wiki and my email before). Thats a wrong approach. I don't understand- if the purpose of both of these isn't same what is the point in maintaining same code? It won't save in efforts mainly because its like developing a distributed FS where two parties has to be communicated to be in sync. Which is like using the canon to shoo a crow. But if the reason was fuse like kernel-free FS (no one said that though) then its better to do it as a separate project. especially for tests. It depends whats being tested kernel OR progs? Its kernel not progs. Automatic will keep default feature constant for a given kernel version. Further, for testing using a known set of options is even better. A lot of btrfs-progs change, like recent disabling mixed-bg for small volume has already cause regression in generic/077 testcase. And Dave is already fed up with such problem from btrfs... I don't know what's the regression about. But in my experience with some xfstest test cases.. xfstests depend too much on cli output strings which is easy thing to do but a wrong approach. Those cli outputs and its format are NOT APIs, those are UIs. Instead it should have used return code/ FS test interface. This will let developers with free hands to change, otherwise you need to update the test cases every time you change the cli _output_. Especially such auto-detection will make default behavior more unstable, at least not a good idea for me. As above. We design with end-user and their use cases in mind. Not for a test suite. If test suite breaks.. fix it. Thanks, Anand Beside this, I'm curious how other filesystm user tools handle such kernel mismatch, or do they? Thanks, Qu First of all to let user know what features was supported at what kernel version. Patch 1/7 updates -O list-all which will list the feature with version. As we didn't maintain the sysfs and progs feature names consistent, so to avoid confusion Patch 2/7 displays sysfs feature name as well again in the list-all output. Next, Patch 3,4,5/7 are helper functions. Patch 6,7/7 provides the -O comp= for mkfs.btrfs and btrfs-convert respectively Thanks, Anand Anand Jain (7): btrfs-progs: show the version for -O list-all btrfs-progs: add kernel alias for each of the features in the list btrfs-progs: make is_numerical non static btrfs-progs: check for numerical in version_to_code() btrfs-progs: introduce framework version to features btrfs-progs: add -O comp= option for mkfs.btrfs btrfs-progs: add -O comp= option for btrfs-convert btrfs-convert.c | 21 + cmds-replace.c | 11 --- mkfs.c | 24 ++-- utils.c | 58