Re: qgroups not enabled, but perf stats reports btrfs_qgroup_release_data and btrfs_qgroup_free_delayed_ref
On 2018/10/9 上午6:41, Chris Murphy wrote: > [chris@flap ~]$ sudo perf stat -e 'btrfs:*' -a sleep 70 > ##And then I loaded a few sites in Firefox early on in those 70 seconds. > > Performance counter stats for 'system wide': > > 5 btrfs:btrfs_transaction_commit > 29 btrfs:btrfs_inode_new > 29 btrfs:btrfs_inode_request > 25 btrfs:btrfs_inode_evict > 1,602 btrfs:btrfs_get_extent > 0 btrfs:btrfs_handle_em_exist > 1 btrfs:btrfs_get_extent_show_fi_regular > 88 btrfs:btrfs_truncate_show_fi_regular > 19 btrfs:btrfs_get_extent_show_fi_inline > 2 btrfs:btrfs_truncate_show_fi_inline >189 btrfs:btrfs_ordered_extent_add >189 btrfs:btrfs_ordered_extent_remove > 9 btrfs:btrfs_ordered_extent_start >592 btrfs:btrfs_ordered_extent_put > 1,207 btrfs:__extent_writepage > 1,203 btrfs:btrfs_writepage_end_io_hook > 25 btrfs:btrfs_sync_file > 0 btrfs:btrfs_sync_fs > 0 btrfs:btrfs_add_block_group > 1,508 btrfs:add_delayed_tree_ref > 1,498 btrfs:run_delayed_tree_ref >379 btrfs:add_delayed_data_ref >336 btrfs:run_delayed_data_ref > 1,887 btrfs:add_delayed_ref_head > 1,839 btrfs:run_delayed_ref_head > 0 btrfs:btrfs_chunk_alloc > 0 btrfs:btrfs_chunk_free >794 btrfs:btrfs_cow_block > 6,982 btrfs:btrfs_space_reservation > 0 btrfs:btrfs_trigger_flush > 0 btrfs:btrfs_flush_space >952 btrfs:btrfs_reserved_extent_alloc > 0 btrfs:btrfs_reserved_extent_free > 1,005 btrfs:find_free_extent > 1,005 btrfs:btrfs_reserve_extent >816 btrfs:btrfs_reserve_extent_cluster > 1 btrfs:btrfs_find_cluster > 0 btrfs:btrfs_failed_cluster_setup > 1 btrfs:btrfs_setup_cluster > 5,952 btrfs:alloc_extent_state > 6,034 btrfs:free_extent_state >374 btrfs:btrfs_work_queued >362 btrfs:btrfs_work_sched >362 btrfs:btrfs_all_work_done >116 btrfs:btrfs_ordered_sched > 0 btrfs:btrfs_workqueue_alloc > 0 btrfs:btrfs_workqueue_destroy > 0 btrfs:btrfs_qgroup_reserve_data >201 btrfs:btrfs_qgroup_release_data > 1,839 btrfs:btrfs_qgroup_free_delayed_ref > 0 btrfs:btrfs_qgroup_account_extents > 0 btrfs:btrfs_qgroup_trace_extent > 0 btrfs:btrfs_qgroup_account_extent > 0 btrfs:qgroup_update_counters > 0 btrfs:qgroup_update_reserve > 0 btrfs:qgroup_meta_reserve > 0 btrfs:qgroup_meta_convert > 0 btrfs:qgroup_meta_free_all_pertrans > 0 btrfs:btrfs_prelim_ref_merge > 0 btrfs:btrfs_prelim_ref_insert > 2,663 btrfs:btrfs_inode_mod_outstanding_extents > 0 btrfs:btrfs_remove_block_group > 0 btrfs:btrfs_add_unused_block_group > 0 btrfs:btrfs_skip_unused_block_group > > 70.004723586 seconds time elapsed > > [chris@flap ~]$ > > > Seems like a lot of activity for just a few transactions, but what > really caught my eye here is the qgroup reporting for a file system > that has never had qgroups enabled. Is it expected? Indeed some of them can be avoided, as for qgroup not enabled case, such function is really doing nothing. In the case of btrfs_qgroup_free_delayed_ref() case, it doesn't check if qgroup is enabled and calls btrfs_qgroup_free_refroot() under all cases, and expect btrfs_qgroup_free_refroot() to do some check. However btrfs_qgroup_free_refroot() doesn't do any check on if qgroup is enabled or not. And btrfs_qgroup_free_refroot() will just find no corresponding qgroup and exit. Thanks for exposing such bug, Qu > > > Chris Murphy > signature.asc Description: OpenPGP digital signature
Re: [PATCH v3 3/6] btrfs-progs: original check: Add ability to detect bad dev extents
On 10/8/18 8:30 PM, Qu Wenruo wrote: Unlike lowmem mode check, we don't have good place for original mode to check overlap dev extents. So this patch introduces a new function, btrfs_check_dev_extents(), to handle possible bad dev extents. Reported-by: Hans van Kranenburg Signed-off-by: Qu Wenruo Reviewed-by: Su Yue --- check/main.c | 99 1 file changed, 99 insertions(+) diff --git a/check/main.c b/check/main.c index bc2ee22f7943..ff9a785ce555 100644 --- a/check/main.c +++ b/check/main.c @@ -8224,6 +8224,99 @@ out: return ret; } +/* + * Check if all dev extents are valid (not overlap nor beyond device + * boundary). + * + * Dev extents <-> chunk cross checking is already done in check_chunks(). + */ +static int check_dev_extents(struct btrfs_fs_info *fs_info) +{ + struct btrfs_path path; + struct btrfs_key key; + struct btrfs_root *dev_root = fs_info->dev_root; + int ret; + u64 prev_devid = 0; + u64 prev_dev_ext_end = 0; + + btrfs_init_path(); + + key.objectid = 1; + key.type = BTRFS_DEV_EXTENT_KEY; + key.offset = 0; + + ret = btrfs_search_slot(NULL, dev_root, , , 0, 0); + if (ret < 0) { + error("failed to search device tree: %s", strerror(-ret)); + goto out; + } + if (path.slots[0] >= btrfs_header_nritems(path.nodes[0])) { + ret = btrfs_next_leaf(dev_root, ); + if (ret < 0) { + error("failed to find next leaf: %s", strerror(-ret)); + goto out; + } + if (ret > 0) { + ret = 0; + goto out; + } + } + + while (1) { + struct btrfs_dev_extent *dev_ext; + struct btrfs_device *dev; + u64 devid; + u64 physical_offset; + u64 physical_len; + + btrfs_item_key_to_cpu(path.nodes[0], , path.slots[0]); + if (key.type != BTRFS_DEV_EXTENT_KEY) + break; + dev_ext = btrfs_item_ptr(path.nodes[0], path.slots[0], +struct btrfs_dev_extent); + devid = key.objectid; + physical_offset = key.offset; + physical_len = btrfs_dev_extent_length(path.nodes[0], dev_ext); + + dev = btrfs_find_device(fs_info, devid, NULL, NULL); + if (!dev) { + error("failed to find device with devid %llu", devid); + ret = -EUCLEAN; + goto out; + } + if (prev_devid == devid && prev_dev_ext_end > physical_offset) { + error( +"dev extent devid %llu physical offset %llu overlap with previous dev extent end %llu", + devid, physical_offset, prev_dev_ext_end); + ret = -EUCLEAN; + goto out; + } + if (physical_offset + physical_len > dev->total_bytes) { + error( +"dev extent devid %llu physical offset %llu len %llu is beyond device boudnary %llu", + devid, physical_offset, physical_len, + dev->total_bytes); + ret = -EUCLEAN; + goto out; + } + prev_devid = devid; + prev_dev_ext_end = physical_offset + physical_len; + + ret = btrfs_next_item(dev_root, ); + if (ret < 0) { + error("failed to find next leaf: %s", strerror(-ret)); + goto out; + } + if (ret > 0) { + ret = 0; + break; + } + } +out: + btrfs_release_path(); + return ret; +} + static int check_chunks_and_extents(struct btrfs_fs_info *fs_info) { struct rb_root dev_cache; @@ -8318,6 +8411,12 @@ again: goto out; } + ret = check_dev_extents(fs_info); + if (ret < 0) { + err = ret; + goto out; + } + ret = check_chunks(_cache, _group_cache, _extent_cache, NULL, NULL, NULL, 0); if (ret) {
Re: [PATCH v3 2/6] btrfs-progs: lowmem check: Add check for overlapping dev extents
On 10/8/18 8:30 PM, Qu Wenruo wrote: Add such check at check_dev_item(), since at that timing we're also iterating dev extents for dev item accounting. Signed-off-by: Qu Wenruo LGTM. Reviewed-by: Su Yue --- check/mode-lowmem.c | 34 -- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 1bce44f5658a..07c03cad77af 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -4065,6 +4065,8 @@ static int check_dev_item(struct btrfs_fs_info *fs_info, u64 dev_id; u64 used; u64 total = 0; + u64 prev_devid = 0; + u64 prev_dev_ext_end = 0; int ret; dev_item = btrfs_item_ptr(eb, slot, struct btrfs_dev_item); @@ -4086,8 +4088,16 @@ static int check_dev_item(struct btrfs_fs_info *fs_info, return REFERENCER_MISSING; } - /* Iterate dev_extents to calculate the used space of a device */ + /* +* Iterate dev_extents to calculate the used space of a device +* +* Also make sure no dev extents overlap and end beyond device boundary +*/ while (1) { + u64 devid; + u64 physical_offset; + u64 physical_len; + if (path.slots[0] >= btrfs_header_nritems(path.nodes[0])) goto next; @@ -4099,7 +4109,27 @@ static int check_dev_item(struct btrfs_fs_info *fs_info, ptr = btrfs_item_ptr(path.nodes[0], path.slots[0], struct btrfs_dev_extent); - total += btrfs_dev_extent_length(path.nodes[0], ptr); + devid = key.objectid; + physical_offset = key.offset; + physical_len = btrfs_dev_extent_length(path.nodes[0], ptr); + + if (prev_devid == devid && physical_offset < prev_dev_ext_end) { + error( +"dev extent devid %llu offset %llu len %llu overlap with previous dev extent end %llu", + devid, physical_offset, physical_len, + prev_dev_ext_end); + return ACCOUNTING_MISMATCH; + } + if (physical_offset + physical_len > total_bytes) { + error( +"dev extent devid %llu offset %llu len %llu is beyond device boundary %llu", + devid, physical_offset, physical_len, + total_bytes); + return ACCOUNTING_MISMATCH; + } + prev_devid = devid; + prev_dev_ext_end = physical_offset + physical_len; + total += physical_len; next: ret = btrfs_next_item(dev_root, ); if (ret)
Re: [PATCH v3 4/6] btrfs-progs: lowmem check: Add dev_item check for used bytes and total bytes
On 2018/10/9 上午6:20, Hans van Kranenburg wrote: > On 10/08/2018 02:30 PM, Qu Wenruo wrote: >> Obviously, used bytes can't be larger than total bytes. >> >> Signed-off-by: Qu Wenruo >> --- >> check/mode-lowmem.c | 5 + >> 1 file changed, 5 insertions(+) >> >> diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c >> index 07c03cad77af..1173b963b8f3 100644 >> --- a/check/mode-lowmem.c >> +++ b/check/mode-lowmem.c >> @@ -4074,6 +4074,11 @@ static int check_dev_item(struct btrfs_fs_info >> *fs_info, >> used = btrfs_device_bytes_used(eb, dev_item); >> total_bytes = btrfs_device_total_bytes(eb, dev_item); >> >> +if (used > total_bytes) { >> +error("device %llu has incorrect used bytes %llu > total bytes >> %llu", >> +dev_id, used, total_bytes); >> +return ACCOUNTING_MISMATCH; > > The message and return code point at an error in accounting logic. One of the biggest problem in lowmem is we don't always have the error code we really want. And that's the case for this error message. It's indeed not an accounting error, as in that case (just like that test case image) the used bytes is correct accounted. The good news is, the return value is never really used to classify the error. So as long as the error message makes sense, it's not a big problem. Thanks, Qu > > However, if you have a fully allocated device and a DUP chunk ending > beyond device, then having used > total_bytes is expected... > > So maybe there's two possibilities... There's an error in the accounting > logic, or there's an "over-allocation", which is another type of issue > which produces used > total with correct accounting logic. > >> +} >> key.objectid = dev_id; >> key.type = BTRFS_DEV_EXTENT_KEY; >> key.offset = 0; >> > > signature.asc Description: OpenPGP digital signature
Re: merge status of per-chunk degradable check [was Re: Which device is missing ?]
On 10/09/2018 02:08 AM, Nicholas D Steeves wrote: > On Mon, Oct 08, 2018 at 04:10:55PM +, Hugo Mills wrote: >> On Mon, Oct 08, 2018 at 03:49:53PM +0200, Pierre Couderc wrote: >>> I ma trying to make a "RAID1" with /dev/sda2 ans /dev/sdb (or similar). >>> >>> But I have stranges status or errors about "missing devices" and I >>> do not understand the current situation : > [...] >>Note that, since the main FS is missing a device, it will probably >> need to be mounted in degraded mode (-o degraded), and that on kernels >> earlier than (IIRC) 4.14, this can only be done *once* without the FS >> becoming more or less permanently read-only. On recent kernels, it >> _should_ be OK. >> >> *WARNING ENDS* > > I think this was the patch that addressed this?: > https://www.spinics.net/lists/linux-btrfs/msg47283.html > https://patchwork.kernel.org/patch/7226931/ > > In my notes it wasn't present in <= 4.14.15, but my notes might be > wrong. Does this patch resolve the one-shot -o degraded, reboot, > forever read-only behaviour, or is something else required? When was > it merged? Has it been or will it be backported to 4.14.x? I'm > guessing 4.9.x is too far back, but it would be really nice to see it > there too :-) > > Also, will this issue be resolved for linux-4.19? If so I'd like to > update the Debian btrfs wiki with this good news :-) [...] > P.S. Please let me know if you'd prefer for me to shift this > documentation effort to btrfs.wiki.kernel.org. Yes, absolutely. This is not specific to how we do things for Debian. Upstream documentation can help all distros. -- Hans van Kranenburg signature.asc Description: OpenPGP digital signature
merge status of per-chunk degradable check [was Re: Which device is missing ?]
On Mon, Oct 08, 2018 at 04:10:55PM +, Hugo Mills wrote: > On Mon, Oct 08, 2018 at 03:49:53PM +0200, Pierre Couderc wrote: > > I ma trying to make a "RAID1" with /dev/sda2 ans /dev/sdb (or similar). > > > > But I have stranges status or errors about "missing devices" and I > > do not understand the current situation : [...] >Note that, since the main FS is missing a device, it will probably > need to be mounted in degraded mode (-o degraded), and that on kernels > earlier than (IIRC) 4.14, this can only be done *once* without the FS > becoming more or less permanently read-only. On recent kernels, it > _should_ be OK. > > *WARNING ENDS* I think this was the patch that addressed this?: https://www.spinics.net/lists/linux-btrfs/msg47283.html https://patchwork.kernel.org/patch/7226931/ In my notes it wasn't present in <= 4.14.15, but my notes might be wrong. Does this patch resolve the one-shot -o degraded, reboot, forever read-only behaviour, or is something else required? When was it merged? Has it been or will it be backported to 4.14.x? I'm guessing 4.9.x is too far back, but it would be really nice to see it there too :-) Also, will this issue be resolved for linux-4.19? If so I'd like to update the Debian btrfs wiki with this good news :-) Finally, is the following a valid workaround for users who don't have access to a kernel containing this fix: 1. Make a raid1 profile volume (both data and metadata) with >= 3 disks. 2. Lose one disk. 3. Allocator continues to write raid1 chunks instead of single, because it is still possible to write one chunk to two disks. 4. Thus reboot twice -> forever read-only averted? Kind regards, Nicholas P.S. Please let me know if you'd prefer for me to shift this documentation effort to btrfs.wiki.kernel.org. signature.asc Description: PGP signature
Re: [PATCH 0/6] Chunk allocator DUP fix and cleanups
On 10/08/2018 03:19 PM, Hans van Kranenburg wrote: > On 10/08/2018 08:43 AM, Qu Wenruo wrote: >> >> >> On 2018/10/5 下午6:58, Hans van Kranenburg wrote: >>> On 10/05/2018 09:51 AM, Qu Wenruo wrote: On 2018/10/5 上午5:24, Hans van Kranenburg wrote: > This patch set contains an additional fix for a newly exposed bug after > the previous attempt to fix a chunk allocator bug for new DUP chunks: > > https://lore.kernel.org/linux-btrfs/782f6000-30c0-0085-abd2-74ec5827c...@mendix.com/T/#m609ccb5d32998e8ba5cfa9901c1ab56a38a6f374 For that bug, did you succeeded in reproducing the bug? >>> >>> Yes, open the above link and scroll to "Steps to reproduce". >> >> That's beyond device boundary one. Also reproduced here. >> And hand-crafted a super small image as test case for btrfs-progs. >> >> But I'm a little curious about the dev extent overlapping case. >> Have you got one? > > Ah ok, I see. No, I didn't do that yet. > > By using unmodified tooling, I think this can be done by a combination > of a few resizings and using very specific balance to cause a fs of > exactly 7880MiB again with a single 1578MiB gap inside... > > I'll try later today to see if I can come up with a recipe for this. Ok, this is actually pretty simple to do: >8 -# mkdir bork -# cd bork -# dd if=/dev/zero of=image bs=1 count=0 seek=1024M 0+0 records in 0+0 records out 0 bytes copied, 0.000183343 s, 0.0 kB/s -# mkfs.btrfs -d dup -m dup image -# losetup -f image -# mount -o space_cache=v2 /dev/loop0 mountpoint -# dd if=/dev/zero of=mountpoint/flapsie bs=1M dd: error writing 'mountpoint/flapsie': No space left on device 453+0 records in 452+0 records out 474185728 bytes (474 MB, 452 MiB) copied, 4.07663 s, 116 MB/s >8 -# ./show_usage.py /bork/mountpoint/ Target profile for SYSTEM (chunk tree): DUP Target profile for METADATA: DUP Target profile for DATA: DUP Mixed groups: False Virtual space usage by block group type: | | typetotal used | - | Data452.31MiB452.22MiB | System8.00MiB 16.00KiB | Metadata 51.19MiB656.00KiB Total raw filesystem size: 1.00GiB Total raw allocated bytes: 1023.00MiB Allocatable bytes remaining: 1.00MiB Unallocatable raw bytes: 0.00B Unallocatable bytes that can be reclaimed by balancing: 0.00B Estimated virtual space left to use for metadata: 51.05MiB Estimated virtual space left to use for data: 96.00KiB Allocated raw disk bytes by chunk type. Parity is a reserved part of the allocated bytes, limiting the amount that can be used for data or metadata: | | flags allocated used parity | - - -- | DATA|DUP904.62MiB904.44MiB0.00B | SYSTEM|DUP 16.00MiB 32.00KiB0.00B | METADATA|DUP102.38MiB 1.28MiB0.00B Allocated bytes per device: | | devid total sizeallocated path | - --- | 1 1.00GiB 1023.00MiB /dev/loop0 Allocated bytes per device, split up per chunk type. Parity bytes are again part of the total amount of allocated bytes. | | Device ID: 1 | | flags allocated parity | | - - -- | | DATA|DUP904.62MiB0.00B | | SYSTEM|DUP 16.00MiB0.00B | | METADATA|DUP102.38MiB0.00B Unallocatable raw bytes per device: | | devidunallocatable | -- | 1 0.00B >8 Now we're gonna cause some neat 1578MiB to happen that we're going to free up later to reproduce the failure: -# dd if=/dev/zero of=image bs=1 count=0 seek=2602M 0+0 records in 0+0 records out 0 bytes copied, 0.000188621 s, 0.0 kB/s -# losetup -c /dev/loop0 -# btrfs fi resize max mountpoint/ Resize 'mountpoint/' of 'max' -# dd if=/dev/zero of=mountpoint/1578MiB bs=1M dd: error writing 'mountpoint/1578MiB': No space left on device 790+0 records in 789+0 records out 827326464 bytes (827 MB, 789 MiB) copied, 12.2452 s, 67.6 MB/s >8 -# python3 import btrfs fs = btrfs.FileSystem('/bork/mountpoint/') for d in fs.dev_extents(): print("start {} end {} vaddr {}".format(d.paddr, d.paddr + d.length, d.vaddr)) start 1048576 end 11534336 vaddr 547880960 start 11534336 end 22020096 vaddr 547880960 start 22020096 end 30408704 vaddr 22020096 start 30408704 end 38797312 vaddr 22020096 start 38797312 end 92471296 vaddr 30408704 start 92471296 end 146145280 vaddr 30408704 start 146145280 end 213254144 vaddr 84082688 start 213254144 end 280363008 vaddr 84082688 start 280363008 end 397803520 vaddr 151191552 start 397803520 end 515244032 vaddr 151191552 start 515244032 end 632684544 vaddr 268632064 start 632684544 end 750125056 vaddr 268632064 start 750125056 end 867565568 vaddr 386072576 start 867565568 end 985006080 vaddr 386072576 start 985006080 end
[PATCH v2] fstests: btrfs verify hardening agaist duplicate fsid
We have a known bug in btrfs, that we let the device path be changed after the device has been mounted. So using this loop hole the new copied device would appears as if its mounted immediately after its been copied. So this test case reproduces this issue. For example: Initially.. /dev/mmcblk0p4 is mounted as / lsblk NAMEMAJ:MIN RM SIZE RO TYPE MOUNTPOINT mmcblk0 179:00 29.2G 0 disk |-mmcblk0p4 179:404G 0 part / |-mmcblk0p2 179:20 500M 0 part /boot |-mmcblk0p3 179:30 256M 0 part [SWAP] `-mmcblk0p1 179:10 256M 0 part /boot/efi btrfs fi show Label: none uuid: 07892354-ddaa-4443-90ea-f76a06accaba Total devices 1 FS bytes used 1.40GiB devid1 size 4.00GiB used 3.00GiB path /dev/mmcblk0p4 Copy mmcblk0 to sda dd if=/dev/mmcblk0 of=/dev/sda And immediately after the copy completes the change in the device superblock is notified which the automount scans using btrfs device scan and the new device sda becomes the mounted root device. lsblk NAMEMAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:01 14.9G 0 disk |-sda48:414G 0 part / |-sda28:21 500M 0 part |-sda38:31 256M 0 part `-sda18:11 256M 0 part mmcblk0 179:00 29.2G 0 disk |-mmcblk0p4 179:404G 0 part |-mmcblk0p2 179:20 500M 0 part /boot |-mmcblk0p3 179:30 256M 0 part [SWAP] `-mmcblk0p1 179:10 256M 0 part /boot/efi btrfs fi show / Label: none uuid: 07892354-ddaa-4443-90ea-f76a06accaba Total devices 1 FS bytes used 1.40GiB devid1 size 4.00GiB used 3.00GiB path /dev/sda4 The bug is quite nasty that you can't either unmount /dev/sda4 or /dev/mmcblk0p4. And the problem does not get solved until you take the sda out of the system on to another system to change its fsid using the 'btrfstune -u' command. Signed-off-by: Anand Jain --- tests/btrfs/173 | 88 + tests/btrfs/173.out | 6 tests/btrfs/group | 1 + 3 files changed, 95 insertions(+) create mode 100755 tests/btrfs/173 create mode 100644 tests/btrfs/173.out diff --git a/tests/btrfs/173 b/tests/btrfs/173 new file mode 100755 index ..b466ae921e19 --- /dev/null +++ b/tests/btrfs/173 @@ -0,0 +1,88 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2018 Oracle. All Rights Reserved. +# +# FS QA Test 173 +# +# Fuzzy test for FS image duplication. +# Could be fixed by +#[patch] btrfs: harden agaist duplicate fsid +# +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +mnt=$TEST_DIR/$seq.mnt +_cleanup() +{ + rm -rf $mnt > /dev/null 2>&1 + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# remove previous $seqres.full before test +rm -f $seqres.full + +# real QA test starts here + +# Modify as appropriate. +_supported_fs btrfs +_supported_os Linux +_require_scratch_dev_pool 2 +_scratch_dev_pool_get 2 + +dev_foo=$(echo $SCRATCH_DEV_POOL | awk '{print $1}') +dev_bar=$(echo $SCRATCH_DEV_POOL | awk '{print $2}') + +echo dev_foo=$dev_foo >> $seqres.full +echo dev_bar=$dev_bar >> $seqres.full +echo | tee -a $seqres.full + +rm -rf $mnt > /dev/null 2>&1 +mkdir $mnt +_mkfs_dev $dev_foo +_mount $dev_foo $mnt + +check_btrfs_mount() +{ + local x=$(findmnt $mnt | grep -v TARGET | awk '{print $2}') + [[ $x == $dev_foo ]] && echo DEV_FOO + [[ $x == $dev_bar ]] && echo DEV_BAR +} + +echo MNT $(check_btrfs_mount) + +for sb_bytenr in 65536 67108864 +do + echo -n "dd status=none if=$dev_foo of=$dev_bar bs=1 "\ + "seek=$sb_bytenr skip=$sb_bytenr count=4096" >> $seqres.full + dd status=none if=$dev_foo of=$dev_bar bs=1 seek=$sb_bytenr \ + skip=$sb_bytenr count=4096 >> $seqres.full 2>&1 + echo ..:$? >> $seqres.full +done + +#Original device is mounted, scan of its clone should fail +$BTRFS_UTIL_PROG device scan $dev_bar >> $seqres.full 2>&1 +echo btrfs device scan dev_bar ...:$?| tee -a $seqres.full + +echo MNT $(check_btrfs_mount) + +#Original device scan should be successful +$BTRFS_UTIL_PROG device scan $dev_foo >> $seqres.full 2>&1 +echo btrfs device scan dev_foo ...:$?| tee -a $seqres.full + +umount $mnt > /dev/null 2>&1 +_scratch_dev_pool_put + +# success, all done +status=0 +exit diff --git a/tests/btrfs/173.out b/tests/btrfs/173.out new file mode 100644 index ..3c7e3fb4e3f7 --- /dev/null +++ b/tests/btrfs/173.out @@ -0,0 +1,6 @@ +QA output created by 173 + +MNT DEV_FOO +btrfs device scan dev_bar ...:1 +MNT DEV_FOO +btrfs device scan dev_foo ...:0 diff --git a/tests/btrfs/group b/tests/btrfs/group index 45782565c3b7..b2f1393f3e97 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -175,3 +175,4 @@ 170 auto quick snapshot 171 auto quick
qgroups not enabled, but perf stats reports btrfs_qgroup_release_data and btrfs_qgroup_free_delayed_ref
[chris@flap ~]$ sudo perf stat -e 'btrfs:*' -a sleep 70 ##And then I loaded a few sites in Firefox early on in those 70 seconds. Performance counter stats for 'system wide': 5 btrfs:btrfs_transaction_commit 29 btrfs:btrfs_inode_new 29 btrfs:btrfs_inode_request 25 btrfs:btrfs_inode_evict 1,602 btrfs:btrfs_get_extent 0 btrfs:btrfs_handle_em_exist 1 btrfs:btrfs_get_extent_show_fi_regular 88 btrfs:btrfs_truncate_show_fi_regular 19 btrfs:btrfs_get_extent_show_fi_inline 2 btrfs:btrfs_truncate_show_fi_inline 189 btrfs:btrfs_ordered_extent_add 189 btrfs:btrfs_ordered_extent_remove 9 btrfs:btrfs_ordered_extent_start 592 btrfs:btrfs_ordered_extent_put 1,207 btrfs:__extent_writepage 1,203 btrfs:btrfs_writepage_end_io_hook 25 btrfs:btrfs_sync_file 0 btrfs:btrfs_sync_fs 0 btrfs:btrfs_add_block_group 1,508 btrfs:add_delayed_tree_ref 1,498 btrfs:run_delayed_tree_ref 379 btrfs:add_delayed_data_ref 336 btrfs:run_delayed_data_ref 1,887 btrfs:add_delayed_ref_head 1,839 btrfs:run_delayed_ref_head 0 btrfs:btrfs_chunk_alloc 0 btrfs:btrfs_chunk_free 794 btrfs:btrfs_cow_block 6,982 btrfs:btrfs_space_reservation 0 btrfs:btrfs_trigger_flush 0 btrfs:btrfs_flush_space 952 btrfs:btrfs_reserved_extent_alloc 0 btrfs:btrfs_reserved_extent_free 1,005 btrfs:find_free_extent 1,005 btrfs:btrfs_reserve_extent 816 btrfs:btrfs_reserve_extent_cluster 1 btrfs:btrfs_find_cluster 0 btrfs:btrfs_failed_cluster_setup 1 btrfs:btrfs_setup_cluster 5,952 btrfs:alloc_extent_state 6,034 btrfs:free_extent_state 374 btrfs:btrfs_work_queued 362 btrfs:btrfs_work_sched 362 btrfs:btrfs_all_work_done 116 btrfs:btrfs_ordered_sched 0 btrfs:btrfs_workqueue_alloc 0 btrfs:btrfs_workqueue_destroy 0 btrfs:btrfs_qgroup_reserve_data 201 btrfs:btrfs_qgroup_release_data 1,839 btrfs:btrfs_qgroup_free_delayed_ref 0 btrfs:btrfs_qgroup_account_extents 0 btrfs:btrfs_qgroup_trace_extent 0 btrfs:btrfs_qgroup_account_extent 0 btrfs:qgroup_update_counters 0 btrfs:qgroup_update_reserve 0 btrfs:qgroup_meta_reserve 0 btrfs:qgroup_meta_convert 0 btrfs:qgroup_meta_free_all_pertrans 0 btrfs:btrfs_prelim_ref_merge 0 btrfs:btrfs_prelim_ref_insert 2,663 btrfs:btrfs_inode_mod_outstanding_extents 0 btrfs:btrfs_remove_block_group 0 btrfs:btrfs_add_unused_block_group 0 btrfs:btrfs_skip_unused_block_group 70.004723586 seconds time elapsed [chris@flap ~]$ Seems like a lot of activity for just a few transactions, but what really caught my eye here is the qgroup reporting for a file system that has never had qgroups enabled. Is it expected? Chris Murphy
Re: [PATCH v3 4/6] btrfs-progs: lowmem check: Add dev_item check for used bytes and total bytes
On 10/08/2018 02:30 PM, Qu Wenruo wrote: > Obviously, used bytes can't be larger than total bytes. > > Signed-off-by: Qu Wenruo > --- > check/mode-lowmem.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c > index 07c03cad77af..1173b963b8f3 100644 > --- a/check/mode-lowmem.c > +++ b/check/mode-lowmem.c > @@ -4074,6 +4074,11 @@ static int check_dev_item(struct btrfs_fs_info > *fs_info, > used = btrfs_device_bytes_used(eb, dev_item); > total_bytes = btrfs_device_total_bytes(eb, dev_item); > > + if (used > total_bytes) { > + error("device %llu has incorrect used bytes %llu > total bytes > %llu", > + dev_id, used, total_bytes); > + return ACCOUNTING_MISMATCH; The message and return code point at an error in accounting logic. However, if you have a fully allocated device and a DUP chunk ending beyond device, then having used > total_bytes is expected... So maybe there's two possibilities... There's an error in the accounting logic, or there's an "over-allocation", which is another type of issue which produces used > total with correct accounting logic. > + } > key.objectid = dev_id; > key.type = BTRFS_DEV_EXTENT_KEY; > key.offset = 0; > -- Hans van Kranenburg
Re: Which device is missing ?
On 10/08/2018 11:21 PM, Hugo Mills wrote: On Mon, Oct 08, 2018 at 11:01:35PM +0200, Pierre Couderc wrote: On 10/08/2018 06:14 PM, Hugo Mills wrote: On Mon, Oct 08, 2018 at 04:10:55PM +, Hugo Mills wrote: On Mon, Oct 08, 2018 at 03:49:53PM +0200, Pierre Couderc wrote: I ma trying to make a "RAID1" with /dev/sda2 ans /dev/sdb (or similar). But I have stranges status or errors about "missing devices" and I do not understand the current situation : root@server:~# btrfs fi show Label: none uuid: 28c2b7ab-631c-40a3-bab7-00dac5dd20eb Total devices 1 FS bytes used 190.91GiB devid 1 size 1.82TiB used 196.02GiB path /dev/sda2 warning, device 1 is missing Label: none uuid: 2d45149a-fb97-4c2a-bae2-4cfe4e01a8aa Total devices 2 FS bytes used 116.18GiB devid 2 size 1.82TiB used 118.03GiB path /dev/sdb *** Some devices missing This looks like you've created a RAID-1 array with /dev/sda2 and /dev/sdb, and then run mkfs.btrfs again on /dev/sda2, overwriting the original [part of a] filesystem on /dev/sda2, and replacing it with a wholly different filesystem. Since the new FS on /dev/sda2 (UUID 28c2...) doesn't have the same UUID as the original FS (UUID 2d45...), and the original FS was made of two devices, btrfs fi show is telling you that there's some devices missing -- /dev/sda2 is no longer part of that FS, and is therefore a missing device. I note that you've got data on both filesystems, so they must both have been mounted somewhere and had stuff put on them. I recommend doing something like this: # mkfs /media/btrfs/myraid1 /media/btrfs/tmp # mount /dev/sdb /media/btrfs/myraid1/ # mount /dev/sda2 /media/btrfs/tmp/ # mount both filesystems # cp /media/btrfs/tmp/* /media/btrfs/myraid1 # put it where you want it # umount /media/btrfs/tmp/ # wipefs /dev/sda2 # destroy the FS on sda2 # btrfs replace start 1 /dev/sda2 /media/btrfs/myraid1/ This will copy all the data from the filesystem on /dev/sda2 into the filesystem on /dev/sdb, destroy the FS on sda2, and then use sda2 as the second device for the main FS. *WARNING!* Note that, since the main FS is missing a device, it will probably need to be mounted in degraded mode (-o degraded), and that on kernels earlier than (IIRC) 4.14, this can only be done *once* without the FS becoming more or less permanently read-only. On recent kernels, it _should_ be OK. *WARNING ENDS* Oh, and for the record, to make a RAID-1 filesystem from scratch, you simply need this: # mkfs.btrfs -m raid1 -d raid1 /dev/sda2 /dev/sdb You do not need to run mkfs.btrfs on each device separately. Hugo. Thnk you very much. I understand a bit better. I think that I have nothing of interest on /dev/sdb and that its contents is the result of previous trials. And that my system is on /dev/dsda2 as : root@server:~# df -h Filesystem Size Used Avail Use% Mounted on udev 3.9G 0 3.9G 0% /dev tmpfs 787M 8.8M 778M 2% /run /dev/sda2 1.9T 193G 1.7T 11% / tmpfs 3.9G 0 3.9G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup /dev/sda1 511M 5.7M 506M 2% /boot/efi tmpfs 100K 0 100K 0% /var/lib/lxd/shmounts tmpfs 100K 0 100K 0% /var/lib/lxd/devlxd root@server:~# Is it exact ? Yes, it looks like you're running / from the FS on /dev/sda2. If yes, I suppose I should wipe data on /dev/sdb, then build the RAID by expanding /dev/sda2. Correct. I would recommend putting a partition table on /dev/sdb, because it doesn't take up much space, and it's always easier to have one already there when you need it (and there's a few things that can get confused if there isn't a partition table). So I should : wipefs /dev/sdb btrfs device add /dev/sdb / btrfs balance start -v -mconvert=raid1 -dconvert=raid1 / Does it sound correct ? (my kernel is boot/vmlinuz-4.18.0-1-amd64) Yes, exactly. Hugo. Thnk you very very much. I do it now, as you with a partition table on /dev/sdb !
Re: Which device is missing ?
On Mon, Oct 08, 2018 at 11:01:35PM +0200, Pierre Couderc wrote: > On 10/08/2018 06:14 PM, Hugo Mills wrote: > >On Mon, Oct 08, 2018 at 04:10:55PM +, Hugo Mills wrote: > >>On Mon, Oct 08, 2018 at 03:49:53PM +0200, Pierre Couderc wrote: > >>>I ma trying to make a "RAID1" with /dev/sda2 ans /dev/sdb (or similar). > >>> > >>>But I have stranges status or errors about "missing devices" and I > >>>do not understand the current situation : > >>> > >>> > >>>root@server:~# btrfs fi show > >>>Label: none uuid: 28c2b7ab-631c-40a3-bab7-00dac5dd20eb > >>> Total devices 1 FS bytes used 190.91GiB > >>> devid 1 size 1.82TiB used 196.02GiB path /dev/sda2 > >>> > >>>warning, device 1 is missing > >>>Label: none uuid: 2d45149a-fb97-4c2a-bae2-4cfe4e01a8aa > >>> Total devices 2 FS bytes used 116.18GiB > >>> devid 2 size 1.82TiB used 118.03GiB path /dev/sdb > >>> *** Some devices missing > >>This looks like you've created a RAID-1 array with /dev/sda2 and > >>/dev/sdb, and then run mkfs.btrfs again on /dev/sda2, overwriting the > >>original [part of a] filesystem on /dev/sda2, and replacing it with a > >>wholly different filesystem. Since the new FS on /dev/sda2 (UUID > >>28c2...) doesn't have the same UUID as the original FS (UUID 2d45...), > >>and the original FS was made of two devices, btrfs fi show is telling > >>you that there's some devices missing -- /dev/sda2 is no longer part > >>of that FS, and is therefore a missing device. > >> > >>I note that you've got data on both filesystems, so they must both > >>have been mounted somewhere and had stuff put on them. > >> > >>I recommend doing something like this: > >> > >># mkfs /media/btrfs/myraid1 /media/btrfs/tmp > >># mount /dev/sdb /media/btrfs/myraid1/ > >># mount /dev/sda2 /media/btrfs/tmp/ # mount both filesystems > >># cp /media/btrfs/tmp/* /media/btrfs/myraid1 # put it where you want it > >># umount /media/btrfs/tmp/ > >># wipefs /dev/sda2 # destroy the FS on sda2 > >># btrfs replace start 1 /dev/sda2 /media/btrfs/myraid1/ > >> > >>This will copy all the data from the filesystem on /dev/sda2 into > >>the filesystem on /dev/sdb, destroy the FS on sda2, and then use sda2 > >>as the second device for the main FS. > >> > >>*WARNING!* > >> > >>Note that, since the main FS is missing a device, it will probably > >>need to be mounted in degraded mode (-o degraded), and that on kernels > >>earlier than (IIRC) 4.14, this can only be done *once* without the FS > >>becoming more or less permanently read-only. On recent kernels, it > >>_should_ be OK. > >> > >>*WARNING ENDS* > >Oh, and for the record, to make a RAID-1 filesystem from scratch, > >you simply need this: > > > ># mkfs.btrfs -m raid1 -d raid1 /dev/sda2 /dev/sdb > > > >You do not need to run mkfs.btrfs on each device separately. > > > >Hugo. > Thnk you very much. I understand a bit better. I think that I have > nothing of interest on /dev/sdb and that its contents is the result > of previous trials. > And that my system is on /dev/dsda2 as : > > root@server:~# df -h > Filesystem Size Used Avail Use% Mounted on > udev 3.9G 0 3.9G 0% /dev > tmpfs 787M 8.8M 778M 2% /run > /dev/sda2 1.9T 193G 1.7T 11% / > tmpfs 3.9G 0 3.9G 0% /dev/shm > tmpfs 5.0M 0 5.0M 0% /run/lock > tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup > /dev/sda1 511M 5.7M 506M 2% /boot/efi > tmpfs 100K 0 100K 0% /var/lib/lxd/shmounts > tmpfs 100K 0 100K 0% /var/lib/lxd/devlxd > root@server:~# > > Is it exact ? Yes, it looks like you're running / from the FS on /dev/sda2. > If yes, I suppose I should wipe data on /dev/sdb, then build the > RAID by expanding /dev/sda2. Correct. I would recommend putting a partition table on /dev/sdb, because it doesn't take up much space, and it's always easier to have one already there when you need it (and there's a few things that can get confused if there isn't a partition table). > So I should : > > wipefs /dev/sdb > btrfs device add /dev/sdb / > btrfs balance start -v -mconvert=raid1 -dconvert=raid1 / > Does it sound correct ? (my kernel is boot/vmlinuz-4.18.0-1-amd64) Yes, exactly. Hugo. -- Hugo Mills | Yes, this is an example of something that becomes hugo@... carfax.org.uk | less explosive as a one-to-one cocrystal with TNT. http://carfax.org.uk/ | (Hexanitrohexaazaisowurtzitane) PGP: E2AB1DE4 |Derek Lowe signature.asc Description: Digital signature
Re: Curious problem: btrfs device stats & unpriviliged access
On 10/08/2018 06:37 PM, Holger Hoffstätte wrote: > On 10/08/18 17:46, Hans van Kranenburg wrote: > >> fs.devices() also looks for dev_items in the chunk tree: >> >> https://github.com/knorrie/python-btrfs/blob/master/btrfs/ctree.py#L481 >> >> So, BOOM! you need root. >> >> Or just start a 0, ignore errors and start trying all devids until you >> found num_devices amount of them that work, yolo. > > Since I need to walk /sys/fs/btrfs/ anyway I *think* I can just look > at the entries in /sys/fs/btrfs//devices/ and query them all > directly. But, you still need root for that right? The progs code does a RO open directly on the block device. -$ btrfs dev stats /dev/xvdb ERROR: cannot open /dev/xvdb: Permission denied ERROR: '/dev/xvdb' is not a mounted btrfs device stat("/dev/loop0", {st_mode=S_IFBLK|0660, st_rdev=makedev(7, 0), ...}) = 0 stat("/dev/loop0", {st_mode=S_IFBLK|0660, st_rdev=makedev(7, 0), ...}) = 0 open("/dev/loop0", O_RDONLY)= -1 EACCES (Permission denied) But: -# btrfs dev stats /dev/xvdb [/dev/xvdb].write_io_errs0 [/dev/xvdb].read_io_errs 0 [/dev/xvdb].flush_io_errs0 [/dev/xvdb].corruption_errs 0 [/dev/xvdb].generation_errs 0 > The skeleton btrfs_exporter already responds to http requests and > returns dummy metrics, using the official python client library. > I've found a nice little python sysfs scraper and now only need to > figure out how best to map the btrfs info in sysfs to useful metrics. > The privileged access issue would only have come into play much later, > but it seems I can avoid it after all, which is great. > I'm also (re-)learning python in the process, so that's the actual > thing slowing me down.. :) -- Hans van Kranenburg
Re: [PATCH] fstests: btrfs verify hardening agaist duplicate fsid
On 10/06/2018 06:14 PM, Eryu Guan wrote: On Mon, Oct 01, 2018 at 04:44:35PM +0800, Anand Jain wrote: We have a known bug in btrfs, that we let the device path be changed after the device has been mounted. So using this loop hole the new copied device would appears as if its mounted immediately after its been copied. So this test case reproduces this issue. For example: Initially.. /dev/mmcblk0p4 is mounted as / lsblk NAMEMAJ:MIN RM SIZE RO TYPE MOUNTPOINT mmcblk0 179:00 29.2G 0 disk |-mmcblk0p4 179:404G 0 part / |-mmcblk0p2 179:20 500M 0 part /boot |-mmcblk0p3 179:30 256M 0 part [SWAP] `-mmcblk0p1 179:10 256M 0 part /boot/efi btrfs fi show Label: none uuid: 07892354-ddaa-4443-90ea-f76a06accaba Total devices 1 FS bytes used 1.40GiB devid1 size 4.00GiB used 3.00GiB path /dev/mmcblk0p4 Copy mmcblk0 to sda dd if=/dev/mmcblk0 of=/dev/sda And immediately after the copy completes the change in the device superblock is notified which the automount scans using btrfs device scan and the new device sda becomes the mounted root device. lsblk NAMEMAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:01 14.9G 0 disk |-sda48:414G 0 part / |-sda28:21 500M 0 part |-sda38:31 256M 0 part `-sda18:11 256M 0 part mmcblk0 179:00 29.2G 0 disk |-mmcblk0p4 179:404G 0 part |-mmcblk0p2 179:20 500M 0 part /boot |-mmcblk0p3 179:30 256M 0 part [SWAP] `-mmcblk0p1 179:10 256M 0 part /boot/efi btrfs fi show / Label: none uuid: 07892354-ddaa-4443-90ea-f76a06accaba Total devices 1 FS bytes used 1.40GiB devid1 size 4.00GiB used 3.00GiB path /dev/sda4 The bug is quite nasty that you can't either unmount /dev/sda4 or /dev/mmcblk0p4. And the problem does not get solved until you take the sda out of the system on to another system to change its fsid using the 'btrfstune -u' command. Signed-off-by: Anand Jain Looks like that the test will break the whole test env as it leaves an unmountable $SCRATCH_MNT. I'd wait for the fix to get in first before merging the test, in case it breaks normal regression tests. (I noticed that the test is not in 'auto' group, so it's not that dangerous.) Its possible that its unmountable without the kernel patch. But I am unable to reproduce it consistently with or without the kernel patch. Any idea ways to make it auto for kernels without the patch? Also, it'd be great if test can be reviewed by btrfs folks too! --- tests/btrfs/173 | 72 + tests/btrfs/173.out | 5 tests/btrfs/group | 1 + 3 files changed, 78 insertions(+) create mode 100755 tests/btrfs/173 create mode 100644 tests/btrfs/173.out diff --git a/tests/btrfs/173 b/tests/btrfs/173 new file mode 100755 index ..f59a62e206c3 --- /dev/null +++ b/tests/btrfs/173 @@ -0,0 +1,72 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2018 Oracle. All Rights Reserved. +# +# FS QA Test 173 +# +# Fuzzy test for FS image duplication. +# Could be fixed by +#[patch] btrfs: harden agaist duplicate fsid +# +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# remove previous $seqres.full before test +rm -f $seqres.full + +# real QA test starts here + +# Modify as appropriate. +_supported_fs btrfs +_supported_os Linux +_require_scratch_dev_pool 2 +_scratch_dev_pool_get 2 + +dev_foo=$(echo $SCRATCH_DEV_POOL | awk '{print $1}' | rev | cut -d"/" -f1 | rev) +dev_bar=$(echo $SCRATCH_DEV_POOL | awk '{print $2}' | rev | cut -d"/" -f1 | rev) This doesn't work if the devices in SCRATCH_DEV_POOL are symlinks, e.g. lvm devices: /dev/mapper/testvg-testlv1, dev_foo is "testvg-testlv1" in this case. Ah, right will fix. + +_mkfs_dev /dev/$dev_foo But /dev/testvg-testlv1 isn't existed. _short_dev and/or _real_dev is useful in this case. e.g. dev_foo=$(echo $SCRATCH_DEV_POOL | awk '{print $1}') # dev_foo is like "dm-1" dev_foo=$(_short_dev $dev_foo) # dev_foo is like "/dev/dm-1" dev_foo=$(_real_dev $dev_foo) I changed the code a bit which avoids the split. Pls review if that will be ok. +_mount /dev/$dev_foo $SCRATCH_MNT It'd better to mount non-SCRATCH_DEV to other mount point, e.g. $TEST_DIR/$seq.mnt Will do, any idea why? Isn't the framework automatically try to unmount SCRATCH_MNT. Thanks, Anand Thanks, Eryu + +echo mount before btrfs image clone | tee -a $seqres.full +findmnt /dev/$dev_foo | grep -v TARGET | awk '{print $1" "$2}' | \ + sed -e "s/$dev_foo/dev_foo/g" | _filter_scratch | tee -a $seqres.full +findmnt /dev/$dev_bar | grep -v TARGET | awk '{print $1"
[PATCH v2 rev log added] fstests: btrfs verify hardening agaist duplicate fsid
We have a known bug in btrfs, that we let the device path be changed after the device has been mounted. So using this loop hole the new copied device would appears as if its mounted immediately after its been copied. So this test case reproduces this issue. For example: Initially.. /dev/mmcblk0p4 is mounted as / lsblk NAMEMAJ:MIN RM SIZE RO TYPE MOUNTPOINT mmcblk0 179:00 29.2G 0 disk |-mmcblk0p4 179:404G 0 part / |-mmcblk0p2 179:20 500M 0 part /boot |-mmcblk0p3 179:30 256M 0 part [SWAP] `-mmcblk0p1 179:10 256M 0 part /boot/efi btrfs fi show Label: none uuid: 07892354-ddaa-4443-90ea-f76a06accaba Total devices 1 FS bytes used 1.40GiB devid1 size 4.00GiB used 3.00GiB path /dev/mmcblk0p4 Copy mmcblk0 to sda dd if=/dev/mmcblk0 of=/dev/sda And immediately after the copy completes the change in the device superblock is notified which the automount scans using btrfs device scan and the new device sda becomes the mounted root device. lsblk NAMEMAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:01 14.9G 0 disk |-sda48:414G 0 part / |-sda28:21 500M 0 part |-sda38:31 256M 0 part `-sda18:11 256M 0 part mmcblk0 179:00 29.2G 0 disk |-mmcblk0p4 179:404G 0 part |-mmcblk0p2 179:20 500M 0 part /boot |-mmcblk0p3 179:30 256M 0 part [SWAP] `-mmcblk0p1 179:10 256M 0 part /boot/efi btrfs fi show / Label: none uuid: 07892354-ddaa-4443-90ea-f76a06accaba Total devices 1 FS bytes used 1.40GiB devid1 size 4.00GiB used 3.00GiB path /dev/sda4 The bug is quite nasty that you can't either unmount /dev/sda4 or /dev/mmcblk0p4. And the problem does not get solved until you take the sda out of the system on to another system to change its fsid using the 'btrfstune -u' command. Signed-off-by: Anand Jain --- v1->v2: dont play around with dev patch use it as it is. do not use SCRATCH_MNT instead create it at the TEST_DIR and its related changes. golden out changes tests/btrfs/173 | 88 + tests/btrfs/173.out | 6 tests/btrfs/group | 1 + 3 files changed, 95 insertions(+) create mode 100755 tests/btrfs/173 create mode 100644 tests/btrfs/173.out diff --git a/tests/btrfs/173 b/tests/btrfs/173 new file mode 100755 index ..b466ae921e19 --- /dev/null +++ b/tests/btrfs/173 @@ -0,0 +1,88 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2018 Oracle. All Rights Reserved. +# +# FS QA Test 173 +# +# Fuzzy test for FS image duplication. +# Could be fixed by +#[patch] btrfs: harden agaist duplicate fsid +# +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +mnt=$TEST_DIR/$seq.mnt +_cleanup() +{ + rm -rf $mnt > /dev/null 2>&1 + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# remove previous $seqres.full before test +rm -f $seqres.full + +# real QA test starts here + +# Modify as appropriate. +_supported_fs btrfs +_supported_os Linux +_require_scratch_dev_pool 2 +_scratch_dev_pool_get 2 + +dev_foo=$(echo $SCRATCH_DEV_POOL | awk '{print $1}') +dev_bar=$(echo $SCRATCH_DEV_POOL | awk '{print $2}') + +echo dev_foo=$dev_foo >> $seqres.full +echo dev_bar=$dev_bar >> $seqres.full +echo | tee -a $seqres.full + +rm -rf $mnt > /dev/null 2>&1 +mkdir $mnt +_mkfs_dev $dev_foo +_mount $dev_foo $mnt + +check_btrfs_mount() +{ + local x=$(findmnt $mnt | grep -v TARGET | awk '{print $2}') + [[ $x == $dev_foo ]] && echo DEV_FOO + [[ $x == $dev_bar ]] && echo DEV_BAR +} + +echo MNT $(check_btrfs_mount) + +for sb_bytenr in 65536 67108864 +do + echo -n "dd status=none if=$dev_foo of=$dev_bar bs=1 "\ + "seek=$sb_bytenr skip=$sb_bytenr count=4096" >> $seqres.full + dd status=none if=$dev_foo of=$dev_bar bs=1 seek=$sb_bytenr \ + skip=$sb_bytenr count=4096 >> $seqres.full 2>&1 + echo ..:$? >> $seqres.full +done + +#Original device is mounted, scan of its clone should fail +$BTRFS_UTIL_PROG device scan $dev_bar >> $seqres.full 2>&1 +echo btrfs device scan dev_bar ...:$?| tee -a $seqres.full + +echo MNT $(check_btrfs_mount) + +#Original device scan should be successful +$BTRFS_UTIL_PROG device scan $dev_foo >> $seqres.full 2>&1 +echo btrfs device scan dev_foo ...:$?| tee -a $seqres.full + +umount $mnt > /dev/null 2>&1 +_scratch_dev_pool_put + +# success, all done +status=0 +exit diff --git a/tests/btrfs/173.out b/tests/btrfs/173.out new file mode 100644 index ..3c7e3fb4e3f7 --- /dev/null +++ b/tests/btrfs/173.out @@ -0,0 +1,6 @@ +QA output created by 173 + +MNT DEV_FOO +btrfs device scan dev_bar ...:1 +MNT DEV_FOO +btrfs device scan dev_foo ...:0 diff --git
Re: Curious problem: btrfs device stats & unpriviliged access
On 10/08/18 17:46, Hans van Kranenburg wrote: fs.devices() also looks for dev_items in the chunk tree: https://github.com/knorrie/python-btrfs/blob/master/btrfs/ctree.py#L481 So, BOOM! you need root. Or just start a 0, ignore errors and start trying all devids until you found num_devices amount of them that work, yolo. Since I need to walk /sys/fs/btrfs/ anyway I *think* I can just look at the entries in /sys/fs/btrfs//devices/ and query them all directly. The skeleton btrfs_exporter already responds to http requests and returns dummy metrics, using the official python client library. I've found a nice little python sysfs scraper and now only need to figure out how best to map the btrfs info in sysfs to useful metrics. The privileged access issue would only have come into play much later, but it seems I can avoid it after all, which is great. I'm also (re-)learning python in the process, so that's the actual thing slowing me down.. -h
Re: Which device is missing ?
On Mon, Oct 08, 2018 at 04:10:55PM +, Hugo Mills wrote: > On Mon, Oct 08, 2018 at 03:49:53PM +0200, Pierre Couderc wrote: > > I ma trying to make a "RAID1" with /dev/sda2 ans /dev/sdb (or similar). > > > > But I have stranges status or errors about "missing devices" and I > > do not understand the current situation : > > > > > > root@server:~# btrfs fi show > > Label: none uuid: 28c2b7ab-631c-40a3-bab7-00dac5dd20eb > > Total devices 1 FS bytes used 190.91GiB > > devid 1 size 1.82TiB used 196.02GiB path /dev/sda2 > > > > warning, device 1 is missing > > Label: none uuid: 2d45149a-fb97-4c2a-bae2-4cfe4e01a8aa > > Total devices 2 FS bytes used 116.18GiB > > devid 2 size 1.82TiB used 118.03GiB path /dev/sdb > > *** Some devices missing > >This looks like you've created a RAID-1 array with /dev/sda2 and > /dev/sdb, and then run mkfs.btrfs again on /dev/sda2, overwriting the > original [part of a] filesystem on /dev/sda2, and replacing it with a > wholly different filesystem. Since the new FS on /dev/sda2 (UUID > 28c2...) doesn't have the same UUID as the original FS (UUID 2d45...), > and the original FS was made of two devices, btrfs fi show is telling > you that there's some devices missing -- /dev/sda2 is no longer part > of that FS, and is therefore a missing device. > >I note that you've got data on both filesystems, so they must both > have been mounted somewhere and had stuff put on them. > >I recommend doing something like this: > > # mkfs /media/btrfs/myraid1 /media/btrfs/tmp > # mount /dev/sdb /media/btrfs/myraid1/ > # mount /dev/sda2 /media/btrfs/tmp/ # mount both filesystems > # cp /media/btrfs/tmp/* /media/btrfs/myraid1 # put it where you want it > # umount /media/btrfs/tmp/ > # wipefs /dev/sda2 # destroy the FS on sda2 > # btrfs replace start 1 /dev/sda2 /media/btrfs/myraid1/ > >This will copy all the data from the filesystem on /dev/sda2 into > the filesystem on /dev/sdb, destroy the FS on sda2, and then use sda2 > as the second device for the main FS. > > *WARNING!* > >Note that, since the main FS is missing a device, it will probably > need to be mounted in degraded mode (-o degraded), and that on kernels > earlier than (IIRC) 4.14, this can only be done *once* without the FS > becoming more or less permanently read-only. On recent kernels, it > _should_ be OK. > > *WARNING ENDS* Oh, and for the record, to make a RAID-1 filesystem from scratch, you simply need this: # mkfs.btrfs -m raid1 -d raid1 /dev/sda2 /dev/sdb You do not need to run mkfs.btrfs on each device separately. Hugo. -- Hugo Mills | Welcome to Rivendell, Mr Anderson... hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: E2AB1DE4 |Machinae Supremacy, Hybrid signature.asc Description: Digital signature
Re: Which device is missing ?
On Mon, Oct 08, 2018 at 03:49:53PM +0200, Pierre Couderc wrote: > I ma trying to make a "RAID1" with /dev/sda2 ans /dev/sdb (or similar). > > But I have stranges status or errors about "missing devices" and I > do not understand the current situation : > > > root@server:~# btrfs fi show > Label: none uuid: 28c2b7ab-631c-40a3-bab7-00dac5dd20eb > Total devices 1 FS bytes used 190.91GiB > devid 1 size 1.82TiB used 196.02GiB path /dev/sda2 > > warning, device 1 is missing > Label: none uuid: 2d45149a-fb97-4c2a-bae2-4cfe4e01a8aa > Total devices 2 FS bytes used 116.18GiB > devid 2 size 1.82TiB used 118.03GiB path /dev/sdb > *** Some devices missing This looks like you've created a RAID-1 array with /dev/sda2 and /dev/sdb, and then run mkfs.btrfs again on /dev/sda2, overwriting the original [part of a] filesystem on /dev/sda2, and replacing it with a wholly different filesystem. Since the new FS on /dev/sda2 (UUID 28c2...) doesn't have the same UUID as the original FS (UUID 2d45...), and the original FS was made of two devices, btrfs fi show is telling you that there's some devices missing -- /dev/sda2 is no longer part of that FS, and is therefore a missing device. I note that you've got data on both filesystems, so they must both have been mounted somewhere and had stuff put on them. I recommend doing something like this: # mkfs /media/btrfs/myraid1 /media/btrfs/tmp # mount /dev/sdb /media/btrfs/myraid1/ # mount /dev/sda2 /media/btrfs/tmp/ # mount both filesystems # cp /media/btrfs/tmp/* /media/btrfs/myraid1 # put it where you want it # umount /media/btrfs/tmp/ # wipefs /dev/sda2 # destroy the FS on sda2 # btrfs replace start 1 /dev/sda2 /media/btrfs/myraid1/ This will copy all the data from the filesystem on /dev/sda2 into the filesystem on /dev/sdb, destroy the FS on sda2, and then use sda2 as the second device for the main FS. *WARNING!* Note that, since the main FS is missing a device, it will probably need to be mounted in degraded mode (-o degraded), and that on kernels earlier than (IIRC) 4.14, this can only be done *once* without the FS becoming more or less permanently read-only. On recent kernels, it _should_ be OK. *WARNING ENDS* Hugo. [snip] -- Hugo Mills | UNIX: Japanese brand of food containers hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
Re: Curious problem: btrfs device stats & unpriviliged access
On 10/08/2018 05:29 PM, Holger Hoffstätte wrote: > On 10/08/18 16:40, Hans van Kranenburg wrote: >>> Looking at the kernel side of things in fs/btrfs/ioctl.c I see both >>> BTRFS_IOC_TREE_SEARCH[_V2} unconditionally require CAP_SYS_ADMIN. >> >> That's the tree search ioctl, for reading arbitrary metadata. >> >> The device stats ioctl is IOC_GET_DEV_STATS... > > Yeah..OK, that is clear and gave me the hint to ask: why is it > calling this in the first place? And as it turns out [1] is where > it seems to go wrong, as is_block_device() returns 0 (as it should), > fi_args.num_devices is never set (remains 0) and it proceeds to call > everything below, eventually calling the BTRFS_IOC_FS_INFO ioctl in > #1712. And that works fine: > > 1711 if (fi_args->num_devices != 1) { > (gdb) s > 1712 ret = ioctl(fd, BTRFS_IOC_FS_INFO, fi_args); > (gdb) s > 1713 if (ret < 0) { > (gdb) p ret > $28 = 0 > (gdb) p *fi_args > $30 = { > max_id = 1, > num_devices = 1, > fsid = "z%:\371\315\033A\203\267.\200\255;FH\221", > nodesize = 16384, > sectorsize = 4096, > clone_alignment = 4096, > reserved32 = 0, > reserved = {0 } > } > > It's only when it goes into search_chunk_tree_for_fs_info() > where things fail due to CAP_SYS_ADMIN. > > And all this explains the actual bug: when I call btrfs device stats > not on the mountpoint (as I've been trying all this time) but rather > on the device, it works properly right away as regular user: > > (gdb) set args device stats /dev/loop0 > (gdb) r > Breakpoint 1, get_fs_info (path=path@entry=0x7fffe527 "/dev/loop0", > fi_args=fi_args@entry=0x7fffd400, > di_ret=di_ret@entry=0x7fffd3f0) at utils.c:1652 > 1652 { > (gdb) c > Continuing. > [/dev/loop0].write_io_errs 0 > [/dev/loop0].read_io_errs 0 > [/dev/loop0].flush_io_errs 0 > [/dev/loop0].corruption_errs 0 > [/dev/loop0].generation_errs 0 > [Inferior 1 (process 2805) exited normally] > > So this is simply a discrepancy in handling a device vs. the device(s) > for a mountpoint. Apparently based on what you point it at, it does a different thing. If you point it at a block device, it will try opening the block device to find out which devid it has (from the superblock), find out where it is mounted and then only ask for stats for this one device. -# btrfs dev stats /dev/xvdc [/dev/xvdc].write_io_errs0 [/dev/xvdc].read_io_errs 0 [/dev/xvdc].flush_io_errs0 [/dev/xvdc].corruption_errs 0 [/dev/xvdc].generation_errs 0 If you point it at a mounted filesystem, it lists stats for all devices. Since there's no way to get a list of devids, it directly searches the chunk tree directly for dev_item objects. -# btrfs dev stats /mountpoint [/dev/xvdb].write_io_errs0 [/dev/xvdb].read_io_errs 0 [/dev/xvdb].flush_io_errs0 [/dev/xvdb].corruption_errs 0 [/dev/xvdb].generation_errs 0 [/dev/xvdc].write_io_errs0 [/dev/xvdc].read_io_errs 0 [/dev/xvdc].flush_io_errs0 [/dev/xvdc].corruption_errs 0 [/dev/xvdc].generation_errs 0 [/dev/xvdd].write_io_errs0 [/dev/xvdd].read_io_errs 0 [/dev/xvdd].flush_io_errs0 [/dev/xvdd].corruption_errs 0 [/dev/xvdd].generation_errs 0 I do the same thing in the nagios plugin: https://github.com/knorrie/python-btrfs/blob/master/examples/nagios/plugins/check_btrfs#L131 fs.devices() also looks for dev_items in the chunk tree: https://github.com/knorrie/python-btrfs/blob/master/btrfs/ctree.py#L481 So, BOOM! you need root. Or just start a 0, ignore errors and start trying all devids until you found num_devices amount of them that work, yolo. >> I can do the device stats ioctl as normal user: >> >> import btrfs >> fs = btrfs.FileSystem('/') >> btrfs.utils.pretty_print(fs.dev_stats(1)) >> >> >> devid: 1 >> nr_items: 5 >> flags: 0 >> write_errs: 0 >> read_errs: 0 >> flush_errs: 0 >> generation_errs: 0 >> corruption_errs: 0 > > That works for me too, and that's actually the important part. \o/ > Glad we talked about it. :} > > -h > > [1] > https://github.com/kdave/btrfs-progs/blob/7faaca0d9f78f7162ae603231f693dd8e1af2a41/utils.c#L1666 > > -- Hans van Kranenburg
Re: Curious problem: btrfs device stats & unpriviliged access
On 10/08/18 16:40, Hans van Kranenburg wrote: Looking at the kernel side of things in fs/btrfs/ioctl.c I see both BTRFS_IOC_TREE_SEARCH[_V2} unconditionally require CAP_SYS_ADMIN. That's the tree search ioctl, for reading arbitrary metadata. The device stats ioctl is IOC_GET_DEV_STATS... Yeah..OK, that is clear and gave me the hint to ask: why is it calling this in the first place? And as it turns out [1] is where it seems to go wrong, as is_block_device() returns 0 (as it should), fi_args.num_devices is never set (remains 0) and it proceeds to call everything below, eventually calling the BTRFS_IOC_FS_INFO ioctl in #1712. And that works fine: 1711 if (fi_args->num_devices != 1) { (gdb) s 1712ret = ioctl(fd, BTRFS_IOC_FS_INFO, fi_args); (gdb) s 1713if (ret < 0) { (gdb) p ret $28 = 0 (gdb) p *fi_args $30 = { max_id = 1, num_devices = 1, fsid = "z%:\371\315\033A\203\267.\200\255;FH\221", nodesize = 16384, sectorsize = 4096, clone_alignment = 4096, reserved32 = 0, reserved = {0 } } It's only when it goes into search_chunk_tree_for_fs_info() where things fail due to CAP_SYS_ADMIN. And all this explains the actual bug: when I call btrfs device stats not on the mountpoint (as I've been trying all this time) but rather on the device, it works properly right away as regular user: (gdb) set args device stats /dev/loop0 (gdb) r Breakpoint 1, get_fs_info (path=path@entry=0x7fffe527 "/dev/loop0", fi_args=fi_args@entry=0x7fffd400, di_ret=di_ret@entry=0x7fffd3f0) at utils.c:1652 1652{ (gdb) c Continuing. [/dev/loop0].write_io_errs0 [/dev/loop0].read_io_errs 0 [/dev/loop0].flush_io_errs0 [/dev/loop0].corruption_errs 0 [/dev/loop0].generation_errs 0 [Inferior 1 (process 2805) exited normally] So this is simply a discrepancy in handling a device vs. the device(s) for a mountpoint. I can do the device stats ioctl as normal user: import btrfs fs = btrfs.FileSystem('/') btrfs.utils.pretty_print(fs.dev_stats(1)) devid: 1 nr_items: 5 flags: 0 write_errs: 0 read_errs: 0 flush_errs: 0 generation_errs: 0 corruption_errs: 0 That works for me too, and that's actually the important part. \o/ Glad we talked about it. :} -h [1] https://github.com/kdave/btrfs-progs/blob/7faaca0d9f78f7162ae603231f693dd8e1af2a41/utils.c#L1666
Re: Curious problem: btrfs device stats & unpriviliged access
On 10/08/2018 04:40 PM, Hans van Kranenburg wrote: > On 10/08/2018 04:27 PM, Holger Hoffstätte wrote: >> (moving the discussion here from GH [1]) >> >> Apparently there is something weird going on with the device stats >> ioctls. I cannot get them to work as regular user, while they work >> for David. A friend confirms the same issue on his system - no access >> as non-root. >> >> So I made a new empty fs, mounted it, built btrfs-progs-4.17.1 with >> debug symbols and stepped into search_chunk_tree_for_fs_info(). >> Everything is fine, all args are correct, right until: >> >> (gdb) s >> 1614 ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, _args); >> (gdb) s >> 1615 if (ret < 0) >> (gdb) p ret >> $4 = -1 >> (gdb) p search_args >> $5 = {key = {tree_id = 3, min_objectid = 1, max_objectid = 1, min_offset >> = 1, >> max_offset = 18446744073709551615, min_transid = 0, max_transid = >> 18446744073709551615, >> min_type = 216, max_type = 216, nr_items = 30, unused = 0, unused1 = 0, >> unused2 = 0, >> unused3 = 0, unused4 = 0}, buf = '\000' } >> >> Looking at the kernel side of things in fs/btrfs/ioctl.c I see both >> BTRFS_IOC_TREE_SEARCH[_V2} unconditionally require CAP_SYS_ADMIN. > > That's the tree search ioctl, for reading arbitrary metadata. > > The device stats ioctl is IOC_GET_DEV_STATS... > > I can do the device stats ioctl as normal user: > > import btrfs > fs = btrfs.FileSystem('/') > btrfs.utils.pretty_print(fs.dev_stats(1)) > > > devid: 1 > nr_items: 5 > flags: 0 > write_errs: 0 > read_errs: 0 > flush_errs: 0 > generation_errs: 0 > corruption_errs: 0 By the way, I can also do BTRFS_IOC_FS_INFO, BTRFS_IOC_DEV_INFO and BTRFS_IOC_SPACE_INFO as normal user. However, while fs_info tells me that there are num_devices devices, there's no place where you can actually get which devids these are, and you need to provide them one by one to dev_info and dev_stats... : btrfs.utils.pretty_print(fs.fs_info()) max_id: 1 num_devices: 1 nodesize: 4096 sectorsize: 4096 clone_alignment: 4096 fsid: 91077ca5-6559-4a90-9d03-912d3a33412e btrfs.utils.pretty_print(fs.dev_info(1)) devid: 1 bytes_used: 60699967488 total_bytes: 107374182400 path: /dev/xvda uuid: 7e998baa-b533-4476-9132-d7d748d28044 btrfs.utils.pretty_print(fs.space_info()) - flags: Data, single total_bytes: 54.00GiB used_bytes: 53.27GiB - flags: System, single total_bytes: 32.00MiB used_bytes: 12.00KiB - flags: Metadata, single total_bytes: 2.50GiB used_bytes: 1.30GiB - flags: GlobalReserve, single total_bytes: 181.02MiB used_bytes: 0.00B > >> So why can Dave get his dev stats as unprivileged user? >> Does this work for anybody else? And why? :) >> >> cheers >> Holger >> >> [1] >> https://github.com/prometheus/node_exporter/issues/1100#issuecomment-427823190 >> > > -- Hans van Kranenburg
Re: [PATCH] btrfs: Remove unused variable mode in btrfs_mount
On 15:03 08/10, David Sterba wrote: > On Fri, Oct 05, 2018 at 07:26:15AM -0500, Goldwyn Rodrigues wrote: > > Code cleanup. > > Have you check when and why the variable become unused? Thanks. No, I did not check it earlier. git blame points to 312c89fbca06 ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()") Author cc'd. -- Goldwyn
Re: Curious problem: btrfs device stats & unpriviliged access
On 10/08/2018 04:27 PM, Holger Hoffstätte wrote: > (moving the discussion here from GH [1]) > > Apparently there is something weird going on with the device stats > ioctls. I cannot get them to work as regular user, while they work > for David. A friend confirms the same issue on his system - no access > as non-root. > > So I made a new empty fs, mounted it, built btrfs-progs-4.17.1 with > debug symbols and stepped into search_chunk_tree_for_fs_info(). > Everything is fine, all args are correct, right until: > > (gdb) s > 1614 ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, _args); > (gdb) s > 1615 if (ret < 0) > (gdb) p ret > $4 = -1 > (gdb) p search_args > $5 = {key = {tree_id = 3, min_objectid = 1, max_objectid = 1, min_offset > = 1, > max_offset = 18446744073709551615, min_transid = 0, max_transid = > 18446744073709551615, > min_type = 216, max_type = 216, nr_items = 30, unused = 0, unused1 = 0, > unused2 = 0, > unused3 = 0, unused4 = 0}, buf = '\000' } > > Looking at the kernel side of things in fs/btrfs/ioctl.c I see both > BTRFS_IOC_TREE_SEARCH[_V2} unconditionally require CAP_SYS_ADMIN. That's the tree search ioctl, for reading arbitrary metadata. The device stats ioctl is IOC_GET_DEV_STATS... I can do the device stats ioctl as normal user: import btrfs fs = btrfs.FileSystem('/') btrfs.utils.pretty_print(fs.dev_stats(1)) devid: 1 nr_items: 5 flags: 0 write_errs: 0 read_errs: 0 flush_errs: 0 generation_errs: 0 corruption_errs: 0 > So why can Dave get his dev stats as unprivileged user? > Does this work for anybody else? And why? :) > > cheers > Holger > > [1] > https://github.com/prometheus/node_exporter/issues/1100#issuecomment-427823190 > -- Hans van Kranenburg
Curious problem: btrfs device stats & unpriviliged access
(moving the discussion here from GH [1]) Apparently there is something weird going on with the device stats ioctls. I cannot get them to work as regular user, while they work for David. A friend confirms the same issue on his system - no access as non-root. So I made a new empty fs, mounted it, built btrfs-progs-4.17.1 with debug symbols and stepped into search_chunk_tree_for_fs_info(). Everything is fine, all args are correct, right until: (gdb) s 1614ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, _args); (gdb) s 1615if (ret < 0) (gdb) p ret $4 = -1 (gdb) p search_args $5 = {key = {tree_id = 3, min_objectid = 1, max_objectid = 1, min_offset = 1, max_offset = 18446744073709551615, min_transid = 0, max_transid = 18446744073709551615, min_type = 216, max_type = 216, nr_items = 30, unused = 0, unused1 = 0, unused2 = 0, unused3 = 0, unused4 = 0}, buf = '\000' } Looking at the kernel side of things in fs/btrfs/ioctl.c I see both BTRFS_IOC_TREE_SEARCH[_V2} unconditionally require CAP_SYS_ADMIN. So why can Dave get his dev stats as unprivileged user? Does this work for anybody else? And why? :) cheers Holger [1] https://github.com/prometheus/node_exporter/issues/1100#issuecomment-427823190
Which device is missing ?
I ma trying to make a "RAID1" with /dev/sda2 ans /dev/sdb (or similar). But I have stranges status or errors about "missing devices" and I do not understand the current situation : root@server:~# btrfs fi show Label: none uuid: 28c2b7ab-631c-40a3-bab7-00dac5dd20eb Total devices 1 FS bytes used 190.91GiB devid 1 size 1.82TiB used 196.02GiB path /dev/sda2 warning, device 1 is missing Label: none uuid: 2d45149a-fb97-4c2a-bae2-4cfe4e01a8aa Total devices 2 FS bytes used 116.18GiB devid 2 size 1.82TiB used 118.03GiB path /dev/sdb *** Some devices missing root@server:~# fdisk -l Disk /dev/sda: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: CFF97102-D6B5-4126-B2B4-FA735598D1F0 Device Start End Sectors Size Type /dev/sda1 2048 1050623 1048576 512M EFI System /dev/sda2 1050624 3907026943 3905976320 1.8T Linux filesystem Disk /dev/sdb: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes root@server:~# btrfs fi usage / Overall: Device size: 1.82TiB Device allocated: 196.02GiB Device unallocated: 1.63TiB Device missing: 0.00B Used: 191.84GiB Free (estimated): 1.63TiB (min: 835.27GiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 263.64MiB (used: 0.00B) Data,single: Size:192.01GiB, Used:189.98GiB /dev/sda2 192.01GiB Metadata,DUP: Size:2.00GiB, Used:951.67MiB /dev/sda2 4.00GiB System,DUP: Size:8.00MiB, Used:48.00KiB /dev/sda2 16.00MiB Unallocated: /dev/sda2 1.63TiB root@server:~#
Re: [PATCH 24/42] btrfs: assert on non-empty delayed iputs
On Fri, Sep 28, 2018 at 07:18:03AM -0400, Josef Bacik wrote: > I ran into an issue where there was some reference being held on an > inode that I couldn't track. This assert wasn't triggered, but it at > least rules out we're doing something stupid. > > Reviewed-by: Omar Sandoval > Signed-off-by: Josef Bacik Reviewed-by: David Sterba
Re: [PATCH 23/42] btrfs: make sure we create all new bgs
On Fri, Sep 28, 2018 at 07:18:02AM -0400, Josef Bacik wrote: > Allocating new chunks modifies both the extent and chunk tree, which can > trigger new chunk allocations. So instead of doing list_for_each_safe, > just do while (!list_empty()) so we make sure we don't exit with other > pending bg's still on our list. > > Reviewed-by: Omar Sandoval > Reviewed-by: Liu Bo > Signed-off-by: Josef Bacik Reviewed-by: David Sterba
Re: [PATCH] btrfs: Remove unused variable mode in btrfs_mount
On Fri, Oct 05, 2018 at 07:26:15AM -0500, Goldwyn Rodrigues wrote: > Code cleanup. Have you check when and why the variable become unused? Thanks.
[PATCH v3 6/6] btrfs-progs: fsck-tests: Add test image for dev extents beyond device boundary
Now two locations can detect such problem, either by device item used/total bytes check, or by early dev extents check against device boundary. The image is hand-crafted image which uses DATA SINGLE chunk to feed btrfs check. As expected, as long as block group item, chunk item, device used bytes matches, older btrfs check can't detect such problem. Signed-off-by: Qu Wenruo --- .../over_dev_boundary.img.xz | Bin 0 -> 1640 bytes tests/fsck-tests/036-bad-dev-extents/test.sh | 20 ++ 2 files changed, 20 insertions(+) create mode 100644 tests/fsck-tests/036-bad-dev-extents/over_dev_boundary.img.xz create mode 100755 tests/fsck-tests/036-bad-dev-extents/test.sh diff --git a/tests/fsck-tests/036-bad-dev-extents/over_dev_boundary.img.xz b/tests/fsck-tests/036-bad-dev-extents/over_dev_boundary.img.xz new file mode 100644 index ..47cb2a707b0097e369dc088ed0549f847995f136 GIT binary patch literal 1640 zcmV-u2ABE$H+ooF000E$*0e?f03iVu0001VFXf})3;zZsT>wRyj;C3^v%$$4d1oRm zhA1@4%tH=9jYF%IQSpIUKDpjXLRl?q4p$q;1zsY^#9_Lx=#tbGm>@S#e2aqt!?0}y z2BPO%4~c9Q5)jKFD7}DURarKL)`^j{f?s>sEHdzmSvX^98%kGi<_8 zMnXynsC*B7}KE(6w>*wfdb|tw$kt^y>W+TB*?pon1P+)#u#?6bSIG)TooJx~u$# zf^+}xKw`BfI}6=717S~Q%LW1kwY`pz9H{`uNNNFOk2w!VauNEaMfoLj)Z<)!1?F60 zJ+OEA|4$a=9W#XX*l{EG!j^s}p| z0i#_%$Q}d)-EE8#8O5^x$$8Y0l2 zc19e0Et3O3m`pMOqEkasL8VGE+u~2lZn>sRCRj169Z6mQ3*+`D+C#F1V7POV%lx(9cB{WN*9OP%Zbd1VDn(S4HX^ad4-b~#H z@9eUP4AAU`)yRf!k+rrrLSYfBSEi6RE#HtbqyPl11S>RCH zqvJbt22t`FmU^tmTb+7LtpybCB-x1lGSlrpVQ9|6WNBs~q*M-to1gD1l41oiy~}+O?!{68Jvim2*%BznW)@B=3IH)Xi1795q{#>sF*y^0T2@b$`Wqwch?BgN}IR9Ui z;!cs)hJFGsJmFaiUsYrN$c0^BLU^n-B%fagn+jR{?Dq+K%VyMG@pmAOShFY)k8zBxm@7YD zb^ZXU;<`+_B+IV{+A~1Ku zWggqWQd{E%hF}W2ArPwJ33zWP>MIe;YDN8WV+|+a4_c>yFN#ZGN00G#%3HYi z4pePTnc{8k5CjwuN6hudRFcBkvB^pSFufzaaD`-;mPgJ~wr(
[PATCH v3 3/6] btrfs-progs: original check: Add ability to detect bad dev extents
Unlike lowmem mode check, we don't have good place for original mode to check overlap dev extents. So this patch introduces a new function, btrfs_check_dev_extents(), to handle possible bad dev extents. Reported-by: Hans van Kranenburg Signed-off-by: Qu Wenruo --- check/main.c | 99 1 file changed, 99 insertions(+) diff --git a/check/main.c b/check/main.c index bc2ee22f7943..ff9a785ce555 100644 --- a/check/main.c +++ b/check/main.c @@ -8224,6 +8224,99 @@ out: return ret; } +/* + * Check if all dev extents are valid (not overlap nor beyond device + * boundary). + * + * Dev extents <-> chunk cross checking is already done in check_chunks(). + */ +static int check_dev_extents(struct btrfs_fs_info *fs_info) +{ + struct btrfs_path path; + struct btrfs_key key; + struct btrfs_root *dev_root = fs_info->dev_root; + int ret; + u64 prev_devid = 0; + u64 prev_dev_ext_end = 0; + + btrfs_init_path(); + + key.objectid = 1; + key.type = BTRFS_DEV_EXTENT_KEY; + key.offset = 0; + + ret = btrfs_search_slot(NULL, dev_root, , , 0, 0); + if (ret < 0) { + error("failed to search device tree: %s", strerror(-ret)); + goto out; + } + if (path.slots[0] >= btrfs_header_nritems(path.nodes[0])) { + ret = btrfs_next_leaf(dev_root, ); + if (ret < 0) { + error("failed to find next leaf: %s", strerror(-ret)); + goto out; + } + if (ret > 0) { + ret = 0; + goto out; + } + } + + while (1) { + struct btrfs_dev_extent *dev_ext; + struct btrfs_device *dev; + u64 devid; + u64 physical_offset; + u64 physical_len; + + btrfs_item_key_to_cpu(path.nodes[0], , path.slots[0]); + if (key.type != BTRFS_DEV_EXTENT_KEY) + break; + dev_ext = btrfs_item_ptr(path.nodes[0], path.slots[0], +struct btrfs_dev_extent); + devid = key.objectid; + physical_offset = key.offset; + physical_len = btrfs_dev_extent_length(path.nodes[0], dev_ext); + + dev = btrfs_find_device(fs_info, devid, NULL, NULL); + if (!dev) { + error("failed to find device with devid %llu", devid); + ret = -EUCLEAN; + goto out; + } + if (prev_devid == devid && prev_dev_ext_end > physical_offset) { + error( +"dev extent devid %llu physical offset %llu overlap with previous dev extent end %llu", + devid, physical_offset, prev_dev_ext_end); + ret = -EUCLEAN; + goto out; + } + if (physical_offset + physical_len > dev->total_bytes) { + error( +"dev extent devid %llu physical offset %llu len %llu is beyond device boudnary %llu", + devid, physical_offset, physical_len, + dev->total_bytes); + ret = -EUCLEAN; + goto out; + } + prev_devid = devid; + prev_dev_ext_end = physical_offset + physical_len; + + ret = btrfs_next_item(dev_root, ); + if (ret < 0) { + error("failed to find next leaf: %s", strerror(-ret)); + goto out; + } + if (ret > 0) { + ret = 0; + break; + } + } +out: + btrfs_release_path(); + return ret; +} + static int check_chunks_and_extents(struct btrfs_fs_info *fs_info) { struct rb_root dev_cache; @@ -8318,6 +8411,12 @@ again: goto out; } + ret = check_dev_extents(fs_info); + if (ret < 0) { + err = ret; + goto out; + } + ret = check_chunks(_cache, _group_cache, _extent_cache, NULL, NULL, NULL, 0); if (ret) { -- 2.19.1
[PATCH v3 4/6] btrfs-progs: lowmem check: Add dev_item check for used bytes and total bytes
Obviously, used bytes can't be larger than total bytes. Signed-off-by: Qu Wenruo --- check/mode-lowmem.c | 5 + 1 file changed, 5 insertions(+) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 07c03cad77af..1173b963b8f3 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -4074,6 +4074,11 @@ static int check_dev_item(struct btrfs_fs_info *fs_info, used = btrfs_device_bytes_used(eb, dev_item); total_bytes = btrfs_device_total_bytes(eb, dev_item); + if (used > total_bytes) { + error("device %llu has incorrect used bytes %llu > total bytes %llu", + dev_id, used, total_bytes); + return ACCOUNTING_MISMATCH; + } key.objectid = dev_id; key.type = BTRFS_DEV_EXTENT_KEY; key.offset = 0; -- 2.19.1
[PATCH v3 2/6] btrfs-progs: lowmem check: Add check for overlapping dev extents
Add such check at check_dev_item(), since at that timing we're also iterating dev extents for dev item accounting. Signed-off-by: Qu Wenruo --- check/mode-lowmem.c | 34 -- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 1bce44f5658a..07c03cad77af 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -4065,6 +4065,8 @@ static int check_dev_item(struct btrfs_fs_info *fs_info, u64 dev_id; u64 used; u64 total = 0; + u64 prev_devid = 0; + u64 prev_dev_ext_end = 0; int ret; dev_item = btrfs_item_ptr(eb, slot, struct btrfs_dev_item); @@ -4086,8 +4088,16 @@ static int check_dev_item(struct btrfs_fs_info *fs_info, return REFERENCER_MISSING; } - /* Iterate dev_extents to calculate the used space of a device */ + /* +* Iterate dev_extents to calculate the used space of a device +* +* Also make sure no dev extents overlap and end beyond device boundary +*/ while (1) { + u64 devid; + u64 physical_offset; + u64 physical_len; + if (path.slots[0] >= btrfs_header_nritems(path.nodes[0])) goto next; @@ -4099,7 +4109,27 @@ static int check_dev_item(struct btrfs_fs_info *fs_info, ptr = btrfs_item_ptr(path.nodes[0], path.slots[0], struct btrfs_dev_extent); - total += btrfs_dev_extent_length(path.nodes[0], ptr); + devid = key.objectid; + physical_offset = key.offset; + physical_len = btrfs_dev_extent_length(path.nodes[0], ptr); + + if (prev_devid == devid && physical_offset < prev_dev_ext_end) { + error( +"dev extent devid %llu offset %llu len %llu overlap with previous dev extent end %llu", + devid, physical_offset, physical_len, + prev_dev_ext_end); + return ACCOUNTING_MISMATCH; + } + if (physical_offset + physical_len > total_bytes) { + error( +"dev extent devid %llu offset %llu len %llu is beyond device boundary %llu", + devid, physical_offset, physical_len, + total_bytes); + return ACCOUNTING_MISMATCH; + } + prev_devid = devid; + prev_dev_ext_end = physical_offset + physical_len; + total += physical_len; next: ret = btrfs_next_item(dev_root, ); if (ret) -- 2.19.1
[PATCH v3 1/6] btrfs-progs: image: Use correct device size when restoring
When restoring btrfs image, the total_bytes of device item is not updated correctly. In fact total_bytes can be left 0 for restored image. It doesn't trigger any error because btrfs check never checks total_bytes of dev item. However this is going to change. Fix it by populating total_bytes of device item with the end position of last dev extent to make later btrfs check happy. Signed-off-by: Qu Wenruo --- image/main.c | 48 +--- 1 file changed, 45 insertions(+), 3 deletions(-) diff --git a/image/main.c b/image/main.c index 351c5a256938..d5b89bc3149f 100644 --- a/image/main.c +++ b/image/main.c @@ -2082,15 +2082,17 @@ static void remap_overlapping_chunks(struct mdrestore_struct *mdres) } static int fixup_devices(struct btrfs_fs_info *fs_info, -struct mdrestore_struct *mdres, off_t dev_size) +struct mdrestore_struct *mdres, int out_fd) { struct btrfs_trans_handle *trans; struct btrfs_dev_item *dev_item; + struct btrfs_dev_extent *dev_ext; struct btrfs_path path; struct extent_buffer *leaf; struct btrfs_root *root = fs_info->chunk_root; struct btrfs_key key; u64 devid, cur_devid; + u64 dev_size; /* Get from last dev extents */ int ret; trans = btrfs_start_transaction(fs_info->tree_root, 1); @@ -2101,16 +2103,56 @@ static int fixup_devices(struct btrfs_fs_info *fs_info, dev_item = _info->super_copy->dev_item; + btrfs_init_path(); devid = btrfs_stack_device_id(dev_item); + key.objectid = devid; + key.type = BTRFS_DEV_EXTENT_KEY; + key.offset = (u64)-1; + + ret = btrfs_search_slot(NULL, fs_info->dev_root, , , 0, 0); + if (ret < 0) { + error("failed to locate last dev extent of devid %llu: %s", + devid, strerror(-ret)); + btrfs_release_path(); + return ret; + } + if (ret == 0) { + error("found invalid dev extent devid %llu offset -1", + devid); + btrfs_release_path(); + return -EUCLEAN; + } + ret = btrfs_previous_item(fs_info->dev_root, , devid, + BTRFS_DEV_EXTENT_KEY); + if (ret > 0) + ret = -ENOENT; + if (ret < 0) { + error("failed to locate last dev extent of devid %llu: %s", + devid, strerror(-ret)); + btrfs_release_path(); + return ret; + } + + btrfs_item_key_to_cpu(path.nodes[0], , path.slots[0]); + dev_ext = btrfs_item_ptr(path.nodes[0], path.slots[0], +struct btrfs_dev_extent); + dev_size = key.offset + btrfs_dev_extent_length(path.nodes[0], dev_ext); + btrfs_release_path(); + btrfs_set_stack_device_total_bytes(dev_item, dev_size); btrfs_set_stack_device_bytes_used(dev_item, mdres->alloced_chunks); + /* Don't forget to enlarge the real file */ + ret = ftruncate64(out_fd, dev_size); + if (ret < 0) { + error("failed to enlarge result image: %s", strerror(errno)); + return -errno; + } key.objectid = BTRFS_DEV_ITEMS_OBJECTID; key.type = BTRFS_DEV_ITEM_KEY; key.offset = 0; - btrfs_init_path(); again: ret = btrfs_search_slot(trans, root, , , -1, 1); @@ -2275,7 +2317,7 @@ static int restore_metadump(const char *input, FILE *out, int old_restore, return 1; } - ret = fixup_devices(info, , st.st_size); + ret = fixup_devices(info, , fileno(out)); close_ctree(info->chunk_root); if (ret) goto out; -- 2.19.1
[PATCH v3 5/6] btrfs-progs: original check: Add dev_item check for used bytes and total bytes
Signed-off-by: Qu Wenruo --- check/main.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/check/main.c b/check/main.c index ff9a785ce555..12f12e18a83f 100644 --- a/check/main.c +++ b/check/main.c @@ -7938,6 +7938,12 @@ static int check_device_used(struct device_record *dev_rec, struct device_extent_record *dev_extent_rec; u64 total_byte = 0; + if (dev_rec->byte_used > dev_rec->total_byte) { + error("device %llu has incorrect used bytes %llu > total bytes %llu", + dev_rec->devid, dev_rec->byte_used, dev_rec->total_byte); + return -EUCLEAN; + } + cache = search_cache_extent2(_cache->tree, dev_rec->devid, 0); while (cache) { dev_extent_rec = container_of(cache, -- 2.19.1
[PATCH v3 0/6] btrfs-progs: check: Detect invalid dev extents and device items
This patchset can be fetch from github: https://github.com/adam900710/btrfs-progs/tree/dev_extents_check Hans van Kranenburg reported a case where btrfs DUP chunk allocator could allocate invalid dev extents, either overlaps with existing dev extents or beyond device boundary. This patchset enhances the btrfs-progs side to detect such problems. With hand crafted test image for it. Link: https://www.spinics.net/lists/linux-btrfs/msg82370.html Changelog: v2: Fix a bug in the 1st patch which makes lowmem mode never checks overlap dev extents. Fix test case bug which never passes due to wrong script. v3: Add btrfs-image fixes to make test cases happy. Qu Wenruo (6): btrfs-progs: image: Use correct device size when restoring btrfs-progs: lowmem check: Add check for overlapping dev extents btrfs-progs: original check: Add ability to detect bad dev extents btrfs-progs: lowmem check: Add dev_item check for used bytes and total bytes btrfs-progs: original check: Add dev_item check for used bytes and total bytes btrfs-progs: fsck-tests: Add test image for dev extents beyond device boundary check/main.c | 105 ++ check/mode-lowmem.c | 39 ++- image/main.c | 48 +++- .../over_dev_boundary.img.xz | Bin 0 -> 1640 bytes tests/fsck-tests/036-bad-dev-extents/test.sh | 20 5 files changed, 207 insertions(+), 5 deletions(-) create mode 100644 tests/fsck-tests/036-bad-dev-extents/over_dev_boundary.img.xz create mode 100755 tests/fsck-tests/036-bad-dev-extents/test.sh -- 2.19.1
Re: Monitoring btrfs with Prometheus (and soon OpenMonitoring)
On 2018-10-07 09:37, Holger Hoffstätte wrote: The Prometheus statistics collection/aggregation/monitoring/alerting system [1] is quite popular, easy to use and will probably be the basis for the upcoming OpenMetrics "standard" [2]. Prometheus collects metrics by polling host-local "exporters" that respond to http requests; many such exporters exist, from the generic node_exporter for OS metrics to all sorts of application-/service-specific varieties. Since btrfs already exposes quite a lot of monitorable and - more importantly - actionable runtime information in sysfs it only makes sense to expose these metrics for visualization & alerting. I noodled over the idea some time ago but got sidetracked, besides not being thrilled at all by the idea of doing this in golang (which I *really* dislike). However, exporters can be written in any language as long as they speak the standard response protocol, so an alternative would be to use one of the other official exporter clients. These provide language-native "mini-frameworks" where one only has to fill in the blanks (see [3] for examples). Since the issue just came up in the node_exporter bugtracker [3] I figured I ask if anyone here is interested in helping build a proper standalone btrfs_exporter in C++? :D ..just kidding, I'd probably use python (which I kind of don't really know either :) and build on Hans' python-btrfs library for anything not covered by sysfs. Anybody interested in helping? Apparently there are also golang libs for btrfs [5] but I don't know anything about them (if you do, please comment on the bug), and the idea of adding even more stuff into the monolithic, already creaky and somewhat bloated node_exporter is not appealing to me. Potential problems wrt. btrfs are access to root-only information, like e.g. the btrfs device stats/errors in the aforementioned bug, since exporters are really supposed to run unprivileged due to network exposure. The S.M.A.R.T. exporter [6] solves this with dual-process contortions; obviously it would be better if all relevant metrics were accessible directly in sysfs and not require privileged access, but forking a tiny privileged process every polling interval is probably not that bad. All ideas welcome! You might be interested in what Netdata [1] is doing. We've already got tracking of space allocations via the sysfs interface (fun fact, you actually don't have to be root on most systems to read that data), and also ship some per-defined alarms that will trigger when the device gets close to full at a low-level (more specifically, if total chunk allocations exceed 90% of the total space of all the devices in the volume). Actual data collection is being done in C (Netdata already has a lot of infrastructure for parsing things out of /proc or /sys), and there ahs been some discussion in the past of adding collection of device error counters (I've been working on and off on it myself, but I still don't have a good enough understanding of the C code to get anything actually working yet). [1] https://my-netdata.io/
Re: Understanding BTRFS RAID0 Performance
On 2018-10-05 20:34, Duncan wrote: Wilson, Ellis posted on Fri, 05 Oct 2018 15:29:52 + as excerpted: Is there any tuning in BTRFS that limits the number of outstanding reads at a time to a small single-digit number, or something else that could be behind small queue depths? I can't otherwise imagine what the difference would be on the read path between ext4 vs btrfs when both are on mdraid. It seems I forgot to directly answer that question in my first reply. Thanks for restating it. Btrfs doesn't really expose much performance tuning (yet?), at least outside the code itself. There are a few very limited knobs, but they're just that, few and limited or broad-stroke. There are mount options like ssd/nossd, ssd_spread/nossd_spread, the space_cache set of options (see below), flushoncommit/noflushoncommit, commit=, etc (see the btrfs (5) manpage), but nothing really to influence stride length, etc, or to optimize chunk placement between ssd and non-ssd devices, for instance. And there's a few filesystem features, normally set at mkfs.btrfs time (and thus covered in the mkfs.btrfs manpage) but some of which can be tuned later, but generally, the defaults have changed over time to reflect the best case, and the older variants are there primarily to retain backward compatibility with old kernels and tools that didn't handle the newer variants. That said, as I think about it there are some tunables that may be worth experimenting with. Most or all of these are covered in the btrfs (5) manpage. * Given the large device numbers you mention and raid0, you're likely dealing with multi-TB-scale filesystems. At this level, the space_cache=v2 mount option may be useful. It's not the default yet as btrfs check, etc, don't yet handle it, but given your raid0 choice you may not be concerned about that. Need only be given once after which v2 is "on" for the filesystem until turned off. * Consider experimenting with the thread_pool=n mount option. I've seen very little discussion of this one, but given your interest in parallelization, it could make a difference. Probably not as much as you might think. I'll explain a bit more further down where this is being mentioned again. * Possibly the commit= (default 30) mount option. In theory, upping this may allow better write merging, tho your interest seems to be more on the read side, and the commit time has consequences at crash time. Based on my own experience, having a higher commit time doesn't impact read or write performance much or really help all that much with write merging. All it really helps with is minimizing overhead, but it's not even all that great at doing that. * The autodefrag mount option may be considered if you do a lot of existing file updates, as is common with database or VM image files. Due to COW this triggers high fragmentation on btrfs, and autodefrag should help control that. Note that autodefrag effectively increases the minimum extent size from 4 KiB to, IIRC, 16 MB, tho it may be less, and doesn't operate at whole-file size, so larger repeatedly-modified files will still have some fragmentation, just not as much. Obviously, you wouldn't see the read-time effects of this until the filesystem has aged somewhat, so it may not show up on your benchmarks. (Another option for such files is setting them nocow or using the nodatacow mount option, but this turns off checksumming and if it's on, compression for those files, and has a few other non-obvious caveats as well, so isn't something I recommend. Instead of using nocow, I'd suggest putting such files on a dedicated traditional non-cow filesystem such as ext4, and I consider nocow at best a workaround option for those who prefer to use btrfs as a single big storage pool and thus don't want to do the dedicated non-cow filesystem for some subset of their files.) * Not really for reads but for btrfs and any cow-based filesystem, you almost certainly want the (not btrfs specific) noatime mount option. Actually... This can help a bit for some workloads. Just like the commit time, it comes down to a matter of overhead. Essentially, if you read a file regularly, than with the default of relatime, you've got a guaranteed write requiring a commit of the metadata tree once every 24 hours. It's not much to worry about for just one file, but if you're reading a very large number of files all the time, it can really add up. * While it has serious filesystem integrity implications and thus can't be responsibly recommended, there is the nobarrier mount option. But if you're already running raid0 on a large number of devices you're already gambling with device stability, and this /might/ be an additional risk you're willing to take, as it should increase performance. But for normal users it's simply not worth the risk, and if you do choose to use it, it's at your own risk. Agreed, if you're running RAID0 with this many drives, nobarrier may be worth it for a
Re: [PATCH 37/42] btrfs: wakeup cleaner thread when adding delayed iput
On Fri, Sep 28, 2018 at 12:21 PM Josef Bacik wrote: > > The cleaner thread usually takes care of delayed iputs, with the > exception of the btrfs_end_transaction_throttle path. The cleaner > thread only gets woken up every 30 seconds, so instead wake it up to do > it's work so that we can free up that space as quickly as possible. > > Signed-off-by: Josef Bacik Reviewed-by: Filipe Manana > --- > fs/btrfs/inode.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 2b257d14bd3d..0a1671fb03bf 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -3323,6 +3323,7 @@ void btrfs_add_delayed_iput(struct inode *inode) > ASSERT(list_empty(>delayed_iput)); > list_add_tail(>delayed_iput, _info->delayed_iputs); > spin_unlock(_info->delayed_iput_lock); > + wake_up_process(fs_info->cleaner_kthread); > } > > void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info) > -- > 2.14.3 > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.”
[PATCH v2 4/5] btrfs-progs: original check: Add dev_item check for used bytes and total bytes
Signed-off-by: Qu Wenruo --- check/main.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/check/main.c b/check/main.c index ff9a785ce555..12f12e18a83f 100644 --- a/check/main.c +++ b/check/main.c @@ -7938,6 +7938,12 @@ static int check_device_used(struct device_record *dev_rec, struct device_extent_record *dev_extent_rec; u64 total_byte = 0; + if (dev_rec->byte_used > dev_rec->total_byte) { + error("device %llu has incorrect used bytes %llu > total bytes %llu", + dev_rec->devid, dev_rec->byte_used, dev_rec->total_byte); + return -EUCLEAN; + } + cache = search_cache_extent2(_cache->tree, dev_rec->devid, 0); while (cache) { dev_extent_rec = container_of(cache, -- 2.19.1
[PATCH v2 3/5] btrfs-progs: lowmem check: Add dev_item check for used bytes and total bytes
Obviously, used bytes can't be larger than total bytes. Signed-off-by: Qu Wenruo --- check/mode-lowmem.c | 5 + 1 file changed, 5 insertions(+) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 07c03cad77af..1173b963b8f3 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -4074,6 +4074,11 @@ static int check_dev_item(struct btrfs_fs_info *fs_info, used = btrfs_device_bytes_used(eb, dev_item); total_bytes = btrfs_device_total_bytes(eb, dev_item); + if (used > total_bytes) { + error("device %llu has incorrect used bytes %llu > total bytes %llu", + dev_id, used, total_bytes); + return ACCOUNTING_MISMATCH; + } key.objectid = dev_id; key.type = BTRFS_DEV_EXTENT_KEY; key.offset = 0; -- 2.19.1
[PATCH v2 5/5] btrfs-progs: fsck-tests: Add test image for dev extents beyond device boundary
Now two locations can detect such problem, either by device item used/total bytes check, or by early dev extents check against device boundary. The image is hand-crafted image which uses DATA SINGLE chunk to feed btrfs check. As expected, as long as block group item, chunk item, device used bytes matches, older btrfs check can't detect such problem. Signed-off-by: Qu Wenruo --- .../over_dev_boundary.img.xz | Bin 0 -> 1640 bytes tests/fsck-tests/036-bad-dev-extents/test.sh | 20 ++ 2 files changed, 20 insertions(+) create mode 100644 tests/fsck-tests/036-bad-dev-extents/over_dev_boundary.img.xz create mode 100755 tests/fsck-tests/036-bad-dev-extents/test.sh diff --git a/tests/fsck-tests/036-bad-dev-extents/over_dev_boundary.img.xz b/tests/fsck-tests/036-bad-dev-extents/over_dev_boundary.img.xz new file mode 100644 index ..47cb2a707b0097e369dc088ed0549f847995f136 GIT binary patch literal 1640 zcmV-u2ABE$H+ooF000E$*0e?f03iVu0001VFXf})3;zZsT>wRyj;C3^v%$$4d1oRm zhA1@4%tH=9jYF%IQSpIUKDpjXLRl?q4p$q;1zsY^#9_Lx=#tbGm>@S#e2aqt!?0}y z2BPO%4~c9Q5)jKFD7}DURarKL)`^j{f?s>sEHdzmSvX^98%kGi<_8 zMnXynsC*B7}KE(6w>*wfdb|tw$kt^y>W+TB*?pon1P+)#u#?6bSIG)TooJx~u$# zf^+}xKw`BfI}6=717S~Q%LW1kwY`pz9H{`uNNNFOk2w!VauNEaMfoLj)Z<)!1?F60 zJ+OEA|4$a=9W#XX*l{EG!j^s}p| z0i#_%$Q}d)-EE8#8O5^x$$8Y0l2 zc19e0Et3O3m`pMOqEkasL8VGE+u~2lZn>sRCRj169Z6mQ3*+`D+C#F1V7POV%lx(9cB{WN*9OP%Zbd1VDn(S4HX^ad4-b~#H z@9eUP4AAU`)yRf!k+rrrLSYfBSEi6RE#HtbqyPl11S>RCH zqvJbt22t`FmU^tmTb+7LtpybCB-x1lGSlrpVQ9|6WNBs~q*M-to1gD1l41oiy~}+O?!{68Jvim2*%BznW)@B=3IH)Xi1795q{#>sF*y^0T2@b$`Wqwch?BgN}IR9Ui z;!cs)hJFGsJmFaiUsYrN$c0^BLU^n-B%fagn+jR{?Dq+K%VyMG@pmAOShFY)k8zBxm@7YD zb^ZXU;<`+_B+IV{+A~1Ku zWggqWQd{E%hF}W2ArPwJ33zWP>MIe;YDN8WV+|+a4_c>yFN#ZGN00G#%3HYi z4pePTnc{8k5CjwuN6hudRFcBkvB^pSFufzaaD`-;mPgJ~wr(
[PATCH v2 1/5] btrfs-progs: lowmem check: Add check for overlapping dev extents
Add such check at check_dev_item(), since at that timing we're also iterating dev extents for dev item accounting. Signed-off-by: Qu Wenruo --- check/mode-lowmem.c | 34 -- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 1bce44f5658a..07c03cad77af 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -4065,6 +4065,8 @@ static int check_dev_item(struct btrfs_fs_info *fs_info, u64 dev_id; u64 used; u64 total = 0; + u64 prev_devid = 0; + u64 prev_dev_ext_end = 0; int ret; dev_item = btrfs_item_ptr(eb, slot, struct btrfs_dev_item); @@ -4086,8 +4088,16 @@ static int check_dev_item(struct btrfs_fs_info *fs_info, return REFERENCER_MISSING; } - /* Iterate dev_extents to calculate the used space of a device */ + /* +* Iterate dev_extents to calculate the used space of a device +* +* Also make sure no dev extents overlap and end beyond device boundary +*/ while (1) { + u64 devid; + u64 physical_offset; + u64 physical_len; + if (path.slots[0] >= btrfs_header_nritems(path.nodes[0])) goto next; @@ -4099,7 +4109,27 @@ static int check_dev_item(struct btrfs_fs_info *fs_info, ptr = btrfs_item_ptr(path.nodes[0], path.slots[0], struct btrfs_dev_extent); - total += btrfs_dev_extent_length(path.nodes[0], ptr); + devid = key.objectid; + physical_offset = key.offset; + physical_len = btrfs_dev_extent_length(path.nodes[0], ptr); + + if (prev_devid == devid && physical_offset < prev_dev_ext_end) { + error( +"dev extent devid %llu offset %llu len %llu overlap with previous dev extent end %llu", + devid, physical_offset, physical_len, + prev_dev_ext_end); + return ACCOUNTING_MISMATCH; + } + if (physical_offset + physical_len > total_bytes) { + error( +"dev extent devid %llu offset %llu len %llu is beyond device boundary %llu", + devid, physical_offset, physical_len, + total_bytes); + return ACCOUNTING_MISMATCH; + } + prev_devid = devid; + prev_dev_ext_end = physical_offset + physical_len; + total += physical_len; next: ret = btrfs_next_item(dev_root, ); if (ret) -- 2.19.1
[PATCH v2 2/5] btrfs-progs: original check: Add ability to detect bad dev extents
Unlike lowmem mode check, we don't have good place for original mode to check overlap dev extents. So this patch introduces a new function, btrfs_check_dev_extents(), to handle possible bad dev extents. Reported-by: Hans van Kranenburg Signed-off-by: Qu Wenruo --- check/main.c | 99 1 file changed, 99 insertions(+) diff --git a/check/main.c b/check/main.c index bc2ee22f7943..ff9a785ce555 100644 --- a/check/main.c +++ b/check/main.c @@ -8224,6 +8224,99 @@ out: return ret; } +/* + * Check if all dev extents are valid (not overlap nor beyond device + * boundary). + * + * Dev extents <-> chunk cross checking is already done in check_chunks(). + */ +static int check_dev_extents(struct btrfs_fs_info *fs_info) +{ + struct btrfs_path path; + struct btrfs_key key; + struct btrfs_root *dev_root = fs_info->dev_root; + int ret; + u64 prev_devid = 0; + u64 prev_dev_ext_end = 0; + + btrfs_init_path(); + + key.objectid = 1; + key.type = BTRFS_DEV_EXTENT_KEY; + key.offset = 0; + + ret = btrfs_search_slot(NULL, dev_root, , , 0, 0); + if (ret < 0) { + error("failed to search device tree: %s", strerror(-ret)); + goto out; + } + if (path.slots[0] >= btrfs_header_nritems(path.nodes[0])) { + ret = btrfs_next_leaf(dev_root, ); + if (ret < 0) { + error("failed to find next leaf: %s", strerror(-ret)); + goto out; + } + if (ret > 0) { + ret = 0; + goto out; + } + } + + while (1) { + struct btrfs_dev_extent *dev_ext; + struct btrfs_device *dev; + u64 devid; + u64 physical_offset; + u64 physical_len; + + btrfs_item_key_to_cpu(path.nodes[0], , path.slots[0]); + if (key.type != BTRFS_DEV_EXTENT_KEY) + break; + dev_ext = btrfs_item_ptr(path.nodes[0], path.slots[0], +struct btrfs_dev_extent); + devid = key.objectid; + physical_offset = key.offset; + physical_len = btrfs_dev_extent_length(path.nodes[0], dev_ext); + + dev = btrfs_find_device(fs_info, devid, NULL, NULL); + if (!dev) { + error("failed to find device with devid %llu", devid); + ret = -EUCLEAN; + goto out; + } + if (prev_devid == devid && prev_dev_ext_end > physical_offset) { + error( +"dev extent devid %llu physical offset %llu overlap with previous dev extent end %llu", + devid, physical_offset, prev_dev_ext_end); + ret = -EUCLEAN; + goto out; + } + if (physical_offset + physical_len > dev->total_bytes) { + error( +"dev extent devid %llu physical offset %llu len %llu is beyond device boudnary %llu", + devid, physical_offset, physical_len, + dev->total_bytes); + ret = -EUCLEAN; + goto out; + } + prev_devid = devid; + prev_dev_ext_end = physical_offset + physical_len; + + ret = btrfs_next_item(dev_root, ); + if (ret < 0) { + error("failed to find next leaf: %s", strerror(-ret)); + goto out; + } + if (ret > 0) { + ret = 0; + break; + } + } +out: + btrfs_release_path(); + return ret; +} + static int check_chunks_and_extents(struct btrfs_fs_info *fs_info) { struct rb_root dev_cache; @@ -8318,6 +8411,12 @@ again: goto out; } + ret = check_dev_extents(fs_info); + if (ret < 0) { + err = ret; + goto out; + } + ret = check_chunks(_cache, _group_cache, _extent_cache, NULL, NULL, NULL, 0); if (ret) { -- 2.19.1
[PATCH v2 0/5] btrfs-progs: check: Detect invalid dev extents and device items
This patchset can be fetch from github: https://github.com/adam900710/btrfs-progs/tree/dev_extents_check Hans van Kranenburg reported a case where btrfs DUP chunk allocator could allocate invalid dev extents, either overlaps with existing dev extents or beyond device boundary. This patchset enhances the btrfs-progs side to detect such problems. With hand crafted test image for it. Link: https://www.spinics.net/lists/linux-btrfs/msg82370.html Changelog: v2: Fix a bug in the 1st patch which makes lowmem mode never checks overlap dev extents. Fix test case bug which never passes due to wrong script. Qu Wenruo (5): btrfs-progs: lowmem check: Add check for overlapping dev extents btrfs-progs: original check: Add ability to detect bad dev extents btrfs-progs: lowmem check: Add dev_item check for used bytes and total bytes btrfs-progs: original check: Add dev_item check for used bytes and total bytes btrfs-progs: fsck-tests: Add test image for dev extents beyond device boundary check/main.c | 105 ++ check/mode-lowmem.c | 39 ++- .../over_dev_boundary.img.xz | Bin 0 -> 1640 bytes tests/fsck-tests/036-bad-dev-extents/test.sh | 20 4 files changed, 162 insertions(+), 2 deletions(-) create mode 100644 tests/fsck-tests/036-bad-dev-extents/over_dev_boundary.img.xz create mode 100755 tests/fsck-tests/036-bad-dev-extents/test.sh -- 2.19.1
[PATCH] Btrfs: fix warning when replaying log after fsync of a tmpfile
From: Filipe Manana When replaying a log which contains a tmpfile (which necessarily has a link count of 0) we end up calling inc_nlink(), at fs/btrfs/tree-log.c:replay_one_buffer(), which produces a warning like the following: [195191.943673] WARNING: CPU: 0 PID: 6924 at fs/inode.c:342 inc_nlink+0x33/0x40 [195191.943674] Modules linked in: btrfs dm_flakey dm_mod xor raid6_pq libcrc32c kvm_intel bochs_drm ttm kvm drm_kms_helper drm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper joydev sg button evdev pcspkr qemu_fw_cfg serio_raw parport_pc ppdev lp parport ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache jbd2 fscrypto sd_mod virtio_scsi ata_generic virtio_pci virtio_ring virtio ata_piix floppy crc32c_intel libata psmouse e1000 scsi_mod i2c_piix4 [last unloaded: btrfs] [195191.943723] CPU: 0 PID: 6924 Comm: mount Not tainted 4.19.0-rc6-btrfs-next-38 #1 [195191.943724] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 [195191.943726] RIP: 0010:inc_nlink+0x33/0x40 [195191.943727] Code: c0 74 07 83 c0 01 89 47 48 c3 f6 87 d1 00 00 00 04 74 17 48 8b 47 28 f0 48 83 a8 70 07 00 00 01 8b 47 48 83 c0 01 89 47 48 c3 <0f> 0b eb e5 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 ff 05 54 [195191.943728] RSP: 0018:b96e425e3870 EFLAGS: 00010246 [195191.943730] RAX: RBX: 8c0d1e6af4f0 RCX: 0006 [195191.943731] RDX: RSI: RDI: 8c0d1e6af4f0 [195191.943731] RBP: 0097 R08: 0001 R09: [195191.943732] R10: R11: R12: b96e425e3a60 [195191.943733] R13: 8c0d10cff0c8 R14: 8c0d0d515348 R15: 8c0d78a1b3f8 [195191.943735] FS: 7f570ee24480() GS:8c0dfb20() knlGS: [195191.943736] CS: 0010 DS: ES: CR0: 80050033 [195191.943737] CR2: 5593286277c8 CR3: bb8f2006 CR4: 003606f0 [195191.943739] DR0: DR1: DR2: [195191.943740] DR3: DR6: fffe0ff0 DR7: 0400 [195191.943741] Call Trace: [195191.943778] replay_one_buffer+0x797/0x7d0 [btrfs] [195191.943802] walk_up_log_tree+0x1c1/0x250 [btrfs] [195191.943809] ? rcu_read_lock_sched_held+0x3f/0x70 [195191.943825] walk_log_tree+0xae/0x1d0 [btrfs] [195191.943840] btrfs_recover_log_trees+0x1d7/0x4d0 [btrfs] [195191.943856] ? replay_dir_deletes+0x280/0x280 [btrfs] [195191.943870] open_ctree+0x1c3b/0x22a0 [btrfs] [195191.943887] btrfs_mount_root+0x6b4/0x800 [btrfs] [195191.943894] ? rcu_read_lock_sched_held+0x3f/0x70 [195191.943899] ? pcpu_alloc+0x55b/0x7c0 [195191.943906] ? mount_fs+0x3b/0x140 [195191.943908] mount_fs+0x3b/0x140 [195191.943912] ? __init_waitqueue_head+0x36/0x50 [195191.943916] vfs_kern_mount+0x62/0x160 [195191.943927] btrfs_mount+0x134/0x890 [btrfs] [195191.943936] ? rcu_read_lock_sched_held+0x3f/0x70 [195191.943938] ? pcpu_alloc+0x55b/0x7c0 [195191.943943] ? mount_fs+0x3b/0x140 [195191.943952] ? btrfs_remount+0x570/0x570 [btrfs] [195191.943954] mount_fs+0x3b/0x140 [195191.943956] ? __init_waitqueue_head+0x36/0x50 [195191.943960] vfs_kern_mount+0x62/0x160 [195191.943963] do_mount+0x1f9/0xd40 [195191.943967] ? memdup_user+0x4b/0x70 [195191.943971] ksys_mount+0x7e/0xd0 [195191.943974] __x64_sys_mount+0x21/0x30 [195191.943977] do_syscall_64+0x60/0x1b0 [195191.943980] entry_SYSCALL_64_after_hwframe+0x49/0xbe [195191.943983] RIP: 0033:0x7f570e4e524a [195191.943985] Code: 48 8b 0d 51 fc 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1e fc 2a 00 f7 d8 64 89 01 48 [195191.943986] RSP: 002b:7ffd83589478 EFLAGS: 0206 ORIG_RAX: 00a5 [195191.943989] RAX: ffda RBX: 563f335b2060 RCX: 7f570e4e524a [195191.943990] RDX: 563f335b2240 RSI: 563f335b2280 RDI: 563f335b2260 [195191.943992] RBP: R08: R09: 0020 [195191.943993] R10: c0ed R11: 0206 R12: 563f335b2260 [195191.943994] R13: 563f335b2240 R14: R15: [195191.944002] irq event stamp: 8688 [195191.944010] hardirqs last enabled at (8687): [] console_unlock+0x503/0x640 [195191.944012] hardirqs last disabled at (8688): [] trace_hardirqs_off_thunk+0x1a/0x1c [195191.944018] softirqs last enabled at (8638): [] __set_page_dirty_nobuffers+0x101/0x150 [195191.944020] softirqs last disabled at (8634): [] wb_wakeup_delayed+0x2e/0x60 [195191.944022] ---[ end trace 5d6e873a9a0b811a ]--- This happens because the inode does not have the flag I_LINKABLE set,
Re: [PATCH 1/5] btrfs-progs: lowmem check: Add check for overlapping dev extents
On 2018/10/8 下午5:28, Su Yue wrote: > > > On 10/8/18 3:00 PM, Qu Wenruo wrote: >> Add such check at check_dev_item(), since at that timing we're also >> iterating dev extents for dev item accounting. >> >> Signed-off-by: Qu Wenruo >> --- >> check/mode-lowmem.c | 32 ++-- >> 1 file changed, 30 insertions(+), 2 deletions(-) >> >> diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c >> index 1bce44f5658a..d387423639e6 100644 >> --- a/check/mode-lowmem.c >> +++ b/check/mode-lowmem.c >> @@ -4065,6 +4065,8 @@ static int check_dev_item(struct btrfs_fs_info >> *fs_info, >> u64 dev_id; >> u64 used; >> u64 total = 0; >> + u64 prev_devid = 0; >> + u64 prev_dev_ext_end = 0; > > Those two new variables aren't assigned anymore in the patch... Oh, what I'm doing... (palm face I'll fix this with the test case bug. Thanks for reviewing, Qu > > Thanks, > Su >> int ret; >> dev_item = btrfs_item_ptr(eb, slot, struct btrfs_dev_item); >> @@ -4086,8 +4088,16 @@ static int check_dev_item(struct btrfs_fs_info >> *fs_info, >> return REFERENCER_MISSING; >> } >> - /* Iterate dev_extents to calculate the used space of a device */ >> + /* >> + * Iterate dev_extents to calculate the used space of a device >> + * >> + * Also make sure no dev extents overlap and end beyond device >> boundary >> + */ >> while (1) { >> + u64 devid; >> + u64 physical_offset; >> + u64 physical_len; >> + >> if (path.slots[0] >= btrfs_header_nritems(path.nodes[0])) >> goto next; >> @@ -4099,7 +4109,25 @@ static int check_dev_item(struct >> btrfs_fs_info *fs_info, >> ptr = btrfs_item_ptr(path.nodes[0], path.slots[0], >> struct btrfs_dev_extent); >> - total += btrfs_dev_extent_length(path.nodes[0], ptr); >> + devid = key.objectid; >> + physical_offset = key.offset; >> + physical_len = btrfs_dev_extent_length(path.nodes[0], ptr); >> + >> + if (prev_devid == devid && physical_offset < prev_dev_ext_end) { >> + error( >> +"dev extent devid %llu offset %llu len %llu overlap with previous dev >> extent end %llu", >> + devid, physical_offset, physical_len, >> + prev_dev_ext_end); >> + return ACCOUNTING_MISMATCH; >> + } >> + if (physical_offset + physical_len > total_bytes) { >> + error( >> +"dev extent devid %llu offset %llu len %llu is beyond device boundary >> %llu", >> + devid, physical_offset, physical_len, >> + total_bytes); >> + return ACCOUNTING_MISMATCH; >> + } >> + total += physical_len; >> next: >> ret = btrfs_next_item(dev_root, ); >> if (ret) >> > >
[PATCH] generic: test mounting filesystem after fsync of a tmpfile
From: Filipe Manana Test that if we fsync a tmpfile, without adding a hard link to it, and then power fail, we will be able to mount the filesystem without triggering any crashes, warnings or corruptions. This test is motivated by an issue in btrfs where this scenario triggered a warning (without any side effects). The following linux kernel patch fixes the issue in btrfs: "Btrfs: fix warning when replaying log after fsync of a tmpfile" Signed-off-by: Filipe Manana --- tests/generic/506 | 58 +++ tests/generic/506.out | 3 +++ tests/generic/group | 1 + 3 files changed, 62 insertions(+) create mode 100755 tests/generic/506 create mode 100644 tests/generic/506.out diff --git a/tests/generic/506 b/tests/generic/506 new file mode 100755 index ..7d28d3b0 --- /dev/null +++ b/tests/generic/506 @@ -0,0 +1,58 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2018 SUSE Linux Products GmbH. All Rights Reserved. +# +# FS QA Test No. 506 +# +# Test that if we fsync a tmpfile, without adding a hard link to it, and then +# power fail, we will be able to mount the filesystem without triggering any +# crashes, warnings or corruptions. +# +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + _cleanup_flakey + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/dmflakey + +# real QA test starts here +_supported_fs generic +_supported_os Linux +_require_scratch +_require_xfs_io_command "-T" +_require_dm_target flakey + +rm -f $seqres.full + +_scratch_mkfs >>$seqres.full 2>&1 +_require_metadata_journaling $SCRATCH_DEV +_init_flakey +_mount_flakey + +# Create our tmpfile, write some data to it and fsync it. We want a power +# failure to happen after the fsync, so that we have an inode with a link +# count of 0 in our log/journal. +$XFS_IO_PROG -T \ + -c "pwrite -S 0xab 0 64K" \ + -c "fsync" \ + $SCRATCH_MNT | _filter_xfs_io + +# Simulate a power failure and mount the filesystem to check that it succeeds. +_flakey_drop_and_remount + +_unmount_flakey + +status=0 +exit diff --git a/tests/generic/506.out b/tests/generic/506.out new file mode 100644 index ..f522e663 --- /dev/null +++ b/tests/generic/506.out @@ -0,0 +1,3 @@ +QA output created by 506 +wrote 65536/65536 bytes at offset 0 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) diff --git a/tests/generic/group b/tests/generic/group index 4da0e188..2e2a6247 100644 --- a/tests/generic/group +++ b/tests/generic/group @@ -508,3 +508,4 @@ 503 auto quick dax punch collapse zero 504 auto quick locks 505 shutdown auto quick metadata +506 auto quick log -- 2.11.0
Re: [PATCH 1/5] btrfs-progs: lowmem check: Add check for overlapping dev extents
On 10/8/18 3:00 PM, Qu Wenruo wrote: Add such check at check_dev_item(), since at that timing we're also iterating dev extents for dev item accounting. Signed-off-by: Qu Wenruo --- check/mode-lowmem.c | 32 ++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 1bce44f5658a..d387423639e6 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -4065,6 +4065,8 @@ static int check_dev_item(struct btrfs_fs_info *fs_info, u64 dev_id; u64 used; u64 total = 0; + u64 prev_devid = 0; + u64 prev_dev_ext_end = 0; Those two new variables aren't assigned anymore in the patch... Thanks, Su int ret; dev_item = btrfs_item_ptr(eb, slot, struct btrfs_dev_item); @@ -4086,8 +4088,16 @@ static int check_dev_item(struct btrfs_fs_info *fs_info, return REFERENCER_MISSING; } - /* Iterate dev_extents to calculate the used space of a device */ + /* +* Iterate dev_extents to calculate the used space of a device +* +* Also make sure no dev extents overlap and end beyond device boundary +*/ while (1) { + u64 devid; + u64 physical_offset; + u64 physical_len; + if (path.slots[0] >= btrfs_header_nritems(path.nodes[0])) goto next; @@ -4099,7 +4109,25 @@ static int check_dev_item(struct btrfs_fs_info *fs_info, ptr = btrfs_item_ptr(path.nodes[0], path.slots[0], struct btrfs_dev_extent); - total += btrfs_dev_extent_length(path.nodes[0], ptr); + devid = key.objectid; + physical_offset = key.offset; + physical_len = btrfs_dev_extent_length(path.nodes[0], ptr); + + if (prev_devid == devid && physical_offset < prev_dev_ext_end) { + error( +"dev extent devid %llu offset %llu len %llu overlap with previous dev extent end %llu", + devid, physical_offset, physical_len, + prev_dev_ext_end); + return ACCOUNTING_MISMATCH; + } + if (physical_offset + physical_len > total_bytes) { + error( +"dev extent devid %llu offset %llu len %llu is beyond device boundary %llu", + devid, physical_offset, physical_len, + total_bytes); + return ACCOUNTING_MISMATCH; + } + total += physical_len; next: ret = btrfs_next_item(dev_root, ); if (ret)
Re: [PATCH 5/5] btrfs-progs: fsck-tests: Add test image for dev extents beyond device boundary
On 10/8/18 3:00 PM, Qu Wenruo wrote: Now two locations can detect such problem, either by device item used/total bytes check, or by early dev extents check against device boundary. The image is hand-crafted image which uses DATA SINGLE chunk to feed btrfs check. As expected, as long as block group item, chunk item, device used bytes matches, older btrfs check can't detect such problem. Signed-off-by: Qu Wenruo --- .../over_dev_boundary.img.xz | Bin 0 -> 1640 bytes tests/fsck-tests/036-bad-dev-extents/test.sh | 19 ++ 2 files changed, 19 insertions(+) create mode 100644 tests/fsck-tests/036-bad-dev-extents/over_dev_boundary.img.xz create mode 100755 tests/fsck-tests/036-bad-dev-extents/test.sh diff --git a/tests/fsck-tests/036-bad-dev-extents/over_dev_boundary.img.xz b/tests/fsck-tests/036-bad-dev-extents/over_dev_boundary.img.xz new file mode 100644 index ..47cb2a707b0097e369dc088ed0549f847995f136 GIT binary patch literal 1640 zcmV-u2ABE$H+ooF000E$*0e?f03iVu0001VFXf})3;zZsT>wRyj;C3^v%$$4d1oRm zhA1@4%tH=9jYF%IQSpIUKDpjXLRl?q4p$q;1zsY^#9_Lx=#tbGm>@S#e2aqt!?0}y z2BPO%4~c9Q5)jKFD7}DURarKL)`^j{f?s>sEHdzmSvX^98%kGi<_8 zMnXynsC*B7}KE(6w>*wfdb|tw$kt^y>W+TB*?pon1P+)#u#?6bSIG)TooJx~u$# zf^+}xKw`BfI}6=717S~Q%LW1kwY`pz9H{`uNNNFOk2w!VauNEaMfoLj)Z<)!1?F60 zJ+OEA|4$a=9W#XX*l{EG!j^s}p| z0i#_%$Q}d)-EE8#8O5^x$$8Y0l2 zc19e0Et3O3m`pMOqEkasL8VGE+u~2lZn>sRCRj169Z6mQ3*+`D+C#F1V7POV%lx(9cB{WN*9OP%Zbd1VDn(S4HX^ad4-b~#H z@9eUP4AAU`)yRf!k+rrrLSYfBSEi6RE#HtbqyPl11S>RCH zqvJbt22t`FmU^tmTb+7LtpybCB-x1lGSlrpVQ9|6WNBs~q*M-to1gD1l41oiy~}+O?!{68Jvim2*%BznW)@B=3IH)Xi1795q{#>sF*y^0T2@b$`Wqwch?BgN}IR9Ui z;!cs)hJFGsJmFaiUsYrN$c0^BLU^n-B%fagn+jR{?Dq+K%VyMG@pmAOShFY)k8zBxm@7YD zb^ZXU;<`+_B+IV{+A~1Ku zWggqWQd{E%hF}W2ArPwJ33zWP>MIe;YDN8WV+|+a4_c>yFN#ZGN00G#%3HYi z4pePTnc{8k5CjwuN6hudRFcBkvB^pSFufzaaD`-;mPgJ~wr( I checked to your branch and ran test but failed. Should it be run_must_fail instead? +} + +check_all_images
[PATCH 4/5] btrfs-progs: original check: Add dev_item check for used bytes and total bytes
Signed-off-by: Qu Wenruo --- check/main.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/check/main.c b/check/main.c index ff9a785ce555..12f12e18a83f 100644 --- a/check/main.c +++ b/check/main.c @@ -7938,6 +7938,12 @@ static int check_device_used(struct device_record *dev_rec, struct device_extent_record *dev_extent_rec; u64 total_byte = 0; + if (dev_rec->byte_used > dev_rec->total_byte) { + error("device %llu has incorrect used bytes %llu > total bytes %llu", + dev_rec->devid, dev_rec->byte_used, dev_rec->total_byte); + return -EUCLEAN; + } + cache = search_cache_extent2(_cache->tree, dev_rec->devid, 0); while (cache) { dev_extent_rec = container_of(cache, -- 2.19.0
[PATCH 2/5] btrfs-progs: original check: Add ability to detect bad dev extents
Unlike lowmem mode check, we don't have good place for original mode to check overlap dev extents. So this patch introduces a new function, btrfs_check_dev_extents(), to handle possible bad dev extents. Reported-by: Hans van Kranenburg Signed-off-by: Qu Wenruo --- check/main.c | 99 1 file changed, 99 insertions(+) diff --git a/check/main.c b/check/main.c index bc2ee22f7943..ff9a785ce555 100644 --- a/check/main.c +++ b/check/main.c @@ -8224,6 +8224,99 @@ out: return ret; } +/* + * Check if all dev extents are valid (not overlap nor beyond device + * boundary). + * + * Dev extents <-> chunk cross checking is already done in check_chunks(). + */ +static int check_dev_extents(struct btrfs_fs_info *fs_info) +{ + struct btrfs_path path; + struct btrfs_key key; + struct btrfs_root *dev_root = fs_info->dev_root; + int ret; + u64 prev_devid = 0; + u64 prev_dev_ext_end = 0; + + btrfs_init_path(); + + key.objectid = 1; + key.type = BTRFS_DEV_EXTENT_KEY; + key.offset = 0; + + ret = btrfs_search_slot(NULL, dev_root, , , 0, 0); + if (ret < 0) { + error("failed to search device tree: %s", strerror(-ret)); + goto out; + } + if (path.slots[0] >= btrfs_header_nritems(path.nodes[0])) { + ret = btrfs_next_leaf(dev_root, ); + if (ret < 0) { + error("failed to find next leaf: %s", strerror(-ret)); + goto out; + } + if (ret > 0) { + ret = 0; + goto out; + } + } + + while (1) { + struct btrfs_dev_extent *dev_ext; + struct btrfs_device *dev; + u64 devid; + u64 physical_offset; + u64 physical_len; + + btrfs_item_key_to_cpu(path.nodes[0], , path.slots[0]); + if (key.type != BTRFS_DEV_EXTENT_KEY) + break; + dev_ext = btrfs_item_ptr(path.nodes[0], path.slots[0], +struct btrfs_dev_extent); + devid = key.objectid; + physical_offset = key.offset; + physical_len = btrfs_dev_extent_length(path.nodes[0], dev_ext); + + dev = btrfs_find_device(fs_info, devid, NULL, NULL); + if (!dev) { + error("failed to find device with devid %llu", devid); + ret = -EUCLEAN; + goto out; + } + if (prev_devid == devid && prev_dev_ext_end > physical_offset) { + error( +"dev extent devid %llu physical offset %llu overlap with previous dev extent end %llu", + devid, physical_offset, prev_dev_ext_end); + ret = -EUCLEAN; + goto out; + } + if (physical_offset + physical_len > dev->total_bytes) { + error( +"dev extent devid %llu physical offset %llu len %llu is beyond device boudnary %llu", + devid, physical_offset, physical_len, + dev->total_bytes); + ret = -EUCLEAN; + goto out; + } + prev_devid = devid; + prev_dev_ext_end = physical_offset + physical_len; + + ret = btrfs_next_item(dev_root, ); + if (ret < 0) { + error("failed to find next leaf: %s", strerror(-ret)); + goto out; + } + if (ret > 0) { + ret = 0; + break; + } + } +out: + btrfs_release_path(); + return ret; +} + static int check_chunks_and_extents(struct btrfs_fs_info *fs_info) { struct rb_root dev_cache; @@ -8318,6 +8411,12 @@ again: goto out; } + ret = check_dev_extents(fs_info); + if (ret < 0) { + err = ret; + goto out; + } + ret = check_chunks(_cache, _group_cache, _extent_cache, NULL, NULL, NULL, 0); if (ret) { -- 2.19.0
[PATCH 3/5] btrfs-progs: lowmem check: Add dev_item check for used bytes and total bytes
Obviously, used bytes can't be larger than total bytes. Signed-off-by: Qu Wenruo --- check/mode-lowmem.c | 5 + 1 file changed, 5 insertions(+) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index d387423639e6..c50e34236ac8 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -4074,6 +4074,11 @@ static int check_dev_item(struct btrfs_fs_info *fs_info, used = btrfs_device_bytes_used(eb, dev_item); total_bytes = btrfs_device_total_bytes(eb, dev_item); + if (used > total_bytes) { + error("device %llu has incorrect used bytes %llu > total bytes %llu", + dev_id, used, total_bytes); + return ACCOUNTING_MISMATCH; + } key.objectid = dev_id; key.type = BTRFS_DEV_EXTENT_KEY; key.offset = 0; -- 2.19.0
[PATCH 5/5] btrfs-progs: fsck-tests: Add test image for dev extents beyond device boundary
Now two locations can detect such problem, either by device item used/total bytes check, or by early dev extents check against device boundary. The image is hand-crafted image which uses DATA SINGLE chunk to feed btrfs check. As expected, as long as block group item, chunk item, device used bytes matches, older btrfs check can't detect such problem. Signed-off-by: Qu Wenruo --- .../over_dev_boundary.img.xz | Bin 0 -> 1640 bytes tests/fsck-tests/036-bad-dev-extents/test.sh | 19 ++ 2 files changed, 19 insertions(+) create mode 100644 tests/fsck-tests/036-bad-dev-extents/over_dev_boundary.img.xz create mode 100755 tests/fsck-tests/036-bad-dev-extents/test.sh diff --git a/tests/fsck-tests/036-bad-dev-extents/over_dev_boundary.img.xz b/tests/fsck-tests/036-bad-dev-extents/over_dev_boundary.img.xz new file mode 100644 index ..47cb2a707b0097e369dc088ed0549f847995f136 GIT binary patch literal 1640 zcmV-u2ABE$H+ooF000E$*0e?f03iVu0001VFXf})3;zZsT>wRyj;C3^v%$$4d1oRm zhA1@4%tH=9jYF%IQSpIUKDpjXLRl?q4p$q;1zsY^#9_Lx=#tbGm>@S#e2aqt!?0}y z2BPO%4~c9Q5)jKFD7}DURarKL)`^j{f?s>sEHdzmSvX^98%kGi<_8 zMnXynsC*B7}KE(6w>*wfdb|tw$kt^y>W+TB*?pon1P+)#u#?6bSIG)TooJx~u$# zf^+}xKw`BfI}6=717S~Q%LW1kwY`pz9H{`uNNNFOk2w!VauNEaMfoLj)Z<)!1?F60 zJ+OEA|4$a=9W#XX*l{EG!j^s}p| z0i#_%$Q}d)-EE8#8O5^x$$8Y0l2 zc19e0Et3O3m`pMOqEkasL8VGE+u~2lZn>sRCRj169Z6mQ3*+`D+C#F1V7POV%lx(9cB{WN*9OP%Zbd1VDn(S4HX^ad4-b~#H z@9eUP4AAU`)yRf!k+rrrLSYfBSEi6RE#HtbqyPl11S>RCH zqvJbt22t`FmU^tmTb+7LtpybCB-x1lGSlrpVQ9|6WNBs~q*M-to1gD1l41oiy~}+O?!{68Jvim2*%BznW)@B=3IH)Xi1795q{#>sF*y^0T2@b$`Wqwch?BgN}IR9Ui z;!cs)hJFGsJmFaiUsYrN$c0^BLU^n-B%fagn+jR{?Dq+K%VyMG@pmAOShFY)k8zBxm@7YD zb^ZXU;<`+_B+IV{+A~1Ku zWggqWQd{E%hF}W2ArPwJ33zWP>MIe;YDN8WV+|+a4_c>yFN#ZGN00G#%3HYi z4pePTnc{8k5CjwuN6hudRFcBkvB^pSFufzaaD`-;mPgJ~wr(
[PATCH 1/5] btrfs-progs: lowmem check: Add check for overlapping dev extents
Add such check at check_dev_item(), since at that timing we're also iterating dev extents for dev item accounting. Signed-off-by: Qu Wenruo --- check/mode-lowmem.c | 32 ++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 1bce44f5658a..d387423639e6 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -4065,6 +4065,8 @@ static int check_dev_item(struct btrfs_fs_info *fs_info, u64 dev_id; u64 used; u64 total = 0; + u64 prev_devid = 0; + u64 prev_dev_ext_end = 0; int ret; dev_item = btrfs_item_ptr(eb, slot, struct btrfs_dev_item); @@ -4086,8 +4088,16 @@ static int check_dev_item(struct btrfs_fs_info *fs_info, return REFERENCER_MISSING; } - /* Iterate dev_extents to calculate the used space of a device */ + /* +* Iterate dev_extents to calculate the used space of a device +* +* Also make sure no dev extents overlap and end beyond device boundary +*/ while (1) { + u64 devid; + u64 physical_offset; + u64 physical_len; + if (path.slots[0] >= btrfs_header_nritems(path.nodes[0])) goto next; @@ -4099,7 +4109,25 @@ static int check_dev_item(struct btrfs_fs_info *fs_info, ptr = btrfs_item_ptr(path.nodes[0], path.slots[0], struct btrfs_dev_extent); - total += btrfs_dev_extent_length(path.nodes[0], ptr); + devid = key.objectid; + physical_offset = key.offset; + physical_len = btrfs_dev_extent_length(path.nodes[0], ptr); + + if (prev_devid == devid && physical_offset < prev_dev_ext_end) { + error( +"dev extent devid %llu offset %llu len %llu overlap with previous dev extent end %llu", + devid, physical_offset, physical_len, + prev_dev_ext_end); + return ACCOUNTING_MISMATCH; + } + if (physical_offset + physical_len > total_bytes) { + error( +"dev extent devid %llu offset %llu len %llu is beyond device boundary %llu", + devid, physical_offset, physical_len, + total_bytes); + return ACCOUNTING_MISMATCH; + } + total += physical_len; next: ret = btrfs_next_item(dev_root, ); if (ret) -- 2.19.0
[PATCH 0/5] btrfs-progs: check: Detect invalid dev extents and device items
This patchset can be fetch from github: https://github.com/adam900710/btrfs-progs/tree/dev_extents_check Hans van Kranenburg reported a case where btrfs DUP chunk allocator could allocate invalid dev extents, either overlaps with existing dev extents or beyond device boundary. This patchset enhances the btrfs-progs side to detect such problems. With hand crafted test image for it. Qu Wenruo (5): btrfs-progs: lowmem check: Add check for overlapping dev extents btrfs-progs: original check: Add ability to detect bad dev extents btrfs-progs: lowmem check: Add dev_item check for used bytes and total bytes btrfs-progs: original check: Add dev_item check for used bytes and total bytes btrfs-progs: fsck-tests: Add test image for dev extents beyond device boundary check/main.c | 105 ++ check/mode-lowmem.c | 37 +- .../over_dev_boundary.img.xz | Bin 0 -> 1640 bytes tests/fsck-tests/036-bad-dev-extents/test.sh | 19 4 files changed, 159 insertions(+), 2 deletions(-) create mode 100644 tests/fsck-tests/036-bad-dev-extents/over_dev_boundary.img.xz create mode 100755 tests/fsck-tests/036-bad-dev-extents/test.sh -- 2.19.0
Re: [PATCH 0/6] Chunk allocator DUP fix and cleanups
On 2018/10/5 下午6:58, Hans van Kranenburg wrote: > On 10/05/2018 09:51 AM, Qu Wenruo wrote: >> >> >> On 2018/10/5 上午5:24, Hans van Kranenburg wrote: >>> This patch set contains an additional fix for a newly exposed bug after >>> the previous attempt to fix a chunk allocator bug for new DUP chunks: >>> >>> https://lore.kernel.org/linux-btrfs/782f6000-30c0-0085-abd2-74ec5827c...@mendix.com/T/#m609ccb5d32998e8ba5cfa9901c1ab56a38a6f374 >> >> For that bug, did you succeeded in reproducing the bug? > > Yes, open the above link and scroll to "Steps to reproduce". That's beyond device boundary one. Also reproduced here. And hand-crafted a super small image as test case for btrfs-progs. But I'm a little curious about the dev extent overlapping case. Have you got one? Thanks, Qu > > o/ > >> I'm adding dev extent overlap checking in btrfs_verify_dev_extents() and >> btrfs-progs. >> I think it could help to detect such problem. >> >> Thanks, >> Qu >> >>> >>> The DUP fix is "fix more DUP stripe size handling". I did that one >>> before starting to change more things so it can be applied to earlier >>> LTS kernels. >>> >>> Besides that patch, which is fixing the bug in a way that is least >>> intrusive, I added a bunch of other patches to help getting the chunk >>> allocator code in a state that is a bit less error-prone and >>> bug-attracting. >>> >>> When running this and trying the reproduction scenario, I can now see >>> that the created DUP device extent is 827326464 bytes long, which is >>> good. >>> >>> I wrote and tested this on top of linus 4.19-rc5. I still need to create >>> a list of related use cases and test more things to at least walk >>> through a bunch of obvious use cases to see if there's nothing exploding >>> too quickly with these changes. However, I'm quite confident about it, >>> so I'm sharing all of it already. >>> >>> Any feedback and review is appreciated. Be gentle and keep in mind that >>> I'm still very much in a learning stage regarding kernel development. >>> >>> The stable patches handling workflow is not 100% clear to me yet. I >>> guess I have to add a Fixes: in the DUP patch which points to the >>> previous commit 92e222df7b. >>> >>> Moo!, >>> Knorrie >>> >>> Hans van Kranenburg (6): >>> btrfs: alloc_chunk: do not refurbish num_bytes >>> btrfs: alloc_chunk: improve chunk size variable name >>> btrfs: alloc_chunk: fix more DUP stripe size handling >>> btrfs: fix ncopies raid_attr for RAID56 >>> btrfs: introduce nparity raid_attr >>> btrfs: alloc_chunk: rework chunk/stripe calculations >>> >>> fs/btrfs/volumes.c | 84 +++--- >>> fs/btrfs/volumes.h | 4 ++- >>> 2 files changed, 45 insertions(+), 43 deletions(-) >>> >> > > signature.asc Description: OpenPGP digital signature