3.15 btrfs free space cache oops
When running MonetDB against a BTRFS RAID-0 set over 4 SSDs [1] on 3.15.5, we see io_ctl have a bad address of 0x20, causing a fatal pagefault in memcpy(): (gdb) list *(__btrfs_write_out_cache+0x3e4) 0x81365984 is in __btrfs_write_out_cache (fs/btrfs/free-space-cache.c:521). 516if (io_ctl-index = io_ctl-num_pages) 517return -ENOSPC; 518io_ctl_map_page(io_ctl, 0); 519} 520 521memcpy(io_ctl-cur, bitmap, PAGE_CACHE_SIZE); 522io_ctl_set_crc(io_ctl, io_ctl-index - 1); 523if (io_ctl-index io_ctl-num_pages) 524io_ctl_map_page(io_ctl, 0); 525return 0; I can try to reproduce it if more data is useful? Thanks, Daniel -- [1] mkfs.btrfs -f -m raid0 -d raid0 -n 16k -l 16k -O skinny-metadata /dev/sda2 /dev/sdc2 /dev/sdb2 /dev/sdd2 mount /dev/sda2 /scratch -o noatime,discard,nodatasum,nobarrier,ssd_spread -- [2] BUG: unable to handle kernel paging request at 0020 IP: [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0 PGD 3bca02c067 PUD 3bcf5fb067 PMD 0 Oops: [#1] SMP Modules linked in: CPU: 34 PID: 46645 Comm: mserver5 Not tainted 3.15.5-server #7 Hardware name: Dell Inc. PowerEdge R815/0W13NR, BIOS 3.1.1 [1.1.54] 10/16/2013 task: 880a8c7234f0 ti: 8809aefcc000 task.ti: 8809aefcc000 RIP: 0010:[8135a374] [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0 RSP: 0018:8809aefcfc40 EFLAGS: 00010246 RAX: 004fb9321000 RBX: 8809aefcfca8 RCX: 0200 RDX: 1000 RSI: 0020 RDI: 884fb9321000 RBP: 8809aefcfd48 R08: 0200 R09: R10: R11: 884fb9320ffc R12: 8831e3303740 R13: 880100579970 R14: 880bb38061c0 R15: 0020 FS: 7fb9447ed700() GS:884bbfc8() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0020 CR3: 00329b71c000 CR4: 000407e0 Stack: 8809aefcfc90 0011 000e 884fbbc2c870 880bb38061c0 8809aefcfc90 880bb3806058 880b02ec 883bcd523800 8833d338f2c0 88476b1eb4e0 00b890cde000 Call Trace: [81a75b4b] ? _raw_spin_lock+0xb/0x20 [8135c0e1] btrfs_write_out_cache+0xb1/0xf0 [8130be0b] btrfs_write_dirty_block_groups+0x58b/0x670 [813199c5] commit_cowonly_roots+0x195/0x250 [8131b92f] btrfs_commit_transaction+0x41f/0x9b0 [81358e85] ? btrfs_log_dentry_safe+0x55/0x70 [8132b6b2] btrfs_sync_file+0x182/0x2a0 [8114a450] do_fsync+0x50/0x80 [8114a6de] SyS_fdatasync+0xe/0x20 [81a766e6] system_call_fastpath+0x1a/0x1f Code: ff 4d 89 fc 49 89 c7 e9 ab 00 00 00 0f 1f 00 40 f6 c7 02 0f 85 fe 00 00 00 40 f6 c7 04 0f 85 14 01 00 00 89 d1 c1 e9 03 f6 c2 04 f3 48 a5 74 09 8b 0e 89 0f b9 04 00 00 00 f6 c2 02 74 0e 44 0f RIP [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0 RSP 8809aefcfc40 CR2: 0020 -- Daniel J Blueman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] btrfs send/receive, page allocation failure
On Mon, Aug 11, 2014 at 4:07 AM, Chris Murphy li...@colorremedies.com wrote: Can you try the following patch and confirm if it helps? https://patchwork.kernel.org/patch/4705171/ This one applies without problems, I didn't build it because I saw v4. The v4 patch I get: + patch -p1 -F1 -s 4 out of 4 hunks FAILED -- saving rejects to file fs/btrfs/send.c.rej Should v4 alone be applied over 3.16.0? Or each version in succession? Alone. How did you try to apply it to 3.16? Try cd source_dir git am patchfile if you didn't (e.g. you used patch command directly). thanks Chris Murphy -- Filipe David Manana, Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] btrfs-progs: random fixes of btrfs-filesystem documentation
From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com - Simplify and unify the description of both man and usage. - Fix to show -m and -d is not exclusive with path|uuid|device|label. - Add the description about short options for --mounted and --all-devices, -m and -d respectively. - Move the descriptions of options to Options section. Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- Documentation/btrfs-filesystem.txt | 22 ++ cmds-filesystem.c | 15 ++- 2 files changed, 24 insertions(+), 13 deletions(-) diff --git a/Documentation/btrfs-filesystem.txt b/Documentation/btrfs-filesystem.txt index c9c0b00..fe68496 100644 --- a/Documentation/btrfs-filesystem.txt +++ b/Documentation/btrfs-filesystem.txt @@ -20,15 +20,21 @@ SUBCOMMAND *df* path [path...]:: Show space usage information for a mount point. -*show* [--mounted|--all-devices|path|uuid|device|label]:: -Show the btrfs filesystem with some additional info. +*show* [-d|-m] [path|uuid|device|label]:: +Show the structure of btrfs filesystem(s). + -If no option nor path|uuid|device|label is passed, btrfs shows -information of all the btrfs filesystem both mounted and unmounted. -If '--mounted' is passed, it would probe btrfs kernel to list mounted btrfs -filesystem(s); -If '--all-devices' is passed, all the devices under /dev are scanned; -otherwise the devices list is extracted from the /proc/partitions file. +If none of 'path|uuid|device|label' is passed, btrfs shows +information of all the btrfs filesystems both mounted and unmounted. ++ +The show command finds btrfs filesystems by scanning all the devices +in /proc/partitions by default. ++ +`Options` ++ +-d|--alldevices +scan all the devices under /dev +-m|--mounted +scan only mounted filesystems *sync* path:: Force a sync for the filesystem identified by path. diff --git a/cmds-filesystem.c b/cmds-filesystem.c index 38011e5..5a80a98 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -578,11 +578,16 @@ out: } static const char * const cmd_show_usage[] = { - btrfs filesystem show [options] [path|uuid|device|label], - Show the structure of a filesystem, - -d|--all-devices show only disks under /dev containing btrfs filesystem, - -m|--mounted show only mounted btrfs, - If no argument is given, structure of all present filesystems is shown., + btrfs filesystem show [-d|-m] [path|uuid|device|label], + Show the structure of btrfs filesystem(s)., + If none of 'path|uuid|device|label' is passed, btrfs shows, + information of all the btrfs filesystems both mounted and unmounted., + , + The show command finds btrfs filesystems by scanning all the devices, + in /proc/partitions by default., + , + -d|--all-devices scan all the devices under /dev, + -m|--mounted scan only mounted filesystems, NULL }; -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] btrfs-progs: avoid to use numeric literal for the size of uuid buffer
From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com Replace a numeric literal to more descriptive macro for the size of uuid buffer. Signed-of-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- cmds-filesystem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cmds-filesystem.c b/cmds-filesystem.c index 5a80a98..7633f1f 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -603,7 +603,7 @@ static int cmd_show(int argc, char **argv) char mp[BTRFS_PATH_NAME_MAX + 1]; char path[PATH_MAX]; __u8 fsid[BTRFS_FSID_SIZE]; - char uuid_buf[37]; + char uuid_buf[BTRFS_UUID_UNPARSED_SIZE]; int found = 0; while (1) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] btrfs-progs: Show error message if btrfs filesystem show failed to find any btrfs filesystem
From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com Current btrfs doesn't display any error message if this command failed to find any btrfs filesystem corresponding to path|uuid|device|label which user specified. Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- cmds-filesystem.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/cmds-filesystem.c b/cmds-filesystem.c index 7633f1f..2f78e24 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -695,6 +695,7 @@ static int cmd_show(int argc, char **argv) ret = btrfs_scan_kernel(search); if (search !ret) { /* since search is found we are done */ + found = 1; goto out; } @@ -729,6 +730,15 @@ devs_only: btrfs_close_devices(fs_devices); } out: + if (search !found) { + fprintf(stderr, + ERROR: Couldn't find any btrfs filesystem + matches with '%s'.\n, search); + fprintf(stderr, + Please check if both '%s' and the range of scanning + are correct.\n, search); + } + printf(%s\n, BTRFS_BUILD_VERSION); free_seen_fsid(); return ret; -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] btrfs: update sprout seed pointer when seed fs is relinquished
We are not updating sprout fs seed pointer when all seed device is replaced. This patch will check if all seed device has been replaced and then update the sprout pointer accordingly. Same reproducer as in the previous patch would apply here. And notice that btrfs_close_device will check if seed fs is present and spits out the error with out this patch. int btrfs_close_devices(struct btrfs_fs_devices *fs_devices) { :: seed_devices = fs_devices-seed; :: while (seed_devices) { fs_devices = seed_devices; seed_devices = fs_devices-seed; __btrfs_close_devices(fs_devices); free_fs_devices(fs_devices); } Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/volumes.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f098ae7..bfdc11f 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1992,6 +1992,25 @@ void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info *fs_info, btrfs_scratch_superblock(srcdev); } + /* unless fs_devices is seed fs, num_devices shouldn't go +* zero +*/ + BUG_ON(!fs_devices-num_devices !fs_devices-seeding); + + /* if this is no devs we rather delete the fs_devices */ + if (!fs_devices-num_devices) { + struct btrfs_fs_devices *tmp_fs_devices; + + tmp_fs_devices = fs_info-fs_devices; + while (tmp_fs_devices) { + if (tmp_fs_devices-seed == fs_devices) { + tmp_fs_devices-seed = fs_devices-seed; + break; + } + tmp_fs_devices = tmp_fs_devices-seed; + } + fs_devices-seed = NULL; + } call_rcu(srcdev-rcu, free_device); } -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] btrfs: fix rw_devices miss match after seed replace
reproducer: reproducer: mount /dev/sdb /btrfs btrfs dev add /dev/sdc /btrfs btrfs rep start -B /dev/sdb /dev/sdd /btrfs umount /btrfs WARNING: CPU: 0 PID: 3882 at fs/btrfs/volumes.c:892 __btrfs_close_devices+0x1c8/0x200 [btrfs]() which is WARN_ON(fs_devices-rw_devices); The problem here is that we did not add one to the rw_devices when we replace the seed device with a writable device. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/dev-replace.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index eea26e1..fb0a7fa 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -562,6 +562,8 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, if (fs_info-fs_devices-latest_bdev == src_device-bdev) fs_info-fs_devices-latest_bdev = tgt_device-bdev; list_add(tgt_device-dev_alloc_list, fs_info-fs_devices-alloc_list); + if (src_device-fs_devices-seeding) + fs_info-fs_devices-rw_devices++; /* replace the sysfs entry */ btrfs_kobj_rm_device(fs_info, src_device); -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] btrfs: preparatory to make btrfs_rm_dev_replace_srcdev() seed aware
There is no logical change in this patch, just a preparatory patch, so that changes can be easily reasoned. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/volumes.c | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 5f634b6..5fd0132 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1960,19 +1960,23 @@ error_undo: void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info *fs_info, struct btrfs_device *srcdev) { + struct btrfs_fs_devices *fs_devices; + WARN_ON(!mutex_is_locked(fs_info-fs_devices-device_list_mutex)); + fs_devices = fs_info-fs_devices; + list_del_rcu(srcdev-dev_list); list_del_rcu(srcdev-dev_alloc_list); - fs_info-fs_devices-num_devices--; + fs_devices-num_devices--; if (srcdev-missing) { - fs_info-fs_devices-missing_devices--; - fs_info-fs_devices-rw_devices++; + fs_devices-missing_devices--; + fs_devices-rw_devices++; } if (srcdev-can_discard) - fs_info-fs_devices-num_can_discard--; + fs_devices-num_can_discard--; if (srcdev-bdev) { - fs_info-fs_devices-open_devices--; + fs_devices-open_devices--; /* * zero out the old super if it is not writable -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] btrfs: replace seed device followed by unmount causes kernel WARNING
reproducer: mount /dev/sdb /btrfs btrfs dev add /dev/sdc /btrfs btrfs rep start -B /dev/sdb /dev/sdd /btrfs umount /btrfs WARNING: CPU: 0 PID: 12661 at fs/btrfs/volumes.c:891 __btrfs_close_devices+0x1b0/0x200 [btrfs]() :: __btrfs_close_devices() :: WARN_ON(fs_devices-open_devices); After the seed device has been replaced the new target device is no more a seed device. So we need to update the device numbers in the fs_devices as pointed by the fs_info. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/volumes.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 5fd0132..f098ae7 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1964,7 +1964,13 @@ void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info *fs_info, WARN_ON(!mutex_is_locked(fs_info-fs_devices-device_list_mutex)); - fs_devices = fs_info-fs_devices; + /* +* in case of fs with no seed, srcdev-fs_devices will point +* to fs_devices of fs_info. However when the dev being replaced is +* a seed dev it will point to the seed's local fs_devices. In short +* srcdev will have its correct fs_devices in both the cases. +*/ + fs_devices = srcdev-fs_devices; list_del_rcu(srcdev-dev_list); list_del_rcu(srcdev-dev_alloc_list); -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: replace seed device followed by unmount causes kernel WARNING
I have sent out the patch-set [PATCH 1/4] btrfs: preparatory to make btrfs_rm_dev_replace_srcdev() seed aware in replacement for this patch. Kindly use/review the above patch set. Thanks. Anand On 31/07/2014 16:45, Anand Jain wrote: On 30/07/2014 15:42, Miao Xie wrote: On Fri, 25 Jul 2014 20:33:34 +0800, Anand Jain wrote: After the seed device has been replaced the new target device is no more a seed device. So we need to bring that state in the fs_devices. reproducer: mount /dev/sdb /btrfs btrfs dev add /dev/sdc /btrfs btrfs rep start -B /dev/sdb /dev/sdd /btrfs umount /btrfs WARNING: CPU: 0 PID: 12661 at fs/btrfs/volumes.c:891 __btrfs_close_devices+0x1b0/0x200 [btrfs]() :: __btrfs_close_devices() :: WARN_ON(fs_devices-open_devices); WARN_ON(fs_devices-rw_devices); per the btrfs-devlist tool (to dump fs_devices and btrfs_device from the kernel) the num_device, open_devices, rw_devices are still at 1 but the total_device is at 2, even after the seed device has been replaced in the above example. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/dev-replace.c | 13 + 1 file changed, 13 insertions(+) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index eea26e1..a144bb1 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -569,6 +569,19 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, btrfs_rm_dev_replace_blocked(fs_info); +/* + * if we are replacing a seed device with a writable device + * then FS won't be a seeding FS any more. + */ +if (src_device-fs_devices-seeding !src_device-writeable) { First, why not move this code into btrfs_rm_dev_replace_srcdev()? Then if the first condition is true, the second one(!src_device-writeable) must be true because all the devices in the seed fs_device must be read-only. so only the first check is enough. +fs_info-fs_devices-rw_devices++; If src is missing dev, we would increase it twice. +fs_info-fs_devices-num_devices++; +fs_info-fs_devices-open_devices++; + +fs_info-fs_devices-seeding = 0; +fs_info-fs_devices-seed = NULL; In fact, we may have several seed fs_devices in one fs, and the seed fs_device which includes src might not the first one, so assign seed to be NULL would break the seed fs_device list. Yep I had question when writing this patch but later decided to reset seed and seeding. if I am not wrong don't reset seeding and seed will do as well. Thanks for reviewing. Anand Thanks Miao +} + btrfs_rm_dev_replace_srcdev(fs_info, src_device); btrfs_rm_dev_replace_unblocked(fs_info); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: don't monopolize a core when evicting inode
(2014/08/08 10:47), Filipe Manana wrote: If an inode has a very large number of extent maps, we can spend a lot of time freeing them, which triggers a soft lockup warning. Therefore reschedule if we need to when freeing the extent maps while evicting the inode. I could trigger this all the time by running xfstests/generic/299 on a file system with the no-holes feature enabled. That test creates an inode with 11386677 extent maps. $ mkfs.btrfs -f -O no-holes $TEST_DEV $ MKFS_OPTIONS=-O no-holes ./check generic/299 generic/299 382s ... Message from syslogd@debian-vm3 at Aug 7 10:44:29 ... kernel:[85304.208017] BUG: soft lockup - CPU#0 stuck for 22s! [umount:25330] 384s Ran: generic/299 Passed all 1 tests $ dmesg (...) [86304.300017] BUG: soft lockup - CPU#0 stuck for 23s! [umount:25330] (...) [86304.300036] Call Trace: [86304.300036] [81698ba9] __slab_free+0x54/0x295 [86304.300036] [a02ee9cc] ? free_extent_map+0x5c/0xb0 [btrfs] [86304.300036] [811a6cd2] kmem_cache_free+0x282/0x2a0 [86304.300036] [a02ee9cc] free_extent_map+0x5c/0xb0 [btrfs] [86304.300036] [a02e3775] btrfs_evict_inode+0xd5/0x660 [btrfs] [86304.300036] [811e7c8d] ? __inode_wait_for_writeback+0x6d/0xc0 [86304.300036] [816a389b] ? _raw_spin_unlock+0x2b/0x40 [86304.300036] [811d8cbb] evict+0xab/0x180 [86304.300036] [811d8dce] dispose_list+0x3e/0x60 [86304.300036] [811d9b04] evict_inodes+0xf4/0x110 [86304.300036] [811bd953] generic_shutdown_super+0x53/0x110 [86304.300036] [811bdaa6] kill_anon_super+0x16/0x30 [86304.300036] [a02a78ba] btrfs_kill_super+0x1a/0xa0 [btrfs] [86304.300036] [811bd3a9] deactivate_locked_super+0x59/0x80 [86304.300036] [811be44e] deactivate_super+0x4e/0x70 [86304.300036] [811dec14] mntput_no_expire+0x174/0x1f0 [86304.300036] [811deab7] ? mntput_no_expire+0x17/0x1f0 [86304.300036] [811e0517] SyS_umount+0x97/0x100 (...) Signed-off-by: Filipe Manana fdman...@suse.com Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com Tested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- fs/btrfs/inode.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 8ad3ea9..00b4bd3 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4718,6 +4718,11 @@ static void evict_inode_truncate_pages(struct inode *inode) clear_bit(EXTENT_FLAG_LOGGING, em-flags); remove_extent_mapping(map_tree, em); free_extent_map(em); + if (need_resched()) { + write_unlock(map_tree-lock); + cond_resched(); + write_lock(map_tree-lock); + } } write_unlock(map_tree-lock); @@ -4740,6 +4745,7 @@ static void evict_inode_truncate_pages(struct inode *inode) cached_state, GFP_NOFS); free_extent_state(state); + cond_resched(); spin_lock(io_tree-lock); } spin_unlock(io_tree-lock); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: clone, don't create invalid hole extent map
(2014/08/08 10:47), Filipe Manana wrote: When cloning a file that consists of an inline extent, we were creating an extent map that represents a non-existing trailing hole starting at a file offset that isn't a multiple of the sector size. This happened because when processing an inline extent we weren't aligning the extent's length to the sector size, and therefore incorrectly treating the range [inline_extent_length; sector_size[ as a hole. Signed-off-by: Filipe Manana fdman...@suse.com Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- fs/btrfs/ioctl.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index d490abd..6e3a0d1 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3494,7 +3494,8 @@ process_slot: btrfs_mark_buffer_dirty(leaf); btrfs_release_path(path); - last_dest_end = new_key.offset + datal; + last_dest_end = ALIGN(new_key.offset + datal, + root-sectorsize); ret = clone_finish_inode_update(trans, inode, last_dest_end, destoff, olen); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
File system stuck in scrub
Hello, I started a scrub of one of my btrfs filesystem and then had to restart the system. `systemctl restart` seemed to terminate all processes, but then got stuck at the end. The disk activity led was still flashing rapidly at that point, so I assume that the active scrub was preventing the reboot (is that a bug or a feature?). In any case, I could not wait for that so I power cycled. But now my file system seems to be stuck in a scrub that can neither be completed nor cancelled: $ sudo btrfs scrub status /home/nikratio/ scrub status for 8742472d-a9b0-4ab6-b67a-5d21f14f7a38 scrub started at Sun Aug 10 18:36:43 2014, running for 1562 seconds total bytes scrubbed: 209.97GiB with 0 errors $ date Sun Aug 10 22:00:44 PDT 2014 $ sudo btrfs scrub cancel /home/nikratio/ ERROR: scrub cancel failed on /home/nikratio/: not running $ sudo btrfs scrub start /home/nikratio/ ERROR: scrub is already running. To cancel use 'btrfs scrub cancel /home/nikratio/'. To see the status use 'btrfs scrub status [-d] /home/nikratio/'. Note that the scrub was started more than 3 hours ago, but claims to have been running for only 1562 seconds. I then figured that maybe I need to run btrfsck. This gave the following output: checking extents checking free space cache checking fs roots root 5 inode 3149791 errors 400, nbytes wrong root 5 inode 3150233 errors 400, nbytes wrong root 5 inode 3150238 errors 400, nbytes wrong [102 similar lines] Checking filesystem on /dev/mapper/vg0-nikratio_crypt UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38 free space inode generation (0) did not match free space cache generation (161262) free space inode generation (0) did not match free space cache generation (75485) free space inode generation (0) did not match free space cache generation (79599) free space inode generation (0) did not match free space cache generation (72280) free space inode generation (0) did not match free space cache generation (79599) free space inode generation (0) did not match free space cache generation (25866) free space inode generation (0) did not match free space cache generation (12255) free space inode generation (0) did not match free space cache generation (72521) free space inode generation (0) did not match free space cache generation (161286) free space inode generation (0) did not match free space cache generation (28716) free space inode generation (0) did not match free space cache generation (161481) found 216444746042 bytes used err is 1 total csum bytes: 383160676 total tree bytes: 875753472 total fs tree bytes: 284246016 total extent tree bytes: 69320704 btree space waste bytes: 205021777 file data blocks allocated: 3701556121600 referenced 388107321344 Btrfs v3.14.1 So nothing about the scrub, but apparently some other errors. Can someone tell me: * Should I be able to restart while a scrub is in progress, or is that deliberately prevented by btrfs? * How can I resume or cancel the scrub? * Is it more risky to leave the above errors uncorrected, or to run btrfsck with --repair? I'm using kernel 3.14. Thanks! -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.« -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: File system stuck in scrub
On Mon, Aug 11, 2014 at 08:12:46AM -0700, Nikolaus Rath wrote: I started a scrub of one of my btrfs filesystem and then had to restart the system. `systemctl restart` seemed to terminate all processes, but then got stuck at the end. The disk activity led was still flashing rapidly at that point, so I assume that the active scrub was preventing the reboot (is that a bug or a feature?). Shouldn't have stopped it. In any case, I could not wait for that so I power cycled. But now my file system seems to be stuck in a scrub that can neither be completed nor cancelled: $ sudo btrfs scrub status /home/nikratio/ scrub status for 8742472d-a9b0-4ab6-b67a-5d21f14f7a38 scrub started at Sun Aug 10 18:36:43 2014, running for 1562 seconds total bytes scrubbed: 209.97GiB with 0 errors $ date Sun Aug 10 22:00:44 PDT 2014 $ sudo btrfs scrub cancel /home/nikratio/ ERROR: scrub cancel failed on /home/nikratio/: not running $ sudo btrfs scrub start /home/nikratio/ ERROR: scrub is already running. To cancel use 'btrfs scrub cancel /home/nikratio/'. To see the status use 'btrfs scrub status [-d] /home/nikratio/'. Note that the scrub was started more than 3 hours ago, but claims to have been running for only 1562 seconds. This is a regrettably common problem -- fortunately with a simple solution. The userspace scrub monitor died in the reboot, leaving the status file present. If you delete the status file, which is in /var/lib/btrfs/, that should allow you to start a new scrub. I then figured that maybe I need to run btrfsck. This gave the following output: checking extents checking free space cache checking fs roots root 5 inode 3149791 errors 400, nbytes wrong root 5 inode 3150233 errors 400, nbytes wrong root 5 inode 3150238 errors 400, nbytes wrong [102 similar lines] Checking filesystem on /dev/mapper/vg0-nikratio_crypt UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38 free space inode generation (0) did not match free space cache generation (161262) [snip] found 216444746042 bytes used err is 1 total csum bytes: 383160676 total tree bytes: 875753472 total fs tree bytes: 284246016 total extent tree bytes: 69320704 btree space waste bytes: 205021777 file data blocks allocated: 3701556121600 referenced 388107321344 Btrfs v3.14.1 So nothing about the scrub, but apparently some other errors. The free space inode generation errors are harmless. The wrong nbytes is probably not horrifically damaging, but I don't know so much about that one. Can someone tell me: * Should I be able to restart while a scrub is in progress, or is that deliberately prevented by btrfs? Restart the machine? Yes. * How can I resume or cancel the scrub? It's probably simply not running -- see above. * Is it more risky to leave the above errors uncorrected, or to run btrfsck with --repair? I would, I think, leave them. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- We are all lying in the gutter, but some of us are looking --- at the stars. signature.asc Description: Digital signature
Re: File system stuck in scrub
Hi, On Mon, 2014-08-11 at 08:12 -0700, Nikolaus Rath wrote: Hello, I started a scrub of one of my btrfs filesystem and then had to restart the system. `systemctl restart` seemed to terminate all processes, but then got stuck at the end. The disk activity led was still flashing rapidly at that point, so I assume that the active scrub was preventing the reboot (is that a bug or a feature?). This sounds like a bug - I know that e.g. the rebalance operation is designed so that you can shutdown/reboot during the operation, and it will complete following a reboot. But I'm not familiar with the code in question. In any case, I could not wait for that so I power cycled. But now my file system seems to be stuck in a scrub that can neither be completed nor cancelled: $ sudo btrfs scrub status /home/nikratio/ scrub status for 8742472d-a9b0-4ab6-b67a-5d21f14f7a38 scrub started at Sun Aug 10 18:36:43 2014, running for 1562 seconds total bytes scrubbed: 209.97GiB with 0 errors $ date Sun Aug 10 22:00:44 PDT 2014 $ sudo btrfs scrub cancel /home/nikratio/ ERROR: scrub cancel failed on /home/nikratio/: not running $ sudo btrfs scrub start /home/nikratio/ ERROR: scrub is already running. To cancel use 'btrfs scrub cancel /home/nikratio/'. To see the status use 'btrfs scrub status [-d] /home/nikratio/'. My guess is that this is a mismatch between some state stored by the userspace tools and the state in the kernel. One of the things you can try is to delete the files /var/lib/btrfs/scrub.status.* - that will force the btrfs tools to get the current status from the kernel (you will lose some statistics and scrub history.) Running 'btrfs scrub status /home/nikratio/' after this should simply say 'no stats available', and you can start a new scrub later if you like. I then figured that maybe I need to run btrfsck. This gave the following output: As long as you didn't use --repair, this shouldn't break anything... Note that btrfs has to be run on an *unmounted* filesystem to give useful results. * Is it more risky to leave the above errors uncorrected, or to run btrfsck with --repair? There probably aren't any issues on the filesystem that the runtime btrfs code can't handle. Don't run with --repair, at least not yet. I'm using kernel 3.14. Thanks! -Nikolaus -- Calvin Walton calvin.wal...@kepstin.ca -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: File system stuck in scrub
On Mon, Aug 11, 2014 at 11:45:45AM -0400, Calvin Walton wrote: $ sudo btrfs scrub start /home/nikratio/ ERROR: scrub is already running. To cancel use 'btrfs scrub cancel /home/nikratio/'. To see the status use 'btrfs scrub status [-d] /home/nikratio/'. My guess is that this is a mismatch between some state stored by the userspace tools and the state in the kernel. One of the things you can try is to delete the files /var/lib/btrfs/scrub.status.* - that will force the btrfs tools to get the current status from the kernel (you will lose some statistics and scrub history.) No need to really delete it, just changing one character will do :) http://marc.merlins.org/perso/btrfs/post_2014-04-26_Btrfs-Tips_-Cancel-A-Btrfs-Scrub-That-Is-Already-Stopped.html Cheers, Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] btrfs send/receive, page allocation failure
On Aug 11, 2014, at 2:46 AM, Filipe David Manana fdman...@gmail.com wrote: On Mon, Aug 11, 2014 at 4:07 AM, Chris Murphy li...@colorremedies.com wrote: Can you try the following patch and confirm if it helps? https://patchwork.kernel.org/patch/4705171/ This one applies without problems, I didn't build it because I saw v4. The v4 patch I get: + patch -p1 -F1 -s 4 out of 4 hunks FAILED -- saving rejects to file fs/btrfs/send.c.rej Should v4 alone be applied over 3.16.0? Or each version in succession? Alone. How did you try to apply it to 3.16? rpmbuild https://fedoraproject.org/wiki/Building_a_custom_kernel Gist is, save the patch, create patch filename entry into kernel.spec, then run rpmbuild. rpmbuild uses 'patch -p1 -F1 -s' to apply the patch. The v1 patch applies, as does Liu Bo's patch from July 29 Btrfs: fix regression of btrfs device replace which I was also going to test. But the v4 patch isn't applying. Try cd source_dir git am patchfile if you didn't (e.g. you used patch command directly). I'd kinda prefer to build an rpm since I need to test it on baremetal for this bug, and a VM for the device replace bug. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] btrfs send/receive, page allocation failure
On Mon, Aug 11, 2014 at 4:55 PM, Chris Murphy li...@colorremedies.com wrote: On Aug 11, 2014, at 2:46 AM, Filipe David Manana fdman...@gmail.com wrote: On Mon, Aug 11, 2014 at 4:07 AM, Chris Murphy li...@colorremedies.com wrote: Can you try the following patch and confirm if it helps? https://patchwork.kernel.org/patch/4705171/ This one applies without problems, I didn't build it because I saw v4. The v4 patch I get: + patch -p1 -F1 -s 4 out of 4 hunks FAILED -- saving rejects to file fs/btrfs/send.c.rej Should v4 alone be applied over 3.16.0? Or each version in succession? Alone. How did you try to apply it to 3.16? rpmbuild https://fedoraproject.org/wiki/Building_a_custom_kernel Gist is, save the patch, create patch filename entry into kernel.spec, then run rpmbuild. rpmbuild uses 'patch -p1 -F1 -s' to apply the patch. The v1 patch applies, as does Liu Bo's patch from July 29 Btrfs: fix regression of btrfs device replace which I was also going to test. But the v4 patch isn't applying. Try cd source_dir git am patchfile if you didn't (e.g. you used patch command directly). I'd kinda prefer to build an rpm since I need to test it on baremetal for this bug, and a VM for the device replace bug. Sorry, I don't know anything about fedora's way of kernel patching. Either way, it seems the problem is simple to solve: git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git checkout v3.16 git am /path/to/my_patch_file git diff HEAD^..HEAD /tmp/diff The resulting patch file [1] /tmp/diff then applies cleanly with patch -p1 -F1 -s https://friendpaste.com/Bgwdjk31P3pZHtArr341G Chris Murphy -- Filipe David Manana, Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] btrfs send/receive, page allocation failure
On Aug 11, 2014, at 10:04 AM, Filipe David Manana fdman...@gmail.com wrote: On Mon, Aug 11, 2014 at 4:55 PM, Chris Murphy li...@colorremedies.com wrote: On Aug 11, 2014, at 2:46 AM, Filipe David Manana fdman...@gmail.com wrote: On Mon, Aug 11, 2014 at 4:07 AM, Chris Murphy li...@colorremedies.com wrote: Can you try the following patch and confirm if it helps? https://patchwork.kernel.org/patch/4705171/ This one applies without problems, I didn't build it because I saw v4. The v4 patch I get: + patch -p1 -F1 -s 4 out of 4 hunks FAILED -- saving rejects to file fs/btrfs/send.c.rej Should v4 alone be applied over 3.16.0? Or each version in succession? Alone. How did you try to apply it to 3.16? rpmbuild https://fedoraproject.org/wiki/Building_a_custom_kernel Gist is, save the patch, create patch filename entry into kernel.spec, then run rpmbuild. rpmbuild uses 'patch -p1 -F1 -s' to apply the patch. The v1 patch applies, as does Liu Bo's patch from July 29 Btrfs: fix regression of btrfs device replace which I was also going to test. But the v4 patch isn't applying. Try cd source_dir git am patchfile if you didn't (e.g. you used patch command directly). I'd kinda prefer to build an rpm since I need to test it on baremetal for this bug, and a VM for the device replace bug. Sorry, I don't know anything about fedora's way of kernel patching. Either way, it seems the problem is simple to solve: git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git checkout v3.16 git am /path/to/my_patch_file git diff HEAD^..HEAD /tmp/diff The resulting patch file [1] /tmp/diff then applies cleanly with patch -p1 -F1 -s https://friendpaste.com/Bgwdjk31P3pZHtArr341G OK that friendpaste is completely different than the [PATCH v4] email. # from above URL diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 6528aa6..95891c0 100644 # from [PATCH v4] email diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 3c63b29..b29fc5c 100644 The lines numbers are all completely different also. I'll try the patch from the above URL. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] btrfs send/receive, page allocation failure
On 2014-08-11 10:25, Chris Murphy wrote: On Aug 11, 2014, at 10:04 AM, Filipe David Manana fdman...@gmail.com wrote: On Mon, Aug 11, 2014 at 4:55 PM, Chris Murphy li...@colorremedies.com wrote: On Aug 11, 2014, at 2:46 AM, Filipe David Manana fdman...@gmail.com wrote: On Mon, Aug 11, 2014 at 4:07 AM, Chris Murphy li...@colorremedies.com wrote: Can you try the following patch and confirm if it helps? https://patchwork.kernel.org/patch/4705171/ This one applies without problems, I didn't build it because I saw v4. The v4 patch I get: + patch -p1 -F1 -s 4 out of 4 hunks FAILED -- saving rejects to file fs/btrfs/send.c.rej Should v4 alone be applied over 3.16.0? Or each version in succession? Alone. How did you try to apply it to 3.16? rpmbuild https://fedoraproject.org/wiki/Building_a_custom_kernel Gist is, save the patch, create patch filename entry into kernel.spec, then run rpmbuild. rpmbuild uses 'patch -p1 -F1 -s' to apply the patch. The v1 patch applies, as does Liu Bo's patch from July 29 Btrfs: fix regression of btrfs device replace which I was also going to test. But the v4 patch isn't applying. Try cd source_dir git am patchfile if you didn't (e.g. you used patch command directly). I'd kinda prefer to build an rpm since I need to test it on baremetal for this bug, and a VM for the device replace bug. Sorry, I don't know anything about fedora's way of kernel patching. Either way, it seems the problem is simple to solve: git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git checkout v3.16 git am /path/to/my_patch_file git diff HEAD^..HEAD /tmp/diff The resulting patch file [1] /tmp/diff then applies cleanly with patch -p1 -F1 -s https://friendpaste.com/Bgwdjk31P3pZHtArr341G OK that friendpaste is completely different than the [PATCH v4] email. # from above URL diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 6528aa6..95891c0 100644 # from [PATCH v4] email diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 3c63b29..b29fc5c 100644 The lines numbers are all completely different also. I'll try the patch from the above URL. The above friendpaste URL patch has applied, and I'm now building. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] btrfs-progs: random fixes of btrfs-filesystem documentation
On 8/11/14, 2:11 AM, Satoru Takeuchi wrote: From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com - Simplify and unify the description of both man and usage. - Fix to show -m and -d is not exclusive with path|uuid|device|label. - Add the description about short options for --mounted and --all-devices, -m and -d respectively. - Move the descriptions of options to Options section. Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- Documentation/btrfs-filesystem.txt | 22 ++ cmds-filesystem.c | 15 ++- 2 files changed, 24 insertions(+), 13 deletions(-) diff --git a/Documentation/btrfs-filesystem.txt b/Documentation/btrfs-filesystem.txt index c9c0b00..fe68496 100644 --- a/Documentation/btrfs-filesystem.txt +++ b/Documentation/btrfs-filesystem.txt @@ -20,15 +20,21 @@ SUBCOMMAND *df* path [path...]:: Show space usage information for a mount point. -*show* [--mounted|--all-devices|path|uuid|device|label]:: -Show the btrfs filesystem with some additional info. +*show* [-d|-m] [path|uuid|device|label]:: +Show the structure of btrfs filesystem(s). + -If no option nor path|uuid|device|label is passed, btrfs shows -information of all the btrfs filesystem both mounted and unmounted. -If '--mounted' is passed, it would probe btrfs kernel to list mounted btrfs -filesystem(s); -If '--all-devices' is passed, all the devices under /dev are scanned; -otherwise the devices list is extracted from the /proc/partitions file. +If none of 'path|uuid|device|label' is passed, btrfs shows +information of all the btrfs filesystems both mounted and unmounted. that doesn't seem quite correct; # btrfs filesystem show -m does not specify 'path|uuid|device|label' but it only shows mounted filesystems, not all filesystems. As I understand it, the -d and -m options control how the command finds devices; the 'path|uuid|device|label' argument is used as a filter for what is found. ++ +The show command finds btrfs filesystems by scanning all the devices +in /proc/partitions by default. I think I would document it something like this: show [-m|-d] [path|uuid|device|label] Show the structure of btrfs filesystem(s). By default, the show command scans all devices found in /proc/partitions. If [-d|--all-devices] is specified, all devices found under /dev are scanned. If [-m|--mounted] is specified, only mounted (btrfs?) devices are scanned. By default, the structure of all discovered filesystems is shown. If any one of [path|uuid|device|label] is specified, only filesystems matching that identifier are shown. (What seems to be missing, though, is why would the user ever choose to use '-d?') -Eric -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] btrfs-progs: random fixes of btrfs-filesystem documentation
On 8/11/14, 10:05 AM, Eric Sandeen wrote: On 8/11/14, 2:11 AM, Satoru Takeuchi wrote: From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com - Simplify and unify the description of both man and usage. - Fix to show -m and -d is not exclusive with path|uuid|device|label. - Add the description about short options for --mounted and --all-devices, -m and -d respectively. - Move the descriptions of options to Options section. Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- Documentation/btrfs-filesystem.txt | 22 ++ cmds-filesystem.c | 15 ++- 2 files changed, 24 insertions(+), 13 deletions(-) diff --git a/Documentation/btrfs-filesystem.txt b/Documentation/btrfs-filesystem.txt index c9c0b00..fe68496 100644 --- a/Documentation/btrfs-filesystem.txt +++ b/Documentation/btrfs-filesystem.txt @@ -20,15 +20,21 @@ SUBCOMMAND *df* path [path...]:: Show space usage information for a mount point. -*show* [--mounted|--all-devices|path|uuid|device|label]:: -Show the btrfs filesystem with some additional info. +*show* [-d|-m] [path|uuid|device|label]:: +Show the structure of btrfs filesystem(s). + -If no option nor path|uuid|device|label is passed, btrfs shows -information of all the btrfs filesystem both mounted and unmounted. -If '--mounted' is passed, it would probe btrfs kernel to list mounted btrfs -filesystem(s); -If '--all-devices' is passed, all the devices under /dev are scanned; -otherwise the devices list is extracted from the /proc/partitions file. +If none of 'path|uuid|device|label' is passed, btrfs shows +information of all the btrfs filesystems both mounted and unmounted. that doesn't seem quite correct; # btrfs filesystem show -m does not specify 'path|uuid|device|label' but it only shows mounted filesystems, not all filesystems. As I understand it, the -d and -m options control how the command finds devices; the 'path|uuid|device|label' argument is used as a filter for what is found. ++ +The show command finds btrfs filesystems by scanning all the devices +in /proc/partitions by default. I think I would document it something like this: show [-m|-d] [path|uuid|device|label] Show the structure of btrfs filesystem(s). By default, the show command scans all devices found in /proc/partitions. If [-d|--all-devices] is specified, all devices found under /dev are scanned. If [-m|--mounted] is specified, only mounted (btrfs?) devices are scanned. By default, the structure of all discovered filesystems is shown. If any one of [path|uuid|device|label] is specified, only filesystems matching that identifier are shown. (What seems to be missing, though, is why would the user ever choose to use '-d?') Incidentally, there is some strange behavior here when looking for multiple filesystems which match. Make 2 filesystems w/ the same label: [root@bp-05 tmp]# btrfs filesystem label /dev/sdc1 testlabel2 [root@bp-05 tmp]# btrfs filesystem label /dev/sdc5 testlabel2 Show matching filesytems: [root@bp-05 tmp]# btrfs filesystem show testlabel2 Label: 'testlabel2' uuid: 8c6ec835-5628-439b-9749-d92f62573ce8 Total devices 1 FS bytes used 112.00KiB devid1 size 30.00GiB used 2.04GiB path /dev/sdc5 Label: 'testlabel2' uuid: a43cd507-02a2-46d2-a754-322cb7bdc346 Total devices 1 FS bytes used 384.00KiB devid1 size 30.00GiB used 2.04GiB path /dev/sdc1 Btrfs v3.14.2 That works fine, but if one is mounted: [root@bp-05 tmp]# mount /dev/sdc1 /mnt/test only the mounted filesystem is shown: [root@bp-05 tmp]# btrfs filesystem show testlabel2 Label: 'testlabel2' uuid: a43cd507-02a2-46d2-a754-322cb7bdc346 Total devices 1 FS bytes used 384.00KiB devid1 size 30.00GiB used 2.04GiB path /dev/sdc1 Btrfs v3.14.2 That's unexpected. Mount the other fs, and both are shown again: [root@bp-05 tmp]# mount /dev/sdc5 /mnt/scratch [root@bp-05 tmp]# btrfs filesystem show testlabel2 Label: 'testlabel2' uuid: a43cd507-02a2-46d2-a754-322cb7bdc346 Total devices 1 FS bytes used 384.00KiB devid1 size 30.00GiB used 2.04GiB path /dev/sdc1 Label: 'testlabel2' uuid: 8c6ec835-5628-439b-9749-d92f62573ce8 Total devices 1 FS bytes used 384.00KiB devid1 size 30.00GiB used 2.04GiB path /dev/sdc5 Btrfs v3.14.2 -Eric -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Announcement: buttersink - like rsync for btrfs snapshots
I've written a utility to help me with using btrfs send and receive for backups or other synchronization, and I'd love to get feedback on it. As of this release, buttersink will synchronize a set of read-only snapshots in a btrfs filesystem to an Amazon S3 bucket, and vice-versa. It intelligently picks parent snapshots to diff from, so that a minimal amount of data needs to be sent over the wire and stored in the backend. The utility is on PyPi as buttersink, and the GitHub page is here: https://github.com/AmesCornish/buttersink. Thanks in advance for any feedback! - Ames -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcement: buttersink - like rsync for btrfs snapshots
How has it been for reliability? I wrote a btrsync app a while back, and the app /itself/ worked fine, but the btrfs send / btrfs receive itself proved problematic. Since btrfs would keep a partial receive - with no easy way to tell whether a receive WAS partial or full - I would inevitably end up with interrupted sends causing a problem that couldn't be resolved without manually deleting snapshots on the target end haphazardly until I nailed the incomplete one. On 08/11/2014 01:49 PM, Ames Cornish wrote: I've written a utility to help me with using btrfs send and receive for backups or other synchronization, and I'd love to get feedback on it. As of this release, buttersink will synchronize a set of read-only snapshots in a btrfs filesystem to an Amazon S3 bucket, and vice-versa. It intelligently picks parent snapshots to diff from, so that a minimal amount of data needs to be sent over the wire and stored in the backend. The utility is on PyPi as buttersink, and the GitHub page is here: https://github.com/AmesCornish/buttersink. Thanks in advance for any feedback! - Ames -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcement: buttersink - like rsync for btrfs snapshots
To any core btrfs devs who are listening and care - the unreliability of btrfs send/receive is IMO the single biggest roadblock to adoption of btrfs as a serious next-gen FS. I can live with occasional corner-case performance issues, I can even live with (very) occasional filesystem corruption... IF I can rely on replication to keep my data safe on another box. Without the replication, there's just no reasonable case to be made to replace ZFS. On 08/11/2014 02:05 PM, Ames Cornish wrote: Jim, btrfs send reliability has been an issue, though I've been able to successfully use it for my backups. buttersink usually detects the errors and will either move the destination snapshot to mark it as partial/failed (for btrfs), or cancel and delete the partial upload (for S3). I've also found that it helps to wait a while (e.g. 30 seconds) after any volume deletes before trying the send/sync. I hope btrfs-progs will get more reliable, too. - Ames -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcement: buttersink - like rsync for btrfs snapshots
Jim, btrfs send reliability has been an issue, though I've been able to successfully use it for my backups. buttersink usually detects the errors and will either move the destination snapshot to mark it as partial/failed (for btrfs), or cancel and delete the partial upload (for S3). I've also found that it helps to wait a while (e.g. 30 seconds) after any volume deletes before trying the send/sync. I hope btrfs-progs will get more reliable, too. - Ames -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Large files, nodatacow and fragmentation
I've been playing with btrfs as a backing store for my KVM images. I've used 'chattr +C' on the directory where those images are stored. You can see my recipe below [1]. I've read the gotchas found here [2] I'm having continuing performance issues inside the Guest VM that is created inside the btrfs subvolume, using a qcow2 format. I'm having a hard time determining whether the issues are related to KVM or btrfs, or if this is even a reasonable topic of discussion. I've seen the comments on this list saying that if I want a COW filesystem with sparse files, that I'd be better off with ZFS. I'd like to use an in-tree COW filesystem, but if it's just not gonna happen yet with btrfs, I guess that's just the way it is. That being said, how would I determine what the root issue is? Specifically, the qcow2 file in question seems to have increasing fragmentation, even with the No_COW attr. [1] $ mkfs.btrfs -m raid10 -d raid10 /dev/sda /dev/sdb /dev/sdc /dev/sdd $ mount /dev/sda /mnt $ cd /mnt $ btrfs create subvolume __data $ btrfs create subvolume __data/libvirt $ cd / $ umount /mnt $ mount /dev/sda /var/lib/libvirt $ chattr +C /var/lib/libvirt/images $ cp /run/media/rbellamy/433acf1d-a1a4-4596-a6a7-005e643b24e0/libvirt/images/atlas.qcow2 /var/lib/libvirt/images/ $ filefrag /var/lib/libvirt/images/atlas.qcow2 /var/lib/libvirt/images/atlas.qcow2: 0 extents found [START UP THE VM - DO SOME THINGS] $ filefrag /var/lib/libvirt/images/atlas.qcow2 /var/lib/libvirt/images/atlas.qcow2: 12236 extents found [START UP THE VM - DO SOME THINGS] $ filefrag /var/lib/libvirt/images/atlas.qcow2 /var/lib/libvirt/images/atlas.qcow2: 34988 extents found [2] https://btrfs.wiki.kernel.org/index.php/Gotchas -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Large files, nodatacow and fragmentation
On Mon, 11 Aug 2014 11:36:46 -0700 G. Richard Bellamy rbell...@pteradigm.com wrote: I've been playing with btrfs as a backing store for my KVM images. I've used 'chattr +C' on the directory where those images are stored. You can see my recipe below [1]. I've read the gotchas found here [2] I'm having continuing performance issues inside the Guest VM that is created inside the btrfs subvolume, using a qcow2 format. I'm having a hard time determining whether the issues are related to KVM or btrfs, or if this is even a reasonable topic of discussion. I've seen the comments on this list saying that if I want a COW filesystem with sparse files, that I'd be better off with ZFS. I'd like to use an in-tree COW filesystem, but if it's just not gonna happen yet with btrfs, I guess that's just the way it is. That being said, how would I determine what the root issue is? Specifically, the qcow2 file in question seems to have increasing fragmentation, even with the No_COW attr. First of all, why do you require a COW filesystem in the first place... if all you do is just use it in a NoCOW mode? Second, why qcow2? It can also have internal fragmentation which is unlikely to do anything good for performance. Try RAW format images; to reduce the space requirements, with the latest Qemu/KVM you can pass-through TRIM command from inside the VM guest (at least in the IDE controller mode) so that the backing filesystem will unmap areas that are no longer in use inside the VM, in effect re-sparsifying the image. This is VERY nifty. But yeah this can cause some fragmentation even with NoCOW. In my personal use case NoCOW is only utilized partly, because all subvolumes with running VMs are being snapshotted about every 30 minutes, and those snapshots are kept for two weeks. The performance is passable; at least when using KVM's cache=writeback mode (or less safer ones). -- With respect, Roman signature.asc Description: PGP signature
Re: Ideas for a feature implementation
On Aug 10, 2014, at 8:53 PM, Austin S Hemmelgarn ahferro...@gmail.com wrote: Another thing that isn't listed there, that I would personally love to see is support for secure file deletion. To be truly secure though, this would need to hook into the COW logic so that files marked for secure deletion can't be reflinked (maybe make the automatically NOCOW instead, and don't allow snapshots?), and when they get written to, the blocks that get COW'ed have the old block overwritten. If the file is reflinked or snapshot, then it can it be secure deleted? Because what does it mean to secure delete a file when there's a completely independent file pointing to the same physical blocks? What if someone else owns that independent file? Does the reflink copy get rm'd as well? Or does the file remain, but its blocks are zero'd/corrupted? For SSDs, whether it's an overwrite or an FITRIM ioctl it's an open question when the data is actually irretrievable. It may be seconds, but could be much longer (hours?) so I'm not sure if it's useful. On HDD's using SMR it's not necessarily a given an overwrite will work there either. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix compressed write corruption on enospc
On 08/10/2014 10:55 AM, Liu Bo wrote: On Thu, Aug 07, 2014 at 10:02:15AM -0400, Chris Mason wrote: On 08/07/2014 04:20 AM, Miao Xie wrote: On Thu, 7 Aug 2014 15:50:30 +0800, Liu Bo wrote: [90496.156016] kworker/u8:14 D 880044e38540 0 21050 2 0x [90496.157683] Workqueue: btrfs-delalloc normal_work_helper [btrfs] [90496.159320] 88022880f990 0002 880407f649b0 88022880ffd8 [90496.160997] 880044e38000 00013040 880044e38000 7fff [90496.162686] 880301383aa0 0002 814705d0 880301383a98 [90496.164360] Call Trace: [90496.166028] [814705d0] ? michael_mic.part.6+0x21/0x21 [90496.167854] [81470fd0] schedule+0x64/0x66 [90496.169574] [814705ff] schedule_timeout+0x2f/0x114 [90496.171221] [8106479a] ? wake_up_process+0x2f/0x32 [90496.172867] [81062c3b] ? get_parent_ip+0xd/0x3c [90496.174472] [81062ce5] ? preempt_count_add+0x7b/0x8e [90496.176053] [814717f3] __wait_for_common+0x11e/0x163 [90496.177619] [814717f3] ? __wait_for_common+0x11e/0x163 [90496.179173] [810647aa] ? wake_up_state+0xd/0xd [90496.180728] [81471857] wait_for_completion+0x1f/0x21 [90496.182285] [c044e3b3] btrfs_async_run_delayed_refs+0xbf/0xd9 [btrfs] [90496.183833] [c04624e1] __btrfs_end_transaction+0x2b6/0x2ec [btrfs] [90496.185380] [c0462522] btrfs_end_transaction+0xb/0xd [btrfs] [90496.186940] [c0451742] find_free_extent+0x8a9/0x976 [btrfs] [90496.189464] [c0451990] btrfs_reserve_extent+0x6f/0x119 [btrfs] [90496.191326] [c0466b45] cow_file_range+0x1a6/0x377 [btrfs] [90496.193080] [c047adc4] ? extent_write_locked_range+0x10c/0x11e [btrfs] [90496.194659] [c04677e4] submit_compressed_extents+0x100/0x412 [btrfs] [90496.196225] [8120e344] ? debug_smp_processor_id+0x17/0x19 [90496.197776] [c0467b78] async_cow_submit+0x82/0x87 [btrfs] [90496.199383] [c048644b] normal_work_helper+0x153/0x224 [btrfs] [90496.200944] [81052d8c] process_one_work+0x16f/0x2b8 [90496.202483] [81053636] worker_thread+0x27b/0x32e [90496.204000] [810533bb] ? cancel_delayed_work_sync+0x10/0x10 [90496.205514] [81058012] kthread+0xb2/0xba [90496.207040] [8147] ? ap_handle_dropped_data+0xf/0xc8 [90496.208565] [81057f60] ? __kthread_parkme+0x62/0x62 [90496.210096] [81473f6c] ret_from_fork+0x7c/0xb0 [90496.211618] [81057f60] ? __kthread_parkme+0x62/0x62 Ok, this should explain the hang. submit_compressed_extents is calling cow_file_range with a locked page. cow_file_range is trying to find a free extent and in the process is calling btrfs_end_transaction, which is running the async delayed refs, which is trying to write dirty pages, which is waiting for your locked page. I should be able to reproduce this ;) This part of the trace is relatively new because Liu Bo's patch made us redirty the pages, making it more likely that we'd try to write them during commit. But, at the end of the day we have a fundamental deadlock with committing a transaction while holding a locked page from an ordered file. For now, I'm ripping out the strict ordered file and going back to a best-effort filemap_flush like ext4 is using. I think I've figured the deadlock out, this is obviously a race case, really hard to reproduce :-( So it turns out to be related to workqueues -- now a kthread can process work_struct queued in different workqueues, so we can explain the deadlock as such, (1) btrfs-delalloc workqueue gets a compressed extent to process with all its pages locked during this, and it runs into read free space cache inode, and then wait on lock_page(). (2) Reading that free space cache inode comes to submit part, and we have a indirect twice endio way for it, with the first endio we come to end_workqueue_bio() and queue a work in btrfs-endio-meta workqueue, and it will run the real endio() for us, but ONLY when it's processed. So the problem is a kthread can serve several workqueues, which means works in btrfs-endio-meta workqueues and works in btrfs-flush_delalloc workqueues can be in the same processing list of a kthread. When btrfs-flush_delalloc waits for the compressed page and btrfs-endio-meta comes after it, it hangs. I don't think it is right. All the btrfs workqueue has RECLAIM flag, which means each btrfs workqueue has its own rescue worker. So the problem you said should not happen. Right, I traded some emails with Tejun about this and spent a few days trying to prove the workqueues were doing the wrong thing. It will end up spawning another worker thread for the new work, and it won't get queued up behind the existing thread. If both work items went to the same workqueue, you'd definitely be right.
Re: [BUG] btrfs send/receive, page allocation failure
On Aug 11, 2014, at 10:41 AM, li...@colorremedies.com wrote: https://friendpaste.com/Bgwdjk31P3pZHtArr341G OK that friendpaste is completely different than the [PATCH v4] email. # from above URL diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 6528aa6..95891c0 100644 I can confirm that the above patch fixes the reported bug. I couldn't get this one to apply so it's not tested: http://www.spinics.net/lists/linux-btrfs/msg36556.html Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] btrfs send/receive, page allocation failure
On Mon, Aug 11, 2014 at 9:30 PM, Chris Murphy li...@colorremedies.com wrote: On Aug 11, 2014, at 10:41 AM, li...@colorremedies.com wrote: https://friendpaste.com/Bgwdjk31P3pZHtArr341G OK that friendpaste is completely different than the [PATCH v4] email. # from above URL diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 6528aa6..95891c0 100644 I can confirm that the above patch fixes the reported bug. I couldn't get this one to apply so it's not tested: http://www.spinics.net/lists/linux-btrfs/msg36556.html It's exactly the same as the diff you tried. The git am command is able to deal with any fuzz while the patch command can't (or not always at least). That patch is based on the integration branch, while the diff I pasted (and showed you to generate it) is for the v3.16 tag from linus' repository. Thanks for testing and reporting back. Chris Murphy -- Filipe David Manana, Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 40TB volume taking over 16 hours to mount, any ideas?
As I hate when a thread is left hanging, you deserve to know what happened in the end, you likely already guessed, but anyway: I nuked the filesystem, and started over. After some internal discussion in the company, we decided to move to ZFS for now. However, we will keep an eye on btrfs, and will likely deploy it to some smaller system for further testing. Thanks you all for your help! Sincerely, Ildefonso -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Large files, nodatacow and fragmentation
On Mon, Aug 11, 2014 at 12:14 PM, Roman Mamedov r...@romanrm.net wrote: First of all, why do you require a COW filesystem in the first place... if all you do is just use it in a NoCOW mode? Second, why qcow2? It can also have internal fragmentation which is unlikely to do anything good for performance. Both great questions. I'm experimenting with btrfs, and the various permutations of btrfs with KVM. So, why btrfs vs lvm or ext4: 1. Because nocow isn't all I'm doing with that filesystem. 2. I like the way btrfs subvolumes work, vs lvm. I can have the nocow files in one subvolume, and still get great snapshot performance out of others. 3. I get the performance of a raid10 without the lvm management overhead. Online rebalancing. Easy online resizing. 4. And frankly, I just kinda want to make it work. Try RAW format images; to reduce the space requirements, with the latest Qemu/KVM you can pass-through TRIM command from inside the VM guest (at least in the IDE controller mode) so that the backing filesystem will unmap areas that are no longer in use inside the VM, in effect re-sparsifying the image. This is VERY nifty. But yeah this can cause some fragmentation even with NoCOW. In my personal use case NoCOW is only utilized partly, because all subvolumes with running VMs are being snapshotted about every 30 minutes, and those snapshots are kept for two weeks. The performance is passable; at least when using KVM's cache=writeback mode (or less safer ones). I've done my reading of qcow2 vs raw and that indicated that while there is better performance using raw, it's not significant enough to bypass the ability to take a qemu snapshot. I've not done the analysis myself, so I could be reading things wrong. There's a great thread, Are nocow files snapshot-aware? [1]. My take from that reading is that doing a btrfs snapshot of a nocow file seems like it's reasonable on a semi-regular basis, but DON'T do it every 30 seconds. Also that whole thread is predicated on the idea that your nocow files are themselves managed by a process/system that can read and write to them atomically, thus I decided against using the raw format. -- With respect, Roman Thanks Roman. But really we haven't addressed my original question, which is - how would I determine the root cause of the fragmentation in this nocow file on top of a btrfs subvolume? [1] http://www.spinics.net/lists/linux-btrfs/msg31341.html Kind Regards, Richard -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix regression of btrfs device replace
On Jul 29, 2014, at 5:09 AM, Liu Bo bo.li@oracle.com wrote: Commit 49c6f736f34f901117c20960ebd7d5e60f12fcac( btrfs: dev replace should replace the sysfs entry) added the missing sysfs entry in the process of device replace, but didn't take missing devices into account, so now we have BUG: unable to handle kernel NULL pointer dereference at 0088 IP: [a0268551] btrfs_kobj_rm_device+0x21/0x40 [btrfs] ... To reproduce it, 1. mkfs.btrfs -f disk1 disk2 2. mkfs.ext4 disk1 3. mount disk2 /mnt -odegraded 4. btrfs replace start -B 1 disk3 /mnt -- This fixes the problem. Reported-by: Chris Murphy li...@colorremedies.com Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/sysfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 7869936..12e5355 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -614,7 +614,7 @@ int btrfs_kobj_rm_device(struct btrfs_fs_info *fs_info, if (!fs_info-device_dir_kobj) return -EINVAL; - if (one_device) { + if (one_device one_device-bdev) { disk = one_device-bdev-bd_part; disk_kobj = part_to_dev(disk)-kobj; Applied to 3.16.0 and tested, problem is fixed. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Large files, nodatacow and fragmentation
On Aug 11, 2014, at 1:14 PM, Roman Mamedov r...@romanrm.net wrote: Second, why qcow2? It can also have internal fragmentation which is unlikely to do anything good for performance. It really depends on what version of libvirt and qemu-image you've got. I did some testing during Fedora 20 prior to release, and the best results for my configuration (a laptop with an HDD at that time): btrfs host, +C qcow2, btrfs guest, both 16KB leaf size, and the drive pointing to the qcow2 file with cache policy set to unsafe. And even when obliterating the VM while writing data, I never lost the guest Btrfs file system. Not that I recommend it, the cache policy is unsafe after all. I did lose some data but it was limited to commit time. We're not talking huge differences, the metric I was using was installing Fedora 20 based on installer log start/stop time for doing the unattended portion of the install. It also matters somewhat to pre-allocate metadata when creating the qcow2 file. I also tested XFS on XFS, ext4 on ext4, also in qcow2. And also on raw images. And also on LV's. I'd think the LV would have been faster since it completely eliminates one of the file systems (there is no host fs). Anyway, what I determined was the only way to know is to actually test your workload, or a good approximation of it, with various configurations. And another test is LVM thinp LV's once libvirt has support for using them (which may already have happened, I haven't revisted this since Oct 2013 testing), because those snapshots should be as usable as Btrfs snapshots, unlike conventional LVM snapshots which are slow and need explicit preallocation. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] btrfs-progs: random fixes of btrfs-filesystem documentation
Hi Eric, (2014/08/12 2:05), Eric Sandeen wrote: On 8/11/14, 2:11 AM, Satoru Takeuchi wrote: From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com - Simplify and unify the description of both man and usage. - Fix to show -m and -d is not exclusive with path|uuid|device|label. - Add the description about short options for --mounted and --all-devices, -m and -d respectively. - Move the descriptions of options to Options section. Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- Documentation/btrfs-filesystem.txt | 22 ++ cmds-filesystem.c | 15 ++- 2 files changed, 24 insertions(+), 13 deletions(-) diff --git a/Documentation/btrfs-filesystem.txt b/Documentation/btrfs-filesystem.txt index c9c0b00..fe68496 100644 --- a/Documentation/btrfs-filesystem.txt +++ b/Documentation/btrfs-filesystem.txt @@ -20,15 +20,21 @@ SUBCOMMAND *df* path [path...]:: Show space usage information for a mount point. -*show* [--mounted|--all-devices|path|uuid|device|label]:: -Show the btrfs filesystem with some additional info. +*show* [-d|-m] [path|uuid|device|label]:: +Show the structure of btrfs filesystem(s). + -If no option nor path|uuid|device|label is passed, btrfs shows -information of all the btrfs filesystem both mounted and unmounted. -If '--mounted' is passed, it would probe btrfs kernel to list mounted btrfs -filesystem(s); -If '--all-devices' is passed, all the devices under /dev are scanned; -otherwise the devices list is extracted from the /proc/partitions file. +If none of 'path|uuid|device|label' is passed, btrfs shows +information of all the btrfs filesystems both mounted and unmounted. that doesn't seem quite correct; # btrfs filesystem show -m does not specify 'path|uuid|device|label' but it only shows mounted filesystems, not all filesystems. Oh, I forgot to add [ and ]. As I understand it, the -d and -m options control how the command finds devices; the 'path|uuid|device|label' argument is used as a filter for what is found. Yes, my understanding is so too. ++ +The show command finds btrfs filesystems by scanning all the devices +in /proc/partitions by default. I think I would document it something like this: show [-m|-d] [path|uuid|device|label] Show the structure of btrfs filesystem(s). By default, the show command scans all devices found in /proc/partitions. If [-d|--all-devices] is specified, all devices found under /dev are scanned. If [-m|--mounted] is specified, only mounted (btrfs?) devices are scanned. By default, the structure of all discovered filesystems is shown. If any one of [path|uuid|device|label] is specified, only filesystems matching that identifier are shown. OK, I'll fix my patch based on your comment. # Of course, I'll replace (btrfs?) with something proper words. Can I add your Signed-off-by to my v2 patch? (What seems to be missing, though, is why would the user ever choose to use '-d?') I'm not sure. I guess, for example, in large systems, -d takes too many times for scanning all devices under /dev or something? Thank you for your comments! Satoru -Eric -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] btrfs-progs: random fixes of btrfs-filesystem documentation
Hi Eric, (2014/08/12 2:14), Eric Sandeen wrote: On 8/11/14, 10:05 AM, Eric Sandeen wrote: On 8/11/14, 2:11 AM, Satoru Takeuchi wrote: From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com - Simplify and unify the description of both man and usage. - Fix to show -m and -d is not exclusive with path|uuid|device|label. - Add the description about short options for --mounted and --all-devices, -m and -d respectively. - Move the descriptions of options to Options section. Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- Documentation/btrfs-filesystem.txt | 22 ++ cmds-filesystem.c | 15 ++- 2 files changed, 24 insertions(+), 13 deletions(-) diff --git a/Documentation/btrfs-filesystem.txt b/Documentation/btrfs-filesystem.txt index c9c0b00..fe68496 100644 --- a/Documentation/btrfs-filesystem.txt +++ b/Documentation/btrfs-filesystem.txt @@ -20,15 +20,21 @@ SUBCOMMAND *df* path [path...]:: Show space usage information for a mount point. -*show* [--mounted|--all-devices|path|uuid|device|label]:: -Show the btrfs filesystem with some additional info. +*show* [-d|-m] [path|uuid|device|label]:: +Show the structure of btrfs filesystem(s). + -If no option nor path|uuid|device|label is passed, btrfs shows -information of all the btrfs filesystem both mounted and unmounted. -If '--mounted' is passed, it would probe btrfs kernel to list mounted btrfs -filesystem(s); -If '--all-devices' is passed, all the devices under /dev are scanned; -otherwise the devices list is extracted from the /proc/partitions file. +If none of 'path|uuid|device|label' is passed, btrfs shows +information of all the btrfs filesystems both mounted and unmounted. that doesn't seem quite correct; # btrfs filesystem show -m does not specify 'path|uuid|device|label' but it only shows mounted filesystems, not all filesystems. As I understand it, the -d and -m options control how the command finds devices; the 'path|uuid|device|label' argument is used as a filter for what is found. ++ +The show command finds btrfs filesystems by scanning all the devices +in /proc/partitions by default. I think I would document it something like this: show [-m|-d] [path|uuid|device|label] Show the structure of btrfs filesystem(s). By default, the show command scans all devices found in /proc/partitions. If [-d|--all-devices] is specified, all devices found under /dev are scanned. If [-m|--mounted] is specified, only mounted (btrfs?) devices are scanned. By default, the structure of all discovered filesystems is shown. If any one of [path|uuid|device|label] is specified, only filesystems matching that identifier are shown. (What seems to be missing, though, is why would the user ever choose to use '-d?') Incidentally, there is some strange behavior here when looking for multiple filesystems which match. Make 2 filesystems w/ the same label: [root@bp-05 tmp]# btrfs filesystem label /dev/sdc1 testlabel2 [root@bp-05 tmp]# btrfs filesystem label /dev/sdc5 testlabel2 Show matching filesytems: [root@bp-05 tmp]# btrfs filesystem show testlabel2 Label: 'testlabel2' uuid: 8c6ec835-5628-439b-9749-d92f62573ce8 Total devices 1 FS bytes used 112.00KiB devid1 size 30.00GiB used 2.04GiB path /dev/sdc5 Label: 'testlabel2' uuid: a43cd507-02a2-46d2-a754-322cb7bdc346 Total devices 1 FS bytes used 384.00KiB devid1 size 30.00GiB used 2.04GiB path /dev/sdc1 Btrfs v3.14.2 That works fine, but if one is mounted: [root@bp-05 tmp]# mount /dev/sdc1 /mnt/test only the mounted filesystem is shown: [root@bp-05 tmp]# btrfs filesystem show testlabel2 Label: 'testlabel2' uuid: a43cd507-02a2-46d2-a754-322cb7bdc346 Total devices 1 FS bytes used 384.00KiB devid1 size 30.00GiB used 2.04GiB path /dev/sdc1 Btrfs v3.14.2 That's unexpected. Mount the other fs, and both are shown again: [root@bp-05 tmp]# mount /dev/sdc5 /mnt/scratch [root@bp-05 tmp]# btrfs filesystem show testlabel2 Label: 'testlabel2' uuid: a43cd507-02a2-46d2-a754-322cb7bdc346 Total devices 1 FS bytes used 384.00KiB devid1 size 30.00GiB used 2.04GiB path /dev/sdc1 Label: 'testlabel2' uuid: 8c6ec835-5628-439b-9749-d92f62573ce8 Total devices 1 FS bytes used 384.00KiB devid1 size 30.00GiB used 2.04GiB path /dev/sdc5 Btrfs v3.14.2 I'll dig into it. Thank you for let me know. Satoru -Eric -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] btrfs-progs: random fixes of btrfs-filesystem documentation
On Aug 11, 2014, at 4:51 PM, Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com wrote: Hi Eric, ... OK, I'll fix my patch based on your comment. # Of course, I'll replace (btrfs?) with something proper words. I assumed it only scans btrfs but didn't know for sure :) Can I add your Signed-off-by to my v2 patch? Oh, sure, if you use my text, that makes sense. thanks, -Eric (What seems to be missing, though, is why would the user ever choose to use '-d?') I'm not sure. I guess, for example, in large systems, -d takes too many times for scanning all devices under /dev or something? Thank you for your comments! Satoru -Eric -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix regression of btrfs device replace
Hi Liu, (2014/08/12 6:41), Chris Murphy wrote: On Jul 29, 2014, at 5:09 AM, Liu Bo bo.li@oracle.com wrote: Commit 49c6f736f34f901117c20960ebd7d5e60f12fcac( btrfs: dev replace should replace the sysfs entry) added the missing sysfs entry in the process of device replace, but didn't take missing devices into account, so now we have BUG: unable to handle kernel NULL pointer dereference at 0088 IP: [a0268551] btrfs_kobj_rm_device+0x21/0x40 [btrfs] ... To reproduce it, 1. mkfs.btrfs -f disk1 disk2 2. mkfs.ext4 disk1 3. mount disk2 /mnt -odegraded 4. btrfs replace start -B 1 disk3 /mnt -- This fixes the problem. Reported-by: Chris Murphy li...@colorremedies.com Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/sysfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 7869936..12e5355 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -614,7 +614,7 @@ int btrfs_kobj_rm_device(struct btrfs_fs_info *fs_info, if (!fs_info-device_dir_kobj) return -EINVAL; - if (one_device) { + if (one_device one_device-bdev) { disk = one_device-bdev-bd_part; disk_kobj = part_to_dev(disk)-kobj; Applied to 3.16.0 and tested, problem is fixed. Chris Murphy Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com Tested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com I confirmed both - This problem happens with 3.16, and - This problem doesn't happen with 3.16 + your patch. Thanks, Satoru -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas for a feature implementation
On 08/11/2014 04:27 PM, Chris Murphy wrote: On Aug 10, 2014, at 8:53 PM, Austin S Hemmelgarn ahferro...@gmail.com wrote: Another thing that isn't listed there, that I would personally love to see is support for secure file deletion. To be truly secure though, this would need to hook into the COW logic so that files marked for secure deletion can't be reflinked (maybe make the automatically NOCOW instead, and don't allow snapshots?), and when they get written to, the blocks that get COW'ed have the old block overwritten. If the file is reflinked or snapshot, then it can it be secure deleted? Because what does it mean to secure delete a file when there's a completely independent file pointing to the same physical blocks? What if someone else owns that independent file? Does the reflink copy get rm'd as well? Or does the file remain, but its blocks are zero'd/corrupted? The semantics that I would expect would be that the extents can't be reflinked, and when snapshotted the whole file gets COW'ed, and then inherits the secure deletion flag, possibly with another flag saying that the user can't disable the secure deletion flag. For SSDs, whether it's an overwrite or an FITRIM ioctl it's an open question when the data is actually irretrievable. It may be seconds, but could be much longer (hours?) so I'm not sure if it's useful. On HDD's using SMR it's not necessarily a given an overwrite will work there either. By secure deletion, I don't mean make the data absolutely unrecoverable by any means, I mean make it functionally impractical for someone without low-level access to and/or extensive knowledge of the hardware to recover the data; that is, more secure than simply unlinking the file, but obviously less than (for example) the application of thermite to the disk platters. I'm talking the rough equivalent of wiping the data from RAM. Anyone who is truly security minded should be using whole disk encryption anyway, but even then you have the data accessible from the running OS. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
The blocked tasks issue that got significantly worse in 3.15 -- did anything go into 3.16 related to this? I didn't see a single btrfs in Linus' 3.16 announcement, so I don't know whether it should be better, the same, or worse in this respect... I haven't seen a definite statement about this on this list, either. Can someone more familiar with the state of development comment on this? Charles -- --- Charles Cazabon GPL'ed software available at: http://pyropus.ca/software/ --- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix compressed write corruption on enospc
On Sun, 10 Aug 2014 22:55:44 +0800, Liu Bo wrote: This part of the trace is relatively new because Liu Bo's patch made us redirty the pages, making it more likely that we'd try to write them during commit. But, at the end of the day we have a fundamental deadlock with committing a transaction while holding a locked page from an ordered file. For now, I'm ripping out the strict ordered file and going back to a best-effort filemap_flush like ext4 is using. I think I've figured the deadlock out, this is obviously a race case, really hard to reproduce :-( So it turns out to be related to workqueues -- now a kthread can process work_struct queued in different workqueues, so we can explain the deadlock as such, (1) btrfs-delalloc workqueue gets a compressed extent to process with all its pages locked during this, and it runs into read free space cache inode, and then wait on lock_page(). (2) Reading that free space cache inode comes to submit part, and we have a indirect twice endio way for it, with the first endio we come to end_workqueue_bio() and queue a work in btrfs-endio-meta workqueue, and it will run the real endio() for us, but ONLY when it's processed. So the problem is a kthread can serve several workqueues, which means works in btrfs-endio-meta workqueues and works in btrfs-flush_delalloc workqueues can be in the same processing list of a kthread. When btrfs-flush_delalloc waits for the compressed page and btrfs-endio-meta comes after it, it hangs. I don't think it is right. All the btrfs workqueue has RECLAIM flag, which means each btrfs workqueue has its own rescue worker. So the problem you said should not happen. Right, I traded some emails with Tejun about this and spent a few days trying to prove the workqueues were doing the wrong thing. It will end up spawning another worker thread for the new work, and it won't get queued up behind the existing thread. If both work items went to the same workqueue, you'd definitely be right. I've got a patch to change the flush-delalloc code so we don't do the file writes during commit. It seems like the only choice right now. Not the only choice any more ;) It turns out to be related to async_cow's ordered list, say we have two async_cow works on the wq-ordered_list, and the first work(named A) finishes its -ordered_func() and -ordered_free(), and the second work(B) starts B's -ordered_func() which gets to read free space cache inode, where it queues a work on @endio_meta_workers, but this work happens to be the same address with A's work. So now the situation is, (1) in kthread's looping worker_thread(), work A is actually running its job, (2) however, work A has freed its memory but kthread still want to use this address of memory, which means worker-current_work is still A's address. (3) B's readahead for free space cache inode happens to queue a work whose address of memory is just the previous address of A's work, which means another worker's -current_work is also A's address. (4) as in btrfs we all use function normal_work_helper(), so worker-current_func is fixed here. (5) worker_thread() -process_one_work() -find_worker_executing_work() (find a collision, another work returns) Then we saw the hang. Here is my understand of what you said: The same worker dealt with work A and work B, and the 3rd work which was introduced by work B and has the same virtual memory address as work A was also inserted into the work list of that worker. But work B was wait for the 3rd work at that time, so deadlock happened. Am I right? If I'm right, I think what you said is impossible. Before we dealt with work B, we should already invoke spin_unlock_irq(pool-lock), which implies a memory barrier that all changes happens before unlock should complete before unlock, that is the address in current_work should be the address of work B, when we inserted the 3rd work which was introduced by work B, we should not find the address of work A in current_work of work B's worker. I can not reproduce the problem on my machine, so I don't verify whether what I said is right or not. Please correct me if I am wrong. Thanks Miao -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
On Mon, Aug 11, 2014 at 08:55:21PM -0600, Charles Cazabon wrote: The blocked tasks issue that got significantly worse in 3.15 -- did anything go into 3.16 related to this? I didn't see a single btrfs in Linus' 3.16 announcement, so I don't know whether it should be better, the same, or worse in this respect... I haven't seen a definite statement about this on this list, either. Can someone more familiar with the state of development comment on this? Good news is that we've figured out the bug and the patch is already under testing :-) thanks, -liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix regression of btrfs device replace
On Tue, Aug 12, 2014 at 11:25:00AM +0900, Satoru Takeuchi wrote: Hi Liu, (2014/08/12 6:41), Chris Murphy wrote: On Jul 29, 2014, at 5:09 AM, Liu Bo bo.li@oracle.com wrote: Commit 49c6f736f34f901117c20960ebd7d5e60f12fcac( btrfs: dev replace should replace the sysfs entry) added the missing sysfs entry in the process of device replace, but didn't take missing devices into account, so now we have BUG: unable to handle kernel NULL pointer dereference at 0088 IP: [a0268551] btrfs_kobj_rm_device+0x21/0x40 [btrfs] ... To reproduce it, 1. mkfs.btrfs -f disk1 disk2 2. mkfs.ext4 disk1 3. mount disk2 /mnt -odegraded 4. btrfs replace start -B 1 disk3 /mnt -- This fixes the problem. Reported-by: Chris Murphy li...@colorremedies.com Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/sysfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 7869936..12e5355 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -614,7 +614,7 @@ int btrfs_kobj_rm_device(struct btrfs_fs_info *fs_info, if (!fs_info-device_dir_kobj) return -EINVAL; - if (one_device) { + if (one_device one_device-bdev) { disk = one_device-bdev-bd_part; disk_kobj = part_to_dev(disk)-kobj; Applied to 3.16.0 and tested, problem is fixed. Chris Murphy Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com Tested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com I confirmed both - This problem happens with 3.16, and - This problem doesn't happen with 3.16 + your patch. Thanks for your testing! -liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas for a feature implementation
On Aug 11, 2014, at 8:27 PM, Austin S Hemmelgarn ahferro...@gmail.com wrote: On 08/11/2014 04:27 PM, Chris Murphy wrote: On Aug 10, 2014, at 8:53 PM, Austin S Hemmelgarn ahferro...@gmail.com wrote: Another thing that isn't listed there, that I would personally love to see is support for secure file deletion. To be truly secure though, this would need to hook into the COW logic so that files marked for secure deletion can't be reflinked (maybe make the automatically NOCOW instead, and don't allow snapshots?), and when they get written to, the blocks that get COW'ed have the old block overwritten. If the file is reflinked or snapshot, then it can it be secure deleted? Because what does it mean to secure delete a file when there's a completely independent file pointing to the same physical blocks? What if someone else owns that independent file? Does the reflink copy get rm'd as well? Or does the file remain, but its blocks are zero'd/corrupted? The semantics that I would expect would be that the extents can't be reflinked, and when snapshotted the whole file gets COW'ed, and then inherits the secure deletion flag, possibly with another flag saying that the user can't disable the secure deletion flag. Ahh OK I was thinking of a secure delete command (or an option to rm that indicates secure delete). You're suggesting one or more flags that makes for secure file handling, not just delete, affecting: a.) copied b.) moved, c.) snapshot/reflinked, d.) deleted. So if deleted, a regular rm would see the xattr and do a secure delete; and the xattr would inhibit or limit the others. While a reflink or normal copy could be inhibited, the snapshot case seems more difficult because it just creates a new tree. It's not scanning the tree for files/folders with xattr, which would have to be done to go retroactively remove the file set with the secure delete flag - could be really slow. And what if the snapshot is made read-only? Strictly secure delete, e.g. rm -s, would be more straightforward than a flag affecting other filesystem operations. For SSDs, whether it's an overwrite or an FITRIM ioctl it's an open question when the data is actually irretrievable. It may be seconds, but could be much longer (hours?) so I'm not sure if it's useful. On HDD's using SMR it's not necessarily a given an overwrite will work there either. By secure deletion, I don't mean make the data absolutely unrecoverable by any means, I mean make it functionally impractical for someone without low-level access to and/or extensive knowledge of the hardware to recover the data; that is, more secure than simply unlinking the file, but obviously less than (for example) the application of thermite to the disk platters. I'm talking the rough equivalent of wiping the data from RAM. Anyone who is truly security minded should be using whole disk encryption anyway, but even then you have the data accessible from the running OS. Seems straightforward for any file system already supporting discard. This even has a useful application for thinly provisioned storage and large files where you'd want the underlying logical layer to free up extents sooner than later - even if you didn't care about the security aspect. But for that matter, on SSDs right now you can rm the file and then fstrim the file system to get the same effect. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 40TB volume taking over 16 hours to mount, any ideas?
Jose Ildefonso Camargo Tolosa posted on Mon, 11 Aug 2014 16:33:36 -0500 as excerpted: As I hate when a thread is left hanging, you deserve to know what happened in the end, you likely already guessed, but anyway: I nuked the filesystem, and started over. After some internal discussion in the company, we decided to move to ZFS for now. However, we will keep an eye on btrfs, and will likely deploy it to some smaller system for further testing. Thanks you all for your help! Thank you too. =:^) Sounds like a sane decision for the time being. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
Liu Bo posted on Tue, 12 Aug 2014 10:56:42 +0800 as excerpted: On Mon, Aug 11, 2014 at 08:55:21PM -0600, Charles Cazabon wrote: The blocked tasks issue that got significantly worse in 3.15 -- did anything go into 3.16 related to this? I didn't see a single btrfs in Linus' 3.16 announcement, so I don't know whether it should be better, the same, or worse in this respect... I haven't seen a definite statement about this on this list, either. Can someone more familiar with the state of development comment on this? Good news is that we've figured out the bug and the patch is already under testing :-) IOW, it's not in 3.16.0, but will hopefully make it into 3.16.2 (it'll likely be a too late for 3.16.1). -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
On Mon, Aug 11, 2014 at 08:55:21PM -0600, Charles Cazabon wrote: The blocked tasks issue that got significantly worse in 3.15 -- did anything go into 3.16 related to this? I didn't see a single btrfs in Linus' 3.16 announcement, so I don't know whether it should be better, the same, or worse in this respect... I haven't seen a definite statement about this on this list, either. Yes, 3.15 is unusable for some workloads, mine included. Go back to 3.14 until there is a patch in 3.16, which there isn't quite as for right now, but very soon hopefully. Note 3.16.0 is actually worse than 3.15 for me. Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
3.15 btrfs free space cache oops
When running MonetDB over a BTRFS RAID-0 set over 4 SSDs [1] on 3.15.5, we see io_ctl have a bad address of 0x20, causing a fatal pagefault in memcpy(): (gdb) list *(__btrfs_write_out_cache+0x3e4) 0x81365984 is in __btrfs_write_out_cache (fs/btrfs/free-space-cache.c:521). 516if (io_ctl-index = io_ctl-num_pages) 517return -ENOSPC; 518io_ctl_map_page(io_ctl, 0); 519} 520 521memcpy(io_ctl-cur, bitmap, PAGE_CACHE_SIZE); 522io_ctl_set_crc(io_ctl, io_ctl-index - 1); 523if (io_ctl-index io_ctl-num_pages) 524io_ctl_map_page(io_ctl, 0); 525return 0; I can try to reproduce it if more data is useful? Thanks, Daniel -- [1] mkfs.btrfs -f -m raid0 -d raid0 -n 16k -l 16k -O skinny-metadata /dev/sda2 /dev/sdc2 /dev/sdb2 /dev/sdd2 mount /dev/sda2 /scratch -o noatime,discard,nodatasum,nobarrier,ssd_spread -- [2] BUG: unable to handle kernel paging request at 0020 IP: [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0 PGD 3bca02c067 PUD 3bcf5fb067 PMD 0 Oops: [#1] SMP Modules linked in: CPU: 34 PID: 46645 Comm: mserver5 Not tainted 3.15.5-server #7 Hardware name: Dell Inc. PowerEdge R815/0W13NR, BIOS 3.1.1 [1.1.54] 10/16/2013 task: 880a8c7234f0 ti: 8809aefcc000 task.ti: 8809aefcc000 RIP: 0010:[8135a374] [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0 RSP: 0018:8809aefcfc40 EFLAGS: 00010246 RAX: 004fb9321000 RBX: 8809aefcfca8 RCX: 0200 RDX: 1000 RSI: 0020 RDI: 884fb9321000 RBP: 8809aefcfd48 R08: 0200 R09: R10: R11: 884fb9320ffc R12: 8831e3303740 R13: 880100579970 R14: 880bb38061c0 R15: 0020 FS: 7fb9447ed700() GS:884bbfc8() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0020 CR3: 00329b71c000 CR4: 000407e0 Stack: 8809aefcfc90 0011 000e 884fbbc2c870 880bb38061c0 8809aefcfc90 880bb3806058 880b02ec 883bcd523800 8833d338f2c0 88476b1eb4e0 00b890cde000 Call Trace: [81a75b4b] ? _raw_spin_lock+0xb/0x20 [8135c0e1] btrfs_write_out_cache+0xb1/0xf0 [8130be0b] btrfs_write_dirty_block_groups+0x58b/0x670 [813199c5] commit_cowonly_roots+0x195/0x250 [8131b92f] btrfs_commit_transaction+0x41f/0x9b0 [81358e85] ? btrfs_log_dentry_safe+0x55/0x70 [8132b6b2] btrfs_sync_file+0x182/0x2a0 [8114a450] do_fsync+0x50/0x80 [8114a6de] SyS_fdatasync+0xe/0x20 [81a766e6] system_call_fastpath+0x1a/0x1f Code: ff 4d 89 fc 49 89 c7 e9 ab 00 00 00 0f 1f 00 40 f6 c7 02 0f 85 fe 00 00 00 40 f6 c7 04 0f 85 14 01 00 00 89 d1 c1 e9 03 f6 c2 04 f3 48 a5 74 09 8b 0e 89 0f b9 04 00 00 00 f6 c2 02 74 0e 44 0f RIP [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0 RSP 8809aefcfc40 CR2: 0020 -- Daniel J Blueman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html