Re: Blocked for more than 120 seconds
Am 28.11.2011 10:29, schrieb Chris Samuel: Hi Tobias, On Mon, 28 Nov 2011, 19:16:25 EST, Tobiastra...@robotech.de wrote: The problem occurs on the stock ubuntu kernel 2.6.38-8, 3.0.0-12, 3.0.0-13 and on my self-compiled 3.1.2. There's a lot of work gone into btrfs in 3.2, it would be interesting to know (speaking as just another user) whether it still occurs with 3.2-rc3. I tried 3.2-rc3 tonight but the messages are still there: [46203.412044] INFO: task rsync:1653 blocked for more than 120 seconds. [46203.412056] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [46203.412066] rsync D 8801d7d51aa0 0 1653 1647 0x [46203.412073] 8800042b1d98 0086 [46203.412079] 8801d7d516e0 8800042b1fd8 8800042b1fd8 8800042b1fd8 [46203.412084] 88023212db80 8801d7d516e0 0282 000122103228 [46203.412090] Call Trace: [46203.412101] [8161259f] schedule+0x3f/0x60 [46203.412126] [a01fd1bd] wait_current_trans.isra.22+0x9d/0x100 [btrfs] [46203.412132] [81085a40] ? add_wait_queue+0x60/0x60 [46203.412148] [a01fe7f5] start_transaction+0x135/0x2b0 [btrfs] [46203.412154] [8117bd5a] ? kern_path_create+0x8a/0x120 [46203.412171] [a01fec43] btrfs_start_transaction+0x13/0x20 [btrfs] [46203.412188] [a020a885] btrfs_link+0xa5/0x1a0 [btrfs] [46203.412193] [81178a91] vfs_link+0x101/0x190 [46203.412197] [8117ce88] sys_linkat+0x168/0x180 [46203.412200] [8117cebe] sys_link+0x1e/0x20 [46203.412205] [8161c442] system_call_fastpath+0x16/0x1b [46563.412042] INFO: task btrfs-delalloc-:31614 blocked for more than 120 seconds. [46563.412054] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [46563.412064] btrfs-delalloc- D 8801f8529aa0 0 31614 2 0x [46563.412071] 8801330bdc50 0046 0004 [46563.412077] 8801f85296e0 8801330bdfd8 8801330bdfd8 8801330bdfd8 [46563.412082] 88023212db80 8801f85296e0 0282 000122103228 [46563.412088] Call Trace: [46563.412098] [8161259f] schedule+0x3f/0x60 [46563.412124] [a01fd1bd] wait_current_trans.isra.22+0x9d/0x100 [btrfs] [46563.412130] [81085a40] ? add_wait_queue+0x60/0x60 [46563.412146] [a01fe8b0] start_transaction+0x1f0/0x2b0 [btrfs] [46563.412163] [a01fe9c5] btrfs_join_transaction+0x15/0x20 [btrfs] [46563.412179] [a0206423] compress_file_range+0x2d3/0x610 [btrfs] [46563.412197] [a0206795] async_cow_start+0x35/0x50 [btrfs] [46563.412213] [a02269ba] worker_loop+0x16a/0x560 [btrfs] [46563.412231] [a0226850] ? btrfs_queue_worker+0x300/0x300 [btrfs] [46563.412236] [81084fac] kthread+0x8c/0xa0 [46563.412241] [8161e5b4] kernel_thread_helper+0x4/0x10 [46563.412246] [81084f20] ? flush_kthread_worker+0xa0/0xa0 [46563.412250] [8161e5b0] ? gs_change+0x13/0x13 [46563.412255] INFO: task flush-btrfs-1:323 blocked for more than 120 seconds. [46563.412263] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [46563.412273] flush-btrfs-1 D 8802129bb180 0 323 2 0x [46563.412278] 8800209c58e0 0046 8800 8800218a2710 [46563.412284] 8802129badc0 8800209c5fd8 8800209c5fd8 8800209c5fd8 [46563.412290] 8802321444a0 8802129badc0 8800209c58e0 00018108f20d [46563.412295] Call Trace: [46563.412300] [8110dc90] ? __lock_page+0x70/0x70 [46563.412305] [8161259f] schedule+0x3f/0x60 [46563.412309] [8161264f] io_schedule+0x8f/0xd0 [46563.412313] [8110dc9e] sleep_on_page+0xe/0x20 [46563.412317] [81612d1a] __wait_on_bit_lock+0x5a/0xc0 [46563.412321] [8110dc87] __lock_page+0x67/0x70 [46563.412325] [81085a80] ? autoremove_wake_function+0x40/0x40 [46563.412342] [a021c945] extent_write_cache_pages.isra.21.constprop.31+0x215/0x3f0 [btrfs] [46563.412361] [a021cd65] extent_writepages+0x45/0x60 [btrfs] [46563.412378] [a0201350] ? acls_after_inode_item+0xc0/0xc0 [btrfs] [46563.412382] [81085614] ? bit_waitqueue+0x14/0xc0 [46563.412398] [a0200448] btrfs_writepages+0x28/0x30 [btrfs] [46563.412403] [811195f1] do_writepages+0x21/0x40 [46563.412409] [81194c70] writeback_single_inode+0x180/0x430 [46563.412413] [81195336] writeback_sb_inodes+0x1b6/0x270 [46563.412418] [8119548e] __writeback_inodes_wb+0x9e/0xd0 [46563.412422] [8119573b] wb_writeback+0x27b/0x330 [46563.412427] [81187352] ? get_nr_dirty_inodes+0x52/0x80 [46563.412432] [8119588f] wb_check_old_data_flush+0x9f/0xb0 [46563.412436] [81196731] wb_do_writeback+0x151/0x1d0 [46563.412441] [81611fd4] ?
Re: Honest timeline for btrfsck
Any update on the state of btrfschk? Thanks, Clemens 2011/10/31 David Summers bt...@summers5913.freeserve.co.uk: On 05/10/11 07:16, Chris Mason wrote: So over the next two weeks I'm juggling the merge window and the fsck release. My goal is to demo fsck at linuxcon europe. Thanks again for all of your patience and help with Btrfs! Any chance of a copy of your talk at linuxcon? ;) David. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix submit_worker congestion
On 29.11.2011 22:47, Chris Mason wrote: On Tue, Nov 29, 2011 at 09:40:56PM +0100, Arne Jansen wrote: Write bios are submitted from the submit_worker. The worker pumps down bios into the block layer until it signals a congestion. At least this is the theory. In pratice submit_bio just blocks before any signalling happens. As the bios are queued per device, this can lead to a situation where only one device is served until all bios are submitted, and only then the next device is served. This is obviously suboptimal. This patch just throws out the congestion detection and reschedules the worker every 8 requests. This way, all devices can be kept busy. This is only a temporary fix until the block layer provides a non-blocking submit_bio. Then the whole submit_worker mechanism can be killed. The problem with the every 8 requests logic is that we've still got a pretty good chance of getting stuck behind get_request_wait. The way the elevator batching works is that it should give us a batch of requests, and once that batch is done we wait. If we jump around every 8 requests, we've turned this: [ dev A bio 1-8, dev A bio 8-16, dev A bio 16-32, dev B bio 1-8, dev B ... ] currently, it's more like [ dev A bio 1 - 5000, dev B bio 1-5000 ] into: [ dev A bio 1-8, dev B bio 1-8, dev A bio 8-16, dev B bio 8-16 ] so this is a great improvement :) They look like the same IO, but if we wait for a request when we do (dev B bio 1-8) then our dev A bio 1-8 bio is likely to dispatch without all the other dev A bios we had queued. As you said in IRC, we'd be better off with one thread per device or (my preference) with a real non-blocking submit_bio. What kind of results did you get with your test from bumping the nr_requests? what nr_requests do you mean? btrfs_async_submit_limit? Arne -chris -- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Canonicalise BTRFS: and Btrfs: to btrfs:
Currently there are 3 different capitalisations of btrfs: used in printk()'s, BTRFS: (3 occurences), Btrfs: (1 occurence) and btrfs: (77 occurences). It's best to have them all the same for consistency, so we canonicalise the two minority cases to btrfs:. Signed-off-by: Chris Samuel ch...@csamuel.org --- fs/btrfs/disk-io.c |4 ++-- fs/btrfs/inode.c |4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 94abc25..f4d419b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2106,7 +2106,7 @@ struct btrfs_root *open_ctree(struct super_block *sb, features = btrfs_super_incompat_flags(disk_super) ~BTRFS_FEATURE_INCOMPAT_SUPP; if (features) { - printk(KERN_ERR BTRFS: couldn't mount because of + printk(KERN_ERR btrfs: couldn't mount because of unsupported optional features (%Lx).\n, (unsigned long long)features); err = -EINVAL; @@ -2122,7 +2122,7 @@ struct btrfs_root *open_ctree(struct super_block *sb, features = btrfs_super_compat_ro_flags(disk_super) ~BTRFS_FEATURE_COMPAT_RO_SUPP; if (!(sb-s_flags MS_RDONLY) features) { - printk(KERN_ERR BTRFS: couldn't mount RDWR because of + printk(KERN_ERR btrfs: couldn't mount RDWR because of unsupported option features (%Lx).\n, (unsigned long long)features); err = -EINVAL; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 8ad26b1..399da74 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5057,7 +5057,7 @@ not_found_em: insert: btrfs_release_path(path); if (em-start start || extent_map_end(em) = start) { - printk(KERN_ERR Btrfs: bad extent! em: [%llu %llu] passed + printk(KERN_ERR btrfs: bad extent! em: [%llu %llu] passed [%llu %llu]\n, (unsigned long long)em-start, (unsigned long long)em-len, (unsigned long long)start, @@ -6693,7 +6693,7 @@ void btrfs_destroy_inode(struct inode *inode) spin_lock(root-orphan_lock); if (!list_empty(BTRFS_I(inode)-i_orphan)) { - printk(KERN_INFO BTRFS: inode %llu still on the orphan list\n, + printk(KERN_INFO btrfs: inode %llu still on the orphan list\n, (unsigned long long)btrfs_ino(inode)); list_del_init(BTRFS_I(inode)-i_orphan); } -- 1.7.4.1 -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP signature.asc Description: This is a digitally signed message part.
[PATCH] Prefix mount messages with btrfs: for clarity
Currently when mounting a btrfs filesystem a user searching dmesg has no obvious string to search for as currently we report something cryptic like: [ 5775.216078] device label DR devid 1 transid 15757 /dev/sdh1 It would be much nicer if there was some mention of btrfs to find (as other filesystems generally provide), so this change prefixes that message with btrfs: to make it appear as: [ 5775.216078] btrfs: device label DR devid 1 transid 15757 /dev/sdh1 Signed-off-by: Chris Samuel ch...@csamuel.org --- fs/btrfs/volumes.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index c37433d..bc08087 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -722,9 +722,9 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, devid = btrfs_stack_device_id(disk_super-dev_item); transid = btrfs_super_generation(disk_super); if (disk_super-label[0]) - printk(KERN_INFO device label %s , disk_super-label); + printk(KERN_INFO btrfs: device label %s , disk_super-label); else - printk(KERN_INFO device fsid %pU , disk_super-fsid); + printk(KERN_INFO btrfs: device fsid %pU , disk_super-fsid); printk(KERN_CONT devid %llu transid %llu %s\n, (unsigned long long)devid, (unsigned long long)transid, path); ret = device_list_add(path, disk_super, devid, fs_devices_ret); -- 1.7.4.1 -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP signature.asc Description: This is a digitally signed message part.
Re: [PATCH] Canonicalise BTRFS: and Btrfs: to btrfs:
On Wed, 30 Nov 2011 10:05:21 PM Chris Samuel wrote: Currently there are 3 different capitalisations of btrfs: used in printk()'s, BTRFS: (3 occurences), Btrfs: (1 occurence) and btrfs: (77 occurences). Unfortunately both this and the [PATCH] Prefix mount messages with btrfs: for clarity patch got munged into QP by GnuPG and Kmail. :-( Sending without signing doesn't appear to have that effect so I'll resend unsigned. cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP signature.asc Description: This is a digitally signed message part.
[RESEND] [PATCH] Canonicalise BTRFS: and Btrfs: to btrfs:
Currently there are 3 different capitalisations of btrfs: used in printk()'s, BTRFS: (3 occurences), Btrfs: (1 occurence) and btrfs: (77 occurences). It's best to have them all the same for consistency, so we canonicalise the two minority cases to btrfs:. Signed-off-by: Chris Samuel ch...@csamuel.org --- fs/btrfs/disk-io.c |4 ++-- fs/btrfs/inode.c |4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 94abc25..f4d419b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2106,7 +2106,7 @@ struct btrfs_root *open_ctree(struct super_block *sb, features = btrfs_super_incompat_flags(disk_super) ~BTRFS_FEATURE_INCOMPAT_SUPP; if (features) { - printk(KERN_ERR BTRFS: couldn't mount because of + printk(KERN_ERR btrfs: couldn't mount because of unsupported optional features (%Lx).\n, (unsigned long long)features); err = -EINVAL; @@ -2122,7 +2122,7 @@ struct btrfs_root *open_ctree(struct super_block *sb, features = btrfs_super_compat_ro_flags(disk_super) ~BTRFS_FEATURE_COMPAT_RO_SUPP; if (!(sb-s_flags MS_RDONLY) features) { - printk(KERN_ERR BTRFS: couldn't mount RDWR because of + printk(KERN_ERR btrfs: couldn't mount RDWR because of unsupported option features (%Lx).\n, (unsigned long long)features); err = -EINVAL; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 8ad26b1..399da74 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5057,7 +5057,7 @@ not_found_em: insert: btrfs_release_path(path); if (em-start start || extent_map_end(em) = start) { - printk(KERN_ERR Btrfs: bad extent! em: [%llu %llu] passed + printk(KERN_ERR btrfs: bad extent! em: [%llu %llu] passed [%llu %llu]\n, (unsigned long long)em-start, (unsigned long long)em-len, (unsigned long long)start, @@ -6693,7 +6693,7 @@ void btrfs_destroy_inode(struct inode *inode) spin_lock(root-orphan_lock); if (!list_empty(BTRFS_I(inode)-i_orphan)) { - printk(KERN_INFO BTRFS: inode %llu still on the orphan list\n, + printk(KERN_INFO btrfs: inode %llu still on the orphan list\n, (unsigned long long)btrfs_ino(inode)); list_del_init(BTRFS_I(inode)-i_orphan); } -- 1.7.4.1 -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RESEND] [PATCH] Prefix mount messages with btrfs: for clarity
Currently when mounting a btrfs filesystem a user searching dmesg has no obvious string to search for as currently we report something cryptic like: [ 5775.216078] device label DR devid 1 transid 15757 /dev/sdh1 It would be much nicer if there was some mention of btrfs to find (as other filesystems generally provide), so this change prefixes that message with btrfs: to make it appear as: [ 5775.216078] btrfs: device label DR devid 1 transid 15757 /dev/sdh1 Signed-off-by: Chris Samuel ch...@csamuel.org --- fs/btrfs/volumes.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index c37433d..bc08087 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -722,9 +722,9 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, devid = btrfs_stack_device_id(disk_super-dev_item); transid = btrfs_super_generation(disk_super); if (disk_super-label[0]) - printk(KERN_INFO device label %s , disk_super-label); + printk(KERN_INFO btrfs: device label %s , disk_super-label); else - printk(KERN_INFO device fsid %pU , disk_super-fsid); + printk(KERN_INFO btrfs: device fsid %pU , disk_super-fsid); printk(KERN_CONT devid %llu transid %llu %s\n, (unsigned long long)devid, (unsigned long long)transid, path); ret = device_list_add(path, disk_super, devid, fs_devices_ret); -- 1.7.4.1 -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs encryption problems
Is anyone able to reproduce the problems that I described? I can help if anyone is interested. The hard drive is brand new, I also plugged it directly via eSATA and checked the SMART data and run some tests and it succeeded in all, if I format the drive again I have a working file-system for sure but some of my data that is not on the backup disk will be gone.. I had some time ago on opensuse tumbleweed a encrypted btrfs system and it was gone some time ago like this external hard drive, I also had various btrfs hard drives without encryption and they never failed, -- Thanks -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs encryption problems
On Fri, Nov 25, 2011 at 10:40:00AM +, 810d4rk wrote: My mistake, the same printks are printed when the encryption key is incorrect, I've seen that here. It looks like you have some ugly hardware errors. The kernel cannot read from the drive, so it cannot guess the file system on it. If the data is valuable, you could try to ddrescue the drive to a bigger one. (750GB... and that will take time...) and attempt to mount the rescued data. If the drive is in an USB enclosure, you could plug it directly via SATA to the system (maybe it has issues?). The hard drive is brand new, I also plugged it directly via eSATA and checked the SMART data and run some tests and it succeeded in all, if I format the drive again I have a working file-system for sure but some of my data that is not on the backup disk will be gone.. I had some time ago on opensuse tumbleweed a encrypted btrfs system and it was gone some time ago like this external hard drive, I also had various btrfs hard drives without encryption and they never failed, and I will make a image of the disk but that will be later because I don't have a backup hard drive bigger than 750gb, maybe btrfs fsck can restore my disk when it is officially released? If you plug it in directly with esata, do the IO errors go away? If so, please post the kernel messages from that. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked for more than 120 seconds
On Wed, Nov 30, 2011 at 10:44:15AM +0100, Tobias wrote: Am 28.11.2011 10:29, schrieb Chris Samuel: Hi Tobias, On Mon, 28 Nov 2011, 19:16:25 EST, Tobiastra...@robotech.de wrote: The problem occurs on the stock ubuntu kernel 2.6.38-8, 3.0.0-12, 3.0.0-13 and on my self-compiled 3.1.2. There's a lot of work gone into btrfs in 3.2, it would be interesting to know (speaking as just another user) whether it still occurs with 3.2-rc3. I tried 3.2-rc3 tonight but the messages are still there: We see a bunch of procs stuck waiting to start a transaction, but we don't see why they are waiting. Could you please capture a sysrq-t during this? That will show us all the waiters everywhere. We're really looking for the one proc stuck in btrfs_commit_transaction, he's the key to the stalls. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix submit_worker congestion
On Wed, Nov 30, 2011 at 11:30:01AM +0100, Arne Jansen wrote: On 29.11.2011 22:47, Chris Mason wrote: On Tue, Nov 29, 2011 at 09:40:56PM +0100, Arne Jansen wrote: Write bios are submitted from the submit_worker. The worker pumps down bios into the block layer until it signals a congestion. At least this is the theory. In pratice submit_bio just blocks before any signalling happens. As the bios are queued per device, this can lead to a situation where only one device is served until all bios are submitted, and only then the next device is served. This is obviously suboptimal. This patch just throws out the congestion detection and reschedules the worker every 8 requests. This way, all devices can be kept busy. This is only a temporary fix until the block layer provides a non-blocking submit_bio. Then the whole submit_worker mechanism can be killed. The problem with the every 8 requests logic is that we've still got a pretty good chance of getting stuck behind get_request_wait. The way the elevator batching works is that it should give us a batch of requests, and once that batch is done we wait. If we jump around every 8 requests, we've turned this: [ dev A bio 1-8, dev A bio 8-16, dev A bio 16-32, dev B bio 1-8, dev B ... ] currently, it's more like [ dev A bio 1 - 5000, dev B bio 1-5000 ] into: [ dev A bio 1-8, dev B bio 1-8, dev A bio 8-16, dev B bio 8-16 ] so this is a great improvement :) ;) They look like the same IO, but if we wait for a request when we do (dev B bio 1-8) then our dev A bio 1-8 bio is likely to dispatch without all the other dev A bios we had queued. As you said in IRC, we'd be better off with one thread per device or (my preference) with a real non-blocking submit_bio. What kind of results did you get with your test from bumping the nr_requests? what nr_requests do you mean? btrfs_async_submit_limit? /sys/block/xxx/queue/nr_requests -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Sector Size check during Mount
On Tue, Nov 29, 2011 at 05:44:12PM -0800, Keith Mannthey wrote: Gracefully fail when trying to mount a BTRFS file system that has a sectorsize smaller than PAGE_SIZE. On PPC it is possible to build a FS while using a 4k PAGE_SIZE kernel then boot into a 64K PAGE_SIZE kernel. Presently open_ctree fails in an endless loop and hangs the machine in this situation. My debugging has show this Sector size Page size to be a non trivial situation and a graceful exit from the situation would be nice for the time being. Thanks. The large block size patches should finally go into 3.3, which will make a big difference. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] btrfs: btrfs_calc_avail_data_space cope with no read/write devices V2
On Tue, Nov 29, 2011 at 09:26:01AM +0800, Li Zefan wrote: This patch has the same problem with your previous one, that it will set f_bavail to 0. I've sent out a new patch yesterday. Ahh, sounds great thanks. Often a patch is a good way to start a discussion to a more correct patch. Specially when one is not an expert in the area. -apw -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: deal with enospc from dirtying inodes properly
Now that we're properly keeping track of delayed inode space we've been getting a lot of warnings out of btrfs_dirty_inode() when running xfstest 83. This is because a bunch of people call mark_inode_dirty, which is void so we can't return ENOSPC. This needs to be fixed in a few areas 1) file_update_time - this updates the mtime and such when writing to a file, which will call mark_inode_dirty. So copy file_update_time into btrfs so we can call btrfs_dirty_inode directly and return an error if we get one appropriately. 2) fix symlinks to use btrfs_setattr for -setattr. For some reason we weren't setting -setattr for symlinks, even though we should have been. This catches one of the cases where we were getting errors in mark_inode_dirty. 3) Fix btrfs_setattr and btrfs_setsize to call btrfs_dirty_inode directly instead of mark_inode_dirty. This lets us return errors properly for truncate and chown/anything related to setattr. 4) Add a new btrfs_fs_dirty_inode which will just call btrfs_dirty_inode and print an error if we have one. The only remaining user we can't control for this is touch_atime(), but we don't really want to keep people from walking down the tree if we don't have space to save the atime update, so just complain but don't worry about it. With this patch xfstests 83 complains a handful of times instead of hundreds of times. Thanks, Signed-off-by: Josef Bacik jo...@redhat.com --- fs/btrfs/ctree.h |3 +- fs/btrfs/file.c |6 +++- fs/btrfs/inode.c | 80 +- fs/btrfs/super.c | 13 - 4 files changed, 80 insertions(+), 22 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 04a5dfc..8823822 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2689,7 +2689,8 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf); int btrfs_readpage(struct file *file, struct page *page); void btrfs_evict_inode(struct inode *inode); int btrfs_write_inode(struct inode *inode, struct writeback_control *wbc); -void btrfs_dirty_inode(struct inode *inode, int flags); +int btrfs_dirty_inode(struct inode *inode); +int btrfs_update_time(struct file *file); struct inode *btrfs_alloc_inode(struct super_block *sb); void btrfs_destroy_inode(struct inode *inode); int btrfs_drop_inode(struct inode *inode); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index f2e9282..cc7492c 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1387,7 +1387,11 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, goto out; } - file_update_time(file); + err = btrfs_update_time(file); + if (err) { + mutex_unlock(inode-i_mutex); + goto out; + } BTRFS_I(inode)-sequence++; start_pos = round_down(pos, root-sectorsize); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e16215f..9ae9c2e 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -38,6 +38,7 @@ #include linux/falloc.h #include linux/slab.h #include linux/ratelimit.h +#include linux/mount.h #include compat.h #include ctree.h #include disk-io.h @@ -3386,7 +3387,7 @@ static int btrfs_setsize(struct inode *inode, loff_t newsize) return ret; } - mark_inode_dirty(inode); + ret = btrfs_dirty_inode(inode); } else { /* @@ -3426,9 +3427,9 @@ static int btrfs_setattr(struct dentry *dentry, struct iattr *attr) if (attr-ia_valid) { setattr_copy(inode, attr); - mark_inode_dirty(inode); + err = btrfs_dirty_inode(inode); - if (attr-ia_valid ATTR_MODE) + if (!err attr-ia_valid ATTR_MODE) err = btrfs_acl_chmod(inode); } @@ -4204,42 +4205,80 @@ int btrfs_write_inode(struct inode *inode, struct writeback_control *wbc) * FIXME, needs more benchmarking...there are no reasons other than performance * to keep or drop this code. */ -void btrfs_dirty_inode(struct inode *inode, int flags) +int btrfs_dirty_inode(struct inode *inode) { struct btrfs_root *root = BTRFS_I(inode)-root; struct btrfs_trans_handle *trans; int ret; if (BTRFS_I(inode)-dummy_inode) - return; + return 0; trans = btrfs_join_transaction(root); - BUG_ON(IS_ERR(trans)); + if (IS_ERR(trans)) + return PTR_ERR(trans); ret = btrfs_update_inode(trans, root, inode); if (ret ret == -ENOSPC) { /* whoops, lets try again with the full transaction */ btrfs_end_transaction(trans, root); trans = btrfs_start_transaction(root, 1); - if (IS_ERR(trans)) { - printk_ratelimited(KERN_ERR btrfs: fail to - dirty inode %llu error %ld\n, -
Resize command syntax wrong?
Currently the resize command is under filesystem, and takes a path to the mounted filesystem. This seems wrong to me. Shouldn't it be under device, and take a path to a device to resize? Otherwise, how can a resize operation when you have multiple devices make any sense? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Resize command syntax wrong?
Hallo, Phillip, Du meintest am 30.11.11: Currently the resize command is under filesystem, and takes a path to the mounted filesystem. This seems wrong to me. Shouldn't it be under device, and take a path to a device to resize? No - it's a filesystem operation. p.e. You start with a system of 2 disks. They get filled nearly simultaneously. Then you add a 3rd disk (which is empty at that time). Now it's a good idea to run balance for equalizing the filling. Viele Gruesse! Helmut -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Resize command syntax wrong?
On 30 Nov 2011 19:59:00 +0100 Helmut Hullen hul...@t-online.de wrote: Currently the resize command is under filesystem, and takes a path to the mounted filesystem. This seems wrong to me. Shouldn't it be under device, and take a path to a device to resize? No - it's a filesystem operation. Are you sure about that? p.e. You start with a system of 2 disks. They get filled nearly simultaneously. Then you add a 3rd disk (which is empty at that time). Now it's a good idea to run balance for equalizing the filling. What if I need to replace an individual device with a smaller or a larger one? -- With respect, Roman ~~~ Stallman had a printer, with code he could not see. So he began to tinker, and set the software free. signature.asc Description: PGP signature
Re: Resize command syntax wrong?
On Thursday, 01 December, 2011 01:15:47 you wrote: On 30 Nov 2011 19:59:00 +0100 Helmut Hullen hul...@t-online.de wrote: Currently the resize command is under filesystem, and takes a path to the mounted filesystem. This seems wrong to me. Shouldn't it be under device, and take a path to a device to resize? No - it's a filesystem operation. Are you sure about that? I confirm that. In fact btrfs filesystem resize doesn't change the device(s). It only expands or shrinks the filesystem. Of course if you want to expand the filesystem, you have to expand the underling device *before*. Otherwise if you want to shrink the filesystem, you have to not shrink the device before shrinking the filesystem. p.e. You start with a system of 2 disks. They get filled nearly simultaneously. Then you add a 3rd disk (which is empty at that time). Now it's a good idea to run balance for equalizing the filling. What if I need to replace an individual device with a smaller or a larger one? This is a more simpler case As general rule: # btrfs device add new device btrfs root # btrfs device delete old device btrfs root May be that the device removing is blocked in some RAID setup. -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) kreij...@inwind.it Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Resize command syntax wrong?
Hallo, Roman, Du meintest am 01.12.11: What if I need to replace an individual device with a smaller or a larger one? 1) add the new device 2) balance (may be it's not necessary) 3) run remove for the individual device 4) remove it 5) balance Viele Gruesse! Helmut -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Resize command syntax wrong?
On 30 Nov 2011 20:43:00 +0100 Helmut Hullen hul...@t-online.de wrote: Hallo, Roman, Du meintest am 01.12.11: What if I need to replace an individual device with a smaller or a larger one? 1) add the new device 2) balance (may be it's not necessary) 3) run remove for the individual device 4) remove it 5) balance Okay, adding a new device wasn't the best example to explain my point. What I meant is resizing a BTRFS partition, enlarging it or shrinking it as needed, while still on the same device. Of course in the enlarge scenario the partition(or the LV) is resized upwards first, and then the filesystem, and on shrinking it's vice versa. Suppose I used half of a 1000GB disk for BTRFS (a 500GB partition), and the second half for something else. Now I want to remove this other partition, and make BTRFS occupy the whole disk. Resizing in both 'directions' seems to work very well on single-device BTRFS filesystems, and also it's very useful that BTRFS is almost the only modern FS (besides ext4) that can be shrinked. But with multi-device filesystems, don't you agree it's non-obvious how (or is not even possible) to resize the areas that BTRFS occupies on individual devices? -- With respect, Roman ~~~ Stallman had a printer, with code he could not see. So he began to tinker, and set the software free. signature.asc Description: PGP signature
mkfs.btrfs failure on ARM
I'm hitting an error with the default mkfs.btrfs in debian wheezy: ford:~# uname -a Linux ford.blinkenlights.nl 3.1.0-1-kirkwood #1 Tue Nov 15 00:17:24 UTC 2011 armv5tel GNU/Linux ford:~/btrfs-progs# dpkg -l | grep btrfs ii btrfs-tools 0.19+2005-1 Checksumming Copy on Write Filesystem utilities ford:~# mkfs.btrfs /dev/vgroot/home WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using mkfs.btrfs: volumes.c:1575: btrfs_read_chunk_tree: Assertion `!(ret)' failed. A fresh clone gives me: ford:~/btrfs-progs# ./mkfs.btrfs /dev/vgroot/home WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using mkfs.btrfs: volumes.c:846: btrfs_alloc_chunk: Assertion `!(ret)' failed. Aborted Tested with gcc 4.4 and 4.6. I did a git bisect and ended up here: ford:~/btrfs-progs# git bisect bad 4e64e05c6b8d9a1ed30bb6eda30ef8e93c6af260 is the first bad commit commit 4e64e05c6b8d9a1ed30bb6eda30ef8e93c6af260 Author: Donggeun Kim dg77@samsung.com Date: Thu Jul 8 09:17:59 2010 + btrfs-progs: Add new feature to mkfs.btrfs to make file system image file from source directory Anybody got a clue as to what the problem might be? -- Sten Spans There is a crack in everything, that's how the light gets in. Leonard Cohen - Anthem -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs encryption problems
I plugged it directly by sata and this is what I get from the 3.1 kernel: [ 577.850429] ata3: exception Emask 0x10 SAct 0x0 SErr 0x405 action 0xe frozen [ 577.850433] ata3: irq_stat 0x0040, connection status changed [ 577.850436] ata3: SError: { PHYRdyChg CommWake DevExch } [ 577.850443] ata3: hard resetting link [ 581.768015] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 581.829148] ata3.00: ATA-8: WDC WD7500BPVT-22HXZT1, 01.01A01, max UDMA/133 [ 581.829151] ata3.00: 1465149168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA [ 581.833146] ata3.00: configured for UDMA/133 [ 581.848043] ata3: EH complete [ 581.848134] scsi 2:0:0:0: Direct-Access ATA WDC WD7500BPVT-2 01.0 PQ: 0 ANSI: 5 [ 581.848250] sd 2:0:0:0: [sdb] 1465149168 512-byte logical blocks: (750 GB/698 GiB) [ 581.848253] sd 2:0:0:0: [sdb] 4096-byte physical blocks [ 581.848300] sd 2:0:0:0: [sdb] Write Protect is off [ 581.848302] sd 2:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [ 581.848323] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 581.921417] sdb: sdb1 [ 581.921642] sd 2:0:0:0: [sdb] Attached SCSI disk [ 660.040263] EXT4-fs (dm-4): VFS: Can't find ext4 filesystem If you plug it in directly with esata, do the IO errors go away? If so, please post the kernel messages from that. -chris -- Thanks -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/20] Btrfs: initialize new bitmaps' list
On Nov 29, 2011, Christian Brunner c...@muc.de wrote: When I'm doing havy reading in our ceph cluster. The load and wait-io on the patched servers is higher than on the unpatched ones. That's unexpected. This seems to be coming from btrfs-endio-1. A kernel thread that has not caught my attention on unpatched systems, yet. I suppose I could wave my hands while explaining that you're getting higher data throughput, so it's natural that it would take up more resources, but that explanation doesn't satisfy me. I suppose allocation might have got slightly more CPU intensive in some cases, as we now use bitmaps where before we'd only use the cheaper-to-allocate extents. But that's unsafisfying as well. Do you have any idea what's going on here? Sorry, not really. (Please note that the filesystem is still unmodified - metadata overhead is large). Speaking of metadata overhead, I found out that the bitmap-enabling patch is not enough for a metadata balance to get rid of excess metadata block groups. I had to apply patch #16 to get it again. It sort of makes sense: without patch 16, too often will we get to the end of the list of metadata block groups and advance from LOOP_FIND_IDEAL to LOOP_CACHING_WAIT (skipping NOWAIT after we've cached free space for all block groups), and if we get to the end of that loop as well (how? I couldn't quite figure out, but it only seems to happen under high contention) we'll advance to LOOP_ALLOC_CHUNK and end up unnecessarily allocating a new chunk. Patch 16 makes sure we don't jump ahead during LOOP_CACHING_WAIT, so we won't get new chunks unless they can really help us keep the system going. -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] fix bugs of sub transid -- WARNING: at fs/btrfs/ctree.c:432
On 11/30/2011 12:17 AM, David Sterba wrote: On Tue, Nov 29, 2011 at 09:18:35AM +0800, Liu Bo wrote: a) For the first one (last_snapshot bug), The test involves three processes (derived from Chris): mkfs.btrfs /dev/xxx mount /dev/xxx /mnt 1) run compilebench -i 30 --makej -D /mnt Let compilebench run until it starts the create phase. 2) run synctest -f -u -n 200 -t 3 /mnt 3) for x in `seq 1 200` ; do btrfs subvol snap /mnt /mnt/snap$x ; sleep 0.5 ; done I have hit following 2 warnings during this test. Phase 1 was at compile stage, 2 and 3 were running. I did not see them during first run and other activity at the filestystem was 'du -sh /mnt'. mount options: compress-force=lzo,discard,space_cache,autodefrag,inode_cache Label: none uuid: 79f4160b-81f8-46ed-968c-968cb17a2e87 Total devices 4 FS bytes used 7.76GB devid4 size 13.96GB used 2.26GB path /dev/sdb4 devid3 size 13.96GB used 2.26GB path /dev/sdb3 devid2 size 13.96GB used 3.00GB path /dev/sdb2 devid1 size 13.96GB used 3.02GB path /dev/sdb1 fresh and default mkfs 430 WARN_ON(root-ref_cows trans-transaction-transid != 431 root-fs_info-running_transaction-transid); 432 WARN_ON(root-ref_cows trans-transid root-last_trans); Hi, David, This should be a miss from me. The warning is aimed to check whether this ref_cow root is in this transaction, so we can change it to WARN_ON(root-ref_cows trans-transaction-transid root-last_trans); BTW, did you catch any parent transid mismatch during the test? :) thanks, liubo 20433.473713] [ cut here ] [20433.478825] WARNING: at fs/btrfs/ctree.c:432 __btrfs_cow_block+0x429/0x5e0 [btrfs]() [20433.487148] Hardware name: Santa Rosa platform [20433.487150] Modules linked in: btrfs aoe sr_mod ide_cd_mod cdrom loop [last unloaded: btrfs] [20433.487162] Pid: 12099, comm: btrfs Tainted: GW 3.1.0-default+ #80 [20433.487165] Call Trace: [20433.487174] [81051c0f] warn_slowpath_common+0x7f/0xc0 [20433.487179] [81051c6a] warn_slowpath_null+0x1a/0x20 [20433.487190] [a00d6909] __btrfs_cow_block+0x429/0x5e0 [btrfs] [20433.487196] [8108a429] ? trace_hardirqs_off_caller+0x29/0xc0 [20433.487201] [8108a7ed] ? lock_release_holdtime+0x3d/0x1c0 [20433.487218] [a0129cc0] ? btrfs_set_lock_blocking_rw+0x50/0xb0 [btrfs] [20433.487230] [a00d6c66] btrfs_cow_block+0x1a6/0x3d0 [btrfs] [20433.487236] [818b90cb] ? _raw_write_unlock+0x2b/0x50 [20433.487247] [a00dae70] btrfs_search_slot+0x300/0xd20 [btrfs] [20433.487262] [a00eedcf] btrfs_lookup_inode+0x2f/0xa0 [btrfs] [20433.487279] [a00fd186] btrfs_update_inode_item+0x66/0x120 [btrfs] [20433.487296] [a00fe63b] btrfs_update_inode+0xab/0xc0 [btrfs] [20433.487313] [a0134e61] ? lookup_free_ino_inode+0x51/0xe0 [btrfs] [20433.487327] [a00ef515] btrfs_save_ino_cache+0x145/0x2f0 [btrfs] [20433.487342] [a00f7464] ? commit_fs_roots+0xa4/0x1c0 [btrfs] [20433.487357] [a00f7494] commit_fs_roots+0xd4/0x1c0 [btrfs] [20433.487373] [a00f86a4] btrfs_commit_transaction+0x454/0x900 [btrfs] [20433.487378] [8108a7ed] ? lock_release_holdtime+0x3d/0x1c0 [20433.487395] [a0126e08] ? btrfs_mksubvol+0x298/0x360 [btrfs] [20433.487400] [81076550] ? wake_up_bit+0x40/0x40 [20433.487405] [8134738e] ? do_raw_spin_unlock+0x5e/0xb0 [20433.487421] [a0126ec8] btrfs_mksubvol+0x358/0x360 [btrfs] [20433.487427] [8110ece3] ? might_fault+0x53/0xb0 [20433.487443] [a0126fd0] btrfs_ioctl_snap_create_transid+0x100/0x160 [btrfs] [20433.487448] [8110ece3] ? might_fault+0x53/0xb0 [20433.487464] [a01271ad] btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs] [20433.487482] [a0128a78] btrfs_ioctl+0x588/0x1080 [btrfs] [20433.487487] [818bd9c0] ? do_page_fault+0x2d0/0x580 [20433.487492] [8107d6cf] ? local_clock+0x6f/0x80 [20433.487498] [811473e8] do_vfs_ioctl+0x98/0x560 [20433.487502] [818b9fd9] ? retint_swapgs+0x13/0x1b [20433.487507] [811478ff] sys_ioctl+0x4f/0x80 [20433.487512] [818c21c2] system_call_fastpath+0x16/0x1b [20433.487515] ---[ end trace d93007cf8d0a8eac ]--- [20433.487576] [ cut here ] [20433.487587] WARNING: at fs/btrfs/ctree.c:432 __btrfs_cow_block+0x429/0x5e0 [btrfs]() [20433.487590] Hardware name: Santa Rosa platform [20433.487592] Modules linked in: btrfs aoe sr_mod ide_cd_mod cdrom loop [last unloaded: btrfs] [20433.487601] Pid: 12099, comm: btrfs Tainted: GW 3.1.0-default+ #80 [20433.487603] Call Trace: [20433.487608] [81051c0f] warn_slowpath_common+0x7f/0xc0 [20433.487613] [81051c6a] warn_slowpath_null+0x1a/0x20 [20433.487623]
Re: Resize command syntax wrong?
Hallo, Phillip, Du meintest am 30.11.11: You start with a system of 2 disks. They get filled nearly simultaneously. Then you add a 3rd disk (which is empty at that time). Now it's a good idea to run balance for equalizing the filling. balance != resize I know. p.e. Start with 1 disk with 2 GB and 1 disk with 4 GByte Fill it with 2 Gbyte data, each disk gets 1 GByte. Add a disk with 10 GByte, run balance: each disk gets about 700 MByte. That has nothing to do with resize. Viele Gruesse! Helmut -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html