Couple of problems regarding btrfs qgroup show reliability
Here is an simplified excerpt of my backup bash script: CURRENT_TIME=$(date +%Y-%m-%d_%H:%M-%S) # LAST_TIME variable contains the timestamp of the last backup in the same format as $CURRENT_TIME btrfs subvolume snapshot -r /mnt/root/@home /mnt/root/@home- backup-$CURRENT_TIME sync # Define space check variables btrfs quota enable /mnt/root SUBVOLUME_ID=$(btrfs subvolume list /mnt/root | grep $CURRENT_TIME | awk '{print $2}') ABSOLUTE_SIZE=$(btrfs qgroup show /mnt/root | grep 0/$SUBVOLUME_ID | awk '{print $2}') RELATIVE_SIZE=$(btrfs qgroup show /mnt/root | grep 0/$SUBVOLUME_ID | awk '{print $3}') FREE_SPACE=$(df -B1 /mnt/backup | tail -1 | awk '{print $4}') # Now I want to check if there is enough space on /mnt/backup, for sending the incremental part to /mnt/backup (Let us assume us, that there have not been made snapshots more recent than @home-backup-$LAST_TIME), so I did the following in my backup script: if (( $FREE_SPACE $RELATIVE_SIZE )); then btrfs send -p /mnt/root/@home-backup-$LAST_TIME /mnt/root/@home- backup-$CURRENT_TIME | btrfs receive /mnt/backup fi # For the initial bootstrapping I choose if (( $FREE_SPACE $ABSOLUTE_SIZE )); then btrfs send /mnt/root/@home-backup-$CURRENT_TIME | btrfs receive /mnt/backup fi Now I have a couple of questions: 1.) does it matter when I enable btrfs quota? I mean even if it is enabled for the first time in the backup script? Does this have any influence on the values determined for $ABSOLUTE_SIZE and $RELATIVE_SIZE? 2.) does btrfs implement some way to show free space on its own or do I have to rely on df? 3.) Is the logic right for the incremental backup space check? I mean the unshared space should be more or less what is transmittted by btrfs send, right, since we already have the last snapshot on the backup drive? If this isn't the right approach, how do I get the size difference of two special snapshots say @home-backup-$CURRENT_TIME and @home-backup-$LAST_TIME? 4.) Out of curiosity I checked the ABSOLUTE_SIZE values of the sent snapshot on the backup device too, in theory they should be equal right? But they are not for some reason they are not equal at all, neither are the RELATIVE_SIZE values. Checking the ABSOLUTE_SIZE with du, seems to inidicate that the values on the backup device seems to be right (2,6 GB), but on the internal drive the value of $ABSOLUTE_SIZE is 2.0 GB, how can that be? Of course the RELATIVE_SIZE can very a bit, depending on what snapshots are residing on the same drive, but let as assume no confounding factors, then they should be roughly in the same magnitude. And the ABSOLUTE_SIZE variables should be definitely equal on both the backup drive and the internal harddrive for the same snapshot. Where am I wrong? 5.) I understand that btrfs snapshot delete breaks the RELATIVE_SIZE, at least this is noted in the wiki. Is this still true, and will it be resolved soon? The wiki also notes After deleting a subvolume, you must manually delete the associated qgroup. How would I do that? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Drop stray check of fixup_workers creation
The issue was introduced in a79b7d4b3e8118f265dcb4bdf9a572c392f02708, adding allocation of extent_workers, so this stray check is surely not meant to be a check of something else. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=82021 Reported-by: Maks Naumov maksq...@ukr.net Signed-off-by: Andrey Utkin andrey.krieger.ut...@gmail.com --- fs/btrfs/disk-io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 08e65e9..1881713 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2601,7 +2601,7 @@ int open_ctree(struct super_block *sb, fs_info-endio_freespace_worker fs_info-rmw_workers fs_info-caching_workers fs_info-readahead_workers fs_info-fixup_workers fs_info-delayed_workers - fs_info-fixup_workers fs_info-extent_workers + fs_info-extent_workers fs_info-qgroup_rescan_workers)) { err = -ENOMEM; goto fail_sb_buffer; -- 1.8.5.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Drop stray check of fixup_workers creation
On 8/9/14, 6:51 AM, Andrey Utkin wrote: The issue was introduced in a79b7d4b3e8118f265dcb4bdf9a572c392f02708, adding allocation of extent_workers, so this stray check is surely not meant to be a check of something else. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=82021 Reported-by: Maks Naumov maksq...@ukr.net Signed-off-by: Andrey Utkin andrey.krieger.ut...@gmail.com Yup, harmless but unneeded. However, might as well put the extent_workers qgroup_rescan_workers checks on the same line now... Could probably do a V2 or fix it on commit, but anyway: Reviewed-by: Eric Sandeen sand...@redhat.com --- fs/btrfs/disk-io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 08e65e9..1881713 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2601,7 +2601,7 @@ int open_ctree(struct super_block *sb, fs_info-endio_freespace_worker fs_info-rmw_workers fs_info-caching_workers fs_info-readahead_workers fs_info-fixup_workers fs_info-delayed_workers - fs_info-fixup_workers fs_info-extent_workers + fs_info-extent_workers fs_info-qgroup_rescan_workers)) { err = -ENOMEM; goto fail_sb_buffer; -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 40TB volume taking over 16 hours to mount, any ideas?
Hello, On Sat, Aug 09, 2014 at 01:38:34PM +1000, Russell Coker wrote: On Fri, 8 Aug 2014 16:35:29 Jose Ildefonso Camargo Tolosa wrote: Then, after reading here and there, decided to try to use a newer kernel, tried 3.15.8. Well, it is still mounting after ~16 hours, and I got messages like these at first: I recommend trying a 3.14 kernel. I had ongoing problems with kernels before 3.14 which included infinite loops in kernel space. Based on reports on this list I haven't been inclined to test 3.15 kernels. But 3.14 has been working well for me on many systems. I'm in a similar position with a filesystem that won't mount except read-only, but am already on 3.14 and am also wondering whether to try a 3.16 kernel. https://bugzilla.kernel.org/show_bug.cgi?id=81981 Jose, maybe you could try -oro in the hope of at least getting back to a read-only mount? Cheers, Andy -- I remember the first time I made love. Perhaps it was not love exactly but I made it and it still works. — The League Against Tedium -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 40TB volume taking over 16 hours to mount, any ideas?
On Sat, Aug 9, 2014 at 9:32 AM, Andy Smith a...@strugglers.net wrote: Hello, On Sat, Aug 09, 2014 at 01:38:34PM +1000, Russell Coker wrote: On Fri, 8 Aug 2014 16:35:29 Jose Ildefonso Camargo Tolosa wrote: Then, after reading here and there, decided to try to use a newer kernel, tried 3.15.8. Well, it is still mounting after ~16 hours, and I got messages like these at first: I recommend trying a 3.14 kernel. I had ongoing problems with kernels before 3.14 which included infinite loops in kernel space. Based on reports on this list I haven't been inclined to test 3.15 kernels. But 3.14 has been working well for me on many systems. I'm in a similar position with a filesystem that won't mount except read-only, but am already on 3.14 and am also wondering whether to try a 3.16 kernel. https://bugzilla.kernel.org/show_bug.cgi?id=81981 Jose, maybe you could try -oro in the hope of at least getting back to a read-only mount? Will try 3.14, ro would be good enough for me, provided that I can resize the filesystem, if I can do that, I can create a new one, and copy all data (hopefully faster than moving ~11TB of data through the network). -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 40TB volume taking over 16 hours to mount, any ideas?
Re-sending to list. On Sat, Aug 9, 2014 at 9:58 AM, Jose Ildefonso Camargo Tolosa ildefonso.cama...@gmail.com wrote: On Sat, Aug 9, 2014 at 9:32 AM, Andy Smith a...@strugglers.net wrote: Hello, On Sat, Aug 09, 2014 at 01:38:34PM +1000, Russell Coker wrote: On Fri, 8 Aug 2014 16:35:29 Jose Ildefonso Camargo Tolosa wrote: Then, after reading here and there, decided to try to use a newer kernel, tried 3.15.8. Well, it is still mounting after ~16 hours, and I got messages like these at first: I recommend trying a 3.14 kernel. I had ongoing problems with kernels before 3.14 which included infinite loops in kernel space. Based on reports on this list I haven't been inclined to test 3.15 kernels. But 3.14 has been working well for me on many systems. I'm in a similar position with a filesystem that won't mount except read-only, but am already on 3.14 and am also wondering whether to try a 3.16 kernel. https://bugzilla.kernel.org/show_bug.cgi?id=81981 Jose, maybe you could try -oro in the hope of at least getting back to a read-only mount? Will try 3.14, ro would be good enough for me, provided that I can resize the filesystem, if I can do that, I can create a new one, and copy all data (hopefully faster than moving ~11TB of data through the network). Or maybe 3.16? sigh I have them both ready, but I am not sure which one to try. My fear is that if I go to 3.16 (still in development), would I be able to go back to, say, 3.14 and work with the filesystem there? According to documents, disk format is stable now. What do you say? 3.14 or 3.16 for my next attempt (I have just today, if I can't get this FS back to life today, I will blow it and start over, with the ~1.5 weeks copy period ahead of me). -- Ildefonso Camargo Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC @cmdpromptinc - 509-416-6579 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 40TB volume taking over 16 hours to mount, any ideas?
Jose Ildefonso Camargo Tolosa posted on Sat, 09 Aug 2014 11:06:37 -0500 as excerpted: 3.16 (still in development) ?? 3.16 has been out for nearly a week now and we're nearing half-way thru the 3.17 commit-window. Based on the kernel git I have here, Linus' commit officially changing the makefile entry to 3.16 was on Sunday, Aug 3, at 15:25:02 -0700. The last pre-3.16 commit was a merge of two timer-related fixes from the tip-tree at 9:58:20 -0700 that morning. So where does your still in development come from? -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 40TB volume taking over 16 hours to mount, any ideas?
On Sat, Aug 9, 2014 at 12:01 PM, Duncan 1i5t5.dun...@cox.net wrote: Jose Ildefonso Camargo Tolosa posted on Sat, 09 Aug 2014 11:06:37 -0500 as excerpted: 3.16 (still in development) ?? 3.16 has been out for nearly a week now and we're nearing half-way thru the 3.17 commit-window. Based on the kernel git I have here, Linus' commit officially changing the makefile entry to 3.16 was on Sunday, Aug 3, at 15:25:02 -0700. The last pre-3.16 commit was a merge of two timer-related fixes from the tip-tree at 9:58:20 -0700 that morning. So where does your still in development come from? Well, maybe not the right word, but here is what kernel.org says about mainline kernels: Mainline tree is maintained by Linus Torvalds. It's the tree where all new features are introduced and where all the exciting new development happens. New mainline kernels are released every 2-3 months. So, there you go: all new features are introduced, and where all the exciting new development happens. So... development is quite active on mainline kernels. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix csum tree corruption, duplicate and outdated checksums
Under rare circumstances we can end up leaving 2 versions of a checksum for the same file extent range. The reason for this is that after calling btrfs_next_leaf we process slot 0 of the leaf it returns, instead of processing the slot set in path-slots[0]. Most of the time (by far) path-slots[0] is 0, but after btrfs_next_leaf() releases the path and before it searches for the next leaf, another task might cause a split of the next leaf, which migrates some of its keys to the leaf we were processing before calling btrfs_next_leaf(). In this case btrfs_next_leaf() returns again the same leaf but with path-slots[0] having a slot number corresponding to the first new key it got, that is, a slot number that didn't exist before calling btrfs_next_leaf(), as the leaf now has more keys than it had before. So we must really process the returned leaf starting at path-slots[0] always, as it isn't always 0, and the key at slot 0 can have an offset much lower than our search offset/bytenr. For example, consider the following scenario, where we have: sums-bytenr: 40157184, sums-len: 16384, sums end: 40173568 four 4kb file data blocks with offsets 40157184, 40161280, 40165376, 40169472 Leaf N: slot = 0 slot = btrfs_header_nritems() - 1 |---| | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4] | |---| Leaf N + 1: slot = 0 slot = btrfs_header_nritems() - 1 || | [(CSUM CSUM 40161280), size 32] ... [((CSUM CSUM 40615936), size 8 | || Because we are at the last slot of leaf N, we call btrfs_next_leaf() to find the next highest key, which releases the current path and then searches for that next key. However after releasing the path and before finding that next key, the item at slot 0 of leaf N + 1 gets moved to leaf N, due to a call to ctree.c:push_leaf_left() (via ctree.c:split_leaf()), and therefore btrfs_next_leaf() will returns us a path again with leaf N but with the slot pointing to its new last key (CSUM CSUM 40161280). This new version of leaf N is then: slot = 0slot = btrfs_header_nritems() - 2 slot = btrfs_header_nritems() - 1 || | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4] [(CSUM CSUM 40161280), size 32] | || And incorrecly using slot 0, makes us set next_offset to 39239680 and we jump into the insert: label, which will set tmp to: tmp = min((sums-len - total_bytes) blocksize_bits, (next_offset - file_key.offset) blocksize_bits) = min((16384 - 0) 12, (39239680 - 40157184) 12) = min(4, (u64)-917504 = 18446744073708634112 12) = 4 and ins_size = csum_size * tmp = 4 * 4 = 16 bytes. In other words, we insert a new csum item in the tree with key (CSUM_OBJECTID CSUM_KEY 40157184 = sums-bytenr) that contains the checksums for all the data (4 blocks of 4096 bytes each = sums-len). Which is wrong, because the item with key (CSUM CSUM 40161280) (the one that was moved from leaf N + 1 to the end of leaf N) contains the old checksums of the last 12288 bytes of our data and won't get those old checksums removed. So this leaves us 2 different checksums for 3 4kb blocks of data in the tree, and breaks the logical rule: Key_N+1.offset = Key_N.offset + length_of_data_its_checksums_cover An obvious bad effect of this is that a subsequent csum tree lookup to get the checksum of any of the blocks with logical offset of 40161280, 40165376 or 40169472 (the last 3 4kb blocks of file data), will get the old checksums. Cc: sta...@vger.kernel.org Signed-off-by: Filipe Manana fdman...@suse.com --- fs/btrfs/file-item.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index a1f97de..7897dcd 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -746,7 +746,7 @@ again: found_next = 1; if (ret != 0) goto insert; - slot = 0; + slot = path-slots[0]; } btrfs_item_key_to_cpu(path-nodes[0], found_key, slot); if (found_key.objectid != BTRFS_EXTENT_CSUM_OBJECTID || -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix csum tree corruption, duplicate and outdated checksums
I'm getting on a plane right now to kiss you, be prepared. Thanks, Josef Filipe Manana fdman...@suse.com wrote: Under rare circumstances we can end up leaving 2 versions of a checksum for the same file extent range. The reason for this is that after calling btrfs_next_leaf we process slot 0 of the leaf it returns, instead of processing the slot set in path-slots[0]. Most of the time (by far) path-slots[0] is 0, but after btrfs_next_leaf() releases the path and before it searches for the next leaf, another task might cause a split of the next leaf, which migrates some of its keys to the leaf we were processing before calling btrfs_next_leaf(). In this case btrfs_next_leaf() returns again the same leaf but with path-slots[0] having a slot number corresponding to the first new key it got, that is, a slot number that didn't exist before calling btrfs_next_leaf(), as the leaf now has more keys than it had before. So we must really process the returned leaf starting at path-slots[0] always, as it isn't always 0, and the key at slot 0 can have an offset much lower than our search offset/bytenr. For example, consider the following scenario, where we have: sums-bytenr: 40157184, sums-len: 16384, sums end: 40173568 four 4kb file data blocks with offsets 40157184, 40161280, 40165376, 40169472 Leaf N: slot = 0 slot = btrfs_header_nritems() - 1 |---| | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4] | |---| Leaf N + 1: slot = 0 slot = btrfs_header_nritems() - 1 || | [(CSUM CSUM 40161280), size 32] ... [((CSUM CSUM 40615936), size 8 | || Because we are at the last slot of leaf N, we call btrfs_next_leaf() to find the next highest key, which releases the current path and then searches for that next key. However after releasing the path and before finding that next key, the item at slot 0 of leaf N + 1 gets moved to leaf N, due to a call to ctree.c:push_leaf_left() (via ctree.c:split_leaf()), and therefore btrfs_next_leaf() will returns us a path again with leaf N but with the slot pointing to its new last key (CSUM CSUM 40161280). This new version of leaf N is then: slot = 0slot = btrfs_header_nritems() - 2 slot = btrfs_header_nritems() - 1 || | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4] [(CSUM CSUM 40161280), size 32] | || And incorrecly using slot 0, makes us set next_offset to 39239680 and we jump into the insert: label, which will set tmp to: tmp = min((sums-len - total_bytes) blocksize_bits, (next_offset - file_key.offset) blocksize_bits) = min((16384 - 0) 12, (39239680 - 40157184) 12) = min(4, (u64)-917504 = 18446744073708634112 12) = 4 and ins_size = csum_size * tmp = 4 * 4 = 16 bytes. In other words, we insert a new csum item in the tree with key (CSUM_OBJECTID CSUM_KEY 40157184 = sums-bytenr) that contains the checksums for all the data (4 blocks of 4096 bytes each = sums-len). Which is wrong, because the item with key (CSUM CSUM 40161280) (the one that was moved from leaf N + 1 to the end of leaf N) contains the old checksums of the last 12288 bytes of our data and won't get those old checksums removed. So this leaves us 2 different checksums for 3 4kb blocks of data in the tree, and breaks the logical rule: Key_N+1.offset = Key_N.offset + length_of_data_its_checksums_cover An obvious bad effect of this is that a subsequent csum tree lookup to get the checksum of any of the blocks with logical offset of 40161280, 40165376 or 40169472 (the last 3 4kb blocks of file data), will get the old checksums. Cc: sta...@vger.kernel.org Signed-off-by: Filipe Manana fdman...@suse.com --- fs/btrfs/file-item.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index a1f97de..7897dcd 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -746,7 +746,7 @@ again: found_next = 1; if (ret != 0) goto insert; - slot = 0; + slot = path-slots[0]; } btrfs_item_key_to_cpu(path-nodes[0], found_key, slot); if (found_key.objectid != BTRFS_EXTENT_CSUM_OBJECTID || -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [PATCH] Btrfs: fix csum tree corruption, duplicate and outdated checksums
On Sat, Aug 09, 2014 at 09:22:27PM +0100, Filipe Manana wrote: (100 lines of detailled explanations snipped) - slot = 0; + slot = path-slots[0]; And this is why, trying to rank kernel contributions by number of lines or characters is a very poor guide of the actual work accomplished and owed credit. Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix csum tree corruption, duplicate and outdated checksums
On 08/09/2014 04:22 PM, Filipe Manana wrote: Under rare circumstances we can end up leaving 2 versions of a checksum for the same file extent range. The reason for this is that after calling btrfs_next_leaf we process slot 0 of the leaf it returns, instead of processing the slot set in path-slots[0]. Most of the time (by far) path-slots[0] is 0, but after btrfs_next_leaf() releases the path and before it searches for the next leaf, another task might cause a split of the next leaf, which migrates some of its keys to the leaf we were processing before calling btrfs_next_leaf(). In this case btrfs_next_leaf() returns again the same leaf but with path-slots[0] having a slot number corresponding to the first new key it got, that is, a slot number that didn't exist before calling btrfs_next_leaf(), as the leaf now has more keys than it had before. So we must really process the returned leaf starting at path-slots[0] always, as it isn't always 0, and the key at slot 0 can have an offset much lower than our search offset/bytenr. And the bug goes all the way back to 2007. I'd like to blame Yan Zheng, but it was in my original code too. Great find and explanation, I've added this to my merge window pull. Thanks! -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 40TB volume taking over 16 hours to mount, any ideas?
3.14.16 test is on its way, it already started with this: [19732.769100] BTRFS: device fsid 7356e329-62ba-49fb-83cc-f6b91ac3b581 devid 1 transid 111580 /dev/sdb1 [19732.769429] BTRFS info (device sdb1): enabling auto recovery [19732.769433] BTRFS info (device sdb1): force clearing of disk cache [20050.137779] INFO: task btrfs-transacti:7353 blocked for more than 120 seconds. [20050.139361] Not tainted 3.14.16-031416-generic #201408072035 [20050.140704] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [20050.142422] btrfs-transacti D 818118e0 0 7353 2 0x [20050.142430] 880450afddc8 0002 880450afdd68 880450afdfd8 [20050.142434] 00014500 00014500 88046985e380 8804602018e0 [20050.142437] 880450afddd8 8808642fc000 8802aa5b8800 880450afde00 [20050.142440] Call Trace: [20050.142447] [8175b0c9] schedule+0x29/0x70 [20050.142473] [a01040ed] btrfs_commit_transaction+0x25d/0xa00 [btrfs] [20050.142482] [810b4e10] ? __wake_up_sync+0x20/0x20 [20050.142493] [a0101e45] transaction_kthread+0x1d5/0x250 [btrfs] [20050.142504] [a0101c70] ? open_ctree+0x20d0/0x20d0 [btrfs] [20050.142507] [8108fd89] kthread+0xc9/0xe0 [20050.142509] [8108fcc0] ? flush_kthread_worker+0xb0/0xb0 [20050.142513] [817681bc] ret_from_fork+0x7c/0xb0 [20050.142515] [8108fcc0] ? flush_kthread_worker+0xb0/0xb0 [20170.194168] INFO: task btrfs-transacti:7353 blocked for more than 120 seconds. [20170.195747] Not tainted 3.14.16-031416-generic #201408072035 [20170.197090] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [20170.198815] btrfs-transacti D 818118e0 0 7353 2 0x [20170.198820] 880450afddc8 0002 880450afdd68 880450afdfd8 [20170.198822] 00014500 00014500 88046985e380 8804602018e0 [20170.198824] 880450afddd8 8808642fc000 8802aa5b8800 880450afde00 [20170.198824] Call Trace: [20170.198831] [8175b0c9] schedule+0x29/0x70 [20170.198856] [a01040ed] btrfs_commit_transaction+0x25d/0xa00 [btrfs] [20170.198861] [810b4e10] ? __wake_up_sync+0x20/0x20 [20170.198875] [a0101e45] transaction_kthread+0x1d5/0x250 [btrfs] [20170.198886] [a0101c70] ? open_ctree+0x20d0/0x20d0 [btrfs] [20170.198889] [8108fd89] kthread+0xc9/0xe0 [20170.198891] [8108fcc0] ? flush_kthread_worker+0xb0/0xb0 [20170.198895] [817681bc] ret_from_fork+0x7c/0xb0 [20170.198897] [8108fcc0] ? flush_kthread_worker+0xb0/0xb0 [20290.250561] INFO: task btrfs-transacti:7353 blocked for more than 120 seconds. [20290.252140] Not tainted 3.14.16-031416-generic #201408072035 [20290.253483] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [20290.282212] btrfs-transacti D 818118e0 0 7353 2 0x [20290.282216] 880450afddc8 0002 880450afdd68 880450afdfd8 [20290.282219] 00014500 00014500 88046985e380 8804602018e0 [20290.282221] 880450afddd8 8808642fc000 8802aa5b8800 880450afde00 [20290.282221] Call Trace: [20290.282227] [8175b0c9] schedule+0x29/0x70 [20290.282253] [a01040ed] btrfs_commit_transaction+0x25d/0xa00 [btrfs] [20290.282262] [810b4e10] ? __wake_up_sync+0x20/0x20 [20290.282272] [a0101e45] transaction_kthread+0x1d5/0x250 [btrfs] [20290.282283] [a0101c70] ? open_ctree+0x20d0/0x20d0 [btrfs] [20290.282286] [8108fd89] kthread+0xc9/0xe0 [20290.282289] [8108fcc0] ? flush_kthread_worker+0xb0/0xb0 [20290.282292] [817681bc] ret_from_fork+0x7c/0xb0 [20290.282294] [8108fcc0] ? flush_kthread_worker+0xb0/0xb0 I'll allow it to run for a few hours, and then will report. On a side-note, I ran 'btrfs check' and it returned so many errors that it went out of my console's history... unfortunately I didn't redirect its output to a file (big mistake), I didn't thought it would be so big. Anyway, part of the output: ( older output lost due to term size ) root 5 inode 94906683 errors 200, dir isize wrong root 5 inode 94906716 errors 200, dir isize wrong root 5 inode 94906730 errors 200, dir isize wrong root 5 inode 94906735 errors 200, dir isize wrong root 5 inode 94906758 errors 200, dir isize wrong () root 5 inode 94928259 errors 200, dir isize wrong root 5 inode 94928286 errors 200, dir isize wrong root 5 inode 94928311 errors 200, dir isize wrong root 5 inode 94928321 errors 200, dir isize wrong root 5 inode 133964681 errors 200, dir isize wrong root 5 inode 133964684 errors 200, dir isize wrong root 5 inode 142590710 errors 200, dir isize wrong root 5 inode 144973646 errors 200, dir isize wrong root 5 inode 146401067 errors 100, file extent discount root 5 inode 146401080 errors 100, file extent discount root 5 inode
Re: 40TB volume taking over 16 hours to mount, any ideas?
And it is still going although the hung task message stopped long ago (behavior similar to 3.15), it hasn't finished mounting, mount is still taking 100% CPU, *and* I can't see any disk activity at all. Last hung task message: [21131.749759] INFO: task btrfs-transacti:7353 blocked for more than 120 seconds. [21131.828755] Not tainted 3.14.16-031416-generic #201408072035 [21131.868788] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [21131.947525] btrfs-transacti D 818118e0 0 7353 2 0x [21131.947530] 880450afddc8 0002 880450afdd68 880450afdfd8 [21131.947535] 00014500 00014500 88046985e380 8804602018e0 [21131.947540] 880450afddd8 8808642fc000 8802aa5b8800 880450afde00 [21131.947544] Call Trace: [21131.947551] [8175b0c9] schedule+0x29/0x70 [21131.947577] [a01040ed] btrfs_commit_transaction+0x25d/0xa00 [btrfs] [21131.947581] [810b4e10] ? __wake_up_sync+0x20/0x20 [21131.947591] [a0101e45] transaction_kthread+0x1d5/0x250 [btrfs] [21131.947601] [a0101c70] ? open_ctree+0x20d0/0x20d0 [btrfs] [21131.947604] [8108fd89] kthread+0xc9/0xe0 [21131.947606] [8108fcc0] ? flush_kthread_worker+0xb0/0xb0 [21131.947610] [817681bc] ret_from_fork+0x7c/0xb0 [21131.947612] [8108fcc0] ? flush_kthread_worker+0xb0/0xb0 Do you think I will have better luck with 3.16? or maybe it is that this filesystem has so many errors (remember the btrfs check output) that it will take a really long time to mount because it is trying to correct this? Thanks! Ildefonso -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 40TB volume taking over 16 hours to mount, any ideas?
Marc MERLIN posted on Sat, 09 Aug 2014 11:21:13 -0700 as excerpted: You could argue that since 3.16.0 does not have the recently found deadlock patch that's been plaging 15 and 16 (14 not as much for me), it's not usable for some (it ran about 1 day on my laptop before deadlocking, and maybe an hour at most on my server). I sure hope that deadlock patch is going to be added to the 3.16.x tree, I'm not super stocked with being stuck at 3.14. Well, yes. It'll almost certainly make it to the stable series including 3.16.x shortly after it ends up in the 3.17 development tree. But the switch to worker-threads was only with 3.15, so anything previous to that doesn't need it (thus 3.14 working well for you, previous versions had other bugs), and 3.15 isn't a long-term-stable and Greg KH already warned that the just-Friday-released 3.15.9 is its penultimate release and people should be thinking about switching to 3.16, so pre-3.15 the patch isn't needed and whether it'll make it into 3.15.10, the last 3.15-series release, is questionable at this point, so 3.17-development or presumably 3.16.1 or 3.16.2 looks to be the soonest it'll possibly happen for people not willing to cherrypick the patch from the list as soon as posted. FWIW, 3.15 (where I didn't have time to try the development series and only upgraded about time it came out) and the 3.16 development series including the 3.16.0 release have worked well enough for me, but my btrfs are all on ssd, the ones I regularly mount all being raid1-pairs, and apparently on my 6-core at least, the bug is hard enough to trigger on ssd and I don't routinely push them hard enough to have seen it, thus explaining why I've not had problems with 3.15 and the 3.16 series up thru 3.16.0 release, beyond an instance that was either right about 3.15 release or in 3.14, and might have been a one-off as it certainly was for me. Tho while the problem has been pretty well traced so we know what it is, I'm not sure that a full patch for it has yet been posted on the list, has it? I think it was nailed down too late in the week to prepare and pre-post test a patch before the weekend. So I'd expect to see the patch on the list on Tuesday or so, just in time to make the last bit of the 3.17 commit window (tho it's a stable-candidate fix so could go in later as well), but likely too late to make 3.15.10 and 3.16.1, so 3.17-rc1 or 3.16.2 it'll likely be. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 40TB volume taking over 16 hours to mount, any ideas?
Jose Ildefonso Camargo Tolosa posted on Sat, 09 Aug 2014 13:38:46 -0500 as excerpted: On Sat, Aug 9, 2014 at 12:01 PM, Duncan 1i5t5.dun...@cox.net wrote: Jose Ildefonso Camargo Tolosa posted on Sat, 09 Aug 2014 11:06:37 -0500 as excerpted: 3.16 (still in development) ?? 3.16 has been out for nearly a week now and we're nearing half-way thru the 3.17 commit-window. Based on the kernel git I have here, Linus' commit officially changing the makefile entry to 3.16 was on Sunday, Aug 3, at 15:25:02 -0700. The last pre-3.16 commit was a merge of two timer-related fixes from the tip-tree at 9:58:20 -0700 that morning. So where does your still in development come from? Well, maybe not the right word, but here is what kernel.org says about mainline kernels: Mainline tree is maintained by Linus Torvalds. It's the tree where all new features are introduced and where all the exciting new development happens. New mainline kernels are released every 2-3 months. So, there you go: all new features are introduced, and where all the exciting new development happens. So... development is quite active on mainline kernels. But 3.16.0 is out, and the real active development is in the commit window pre-rc1, tho a kernel doesn't really /start/ settling down until rc3 or so, and isn't reasonably stable until rc5 or so (tho rc5 is a little late to start testing and reporting bugs to have fixed by release, it's really best to start testing around rc3 or so, at which point any real bad data-eating-risk bugs should be either fixed or at least published, so the risk is dramatically lower than it would be during the commit window itself, for instance). But from rc5 on thru rc7 or 8 and release, unless you're one of the ones still waiting on a bug found earlier to be fixed, it's generally quite stable and boring. So by the time of actual .0 release, it really is quite stable, and no longer development kernel. Sure, Greg KH's stable series kernel releases stabilize it further, but that's exactly what they are, stable series, not development series, and there's really no development going into it generally from rc1 on, tho occasionally something that needs to come after everything else is slipped in in the first couple days after rc1, but still well before rc2, and the .0 release signifies the end of the post development stabilization period such that .0 really is no longer a development kernel at all, even if there are a few more weekly stable- series updates (about 10, 3.15.10 was announced to be the last one for 3.15, with the Friday-released 3.15.9) before support ceases if it's not a long-term-stable candidate. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 40TB volume taking over 16 hours to mount, any ideas?
On Sat, Aug 9, 2014 at 11:21 PM, Duncan 1i5t5.dun...@cox.net wrote: But from rc5 on thru rc7 or 8 and release, unless you're one of the ones still waiting on a bug found earlier to be fixed, it's generally quite stable and boring. So by the time of actual .0 release, it really is quite stable, and no longer development kernel. Sure, Greg KH's stable series kernel releases stabilize it further, but that's exactly what they are, stable series, not development series, and there's really no development going into it generally from rc1 on, tho occasionally something that needs to come after everything else is slipped in in the first couple days after rc1, but still well before rc2, and the .0 release signifies the end of the post development stabilization period such that .0 really is no longer a development kernel at all, even if there are a few more weekly stable- series updates (about 10, 3.15.10 was announced to be the last one for 3.15, with the Friday-released 3.15.9) before support ceases if it's not a long-term-stable candidate. I can't say I've observed that to be the case with Btrfs. I know there is a core group of developers working very hard on testing the Btrfs updates in the _rc kernels, but once that .0 kernel hits the streets, the extra exposure to all the various combinations of hardware and options has been know to discover new issues. I think this is nearly unavoidable given the pace of Btrfs development. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html