Couple of problems regarding btrfs qgroup show reliability

2014-08-09 Thread GEO
Here is an simplified excerpt of my backup bash script:

CURRENT_TIME=$(date +%Y-%m-%d_%H:%M-%S)
# LAST_TIME variable contains the timestamp of the last backup in the same 
format as $CURRENT_TIME

btrfs subvolume snapshot -r /mnt/root/@home /mnt/root/@home-
backup-$CURRENT_TIME
sync

# Define space check variables

btrfs quota enable /mnt/root
SUBVOLUME_ID=$(btrfs subvolume list /mnt/root | grep $CURRENT_TIME | awk 
'{print $2}')
ABSOLUTE_SIZE=$(btrfs qgroup show /mnt/root | grep 0/$SUBVOLUME_ID | awk 
'{print $2}')
RELATIVE_SIZE=$(btrfs qgroup show /mnt/root | grep 0/$SUBVOLUME_ID | awk 
'{print $3}')
FREE_SPACE=$(df -B1 /mnt/backup | tail -1 | awk '{print $4}')

# Now I want to check if there is enough space on /mnt/backup, for sending the 
incremental part to /mnt/backup (Let us assume us, that there have not been 
made snapshots more recent than @home-backup-$LAST_TIME), so I did the 
following in my backup script:

if (( $FREE_SPACE  $RELATIVE_SIZE )); then
   btrfs send -p /mnt/root/@home-backup-$LAST_TIME /mnt/root/@home-
backup-$CURRENT_TIME | btrfs receive /mnt/backup
fi

# For the initial bootstrapping I choose 

if (( $FREE_SPACE  $ABSOLUTE_SIZE )); then
   btrfs send /mnt/root/@home-backup-$CURRENT_TIME | btrfs receive /mnt/backup
fi


Now I have a couple of questions:

1.) does it matter when I enable btrfs quota? I mean even if it is enabled for 
the first time in the backup script? Does this have any influence on the values 
determined for $ABSOLUTE_SIZE and $RELATIVE_SIZE?

2.) does btrfs implement some way to show free space on its own or do I have 
to rely on df?

3.) Is the logic right for the incremental backup space check? I mean the 
unshared space should be more or less what is transmittted by btrfs send, 
right, since we already have the last snapshot on the backup drive? If this 
isn't the right approach, how do I get the size difference of two special 
snapshots say @home-backup-$CURRENT_TIME and @home-backup-$LAST_TIME?

4.) Out of curiosity I checked the ABSOLUTE_SIZE values of the sent snapshot 
on the backup device too, in theory they should be equal right? But they are 
not for some reason they are not equal at all, neither are the RELATIVE_SIZE 
values.
Checking the ABSOLUTE_SIZE with du, seems to inidicate that the values on the 
backup device seems to be right (2,6 GB), but on the internal drive the value 
of $ABSOLUTE_SIZE is 2.0 GB, how can that be?
Of course the RELATIVE_SIZE can very a bit, depending on what snapshots are 
residing on the same drive, but let as assume no confounding factors, then 
they should be roughly in the same magnitude. And the ABSOLUTE_SIZE variables 
should be definitely equal on both the backup drive and the internal harddrive 
for the same snapshot. 
Where am I wrong?

5.) I understand that btrfs snapshot delete breaks the RELATIVE_SIZE, at least 
this is noted in the wiki. Is this still true, and will it be resolved soon? 
The wiki also notes After deleting a subvolume, you must manually delete the 
associated qgroup. How would I do that?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: Drop stray check of fixup_workers creation

2014-08-09 Thread Andrey Utkin
The issue was introduced in a79b7d4b3e8118f265dcb4bdf9a572c392f02708,
adding allocation of extent_workers, so this stray check is surely not
meant to be a check of something else.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=82021
Reported-by: Maks Naumov maksq...@ukr.net
Signed-off-by: Andrey Utkin andrey.krieger.ut...@gmail.com
---
 fs/btrfs/disk-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 08e65e9..1881713 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2601,7 +2601,7 @@ int open_ctree(struct super_block *sb,
  fs_info-endio_freespace_worker  fs_info-rmw_workers 
  fs_info-caching_workers  fs_info-readahead_workers 
  fs_info-fixup_workers  fs_info-delayed_workers 
- fs_info-fixup_workers  fs_info-extent_workers 
+ fs_info-extent_workers 
  fs_info-qgroup_rescan_workers)) {
err = -ENOMEM;
goto fail_sb_buffer;
-- 
1.8.5.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Drop stray check of fixup_workers creation

2014-08-09 Thread Eric Sandeen
On 8/9/14, 6:51 AM, Andrey Utkin wrote:
 The issue was introduced in a79b7d4b3e8118f265dcb4bdf9a572c392f02708,
 adding allocation of extent_workers, so this stray check is surely not
 meant to be a check of something else.
 
 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=82021
 Reported-by: Maks Naumov maksq...@ukr.net
 Signed-off-by: Andrey Utkin andrey.krieger.ut...@gmail.com

Yup, harmless but unneeded.

However, might as well put the extent_workers  qgroup_rescan_workers checks
on the same line now...

Could probably do a V2 or fix it on commit, but anyway:

Reviewed-by: Eric Sandeen sand...@redhat.com

 ---
  fs/btrfs/disk-io.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index 08e65e9..1881713 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -2601,7 +2601,7 @@ int open_ctree(struct super_block *sb,
 fs_info-endio_freespace_worker  fs_info-rmw_workers 
 fs_info-caching_workers  fs_info-readahead_workers 
 fs_info-fixup_workers  fs_info-delayed_workers 
 -   fs_info-fixup_workers  fs_info-extent_workers 
 +   fs_info-extent_workers 
 fs_info-qgroup_rescan_workers)) {
   err = -ENOMEM;
   goto fail_sb_buffer;
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 40TB volume taking over 16 hours to mount, any ideas?

2014-08-09 Thread Andy Smith
Hello,

On Sat, Aug 09, 2014 at 01:38:34PM +1000, Russell Coker wrote:
 On Fri, 8 Aug 2014 16:35:29 Jose Ildefonso Camargo Tolosa wrote:
  Then, after reading here and there, decided to try to use a newer
  kernel, tried 3.15.8.  Well, it is still mounting after ~16 hours, and
  I got messages like these at first:
 
 I recommend trying a 3.14 kernel.  I had ongoing problems with kernels before 
 3.14 which included infinite loops in kernel space.  Based on reports on this 
 list I haven't been inclined to test 3.15 kernels.  But 3.14 has been working 
 well for me on many systems.

I'm in a similar position with a filesystem that won't mount except
read-only, but am already on 3.14 and am also wondering whether to
try a 3.16 kernel.

https://bugzilla.kernel.org/show_bug.cgi?id=81981

Jose, maybe you could try -oro in the hope of at least getting back
to a read-only mount?

Cheers,
Andy

-- 
I remember the first time I made love.  Perhaps it was not love exactly but I
 made it and it still works. — The League Against Tedium
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 40TB volume taking over 16 hours to mount, any ideas?

2014-08-09 Thread Jose Ildefonso Camargo Tolosa
On Sat, Aug 9, 2014 at 9:32 AM, Andy Smith a...@strugglers.net wrote:
 Hello,

 On Sat, Aug 09, 2014 at 01:38:34PM +1000, Russell Coker wrote:
 On Fri, 8 Aug 2014 16:35:29 Jose Ildefonso Camargo Tolosa wrote:
  Then, after reading here and there, decided to try to use a newer
  kernel, tried 3.15.8.  Well, it is still mounting after ~16 hours, and
  I got messages like these at first:

 I recommend trying a 3.14 kernel.  I had ongoing problems with kernels before
 3.14 which included infinite loops in kernel space.  Based on reports on this
 list I haven't been inclined to test 3.15 kernels.  But 3.14 has been working
 well for me on many systems.

 I'm in a similar position with a filesystem that won't mount except
 read-only, but am already on 3.14 and am also wondering whether to
 try a 3.16 kernel.

 https://bugzilla.kernel.org/show_bug.cgi?id=81981

 Jose, maybe you could try -oro in the hope of at least getting back
 to a read-only mount?

Will try 3.14, ro would be good enough for me, provided that I can
resize the filesystem, if I can do that, I can create a new one, and
copy all data (hopefully faster than moving ~11TB of data through the
network).
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 40TB volume taking over 16 hours to mount, any ideas?

2014-08-09 Thread Jose Ildefonso Camargo Tolosa
Re-sending to list.

On Sat, Aug 9, 2014 at 9:58 AM, Jose Ildefonso Camargo Tolosa
ildefonso.cama...@gmail.com wrote:
 On Sat, Aug 9, 2014 at 9:32 AM, Andy Smith a...@strugglers.net wrote:
 Hello,

 On Sat, Aug 09, 2014 at 01:38:34PM +1000, Russell Coker wrote:
 On Fri, 8 Aug 2014 16:35:29 Jose Ildefonso Camargo Tolosa wrote:
  Then, after reading here and there, decided to try to use a newer
  kernel, tried 3.15.8.  Well, it is still mounting after ~16 hours, and
  I got messages like these at first:

 I recommend trying a 3.14 kernel.  I had ongoing problems with kernels 
 before
 3.14 which included infinite loops in kernel space.  Based on reports on 
 this
 list I haven't been inclined to test 3.15 kernels.  But 3.14 has been 
 working
 well for me on many systems.

 I'm in a similar position with a filesystem that won't mount except
 read-only, but am already on 3.14 and am also wondering whether to
 try a 3.16 kernel.

 https://bugzilla.kernel.org/show_bug.cgi?id=81981

 Jose, maybe you could try -oro in the hope of at least getting back
 to a read-only mount?

 Will try 3.14, ro would be good enough for me, provided that I can
 resize the filesystem, if I can do that, I can create a new one, and
 copy all data (hopefully faster than moving ~11TB of data through the
 network).

Or maybe 3.16? sigh I have them both ready, but I am not sure
which one to try.  My fear is that if I go to 3.16 (still in
development), would I be able to go back to, say, 3.14 and work with
the filesystem there?  According to documents, disk format is stable
now.

What do you say? 3.14 or 3.16 for my next attempt (I have just today,
if I can't get this FS back to life today, I will blow it and start
over, with the ~1.5 weeks copy period ahead of me).

-- 
Ildefonso Camargo
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 40TB volume taking over 16 hours to mount, any ideas?

2014-08-09 Thread Duncan
Jose Ildefonso Camargo Tolosa posted on Sat, 09 Aug 2014 11:06:37 -0500 as
excerpted:

 3.16 (still in development)

??

3.16 has been out for nearly a week now and we're nearing half-way thru 
the 3.17 commit-window.  Based on the kernel git I have here, Linus' 
commit officially changing the makefile entry to 3.16 was on Sunday, Aug 
3, at 15:25:02 -0700.

The last pre-3.16 commit was a merge of two timer-related fixes from the 
tip-tree at 9:58:20 -0700 that morning.

So where does your still in development come from?

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 40TB volume taking over 16 hours to mount, any ideas?

2014-08-09 Thread Jose Ildefonso Camargo Tolosa
On Sat, Aug 9, 2014 at 12:01 PM, Duncan 1i5t5.dun...@cox.net wrote:
 Jose Ildefonso Camargo Tolosa posted on Sat, 09 Aug 2014 11:06:37 -0500 as
 excerpted:

 3.16 (still in development)

 ??

 3.16 has been out for nearly a week now and we're nearing half-way thru
 the 3.17 commit-window.  Based on the kernel git I have here, Linus'
 commit officially changing the makefile entry to 3.16 was on Sunday, Aug
 3, at 15:25:02 -0700.

 The last pre-3.16 commit was a merge of two timer-related fixes from the
 tip-tree at 9:58:20 -0700 that morning.

 So where does your still in development come from?


Well, maybe not the right word, but here is what kernel.org says about
mainline kernels:

Mainline tree is maintained by Linus Torvalds. It's the tree where
all new features are introduced and where all the exciting new
development happens. New mainline kernels are released every 2-3
months.

So, there you go: all new features are introduced, and where all the
exciting new development happens.

So... development is quite active on mainline kernels.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix csum tree corruption, duplicate and outdated checksums

2014-08-09 Thread Filipe Manana
Under rare circumstances we can end up leaving 2 versions of a checksum
for the same file extent range.

The reason for this is that after calling btrfs_next_leaf we process
slot 0 of the leaf it returns, instead of processing the slot set in
path-slots[0]. Most of the time (by far) path-slots[0] is 0, but after
btrfs_next_leaf() releases the path and before it searches for the next
leaf, another task might cause a split of the next leaf, which migrates
some of its keys to the leaf we were processing before calling
btrfs_next_leaf(). In this case btrfs_next_leaf() returns again the
same leaf but with path-slots[0] having a slot number corresponding
to the first new key it got, that is, a slot number that didn't exist
before calling btrfs_next_leaf(), as the leaf now has more keys than
it had before. So we must really process the returned leaf starting at
path-slots[0] always, as it isn't always 0, and the key at slot 0 can
have an offset much lower than our search offset/bytenr.

For example, consider the following scenario, where we have:

sums-bytenr: 40157184, sums-len: 16384, sums end: 40173568
four 4kb file data blocks with offsets 40157184, 40161280, 40165376, 40169472

  Leaf N:

slot = 0   slot = btrfs_header_nritems() - 1
  |---|
  | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4] |
  |---|

  Leaf N + 1:

  slot = 0  slot = btrfs_header_nritems() - 1
  ||
  | [(CSUM CSUM 40161280), size 32] ... [((CSUM CSUM 40615936), size 8 |
  ||

Because we are at the last slot of leaf N, we call btrfs_next_leaf() to
find the next highest key, which releases the current path and then searches
for that next key. However after releasing the path and before finding that
next key, the item at slot 0 of leaf N + 1 gets moved to leaf N, due to a call
to ctree.c:push_leaf_left() (via ctree.c:split_leaf()), and therefore
btrfs_next_leaf() will returns us a path again with leaf N but with the slot
pointing to its new last key (CSUM CSUM 40161280). This new version of leaf N
is then:

slot = 0slot = btrfs_header_nritems() - 2  slot = 
btrfs_header_nritems() - 1
  
||
  | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4]  [(CSUM 
CSUM 40161280), size 32] |
  
||

And incorrecly using slot 0, makes us set next_offset to 39239680 and we jump
into the insert: label, which will set tmp to:

tmp = min((sums-len - total_bytes)  blocksize_bits,
(next_offset - file_key.offset)  blocksize_bits) =
min((16384 - 0)  12, (39239680 - 40157184)  12) =
min(4, (u64)-917504 = 18446744073708634112  12) = 4

and

   ins_size = csum_size * tmp = 4 * 4 = 16 bytes.

In other words, we insert a new csum item in the tree with key
(CSUM_OBJECTID CSUM_KEY 40157184 = sums-bytenr) that contains the checksums
for all the data (4 blocks of 4096 bytes each = sums-len). Which is wrong,
because the item with key (CSUM CSUM 40161280) (the one that was moved from
leaf N + 1 to the end of leaf N) contains the old checksums of the last 12288
bytes of our data and won't get those old checksums removed.

So this leaves us 2 different checksums for 3 4kb blocks of data in the tree,
and breaks the logical rule:

   Key_N+1.offset = Key_N.offset + length_of_data_its_checksums_cover

An obvious bad effect of this is that a subsequent csum tree lookup to get
the checksum of any of the blocks with logical offset of 40161280, 40165376
or 40169472 (the last 3 4kb blocks of file data), will get the old checksums.

Cc: sta...@vger.kernel.org
Signed-off-by: Filipe Manana fdman...@suse.com
---
 fs/btrfs/file-item.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index a1f97de..7897dcd 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -746,7 +746,7 @@ again:
found_next = 1;
if (ret != 0)
goto insert;
-   slot = 0;
+   slot = path-slots[0];
}
btrfs_item_key_to_cpu(path-nodes[0], found_key, slot);
if (found_key.objectid != BTRFS_EXTENT_CSUM_OBJECTID ||
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix csum tree corruption, duplicate and outdated checksums

2014-08-09 Thread Josef Bacik
I'm getting on a plane right now to kiss you, be prepared.  Thanks,

Josef

Filipe Manana fdman...@suse.com wrote:


Under rare circumstances we can end up leaving 2 versions of a checksum
for the same file extent range.

The reason for this is that after calling btrfs_next_leaf we process
slot 0 of the leaf it returns, instead of processing the slot set in
path-slots[0]. Most of the time (by far) path-slots[0] is 0, but after
btrfs_next_leaf() releases the path and before it searches for the next
leaf, another task might cause a split of the next leaf, which migrates
some of its keys to the leaf we were processing before calling
btrfs_next_leaf(). In this case btrfs_next_leaf() returns again the
same leaf but with path-slots[0] having a slot number corresponding
to the first new key it got, that is, a slot number that didn't exist
before calling btrfs_next_leaf(), as the leaf now has more keys than
it had before. So we must really process the returned leaf starting at
path-slots[0] always, as it isn't always 0, and the key at slot 0 can
have an offset much lower than our search offset/bytenr.

For example, consider the following scenario, where we have:

sums-bytenr: 40157184, sums-len: 16384, sums end: 40173568
four 4kb file data blocks with offsets 40157184, 40161280, 40165376, 40169472

  Leaf N:

slot = 0   slot = btrfs_header_nritems() - 1
  |---|
  | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4] |
  |---|

  Leaf N + 1:

  slot = 0  slot = btrfs_header_nritems() - 1
  ||
  | [(CSUM CSUM 40161280), size 32] ... [((CSUM CSUM 40615936), size 8 |
  ||

Because we are at the last slot of leaf N, we call btrfs_next_leaf() to
find the next highest key, which releases the current path and then searches
for that next key. However after releasing the path and before finding that
next key, the item at slot 0 of leaf N + 1 gets moved to leaf N, due to a call
to ctree.c:push_leaf_left() (via ctree.c:split_leaf()), and therefore
btrfs_next_leaf() will returns us a path again with leaf N but with the slot
pointing to its new last key (CSUM CSUM 40161280). This new version of leaf N
is then:

slot = 0slot = btrfs_header_nritems() - 2  slot = 
btrfs_header_nritems() - 1
  
||
  | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4]  [(CSUM 
CSUM 40161280), size 32] |
  
||

And incorrecly using slot 0, makes us set next_offset to 39239680 and we jump
into the insert: label, which will set tmp to:

tmp = min((sums-len - total_bytes)  blocksize_bits,
(next_offset - file_key.offset)  blocksize_bits) =
min((16384 - 0)  12, (39239680 - 40157184)  12) =
min(4, (u64)-917504 = 18446744073708634112  12) = 4

and

   ins_size = csum_size * tmp = 4 * 4 = 16 bytes.

In other words, we insert a new csum item in the tree with key
(CSUM_OBJECTID CSUM_KEY 40157184 = sums-bytenr) that contains the checksums
for all the data (4 blocks of 4096 bytes each = sums-len). Which is wrong,
because the item with key (CSUM CSUM 40161280) (the one that was moved from
leaf N + 1 to the end of leaf N) contains the old checksums of the last 12288
bytes of our data and won't get those old checksums removed.

So this leaves us 2 different checksums for 3 4kb blocks of data in the tree,
and breaks the logical rule:

   Key_N+1.offset = Key_N.offset + length_of_data_its_checksums_cover

An obvious bad effect of this is that a subsequent csum tree lookup to get
the checksum of any of the blocks with logical offset of 40161280, 40165376
or 40169472 (the last 3 4kb blocks of file data), will get the old checksums.

Cc: sta...@vger.kernel.org
Signed-off-by: Filipe Manana fdman...@suse.com
---
 fs/btrfs/file-item.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index a1f97de..7897dcd 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -746,7 +746,7 @@ again:
found_next = 1;
if (ret != 0)
goto insert;
-   slot = 0;
+   slot = path-slots[0];
}
btrfs_item_key_to_cpu(path-nodes[0], found_key, slot);
if (found_key.objectid != BTRFS_EXTENT_CSUM_OBJECTID ||
--
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH] Btrfs: fix csum tree corruption, duplicate and outdated checksums

2014-08-09 Thread Marc MERLIN
On Sat, Aug 09, 2014 at 09:22:27PM +0100, Filipe Manana wrote:

(100 lines of detailled explanations snipped)

 - slot = 0;
 + slot = path-slots[0];

And this is why, trying to rank kernel contributions by number of
lines or characters is a very poor guide of the actual work accomplished
and owed credit.

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix csum tree corruption, duplicate and outdated checksums

2014-08-09 Thread Chris Mason


On 08/09/2014 04:22 PM, Filipe Manana wrote:
 Under rare circumstances we can end up leaving 2 versions of a checksum
 for the same file extent range.
 
 The reason for this is that after calling btrfs_next_leaf we process
 slot 0 of the leaf it returns, instead of processing the slot set in
 path-slots[0]. Most of the time (by far) path-slots[0] is 0, but after
 btrfs_next_leaf() releases the path and before it searches for the next
 leaf, another task might cause a split of the next leaf, which migrates
 some of its keys to the leaf we were processing before calling
 btrfs_next_leaf(). In this case btrfs_next_leaf() returns again the
 same leaf but with path-slots[0] having a slot number corresponding
 to the first new key it got, that is, a slot number that didn't exist
 before calling btrfs_next_leaf(), as the leaf now has more keys than
 it had before. So we must really process the returned leaf starting at
 path-slots[0] always, as it isn't always 0, and the key at slot 0 can
 have an offset much lower than our search offset/bytenr.

And the bug goes all the way back to 2007.  I'd like to blame Yan Zheng,
but it was in my original code too.

Great find and explanation, I've added this to my merge window pull.
Thanks!

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 40TB volume taking over 16 hours to mount, any ideas?

2014-08-09 Thread Jose Ildefonso Camargo Tolosa
3.14.16 test is on its way, it already started with this:

[19732.769100] BTRFS: device fsid 7356e329-62ba-49fb-83cc-f6b91ac3b581
devid 1 transid 111580 /dev/sdb1
[19732.769429] BTRFS info (device sdb1): enabling auto recovery
[19732.769433] BTRFS info (device sdb1): force clearing of disk cache
[20050.137779] INFO: task btrfs-transacti:7353 blocked for more than
120 seconds.
[20050.139361]   Not tainted 3.14.16-031416-generic #201408072035
[20050.140704] echo 0  /proc/sys/kernel/hung_task_timeout_secs
disables this message.
[20050.142422] btrfs-transacti D 818118e0 0  7353  2 0x
[20050.142430]  880450afddc8 0002 880450afdd68
880450afdfd8
[20050.142434]  00014500 00014500 88046985e380
8804602018e0
[20050.142437]  880450afddd8 8808642fc000 8802aa5b8800
880450afde00
[20050.142440] Call Trace:
[20050.142447]  [8175b0c9] schedule+0x29/0x70
[20050.142473]  [a01040ed]
btrfs_commit_transaction+0x25d/0xa00 [btrfs]
[20050.142482]  [810b4e10] ? __wake_up_sync+0x20/0x20
[20050.142493]  [a0101e45] transaction_kthread+0x1d5/0x250 [btrfs]
[20050.142504]  [a0101c70] ? open_ctree+0x20d0/0x20d0 [btrfs]
[20050.142507]  [8108fd89] kthread+0xc9/0xe0
[20050.142509]  [8108fcc0] ? flush_kthread_worker+0xb0/0xb0
[20050.142513]  [817681bc] ret_from_fork+0x7c/0xb0
[20050.142515]  [8108fcc0] ? flush_kthread_worker+0xb0/0xb0
[20170.194168] INFO: task btrfs-transacti:7353 blocked for more than
120 seconds.
[20170.195747]   Not tainted 3.14.16-031416-generic #201408072035
[20170.197090] echo 0  /proc/sys/kernel/hung_task_timeout_secs
disables this message.
[20170.198815] btrfs-transacti D 818118e0 0  7353  2 0x
[20170.198820]  880450afddc8 0002 880450afdd68
880450afdfd8
[20170.198822]  00014500 00014500 88046985e380
8804602018e0
[20170.198824]  880450afddd8 8808642fc000 8802aa5b8800
880450afde00
[20170.198824] Call Trace:
[20170.198831]  [8175b0c9] schedule+0x29/0x70
[20170.198856]  [a01040ed]
btrfs_commit_transaction+0x25d/0xa00 [btrfs]
[20170.198861]  [810b4e10] ? __wake_up_sync+0x20/0x20
[20170.198875]  [a0101e45] transaction_kthread+0x1d5/0x250 [btrfs]
[20170.198886]  [a0101c70] ? open_ctree+0x20d0/0x20d0 [btrfs]
[20170.198889]  [8108fd89] kthread+0xc9/0xe0
[20170.198891]  [8108fcc0] ? flush_kthread_worker+0xb0/0xb0
[20170.198895]  [817681bc] ret_from_fork+0x7c/0xb0
[20170.198897]  [8108fcc0] ? flush_kthread_worker+0xb0/0xb0
[20290.250561] INFO: task btrfs-transacti:7353 blocked for more than
120 seconds.
[20290.252140]   Not tainted 3.14.16-031416-generic #201408072035
[20290.253483] echo 0  /proc/sys/kernel/hung_task_timeout_secs
disables this message.
[20290.282212] btrfs-transacti D 818118e0 0  7353  2 0x
[20290.282216]  880450afddc8 0002 880450afdd68
880450afdfd8
[20290.282219]  00014500 00014500 88046985e380
8804602018e0
[20290.282221]  880450afddd8 8808642fc000 8802aa5b8800
880450afde00
[20290.282221] Call Trace:
[20290.282227]  [8175b0c9] schedule+0x29/0x70
[20290.282253]  [a01040ed]
btrfs_commit_transaction+0x25d/0xa00 [btrfs]
[20290.282262]  [810b4e10] ? __wake_up_sync+0x20/0x20
[20290.282272]  [a0101e45] transaction_kthread+0x1d5/0x250 [btrfs]
[20290.282283]  [a0101c70] ? open_ctree+0x20d0/0x20d0 [btrfs]
[20290.282286]  [8108fd89] kthread+0xc9/0xe0
[20290.282289]  [8108fcc0] ? flush_kthread_worker+0xb0/0xb0
[20290.282292]  [817681bc] ret_from_fork+0x7c/0xb0
[20290.282294]  [8108fcc0] ? flush_kthread_worker+0xb0/0xb0


I'll allow it to run for a few hours, and then will report.

On a side-note, I ran 'btrfs check' and it returned so many errors
that it went out of my console's history... unfortunately I didn't
redirect its output to a file (big mistake), I didn't thought it would
be so big.  Anyway, part of the output:

( older output lost due to term size )
root 5 inode 94906683 errors 200, dir isize wrong
root 5 inode 94906716 errors 200, dir isize wrong
root 5 inode 94906730 errors 200, dir isize wrong
root 5 inode 94906735 errors 200, dir isize wrong
root 5 inode 94906758 errors 200, dir isize wrong
()
root 5 inode 94928259 errors 200, dir isize wrong
root 5 inode 94928286 errors 200, dir isize wrong
root 5 inode 94928311 errors 200, dir isize wrong
root 5 inode 94928321 errors 200, dir isize wrong
root 5 inode 133964681 errors 200, dir isize wrong
root 5 inode 133964684 errors 200, dir isize wrong
root 5 inode 142590710 errors 200, dir isize wrong
root 5 inode 144973646 errors 200, dir isize wrong
root 5 inode 146401067 errors 100, file extent discount
root 5 inode 146401080 errors 100, file extent discount
root 5 inode 

Re: 40TB volume taking over 16 hours to mount, any ideas?

2014-08-09 Thread Jose Ildefonso Camargo Tolosa
And it is still going although the hung task message stopped long
ago (behavior similar to 3.15), it hasn't finished mounting, mount is
still taking 100% CPU, *and* I can't see any disk activity at all.
Last hung task message:

[21131.749759] INFO: task btrfs-transacti:7353 blocked for more than
120 seconds.
[21131.828755]   Not tainted 3.14.16-031416-generic #201408072035
[21131.868788] echo 0  /proc/sys/kernel/hung_task_timeout_secs
disables this message.
[21131.947525] btrfs-transacti D 818118e0 0  7353  2 0x
[21131.947530]  880450afddc8 0002 880450afdd68
880450afdfd8
[21131.947535]  00014500 00014500 88046985e380
8804602018e0
[21131.947540]  880450afddd8 8808642fc000 8802aa5b8800
880450afde00
[21131.947544] Call Trace:
[21131.947551]  [8175b0c9] schedule+0x29/0x70
[21131.947577]  [a01040ed]
btrfs_commit_transaction+0x25d/0xa00 [btrfs]
[21131.947581]  [810b4e10] ? __wake_up_sync+0x20/0x20
[21131.947591]  [a0101e45] transaction_kthread+0x1d5/0x250 [btrfs]
[21131.947601]  [a0101c70] ? open_ctree+0x20d0/0x20d0 [btrfs]
[21131.947604]  [8108fd89] kthread+0xc9/0xe0
[21131.947606]  [8108fcc0] ? flush_kthread_worker+0xb0/0xb0
[21131.947610]  [817681bc] ret_from_fork+0x7c/0xb0
[21131.947612]  [8108fcc0] ? flush_kthread_worker+0xb0/0xb0

Do you think I will have better luck with 3.16? or maybe it is that
this filesystem has so many errors (remember the btrfs check output)
that it will take a really long time to mount because it is trying to
correct this?

Thanks!

Ildefonso
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 40TB volume taking over 16 hours to mount, any ideas?

2014-08-09 Thread Duncan
Marc MERLIN posted on Sat, 09 Aug 2014 11:21:13 -0700 as excerpted:

 You could argue that since 3.16.0 does not have the recently found
 deadlock patch that's been plaging 15 and 16 (14 not as much for me),
 it's not usable for some (it ran about 1 day on my laptop before
 deadlocking, and maybe an hour at most on my server).
 
 I sure hope that deadlock patch is going to be added to the 3.16.x tree,
 I'm not super stocked with being stuck at 3.14.

Well, yes.

It'll almost certainly make it to the stable series including 3.16.x 
shortly after it ends up in the 3.17 development tree.  But the switch to 
worker-threads was only with 3.15, so anything previous to that doesn't 
need it (thus 3.14 working well for you, previous versions had other 
bugs), and 3.15 isn't a long-term-stable and Greg KH already warned that 
the just-Friday-released 3.15.9 is its penultimate release and people 
should be thinking about switching to 3.16, so pre-3.15 the patch isn't 
needed and whether it'll make it into 3.15.10, the last 3.15-series 
release, is questionable at this point, so 3.17-development or presumably 
3.16.1 or 3.16.2 looks to be the soonest it'll possibly happen for people 
not willing to cherrypick the patch from the list as soon as posted.

FWIW, 3.15 (where I didn't have time to try the development series and 
only upgraded about time it came out) and the 3.16 development series 
including the 3.16.0 release have worked well enough for me, but my btrfs 
are all on ssd, the ones I regularly mount all being raid1-pairs, and 
apparently on my 6-core at least, the bug is hard enough to trigger on 
ssd and I don't routinely push them hard enough to have seen it, thus 
explaining why I've not had problems with 3.15 and the 3.16 series up 
thru 3.16.0 release, beyond an instance that was either right about 3.15 
release or in 3.14, and might have been a one-off as it certainly was for 
me.

Tho while the problem has been pretty well traced so we know what it is, 
I'm not sure that a full patch for it has yet been posted on the list, 
has it?  I think it was nailed down too late in the week to prepare and 
pre-post test a patch before the weekend.  So I'd expect to see the patch 
on the list on Tuesday or so, just in time to make the last bit of the 
3.17 commit window (tho it's a stable-candidate fix so could go in later 
as well), but likely too late to make 3.15.10 and 3.16.1, so 3.17-rc1 or 
3.16.2 it'll likely be.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 40TB volume taking over 16 hours to mount, any ideas?

2014-08-09 Thread Duncan
Jose Ildefonso Camargo Tolosa posted on Sat, 09 Aug 2014 13:38:46 -0500 as
excerpted:

 On Sat, Aug 9, 2014 at 12:01 PM, Duncan 1i5t5.dun...@cox.net wrote:
 Jose Ildefonso Camargo Tolosa posted on Sat, 09 Aug 2014 11:06:37 -0500
 as excerpted:

 3.16 (still in development)

 ??

 3.16 has been out for nearly a week now and we're nearing half-way thru
 the 3.17 commit-window.  Based on the kernel git I have here, Linus'
 commit officially changing the makefile entry to 3.16 was on Sunday,
 Aug 3, at 15:25:02 -0700.

 The last pre-3.16 commit was a merge of two timer-related fixes from
 the tip-tree at 9:58:20 -0700 that morning.

 So where does your still in development come from?


 Well, maybe not the right word, but here is what kernel.org says about
 mainline kernels:
 
 Mainline tree is maintained by Linus Torvalds. It's the tree where all
 new features are introduced and where all the exciting new development
 happens. New mainline kernels are released every 2-3 months.
 
 So, there you go: all new features are introduced, and where all the
 exciting new development happens.
 
 So... development is quite active on mainline kernels.

But 3.16.0 is out, and the real active development is in the commit 
window pre-rc1, tho a kernel doesn't really /start/ settling down until 
rc3 or so, and isn't reasonably stable until rc5 or so (tho rc5 is a 
little late to start testing and reporting bugs to have fixed by release, 
it's really best to start testing around rc3 or so, at which point any 
real bad data-eating-risk bugs should be either fixed or at least 
published, so the risk is dramatically lower than it would be during the 
commit window itself, for instance).  But from rc5 on thru rc7 or 8 and 
release, unless you're one of the ones still waiting on a bug found 
earlier to be fixed, it's generally quite stable and boring.

So by the time of actual .0 release, it really is quite stable, and no 
longer development kernel.  Sure, Greg KH's stable series kernel releases 
stabilize it further, but that's exactly what they are, stable series, 
not development series, and there's really no development going into it 
generally from rc1 on, tho occasionally something that needs to come 
after everything else is slipped in in the first couple days after rc1, 
but still well before rc2, and the .0 release signifies the end of the 
post development stabilization period such that .0 really is no longer a 
development kernel at all, even if there are a few more weekly stable-
series updates (about 10, 3.15.10 was announced to be the last one for 
3.15, with the Friday-released 3.15.9) before support ceases if it's not 
a long-term-stable candidate.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 40TB volume taking over 16 hours to mount, any ideas?

2014-08-09 Thread Mitch Harder
On Sat, Aug 9, 2014 at 11:21 PM, Duncan 1i5t5.dun...@cox.net wrote:

   But from rc5 on thru rc7 or 8 and
 release, unless you're one of the ones still waiting on a bug found
 earlier to be fixed, it's generally quite stable and boring.

 So by the time of actual .0 release, it really is quite stable, and no
 longer development kernel.  Sure, Greg KH's stable series kernel releases
 stabilize it further, but that's exactly what they are, stable series,
 not development series, and there's really no development going into it
 generally from rc1 on, tho occasionally something that needs to come
 after everything else is slipped in in the first couple days after rc1,
 but still well before rc2, and the .0 release signifies the end of the
 post development stabilization period such that .0 really is no longer a
 development kernel at all, even if there are a few more weekly stable-
 series updates (about 10, 3.15.10 was announced to be the last one for
 3.15, with the Friday-released 3.15.9) before support ceases if it's not
 a long-term-stable candidate.


I can't say I've observed that to be the case with Btrfs.  I know
there is a core group of developers working very hard on testing the
Btrfs updates in the _rc kernels, but once that .0 kernel hits the
streets, the extra exposure to all the various combinations of
hardware and options has been know to discover new issues.  I think
this is nearly unavoidable given the pace of Btrfs development.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html