[PATCH] btrfs: Handle uninitialised inode eviction

2016-06-28 Thread Nikolay Borisov
The code flow in btrfs_new_inode allows for btrfs_evict_inode to be
called with not fully initialised inode (e.g. ->root member not
being set). This can happen when btrfs_set_inode_index in
btrfs_new_inode fails, which in turn would call iput for the newly
allocated inode. This in turn leads to vfs calling into btrfs_evict_inode.
This leads to null pointer dereference. To handle this situation check whether
the passed inode has root set and just free it in case it doesn't.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/inode.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

Hello, 

I belive this is fixes the issue reported in 
http://thread.gmane.org/gmane.comp.file-systems.btrfs/57809

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4421954720b8..b51723811d01 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5159,11 +5159,18 @@ void btrfs_evict_inode(struct inode *inode)
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_block_rsv *rsv, *global_rsv;
int steal_from_global = 0;
-   u64 min_size = btrfs_calc_trunc_metadata_size(root, 1);
+   u64 min_size;
int ret;
 
trace_btrfs_inode_evict(inode);
 
+   if (!root) {
+   kmem_cache_free(btrfs_inode_cachep, BTRFS_I(inode));
+   return;
+   }
+
+   min_size = btrfs_calc_trunc_metadata_size(root, 1);
+
evict_inode_truncate_pages(inode);
 
if (inode->i_nlink &&
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] btrfs: fix fsfreeze hang caused by delayed iputs deal

2016-06-28 Thread Wang Xiaoguang
When running fstests generic/068, sometimes we got below WARNING:
  xfs_io  D 8800331dbb20 0  6697   6693 0x0080
  8800331dbb20 88007acfc140 880034d895c0 8800331dc000
  880032d243e8 fffe 880032d24400 0001
  8800331dbb38 816a9045 880034d895c0 8800331dbba8
  Call Trace:
  [] schedule+0x35/0x80
  [] rwsem_down_read_failed+0xf2/0x140
  [] ? __filemap_fdatawrite_range+0xd1/0x100
  [] call_rwsem_down_read_failed+0x18/0x30
  [] ? btrfs_alloc_block_rsv+0x2c/0xb0 [btrfs]
  [] percpu_down_read+0x35/0x50
  [] __sb_start_write+0x2c/0x40
  [] start_transaction+0x2a5/0x4d0 [btrfs]
  [] btrfs_join_transaction+0x17/0x20 [btrfs]
  [] btrfs_evict_inode+0x3c4/0x5d0 [btrfs]
  [] evict+0xba/0x1a0
  [] iput+0x196/0x200
  [] btrfs_run_delayed_iputs+0x70/0xc0 [btrfs]
  [] btrfs_commit_transaction+0x928/0xa80 [btrfs]
  [] btrfs_freeze+0x30/0x40 [btrfs]
  [] freeze_super+0xf0/0x190
  [] do_vfs_ioctl+0x4a5/0x5c0
  [] ? do_audit_syscall_entry+0x66/0x70
  [] ? syscall_trace_enter_phase1+0x11f/0x140
  [] SyS_ioctl+0x79/0x90
  [] do_syscall_64+0x62/0x110
  [] entry_SYSCALL64_slow_path+0x25/0x25

>From this warning, freeze_super() already holds SB_FREEZE_FS, but
btrfs_freeze() will call btrfs_commit_transaction() again, if
btrfs_commit_transaction() finds that it has delayed iputs to handle,
it'll start_transaction(), which will try to get SB_FREEZE_FS lock
again, then deadlock occurs.

The root cause is that in btrfs, sync_filesystem(sb) does not make
sure all metadata is updated. See below race window in freeze_super():
sync_filesystem(sb);
|
| race window
| In this period, cleaner_kthread() may be scheduled to
| run, and it call btrfs_delete_unused_bgs() which will
| add some delayed iputs.
|
sb->s_writers.frozen = SB_FREEZE_FS;
sb_wait_write(sb, SB_FREEZE_FS);
if (sb->s_op->freeze_fs) {
/* freeze_fs will call btrfs_commit_transaction() */
ret = sb->s_op->freeze_fs(sb);

So if btrfs is doing freeze job, we should block
btrfs_delete_unused_bgs(), to avoid add delayed iputs.

Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/disk-io.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 863bf7a..fdbe0df 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1846,8 +1846,11 @@ static int cleaner_kthread(void *arg)
 * after acquiring fs_info->delete_unused_bgs_mutex. So we
 * can't hold, nor need to, fs_info->cleaner_mutex when deleting
 * unused block groups.
+*
 */
+   __sb_start_write(root->fs_info->sb, SB_FREEZE_WRITE, true);
btrfs_delete_unused_bgs(root->fs_info);
+   __sb_end_write(root->fs_info->sb, SB_FREEZE_WRITE);
 sleep:
if (!again) {
set_current_state(TASK_INTERRUPTIBLE);
-- 
2.9.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs: fix free space calculation in dump_space_info()

2016-06-28 Thread Wang Xiaoguang
Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/extent-tree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8550a0e..520ba8f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7747,8 +7747,8 @@ static void dump_space_info(struct btrfs_space_info 
*info, u64 bytes,
printk(KERN_INFO "BTRFS: space_info %llu has %llu free, is %sfull\n",
   info->flags,
   info->total_bytes - info->bytes_used - info->bytes_pinned -
-  info->bytes_reserved - info->bytes_readonly,
-  (info->full) ? "" : "not ");
+  info->bytes_reserved - info->bytes_readonly -
+  info->bytes_may_use, (info->full) ? "" : "not ");
printk(KERN_INFO "BTRFS: space_info total=%llu, used=%llu, pinned=%llu, 
"
   "reserved=%llu, may_use=%llu, readonly=%llu\n",
   info->total_bytes, info->bytes_used, info->bytes_pinned,
-- 
2.9.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug during RAID1 replace

2016-06-28 Thread Chris Murphy
On Tue, Jun 28, 2016 at 4:52 PM, Saint Germain  wrote:

> Well I made a ddrescue image of both drives (only one error on sdb
> during ddrescue copy) and started the computer again (after
> disconnecting the old drives).

What was the error? Any kernel message at the time of this error?



> I don't know if I should continue trying to repair this RAID1 or if I
> should just cp/rsync to a new BTRFS volume and get done with it.

Well for sure already you should prepare to lose this volume, so
whatever backup you need, do that yesterday.

> On the other hand it seems interesting to repair instead of just giving
> up. It gives a good look at BTRFS resiliency/reliability.

On the one hand Btrfs shouldn't become inconsistent in the first
place, that's the design goal. On the other hand, I'm finding from the
problems reported on the list that Btrfs increasingly mounts at least
read only and allows getting data off, even when the file system isn't
fully functional or repairable.

In your case, once there are metadata problems even with raid 1, it's
difficult at best. But once you have the backup you could try some
other things once it's certain the hardware isn't adding to the
problems, which I'm still not yet certain of.



>
> Here is the log from the mount to the scrub aborting and the result
> from smartctl.
>
> Thanks for your precious help so far.
>
>
> BTRFS error (device sdb1): cleaner transaction attach returned -30

Not sure what this is. The Btrfs cleaner is used to remove snapshots,
decrement extent reference count, and if the count is 0, then free up
that space. So, why is it running? I don't know what -30 means.


> BTRFS info (device sdb1): disk space caching is enabled
> BTRFS info (device sdb1): bdev /dev/sdb1 errs: wr 11881695, rd 14, flush 
> 7928, corrupt 1714507, gen 1335
> BTRFS info (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 
> 21622, gen 24

I missed something the first time around in these messages: the
generation error. Both drives have generation errors. A generation
error on a single drive means that drive was not successfully being
written to or was missing. For it to happen on both drives is bad. If
it happens to just one drive, once it's reappears it will be passively
caught up to the other one as reads happen, but best practice for now
requires the user to run scrub or balance. If that doesn't happen and
a 2nd drive vanishes or has write errors that cause generation
mismatches, now both drives are simultaneously behind and ahead of
each other. Some commits went to one drive, some went to the other.
And right now Btrfs totally flips out and will irreparably get
corrupted.

So I have to ask if this volume was ever mounted degraded? If not you
really need to look at logs and find out why the drives weren't being
written to. sdb show lots of write, flush, corruption and generation
errors, so it seems like it was having a hardware issue. But then sda
has only corruptions and generation problems, as if it wasn't even
connected or powered on.

OR another possibility is one of the drives was previously cloned
(block copied), or snapshot via LVM and you ran into the block level
copies gotcha:
https://btrfs.wiki.kernel.org/index.php/Gotchas



> BTRFS warning (device sdb1): checksum error at logical 93445255168 on dev 
> /dev/sdb1, sector 54528696, root 5, inode 3434831, offset 479232, length 
> 4096, links 1 (path: user/.local/share/zeitgeist/activity.sqlite-wal)

Some extent data and its checksum don't match, on sdb. So this file is
considered corrupt. Maybe the data is OK and the checksum is wrong?

> btrfs_dev_stat_print_on_error: 164 callbacks suppressed
> BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 11881695, rd 14, flush 
> 7928, corrupt 1714508, gen 1335
> scrub_handle_errored_block: 164 callbacks suppressed
> BTRFS error (device sdb1): unable to fixup (regular) error at logical 
> 93445255168 on dev /dev/sdb1

And it can't be fixed, because...

> BTRFS warning (device sdb1): checksum error at logical 93445255168 on dev 
> /dev/sda1, sector 77669048, root 5, inode 3434831, offset 479232, length 
> 4096, links 1 (path: user/.local/share/zeitgeist/activity.sqlite-wal)


The same block on sda also doesn't match checksum. So either both
checksums are wrong, or both datas are wrong.

You can make these errors "go away" by using btrfs check --repair
--init-csum-tree but what this does it it will totally paper over any
real corruptions. You will have no idea if they're really corrupt or
not without checking them. Looks like most of the messages have to do
with files, not metadata although I didn't look at every single line.

I think the generations between the two drives is too far off for them
to be put back together again. But if the --init-csum-tree starts to
clean up the data related errors, you could use rsync -c to compare
the files to a backup and see if they are the same and further inspect
to see if they're corrupt or not.

You definitely don't want corrup

Re: Bug in 'btrfs filesystem du' ?

2016-06-28 Thread Andrei Borzenkov
28.06.2016 20:20, Andrei Borzenkov пишет:
> 28.06.2016 19:55, Henk Slager пишет:
>> On Tue, Jun 28, 2016 at 2:56 PM, M G Berberich  
>> wrote:
>>> Hello,
>>>
>>> Am Montag, den 27. Juni schrieb Henk Slager:
 On Mon, Jun 27, 2016 at 3:33 PM, M G Berberich  
 wrote:
> Am Montag, den 27. Juni schrieb M G Berberich:
>> after a balance ‘btrfs filesystem du’ probably shows false data about
>> shared data.
>
> Oh, I forgot: I have btrfs-progs v4.5.2 and kernel 4.6.2.

 With  btrfs-progs v4.6.1 and kernel 4.7-rc5, the numbers are correct
 about shared data.
>>>
>>> I tested with kernels 4.6.3 and 4.7-rc5 and with btrfs-progs 4.5.2 and
>>> 4.61.
>> Also with kernel 4.6.2-1-default and btrfs-progs v4.5.3+20160516
>> (current stock opensuse tumbleweed) I cannot reproduce the problem.
>>
> 
> I confirm the same behavior on openSUSE Tumbleweed with kernel 4.6.2-1.2
> and btrfsprogs 4.5.3-1.2 using provided script.
> 

I realized that it sounded ambiguous. I do see the reported bug on
openSUSE Tumbleweed using the same versions as you. After rebalance
shared data disappears completely in du output.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 21/20] xfs/128: cycle_mount the scratch device, not the test device

2016-06-28 Thread Darrick J. Wong
This test uses the scratch device, so cycle that, not the test dev.

Signed-off-by: Darrick J. Wong 
---
 tests/xfs/128 |7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tests/xfs/128 b/tests/xfs/128
index 8758d7e..2e756d5 100755
--- a/tests/xfs/128
+++ b/tests/xfs/128
@@ -66,7 +66,7 @@ _pwrite_byte 0x61 0 $((blks * blksz)) $testdir/file1 >> 
$seqres.full
 _cp_reflink $testdir/file1 $testdir/file2
 _cp_reflink $testdir/file2 $testdir/file3
 _cp_reflink $testdir/file3 $testdir/file4
-_test_cycle_mount
+_scratch_cycle_mount
 free_blocks1=$(stat -f $testdir -c '%f')
 
 md5sum $testdir/file1 | _filter_scratch
@@ -82,7 +82,7 @@ c04=$(_md5_checksum $testdir/file4)
 echo "CoW the reflink copies"
 _pwrite_byte 0x62 $blksz $blksz $testdir/file2 >> $seqres.full
 _pwrite_byte 0x63 $(( blksz * (blks - 1) )) $blksz $testdir/file3 >> 
$seqres.full
-_test_cycle_mount
+_scratch_cycle_mount
 free_blocks2=$(stat -f $testdir -c '%f')
 
 md5sum $testdir/file1 | _filter_scratch
@@ -97,11 +97,12 @@ c14=$(_md5_checksum $testdir/file4)
 
 echo "Defragment"
 lsattr -l $testdir/ | _filter_scratch | _filter_spaces
+filefrag -v $testdir/file* >> $seqres.full
 $XFS_FSR_PROG -v -d $testdir/file1 >> $seqres.full
 $XFS_FSR_PROG -v -d $testdir/file2 >> $seqres.full # fsr probably breaks the 
link
 $XFS_FSR_PROG -v -d $testdir/file3 >> $seqres.full # fsr probably breaks the 
link
 $XFS_FSR_PROG -v -d $testdir/file4 >> $seqres.full # fsr probably ignores this 
file
-_test_cycle_mount
+_scratch_cycle_mount
 free_blocks3=$(stat -f $testdir -c '%f')
 
 md5sum $testdir/file1 | _filter_scratch
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix read_node_slot to return errors

2016-06-28 Thread Liu Bo
We use read_node_slot() to read btree node and it has two cases,
a) slot is out of range, which means 'no such entry'
b) we fail to read the block, due to checksum fails or corrupted
   content or not with uptodate flag.
But we're returning NULL in both cases, this makes it return -ENOENT
in case a) and return -EIO in case b), and this fixes its callers
as well as btrfs_search_forward() 's caller to catch the new errors.

The problem is reported by Peter Becker, and I can manage to
hit the same BUG_ON by mounting my fuzz image.

Reported-by: Peter Becker 
Signed-off-by: Liu Bo 
---
 fs/btrfs/ctree.c| 63 +
 fs/btrfs/tree-log.c |  4 
 2 files changed, 48 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index a85cf7d..63510e0 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1858,7 +1858,6 @@ static void root_sub_used(struct btrfs_root *root, u32 
size)
 
 /* given a node and slot number, this reads the blocks it points to.  The
  * extent buffer is returned with a reference taken (but unlocked).
- * NULL is returned on error.
  */
 static noinline struct extent_buffer *read_node_slot(struct btrfs_root *root,
   struct extent_buffer *parent, int slot)
@@ -1866,19 +1865,16 @@ static noinline struct extent_buffer 
*read_node_slot(struct btrfs_root *root,
int level = btrfs_header_level(parent);
struct extent_buffer *eb;
 
-   if (slot < 0)
-   return NULL;
-   if (slot >= btrfs_header_nritems(parent))
-   return NULL;
+   if (slot < 0 || slot >= btrfs_header_nritems(parent))
+   return ERR_PTR(-ENOENT);
 
BUG_ON(level == 0);
 
eb = read_tree_block(root, btrfs_node_blockptr(parent, slot),
 btrfs_node_ptr_generation(parent, slot));
-   if (IS_ERR(eb) || !extent_buffer_uptodate(eb)) {
-   if (!IS_ERR(eb))
-   free_extent_buffer(eb);
-   eb = NULL;
+   if (!IS_ERR(eb) && !extent_buffer_uptodate(eb)) {
+   free_extent_buffer(eb);
+   eb = ERR_PTR(-EIO);
}
 
return eb;
@@ -1931,8 +1927,8 @@ static noinline int balance_level(struct 
btrfs_trans_handle *trans,
 
/* promote the child to a root */
child = read_node_slot(root, mid, 0);
-   if (!child) {
-   ret = -EROFS;
+   if (IS_ERR(child)) {
+   ret = PTR_ERR(child);
btrfs_handle_fs_error(root->fs_info, ret, NULL);
goto enospc;
}
@@ -1970,6 +1966,9 @@ static noinline int balance_level(struct 
btrfs_trans_handle *trans,
return 0;
 
left = read_node_slot(root, parent, pslot - 1);
+   if (IS_ERR(left))
+   left = NULL;
+
if (left) {
btrfs_tree_lock(left);
btrfs_set_lock_blocking(left);
@@ -1980,7 +1979,11 @@ static noinline int balance_level(struct 
btrfs_trans_handle *trans,
goto enospc;
}
}
+
right = read_node_slot(root, parent, pslot + 1);
+   if (IS_ERR(right))
+   right = NULL;
+
if (right) {
btrfs_tree_lock(right);
btrfs_set_lock_blocking(right);
@@ -2135,6 +2138,8 @@ static noinline int push_nodes_for_insert(struct 
btrfs_trans_handle *trans,
return 1;
 
left = read_node_slot(root, parent, pslot - 1);
+   if (IS_ERR(left))
+   left = NULL;
 
/* first, try to make some room in the middle buffer */
if (left) {
@@ -2185,6 +2190,8 @@ static noinline int push_nodes_for_insert(struct 
btrfs_trans_handle *trans,
free_extent_buffer(left);
}
right = read_node_slot(root, parent, pslot + 1);
+   if (IS_ERR(right))
+   right = NULL;
 
/*
 * then try to empty the right most buffer into the middle
@@ -3773,7 +3780,11 @@ static int push_leaf_right(struct btrfs_trans_handle 
*trans, struct btrfs_root
btrfs_assert_tree_locked(path->nodes[1]);
 
right = read_node_slot(root, upper, slot + 1);
-   if (right == NULL)
+   /*
+* slot + 1 is not valid or we fail to read the right node,
+* no big deal, just return.
+*/
+   if (IS_ERR(right))
return 1;
 
btrfs_tree_lock(right);
@@ -4003,7 +4014,11 @@ static int push_leaf_left(struct btrfs_trans_handle 
*trans, struct btrfs_root
btrfs_assert_tree_locked(path->nodes[1]);
 
left = read_node_slot(root, path->nodes[1], slot - 1);
-   if (left == NULL)
+   /*
+* slot - 1 is not valid or we fail to read the left node,
+* no big deal, just return.
+*/
+   if (IS_ERR(left))
return 1;
 
btrfs_tree_lock(left);
@@ -

Re: invalid opcode 0000 / kernel bug with defect HDD

2016-06-28 Thread Liu Bo
On Tue, Jun 28, 2016 at 10:16:58AM +0200, Peter Becker wrote:
> Cause of kernel bugs was a defective HDD (/dev/sdd).

Thanks for reporting this bug, I can reproduce it, it's due to the fact
that we cannot read a valid btree node from the underlying disks, which
comes from a defective HDD in your case, but no worry, I'll send a patch
to return -EIO to the command line. :)

BTW, we're luck that we can still mount it which means the major part of
metadata are valid.

Thanks,

-liubo

> 
> The kernel BUG:
> 
> May 16 07:41:38 nas kernel: [37168.832800]
> btrfs_dev_stat_print_on_error: 470 callbacks suppressed
> May 16 07:41:38 nas kernel: [37168.832806] BTRFS error (device sdd):
> bdev /dev/sdb errs: wr 49293, rd 567248, flush 0, corrupt 0, gen 0
> May 16 07:41:38 nas kernel: [37168.832843] BTRFS error (device sdd):
> bdev /dev/sdf errs: wr 0, rd 537544, flush 0, corrupt 0, gen 0
> May 16 07:41:38 nas kernel: [37168.832887] BTRFS error (device sdd):
> bdev /dev/sdb errs: wr 49293, rd 567249, flush 0, corrupt 0, gen 0
> May 16 07:41:38 nas kernel: [37168.832893] BTRFS error (device sdd):
> bdev /dev/sdf errs: wr 0, rd 537545, flush 0, corrupt 0, gen 0
> May 16 07:41:38 nas kernel: [37168.832969] BTRFS error (device sdd):
> bdev /dev/sdb errs: wr 49293, rd 567250, flush 0, corrupt 0, gen 0
> May 16 07:41:38 nas kernel: [37168.832977] BTRFS error (device sdd):
> bdev /dev/sdf errs: wr 0, rd 537546, flush 0, corrupt 0, gen 0
> May 16 07:41:38 nas kernel: [37168.832987] BTRFS error (device sdd):
> bdev /dev/sdb errs: wr 49293, rd 567251, flush 0, corrupt 0, gen 0
> May 16 07:41:38 nas kernel: [37168.832992] BTRFS error (device sdd):
> bdev /dev/sdf errs: wr 0, rd 537547, flush 0, corrupt 0, gen 0
> May 16 07:41:38 nas kernel: [37168.862127] BTRFS error (device sdd):
> bdev /dev/sdf errs: wr 0, rd 537548, flush 0, corrupt 0, gen 0
> May 16 07:41:38 nas kernel: [37168.862188] BTRFS error (device sdd):
> bdev /dev/sdb errs: wr 49293, rd 567252, flush 0, corrupt 0, gen 0
> May 16 07:41:42 nas kernel: [37173.103386] [ cut here 
> ]
> May 16 07:41:42 nas kernel: [37173.103414] kernel BUG at
> /home/kernel/COD/linux/fs/btrfs/ctree.c:5201!
> May 16 07:41:42 nas kernel: [37173.103434] invalid opcode:  [#1] SMP
> May 16 07:41:42 nas kernel: [37173.103450] Modules linked in: cpuid
> xt_nat veth xt_addrtype xt_conntrack br_netfilter dm_thin_pool
> dm_persistent_data dm_bio_prison dm_bufio libcrc32c nvram msr
> input_leds joydev hid_generic usbhid hid xt_CHECKSUM iptable_mangle
> ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
> nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp
> llc iptable_filter ip_tables x_tables autofs4 eeepc_wmi asus_wmi
> sparse_keymap intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm
> irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd
> dm_multipath serio_raw snd_hda_codec_hdmi snd_hda_codec_realtek
> snd_hda_codec_generic snd_hda_intel snd_hda_codec bnep rfcomm
> snd_hda_core bluetooth snd_hwdep snd_pcm snd_seq_midi
> snd_seq_midi_event snd_rawmidi lpc_ich snd_seq snd_seq_device mei_me
> snd_timer mei snd soundcore mac_hid shpchp nfsd auth_rpcgss parport_pc
> nfs_acl ppdev nfs nct6775 hwmon_vid binfmt_misc coretemp lockd grace
> lp parport sunrpc fscache btrfs xor raid6_pq nls_iso8859_1 dm_mirror
> dm_region_hash dm_log uas usb_storage i915 e1000e psmouse ahci libahci
> i2c_algo_bit drm_kms_helper ptp pps_core syscopyarea sysfillrect
> sysimgblt fb_sys_fops drm video fjes wmi
> May 16 07:41:42 nas kernel: [37173.103893] CPU: 1 PID: 17784 Comm:
> btrfs Tainted: G U  W   4.5.4-040504-generic #201605120823
> May 16 07:41:42 nas kernel: [37173.103916] Hardware name: ASUS All
> Series/H87I-PLUS, BIOS 2003 11/05/2014
> May 16 07:41:42 nas kernel: [37173.103932] task: 88020501c240 ti:
> 880161f18000 task.ti: 880161f18000
> May 16 07:41:42 nas kernel: [37173.103950] RIP:
> 0010:[]  []
> btrfs_search_forward+0x24d/0x330 [btrfs]
> May 16 07:41:42 nas kernel: [37173.103995] RSP: 0018:880161f1bc38
> EFLAGS: 00010246
> May 16 07:41:42 nas kernel: [37173.104009] RAX:  RBX:
>  RCX: 0001
> May 16 07:41:42 nas kernel: [37173.104029] RDX: 0001 RSI:
> 091572628000 RDI: 8801ff2fa368
> May 16 07:41:42 nas kernel: [37173.104048] RBP: 880161f1bc98 R08:
> 091571c0 R09: 0915b1c0
> May 16 07:41:42 nas kernel: [37173.104067] R10: 880161f1ba30 R11:
>  R12: 8801d71b6930
> May 16 07:41:42 nas kernel: [37173.104086] R13: 0001 R14:
>  R15: 
> May 16 07:41:42 nas kernel: [37173.104106] FS:  7ff94d968900()
> GS:88021fb0() knlGS:
> May 16 07:41:42 nas kernel: [37173.104129] CS:  0010 DS:  ES: 
> CR0: 80050033
> May 16 07:41:42 nas kernel: [37173.104144] CR2: 7ff94ca80a00 CR3:
> 000214a96000 CR4: 000406e0
> May 16

Re: Kernel bug during RAID1 replace

2016-06-28 Thread Saint Germain
On Mon, 27 Jun 2016 20:14:58 -0600, Chris Murphy
 wrote :

> On Mon, Jun 27, 2016 at 6:49 PM, Saint Germain 
> wrote:
> 
> >
> > I've tried both option and launched a replace, but I got the same
> > error (replace is cancelled, jernel bug).
> > I will let these options on and attempt a ddrescue on /dev/sda
> > to /dev/sdd.
> > Then I will disconnect /dev/sda and reboot and see if it works
> > better.
> 
> Sounds reasonable. Just make sure the file system is already unmounted
> when you use ddrescue because otherwise you're block copying it while
> it could be modified while rw mounted (generation number tends to get
> incremented while rw mounted).
> 
> 

Well I made a ddrescue image of both drives (only one error on sdb
during ddrescue copy) and started the computer again (after
disconnecting the old drives).

However the errors remains there, and I still cannot scrub (scrub is
aborted), nor delete the file which have errors (drive is remounted
read-only if I try to delete the files).

I don't know if I should continue trying to repair this RAID1 or if I
should just cp/rsync to a new BTRFS volume and get done with it.
On the other hand it seems interesting to repair instead of just giving
up. It gives a good look at BTRFS resiliency/reliability.

Here is the log from the mount to the scrub aborting and the result
from smartctl.

Thanks for your precious help so far.


BTRFS error (device sdb1): cleaner transaction attach returned -30
BTRFS info (device sdb1): disk space caching is enabled
BTRFS info (device sdb1): bdev /dev/sdb1 errs: wr 11881695, rd 14, flush 7928, 
corrupt 1714507, gen 1335
BTRFS info (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 
21622, gen 24
scrub_handle_errored_block: 164 callbacks suppressed
BTRFS warning (device sdb1): checksum error at logical 93445255168 on dev 
/dev/sdb1, sector 54528696, root 5, inode 3434831, offset 479232, length 4096, 
links 1 (path: user/.local/share/zeitgeist/activity.sqlite-wal)
btrfs_dev_stat_print_on_error: 164 callbacks suppressed
BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 11881695, rd 14, flush 7928, 
corrupt 1714508, gen 1335
scrub_handle_errored_block: 164 callbacks suppressed
BTRFS error (device sdb1): unable to fixup (regular) error at logical 
93445255168 on dev /dev/sdb1
BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 11881695, rd 14, flush 7928, 
corrupt 1714509, gen 1335
BTRFS error (device sdb1): unable to fixup (regular) error at logical 
93445259264 on dev /dev/sdb1
BTRFS warning (device sdb1): checksum error at logical 93445255168 on dev 
/dev/sda1, sector 77669048, root 5, inode 3434831, offset 479232, length 4096, 
links 1 (path: user/.local/share/zeitgeist/activity.sqlite-wal)
BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 
21623, gen 24
BTRFS error (device sdb1): unable to fixup (regular) error at logical 
93445255168 on dev /dev/sda1
BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 
21624, gen 24
BTRFS error (device sdb1): unable to fixup (regular) error at logical 
93445259264 on dev /dev/sda1
BTRFS warning (device sdb1): checksum error at logical 136349810688 on dev 
/dev/sda1, sector 140429952, root 5, inode 4265283, offset 0, length 4096, 
links 1 (path: user/Pictures/Picture-42-2.jpg)
BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 
21625, gen 24
BTRFS warning (device sdb1): checksum error at logical 136349929472 on dev 
/dev/sda1, sector 140430184, root 5, inode 4265283, offset 118784, length 4096, 
links 1 (path: user/Pictures/Picture-42-2.jpg)
BTRFS warning (device sdb1): checksum error at logical 136350060544 on dev 
/dev/sda1, sector 140430440, root 5, inode 4265283, offset 249856, length 4096, 
links 1 (path: user/Pictures/Picture-42-2.jpg)
BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 
21626, gen 24
BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 
21627, gen 24
BTRFS error (device sdb1): unable to fixup (regular) error at logical 
136349810688 on dev /dev/sda1
BTRFS error (device sdb1): unable to fixup (regular) error at logical 
136350060544 on dev /dev/sda1
BTRFS error (device sdb1): unable to fixup (regular) error at logical 
136349929472 on dev /dev/sda1
BTRFS warning (device sdb1): checksum error at logical 136349814784 on dev 
/dev/sda1, sector 140429960, root 5, inode 4265283, offset 4096, length 4096, 
links 1 (path: user/Pictures/Picture-42-2.jpg)
BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 
21628, gen 24
BTRFS warning (device sdb1): checksum error at logical 136350064640 on dev 
/dev/sda1, sector 140430448, root 5, inode 4265283, offset 253952, length 4096, 
links 1 (path: user/Pictures/Picture-42-2.jpg)
BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 
21629, gen 24
BTRFS error (device sdb1): unable to fixup (regular) error at logical 
136349814784 on dev /dev/sda1
BTRFS warnin

Re: Btrfs full balance command fails due to ENOSPC (bug 121071)

2016-06-28 Thread Henk Slager
On Tue, Jun 28, 2016 at 3:46 PM, Francesco Turco  wrote:
> On 2016-06-27 23:26, Henk Slager wrote:
>> btrfs-debug does not show metadata ans system chunks; the balancing
>> problem might come from those.
>> This script does show all chunks:
>> https://github.com/knorrie/btrfs-heatmap/blob/master/show_usage.py
>>
>> You might want to use vrange or drange balance filters so that you can
>> just target a certain chunk and maybe that gives a hint where the
>> problem might be. But anyhow, the behavior experienced is a bug.
>
> Updated the bug with the output log from your script. I simply ran it as:
>
> ./show_usage.py /
>
> I don't know how to use vrange/drange balance filters. Can you show me
> how to do that, please?

The original dmesg log shows that balance gets into trouble at block group
46435139584. This is SYSTEM|DUP and in the later blockgroup list
generated with Hans' py script it is not there anymore under this same
vaddr, so btrfs (or new manual balance) has managed to relocate it,
despite the enospc.

One theory I once had is that at the beginning of the disk, there were
or are small chunks of 8MiB, whereas the rest of the disk has 1G or at
least bigger chunks once the fs gets used and filled up. Those initial
small chunks tend to be system and/or metadata. If then later after
heavy use of a full balance the small chunks get relocated, there is
then unallocated space, but virtually nothing fits there if the policy
is 'first allocate big chunks'. So here the allocater mechanism could
then output an enospc, assuming that it
doesn't try exhaustively in order to keep the code simple and fast.
But it is only theory, one would need to traceback etc in such a case.
I never had such a case, so I can't prove it.

Suppose you want to relocate the metadata blockgroup (you have only
one, it is the same location in the 2 lists from the bug report)

btrfs balance start -v -mvrange=29360128..29360129 /

This 1-byte range in in the virtual adress range 29360128 ..
29360128+1G-1,[ so it will relocate the metadata blockgroup. After
succesfull balance, you will see its vaddr increased and its device
addresses (paddr) also changed.

If you want to balance based on device address and for example
relocate just 1 of the dup of the metadata:
btrfs balance start -v -mdevid=1,drange=37748736..37748737 /

All this does not solve the bug, but hopefully gives us better
understanding in cases where balance also fails and also no file
creation is possible anymore.



>
> --
> Website: http://www.fturco.net/
> GPG key: 6712 2364 B2FE 30E1 4791 EB82 7BB1 1F53 29DE CD34
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix double free of fs root

2016-06-28 Thread Liu Bo
I got this warning while mounting a btrfs image,

[ 3020.509606] [ cut here ]
[ 3020.510107] WARNING: CPU: 3 PID: 5581 at lib/idr.c:1051 ida_remove+0xca/0x190
[ 3020.510853] ida_remove called for id=42 which is not allocated.
[ 3020.511466] Modules linked in:
[ 3020.511802] CPU: 3 PID: 5581 Comm: mount Not tainted 4.7.0-rc5+ #274
[ 3020.512438] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.8.2-20150714_191134- 04/01/2014
[ 3020.513385]  0286 21295d86 88006c66b8f0 
8182ba5a
[ 3020.514153]   0009 88006c66b930 
810e0ed7
[ 3020.514928]  041b 8289a8c0 88007f437880 

[ 3020.515717] Call Trace:
[ 3020.515965]  [] dump_stack+0xc9/0x13f
[ 3020.516487]  [] __warn+0x147/0x160
[ 3020.517005]  [] warn_slowpath_fmt+0x5f/0x80
[ 3020.517572]  [] ida_remove+0xca/0x190
[ 3020.518075]  [] free_anon_bdev+0x2c/0x60
[ 3020.518609]  [] free_fs_root+0x13f/0x160
[ 3020.519138]  [] btrfs_get_fs_root+0x379/0x3d0
[ 3020.519710]  [] ? __mutex_unlock_slowpath+0x155/0x2c0
[ 3020.520366]  [] open_ctree+0x2e91/0x3200
[ 3020.520965]  [] btrfs_mount+0x1322/0x15b0
[ 3020.521536]  [] ? kmemleak_alloc_percpu+0x44/0x170
[ 3020.522167]  [] ? lockdep_init_map+0x61/0x210
[ 3020.522780]  [] mount_fs+0x49/0x2c0
[ 3020.523305]  [] vfs_kern_mount+0xac/0x1b0
[ 3020.523872]  [] btrfs_mount+0x421/0x15b0
[ 3020.524402]  [] ? kmemleak_alloc_percpu+0x44/0x170
[ 3020.525045]  [] ? lockdep_init_map+0x61/0x210
[ 3020.525657]  [] ? lockdep_init_map+0x61/0x210
[ 3020.526289]  [] mount_fs+0x49/0x2c0
[ 3020.526803]  [] vfs_kern_mount+0xac/0x1b0
[ 3020.527365]  [] do_mount+0x41a/0x1770
[ 3020.527899]  [] ? strndup_user+0x6d/0xc0
[ 3020.528447]  [] ? memdup_user+0x78/0xb0
[ 3020.528987]  [] SyS_mount+0x150/0x160
[ 3020.529493]  [] entry_SYSCALL_64_fastpath+0x1f/0xbd

It turns out that we free fs root twice, btrfs_init_fs_root() calls
free_anon_bdev(root->anon_dev) and later then btrfs_get_fs_root() cals
free_fs_root which does another free_anon_bdev() and it ends up with the
above warning.

Instead of reset root->anon_dev to 0 after free_anon_bdev(), we can let
btrfs_init_fs_root() return directly since its callers have already done
the free job by calling free_fs_root().

Signed-off-by: Liu Bo 
---
 fs/btrfs/disk-io.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 60ce119..6c88c63 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1600,14 +1600,14 @@ int btrfs_init_fs_root(struct btrfs_root *root)
 
ret = get_anon_bdev(&root->anon_dev);
if (ret)
-   goto free_writers;
+   goto fail;
 
mutex_lock(&root->objectid_mutex);
ret = btrfs_find_highest_objectid(root,
&root->highest_objectid);
if (ret) {
mutex_unlock(&root->objectid_mutex);
-   goto free_root_dev;
+   goto fail;
}
 
ASSERT(root->highest_objectid <= BTRFS_LAST_FREE_OBJECTID);
@@ -1615,14 +1615,8 @@ int btrfs_init_fs_root(struct btrfs_root *root)
mutex_unlock(&root->objectid_mutex);
 
return 0;
-
-free_root_dev:
-   free_anon_bdev(root->anon_dev);
-free_writers:
-   btrfs_free_subvolume_writers(root->subv_writers);
 fail:
-   kfree(root->free_ino_ctl);
-   kfree(root->free_ino_pinned);
+   /* the caller is responsible to call free_fs_root */
return ret;
 }
 
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs-check: Fix bitflipped keys from bad RAM

2016-06-28 Thread Otto Kekäläinen
Hello!

A patch with this subject was submitted in May 2014:
http://www.spinics.net/lists/linux-btrfs/msg33777.html

I don't see it among any of the ~360 open issues at
https://bugzilla.kernel.org/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&component=btrfs

Unless somebody objects, I'll file it as a NEW issue with patch.


I think the work done by Hugo Mills for this one is important and it
would be a pity if those patches were forgotten. It is one of those
things where btrfs could outperform zfs, which has not bitflip
recovery. Btrfs could have one, and it would be great.

I personally came across a machine with a bitflipped index and I would
love to test these patches. I am however reluctant to invest time in
it if there is no issue in the bug tracker and no visible progress.
Without proper tracking all debugging/feedback would go in vain.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs check command fails with "assertion failed" error

2016-06-28 Thread Francesco Turco
On 2016-06-28 20:20, Francesco Turco wrote:
> So I'm going to submit a bug as you suggested.

Done: https://bugzilla.kernel.org/show_bug.cgi?id=12

I also found another similar bug:
https://bugzilla.kernel.org/show_bug.cgi?id=104821

-- 
Website: http://www.fturco.net/
GPG key: 6712 2364 B2FE 30E1 4791 EB82 7BB1 1F53 29DE CD34
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs check command fails with "assertion failed" error

2016-06-28 Thread Chris Murphy
On Tue, Jun 28, 2016 at 12:20 PM, Francesco Turco  wrote:
> On 2016-06-28 20:05, Chris Murphy wrote:
>> Well it probably shouldn't crash but the question is why is device 4
>> missing? We have no information what version of btrfs-progs or kernel
>> is used, or what the layout of this volume is: how many devices,
>> what's the profile breakdown, etc. Are you attempting to fix a
>> degraded volume and are the minimum number of devices present? Btrfs
>> fi show would be useful for this.
>>
>> If it's relatively recent version of btrfs-progs then I'd file a bug
>> just because it shouldn't crash, it should give some sort of coherent
>> message about why it can't proceed.
>
> Sorry for the missing informations. Here you are:
>
> - linux-libre: 4.6.2
> - btrfs-progs: 4.5.3
>
> # btrfs filesystem show /dev/loop0
> warning, device 4 is missing
> Label: none  uuid: 34fb5b58-f50f-47c3-a5b8-91d81a30eade
> Total devices 2 FS bytes used 5.17GiB
> devid1 size 30.00GiB used 30.00GiB path /dev/loop0
> *** Some devices missing
>
> If I remember correctly I extended that root filesystem with some
> additional space from a file in the home directory, in the hope of
> fixing a problem with btrfs balance and not enough space. I don't have
> the additional file anymore, so I probably won't be able to mount this
> image file anymore. Anyway as you said btrfs-check shouldn't crash.
>
> So I'm going to submit a bug as you suggested.

Yeah depending on the layout, if the only copy of some important
metadata is on the missing device, the btrfs check will naturally
fail. It's just that to a human the messages are out of order...

warning, device 4 is missing
(therefore)
Unable to find block group for 0
(And therefore fail to proceed... please find missing device 4) or some such.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs check command fails with "assertion failed" error

2016-06-28 Thread Francesco Turco
On 2016-06-28 20:05, Chris Murphy wrote:
> Well it probably shouldn't crash but the question is why is device 4
> missing? We have no information what version of btrfs-progs or kernel
> is used, or what the layout of this volume is: how many devices,
> what's the profile breakdown, etc. Are you attempting to fix a
> degraded volume and are the minimum number of devices present? Btrfs
> fi show would be useful for this.
> 
> If it's relatively recent version of btrfs-progs then I'd file a bug
> just because it shouldn't crash, it should give some sort of coherent
> message about why it can't proceed.

Sorry for the missing informations. Here you are:

- linux-libre: 4.6.2
- btrfs-progs: 4.5.3

# btrfs filesystem show /dev/loop0
warning, device 4 is missing
Label: none  uuid: 34fb5b58-f50f-47c3-a5b8-91d81a30eade
Total devices 2 FS bytes used 5.17GiB
devid1 size 30.00GiB used 30.00GiB path /dev/loop0
*** Some devices missing

If I remember correctly I extended that root filesystem with some
additional space from a file in the home directory, in the hope of
fixing a problem with btrfs balance and not enough space. I don't have
the additional file anymore, so I probably won't be able to mount this
image file anymore. Anyway as you said btrfs-check shouldn't crash.

So I'm going to submit a bug as you suggested.

Thanks.

-- 
Website: http://www.fturco.net/
GPG key: 6712 2364 B2FE 30E1 4791 EB82 7BB1 1F53 29DE CD34

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adventures in btrfs raid5 disk recovery

2016-06-28 Thread Steven Haigh
On 29/06/16 04:01, Chris Murphy wrote:
> Just wiping the slate clean to summarize:
> 
> 
> 1. We have a consistent ~1 in 3 maybe 1 in 2, reproducible corruption
> of *data extent* parity during a scrub with raid5. Goffredo and I have
> both reproduced it. It's a big bug. It might still be useful if
> someone else can reproduce it too.
> 
> Goffredo, can you file a bug at bugzilla.kernel.org and reference your
> bug thread?  I don't know if the key developers know about this, it
> might be worth pinging them on IRC once the bug is filed.
> 
> Unknown if it affects balance, or raid 6. And if it affects raid 6, is
> p or q corrupted, or both? Unknown how this manifests on metadata
> raid5 profile (only tested was data raid5). Presumably if there is
> metadata corruption that's fixed during a scrub, and its parity is
> overwritten with corrupt parity, the next time there's a degraded
> state, the file system would face plant somehow. And we've seen quite
> a few degraded raid5's (and even 6's) face plant in inexplicable ways
> and we just kinda go, shit. Which is what the fs is doing when it
> encounters a pile of csum errors. It treats the csum errors as a
> signal to disregard the fs rather than maybe only being suspicious of
> the fs. Could it turn out that these file systems were recoverable,
> just that Btrfs wasn't tolerating any csum error and wouldn't proceed
> further?

I believe this is the same case for RAID6 based on my experiences. I
actually wondered if the system halts were the result of a TON of csum
errors - not the actual result of those errors. Just about every system
hang when to 100% CPU usage on all cores and the system just stopped was
after a flood of csum errors. If it was only one or two (or I copied
data off via a network connection where the read rate was slower), I
found I had a MUCH lower chance of the system locking up.

In fact, now that I think about it, when I was copying data to an
external USB drive (maxed out at ~30MB/sec), I still got csum errors -
but the system never hung.

Every crash ended with the last line along the lines of "Stopped
recurring error. Your system needs rebooting". I wonder if this error
reporting was altered, that the system wouldn't go down.

Of course I have no way of testing this.


-- 
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897



signature.asc
Description: OpenPGP digital signature


Re: Btrfs check command fails with "assertion failed" error

2016-06-28 Thread Chris Murphy
On Tue, Jun 28, 2016 at 9:58 AM, Francesco Turco  wrote:
> When trying to repair a btrfs root filesystem I get the following error
> message:
>
> # losetup -f /home/fturco/Buffer/root-20160616.img
> # btrfs check --repair /dev/loop0
> enabling repair mode
> warning, device 4 is missing
> Checking filesystem on /dev/loop0
> UUID: 34fb5b58-f50f-47c3-a5b8-91d81a30eade
> checking extents
> Unable to find block group for 0
> extent-tree.c:289: find_search_start: Assertion `1` failed.

Well it probably shouldn't crash but the question is why is device 4
missing? We have no information what version of btrfs-progs or kernel
is used, or what the layout of this volume is: how many devices,
what's the profile breakdown, etc. Are you attempting to fix a
degraded volume and are the minimum number of devices present? Btrfs
fi show would be useful for this.

If it's relatively recent version of btrfs-progs then I'd file a bug
just because it shouldn't crash, it should give some sort of coherent
message about why it can't proceed.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adventures in btrfs raid5 disk recovery

2016-06-28 Thread Chris Murphy
Just wiping the slate clean to summarize:


1. We have a consistent ~1 in 3 maybe 1 in 2, reproducible corruption
of *data extent* parity during a scrub with raid5. Goffredo and I have
both reproduced it. It's a big bug. It might still be useful if
someone else can reproduce it too.

Goffredo, can you file a bug at bugzilla.kernel.org and reference your
bug thread?  I don't know if the key developers know about this, it
might be worth pinging them on IRC once the bug is filed.

Unknown if it affects balance, or raid 6. And if it affects raid 6, is
p or q corrupted, or both? Unknown how this manifests on metadata
raid5 profile (only tested was data raid5). Presumably if there is
metadata corruption that's fixed during a scrub, and its parity is
overwritten with corrupt parity, the next time there's a degraded
state, the file system would face plant somehow. And we've seen quite
a few degraded raid5's (and even 6's) face plant in inexplicable ways
and we just kinda go, shit. Which is what the fs is doing when it
encounters a pile of csum errors. It treats the csum errors as a
signal to disregard the fs rather than maybe only being suspicious of
the fs. Could it turn out that these file systems were recoverable,
just that Btrfs wasn't tolerating any csum error and wouldn't proceed
further?

2. The existing scrub code computes parity on-the-fly, compares it
with what's on-disk, and overwrites if there's a mismatch. If there's
a mismatch, there's no message anywhere. It's a feature request to get
a message on parity mismatches. An additional feature request would be
to get a parity_error counter along the lines of the other error
counters we have for scrub stats and dev stats.

3. I think it's a more significant change to get parity checksums
stored some where. Right now the csum tree holds item type EXTENT_CSUM
but parity is not an extent, it's also not data, it's a variant of
data. So it seems to me we'd need a new item type PARITY_CSUM to get
it into the existing csum tree. And I'm not sure what incompatibility
that brings; presumably older kernels could mount such a volume ro
safely, but shouldn't write to it, including btrfs check --repair
should probably fail.


Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in 'btrfs filesystem du' ?

2016-06-28 Thread Andrei Borzenkov
28.06.2016 19:55, Henk Slager пишет:
> On Tue, Jun 28, 2016 at 2:56 PM, M G Berberich  
> wrote:
>> Hello,
>>
>> Am Montag, den 27. Juni schrieb Henk Slager:
>>> On Mon, Jun 27, 2016 at 3:33 PM, M G Berberich  
>>> wrote:
 Am Montag, den 27. Juni schrieb M G Berberich:
> after a balance ‘btrfs filesystem du’ probably shows false data about
> shared data.

 Oh, I forgot: I have btrfs-progs v4.5.2 and kernel 4.6.2.
>>>
>>> With  btrfs-progs v4.6.1 and kernel 4.7-rc5, the numbers are correct
>>> about shared data.
>>
>> I tested with kernels 4.6.3 and 4.7-rc5 and with btrfs-progs 4.5.2 and
>> 4.61.
> Also with kernel 4.6.2-1-default and btrfs-progs v4.5.3+20160516
> (current stock opensuse tumbleweed) I cannot reproduce the problem.
> 

I confirm the same behavior on openSUSE Tumbleweed with kernel 4.6.2-1.2
and btrfsprogs 4.5.3-1.2 using provided script.

>  The later kernel with two patches to make the kernel work:
>> https://lkml.org/lkml/2016/6/1/310 https://lkml.org/lkml/2016/6/1/311 .
> ... so these seem to cause the problem
> 
> 
>> You can see the script¹ I used (do-btrfs-du-test) and the logs at
>> http://m-berberich.de/btrfs/
>>
>> In all four cases, ‘btrfs fi du -s .’ reports
>>
>>  Total   Exclusive  Set shared  Filename
>>   59.38MiB   0.00B29.69MiB  .
>>
>> befor balance and
>>
>>  Total   Exclusive  Set shared  Filename
>>   59.38MiB59.38MiB   0.00B  .
>>
>> after balance.
>>
>> Disclaimer: The script works for me, no guaranty at all.
>>
>> MfG
>> bmg
>> __
>> ¹ Disclaimer: The script works for me, no guaranty at all.
>> --
>> „Des is völlig wurscht, was heut beschlos- | M G Berberich
>>  sen wird: I bin sowieso dagegn!“  | m...@m-berberich.de
>> (SPD-Stadtrat Kurt Schindler; Regensburg)  |
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in 'btrfs filesystem du' ?

2016-06-28 Thread Henk Slager
On Tue, Jun 28, 2016 at 2:56 PM, M G Berberich  wrote:
> Hello,
>
> Am Montag, den 27. Juni schrieb Henk Slager:
>> On Mon, Jun 27, 2016 at 3:33 PM, M G Berberich  
>> wrote:
>> > Am Montag, den 27. Juni schrieb M G Berberich:
>> >> after a balance ‘btrfs filesystem du’ probably shows false data about
>> >> shared data.
>> >
>> > Oh, I forgot: I have btrfs-progs v4.5.2 and kernel 4.6.2.
>>
>> With  btrfs-progs v4.6.1 and kernel 4.7-rc5, the numbers are correct
>> about shared data.
>
> I tested with kernels 4.6.3 and 4.7-rc5 and with btrfs-progs 4.5.2 and
> 4.61.
Also with kernel 4.6.2-1-default and btrfs-progs v4.5.3+20160516
(current stock opensuse tumbleweed) I cannot reproduce the problem.

 The later kernel with two patches to make the kernel work:
> https://lkml.org/lkml/2016/6/1/310 https://lkml.org/lkml/2016/6/1/311 .
... so these seem to cause the problem


> You can see the script¹ I used (do-btrfs-du-test) and the logs at
> http://m-berberich.de/btrfs/
>
> In all four cases, ‘btrfs fi du -s .’ reports
>
>  Total   Exclusive  Set shared  Filename
>   59.38MiB   0.00B29.69MiB  .
>
> befor balance and
>
>  Total   Exclusive  Set shared  Filename
>   59.38MiB59.38MiB   0.00B  .
>
> after balance.
>
> Disclaimer: The script works for me, no guaranty at all.
>
> MfG
> bmg
> __
> ¹ Disclaimer: The script works for me, no guaranty at all.
> --
> „Des is völlig wurscht, was heut beschlos- | M G Berberich
>  sen wird: I bin sowieso dagegn!“  | m...@m-berberich.de
> (SPD-Stadtrat Kurt Schindler; Regensburg)  |
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adventures in btrfs raid5 disk recovery

2016-06-28 Thread Steven Haigh
On 28/06/16 22:25, Austin S. Hemmelgarn wrote:
> On 2016-06-28 08:14, Steven Haigh wrote:
>> On 28/06/16 22:05, Austin S. Hemmelgarn wrote:
>>> On 2016-06-27 17:57, Zygo Blaxell wrote:
 On Mon, Jun 27, 2016 at 10:17:04AM -0600, Chris Murphy wrote:
> On Mon, Jun 27, 2016 at 5:21 AM, Austin S. Hemmelgarn
>  wrote:
>> On 2016-06-25 12:44, Chris Murphy wrote:
>>> On Fri, Jun 24, 2016 at 12:19 PM, Austin S. Hemmelgarn
>>>  wrote:
>>>
>>> OK but hold on. During scrub, it should read data, compute checksums
>>> *and* parity, and compare those to what's on-disk - > EXTENT_CSUM in
>>> the checksum tree, and the parity strip in the chunk tree. And if
>>> parity is wrong, then it should be replaced.
>>
>> Except that's horribly inefficient.  With limited exceptions
>> involving
>> highly situational co-processors, computing a checksum of a parity
>> block is
>> always going to be faster than computing parity for the stripe.  By
>> using
>> that to check parity, we can safely speed up the common case of near
>> zero
>> errors during a scrub by a pretty significant factor.
>
> OK I'm in favor of that. Although somehow md gets away with this by
> computing and checking parity for its scrubs, and still manages to
> keep drives saturated in the process - at least HDDs, I'm not sure how
> it fares on SSDs.

 A modest desktop CPU can compute raid6 parity at 6GB/sec, a less-modest
 one at more than 10GB/sec.  Maybe a bottleneck is within reach of an
 array of SSDs vs. a slow CPU.
>>> OK, great for people who are using modern desktop or server CPU's.  Not
>>> everyone has that luxury, and even on many such CPU's, it's _still_
>>> faster to computer CRC32c checksums.  On top of that, we don't appear to
>>> be using the in-kernel parity-raid libraries (or if we are, I haven't
>>> been able to find where we are calling the functions for it), so we
>>> don't necessarily get assembly optimized or co-processor accelerated
>>> computation of the parity itself.  The other thing that I didn't mention
>>> above though, is that computing parity checksums will always take less
>>> time than computing parity, because you have to process significantly
>>> less data.  On a 4 disk RAID5 array, you're processing roughly 2/3 as
>>> much data to do the parity checksums instead of parity itself, which
>>> means that the parity computation would need to be 200% faster than the
>>> CRC32c computation to break even, and this margin gets bigger and bigger
>>> as you add more disks.
>>>
>>> On small arrays, this obviously won't have much impact.  Once you start
>>> to scale past a few TB though, even a few hundred MB/s faster processing
>>> means a significant decrease in processing time.  Say you have a CPU
>>> which gets about 12.0GB/s for RAID5 parity, and and about 12.25GB/s for
>>> CRC32c (~2% is a conservative ratio assuming you use the CRC32c
>>> instruction and assembly optimized RAID5 parity computations on a modern
>>> x86_64 processor (the ratio on both the mobile Core i5 in my laptop and
>>> the Xeon E3 in my home server is closer to 5%)).  Assuming those
>>> numbers, and that we're already checking checksums on non-parity blocks,
>>> processing 120TB of data in a 4 disk array (which gives 40TB of parity
>>> data, so 160TB total) gives:
>>> For computing the parity to scrub:
>>> 120TB / 12.25GB =  9795.9 seconds for processing CRC32c csums of all the
>>> regular data
>>> 120TB / 12GB= 1 seconds for processing parity of all stripes
>>> = 19795.9 seconds total
>>> ~ 5.4 hours total
>>>
>>> For computing csums of the parity:
>>> 120TB / 12.25GB =  9795.9 seconds for processing CRC32c csums of all the
>>> regular data
>>> 40TB / 12.25GB  =  3265.3 seconds for processing CRC32c csums of all the
>>> parity data
>>> = 13061.2 seconds total
>>> ~ 3.6 hours total
>>>
>>> The checksum based computation is approximately 34% faster than the
>>> parity computation.  Much of this of course is that you have to process
>>> the regular data twice for the parity computation method (once for
>>> csums, once for parity).  You could probably do one pass computing both
>>> values, but that would need to be done carefully; and, without
>>> significant optimization, would likely not get you much benefit other
>>> than cutting the number of loads in half.
>>
>> And it all means jack shit because you don't get the data to disk that
>> quick. Who cares if its 500% faster - if it still saturates the
>> throughput of the actual drives, what difference does it make?
> It has less impact on everything else running on the system at the time
> because it uses less CPU time and potentially less memory.  This is the
> exact same reason that you want your RAID parity computation performance
> as good as possible, the less time the CPU spends on that, the more it
> can spend on other th

Btrfs check command fails with "assertion failed" error

2016-06-28 Thread Francesco Turco
When trying to repair a btrfs root filesystem I get the following error
message:

# losetup -f /home/fturco/Buffer/root-20160616.img
# btrfs check --repair /dev/loop0
enabling repair mode
warning, device 4 is missing
Checking filesystem on /dev/loop0
UUID: 34fb5b58-f50f-47c3-a5b8-91d81a30eade
checking extents
Unable to find block group for 0
extent-tree.c:289: find_search_start: Assertion `1` failed.
btrfs[0x44882e]
btrfs(btrfs_reserve_extent+0xaa9)[0x44d639]
btrfs(btrfs_alloc_free_block+0x5f)[0x44d6ff]
btrfs(__btrfs_cow_block+0xc4)[0x43e9d4]
btrfs(btrfs_cow_block+0x35)[0x43efd5]
btrfs[0x4442e6]
btrfs(btrfs_commit_transaction+0x95)[0x446115]
btrfs[0x42b58e]
btrfs(cmd_check+0x76c)[0x42c60c]
btrfs(main+0x7b)[0x40ac3b]
/usr/lib/libc.so.6(__libc_start_main+0xf1)[0x7f272f658741]
btrfs(_start+0x29)[0x40ad39]

This is an unmountable filesystem:

# mount /dev/loop0 /mnt/
mount: wrong fs type, bad option, bad superblock on /dev/loop0,
   missing codepage or helper program, or other error

   In some cases useful info is found in syslog - try
   dmesg | tail or so.

# dmesg | tail
[ 9185.503571] scsi 8:0:0:1: Direct-Access Samsung  File-CD Gadget
 PQ: 0 ANSI: 2
[ 9185.513943] sd 8:0:0:0: [sdb] Attached SCSI removable disk
[ 9185.514565] sd 8:0:0:1: [sdc] Attached SCSI removable disk
[13773.533435] usb 1-4: USB disconnect, device number 8
[31454.672925] loop: module loaded
[31455.600069] BTRFS: device fsid 34fb5b58-f50f-47c3-a5b8-91d81a30eade
devid 1 transid 81460 /dev/loop0
[32584.640398] BTRFS info (device loop0): disk space caching is enabled
[32584.640403] BTRFS: has skinny extents
[32584.654196] BTRFS: failed to read the system array on loop0
[32584.666774] BTRFS: open_ctree failed

Fortunately I have no valuable content in that image file and I restored
the system from a recent backup, but I still wonder if there's a bug
somewhere in btrfs-check.

Do you need other debug informations perhaps?

-- 
Website: http://www.fturco.net/
GPG key: 6712 2364 B2FE 30E1 4791 EB82 7BB1 1F53 29DE CD34
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs full balance command fails due to ENOSPC (bug 121071)

2016-06-28 Thread Francesco Turco
On 2016-06-27 23:47, Hans van Kranenburg wrote:
> Since the existence of python-btrfs, it has gathered a few useful
> example scripts:
> 
> git clone https://github.com/knorrie/python-btrfs
> cd python-btrfs/examples/
> (get root prompt)
> 
> ./show_usage.py /mountpoint <- view sorted by 'virtual' address space
> ./show_dev_extents.py /mountpoint <- view sorted by physical layout
> 
> The show_usage in the btrfs-heatmap repo is almost gone. I'm currently
> replacing all the proof of concept playing around stuff in there with
> dedicated png-creation code that uses the python-btrfs lib.

I also issued the show_dev_extents.py command as you suggested. Hope it
helps.

-- 
Website: http://www.fturco.net/
GPG key: 6712 2364 B2FE 30E1 4791 EB82 7BB1 1F53 29DE CD34
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs full balance command fails due to ENOSPC (bug 121071)

2016-06-28 Thread Francesco Turco
On 2016-06-27 23:26, Henk Slager wrote:
> btrfs-debug does not show metadata ans system chunks; the balancing
> problem might come from those.
> This script does show all chunks:
> https://github.com/knorrie/btrfs-heatmap/blob/master/show_usage.py
> 
> You might want to use vrange or drange balance filters so that you can
> just target a certain chunk and maybe that gives a hint where the
> problem might be. But anyhow, the behavior experienced is a bug.

Updated the bug with the output log from your script. I simply ran it as:

./show_usage.py /

I don't know how to use vrange/drange balance filters. Can you show me
how to do that, please?

-- 
Website: http://www.fturco.net/
GPG key: 6712 2364 B2FE 30E1 4791 EB82 7BB1 1F53 29DE CD34
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in 'btrfs filesystem du' ?

2016-06-28 Thread M G Berberich
Hello,

Am Montag, den 27. Juni schrieb Henk Slager:
> On Mon, Jun 27, 2016 at 3:33 PM, M G Berberich  
> wrote:
> > Am Montag, den 27. Juni schrieb M G Berberich:
> >> after a balance ‘btrfs filesystem du’ probably shows false data about
> >> shared data.
> >
> > Oh, I forgot: I have btrfs-progs v4.5.2 and kernel 4.6.2.
> 
> With  btrfs-progs v4.6.1 and kernel 4.7-rc5, the numbers are correct
> about shared data.

I tested with kernels 4.6.3 and 4.7-rc5 and with btrfs-progs 4.5.2 and
4.61. The later kernel with two patches to make the kernel work:
https://lkml.org/lkml/2016/6/1/310 https://lkml.org/lkml/2016/6/1/311 .

You can see the script¹ I used (do-btrfs-du-test) and the logs at
http://m-berberich.de/btrfs/

In all four cases, ‘btrfs fi du -s .’ reports

 Total   Exclusive  Set shared  Filename
  59.38MiB   0.00B29.69MiB  .

befor balance and

 Total   Exclusive  Set shared  Filename
  59.38MiB59.38MiB   0.00B  .

after balance.

Disclaimer: The script works for me, no guaranty at all.

MfG
bmg
__
¹ Disclaimer: The script works for me, no guaranty at all.
-- 
„Des is völlig wurscht, was heut beschlos- | M G Berberich
 sen wird: I bin sowieso dagegn!“  | m...@m-berberich.de
(SPD-Stadtrat Kurt Schindler; Regensburg)  | 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs_evit_inode doesn't handle not fully initialized inodes.

2016-06-28 Thread Nikolay Borisov
Hello, 

On kernel 4.4.9 I've observed the following oops: 

[3248626.755570] BUG: unable to handle kernel NULL pointer dereference at 
035c
[3248626.755839] IP: [] btrfs_evict_inode+0x2f/0x610 [btrfs]
[3248626.756079] PGD 1eaf8d067 PUD 4096a0067 PMD 0 
[3248626.756383] Oops:  [#1] SMP 
[3248626.756637] Modules linked in: 
[3248626.760475] CPU: 6 PID: 16899 Comm: rsync Tainted: PW  O
4.4.9-clouder1 #20
[3248626.760647] Hardware name: Supermicro 
X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013
[3248626.760932] task: 880338268000 ti: 8802a4f04000 task.ti: 
8802a4f04000
[3248626.761102] RIP: 0010:[]  [] 
btrfs_evict_inode+0x2f/0x610 [btrfs]
[3248626.761447] RSP: 0018:8802a4f07b88  EFLAGS: 00010286
[3248626.761613] RAX:  RBX: 880011548fa0 RCX: 
0034
[3248626.761784] RDX: 88047fffa780 RSI: 0735 RDI: 
880011549150
[3248626.761954] RBP: 8802a4f07c28 R08: ea0009baa1d0 R09: 

[3248626.762127] R10: 0001 R11: 0001 R12: 
880011549270
[3248626.762298] R13: a0970e40 R14: a0970e40 R15: 
8802a4f07c88
[3248626.762469] FS:  7f7dc9c3e700() GS:88047fcc() 
knlGS:
[3248626.762642] CS:  0010 DS:  ES:  CR0: 80050033
[3248626.762810] CR2: 035c CR3: 000103ca8000 CR4: 
000406e0
[3248626.762980] Stack:
[3248626.763143]  8803cdee9870 0001 8802a4f07c08 
811c95f9
[3248626.763495]  8800115491f0   
880011549150
[3248626.763846]  880338268000 81095940 8802a4f07bd8 
8802a4f07bd8
[3248626.764195] Call Trace:
[3248626.764361]  [] ? __inode_wait_for_writeback+0x69/0xc0
[3248626.764534]  [] ? wake_atomic_t_function+0x40/0x40
[3248626.764707]  [] evict+0xc6/0x1c0
[3248626.764874]  [] iput+0x198/0x270
[3248626.765043]  [] ? alloc_inode+0x3a/0x90
[3248626.765221]  [] btrfs_new_inode+0x47c/0x610 [btrfs]
[3248626.765400]  [] ? btrfs_find_free_objectid+0x55/0x70 
[btrfs]
[3248626.765582]  [] ? btrfs_find_free_ino+0x117/0x130 [btrfs]
[3248626.765764]  [] btrfs_symlink+0xfc/0x3e0 [btrfs]
[3248626.765931]  [] vfs_symlink+0x9d/0xd0
[3248626.766094]  [] SyS_symlinkat+0xc5/0xf0
[3248626.766258]  [] SyS_symlink+0x16/0x20
[3248626.766422]  [] entry_SYSCALL_64_fastpath+0x12/0x6a
[3248626.766586] Code: 41 57 41 56 41 55 41 54 53 48 83 ec 78 66 66 66 66 90 48 
89 7d 98 48 89 fb 48 8b 87 50 fe ff ff 48 81 eb b0 01 00 00 48 89 45 88 <8b> 90 
5c 03 00 00 8b 05 ad 53 08 00 89 55 84 89 45 c0 85 c0 0f 
[3248626.769978] RIP  [] btrfs_evict_inode+0x2f/0x610 [btrfs]
[3248626.770205]  RSP 
[3248626.770366] CR2: 035c

And right before it in the dmesg there were multiple errors like:
BTRFS error (device loop9): bad fsid on block 502972416

The RIP points to: 
/home/projects/linux-stable/fs/btrfs/ctree.h: 3391
0xa0901bcf :  mov0x35c(%rax),%edx

which is btrfs_calc_trunc_metadata_size. This corresponds to the
root->nodesize lines. Essentially the root of the inode being passed is NULL 
as evident by the content of RAX. Furthermore the btrfs_inode->vfs_inode has 
its 
various fields set to default initialization values. Looking further into the 
call 
stack it seems that btrfs_new_inode fails in some of its steps and calls iput. 
Concretely I believe this is the culprit: 

ret = btrfs_set_inode_index(dir, index);
  
if (ret) {  
  
btrfs_free_path(path);
iput(inode);
}

In this case if btrfs_set_inode_index fails and we call iput then, 
btrfs_evict_inode is going to be called with uninitialized inode 
which in turn leads to the null pointer deref. 

The only bogus value both inode structures have is the index_cnt: 
18446744073709551615 this is 2^64 

I'm happy to provide further info if necessary to help fix this. 

Regards, 
Nikolay 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adventures in btrfs raid5 disk recovery

2016-06-28 Thread Austin S. Hemmelgarn

On 2016-06-28 08:14, Steven Haigh wrote:

On 28/06/16 22:05, Austin S. Hemmelgarn wrote:

On 2016-06-27 17:57, Zygo Blaxell wrote:

On Mon, Jun 27, 2016 at 10:17:04AM -0600, Chris Murphy wrote:

On Mon, Jun 27, 2016 at 5:21 AM, Austin S. Hemmelgarn
 wrote:

On 2016-06-25 12:44, Chris Murphy wrote:

On Fri, Jun 24, 2016 at 12:19 PM, Austin S. Hemmelgarn
 wrote:

OK but hold on. During scrub, it should read data, compute checksums
*and* parity, and compare those to what's on-disk - > EXTENT_CSUM in
the checksum tree, and the parity strip in the chunk tree. And if
parity is wrong, then it should be replaced.


Except that's horribly inefficient.  With limited exceptions involving
highly situational co-processors, computing a checksum of a parity
block is
always going to be faster than computing parity for the stripe.  By
using
that to check parity, we can safely speed up the common case of near
zero
errors during a scrub by a pretty significant factor.


OK I'm in favor of that. Although somehow md gets away with this by
computing and checking parity for its scrubs, and still manages to
keep drives saturated in the process - at least HDDs, I'm not sure how
it fares on SSDs.


A modest desktop CPU can compute raid6 parity at 6GB/sec, a less-modest
one at more than 10GB/sec.  Maybe a bottleneck is within reach of an
array of SSDs vs. a slow CPU.

OK, great for people who are using modern desktop or server CPU's.  Not
everyone has that luxury, and even on many such CPU's, it's _still_
faster to computer CRC32c checksums.  On top of that, we don't appear to
be using the in-kernel parity-raid libraries (or if we are, I haven't
been able to find where we are calling the functions for it), so we
don't necessarily get assembly optimized or co-processor accelerated
computation of the parity itself.  The other thing that I didn't mention
above though, is that computing parity checksums will always take less
time than computing parity, because you have to process significantly
less data.  On a 4 disk RAID5 array, you're processing roughly 2/3 as
much data to do the parity checksums instead of parity itself, which
means that the parity computation would need to be 200% faster than the
CRC32c computation to break even, and this margin gets bigger and bigger
as you add more disks.

On small arrays, this obviously won't have much impact.  Once you start
to scale past a few TB though, even a few hundred MB/s faster processing
means a significant decrease in processing time.  Say you have a CPU
which gets about 12.0GB/s for RAID5 parity, and and about 12.25GB/s for
CRC32c (~2% is a conservative ratio assuming you use the CRC32c
instruction and assembly optimized RAID5 parity computations on a modern
x86_64 processor (the ratio on both the mobile Core i5 in my laptop and
the Xeon E3 in my home server is closer to 5%)).  Assuming those
numbers, and that we're already checking checksums on non-parity blocks,
processing 120TB of data in a 4 disk array (which gives 40TB of parity
data, so 160TB total) gives:
For computing the parity to scrub:
120TB / 12.25GB =  9795.9 seconds for processing CRC32c csums of all the
regular data
120TB / 12GB= 1 seconds for processing parity of all stripes
= 19795.9 seconds total
~ 5.4 hours total

For computing csums of the parity:
120TB / 12.25GB =  9795.9 seconds for processing CRC32c csums of all the
regular data
40TB / 12.25GB  =  3265.3 seconds for processing CRC32c csums of all the
parity data
= 13061.2 seconds total
~ 3.6 hours total

The checksum based computation is approximately 34% faster than the
parity computation.  Much of this of course is that you have to process
the regular data twice for the parity computation method (once for
csums, once for parity).  You could probably do one pass computing both
values, but that would need to be done carefully; and, without
significant optimization, would likely not get you much benefit other
than cutting the number of loads in half.


And it all means jack shit because you don't get the data to disk that
quick. Who cares if its 500% faster - if it still saturates the
throughput of the actual drives, what difference does it make?
It has less impact on everything else running on the system at the time 
because it uses less CPU time and potentially less memory.  This is the 
exact same reason that you want your RAID parity computation performance 
as good as possible, the less time the CPU spends on that, the more it 
can spend on other things.  On top of that, there are high-end systems 
that do have SSD's that can get multiple GB/s of data transfer per 
second, and NVDIMM's are starting to become popular in the server 
market, and those give you data transfer speeds equivalent to regular 
memory bandwidth (which can be well over 20GB/s on decent hardware (I've 
got a relatively inexpensive system using DDR3-1866 RAM that has roughly 
22-24GB/s of memory bandwidth)).  Looking

Re: Adventures in btrfs raid5 disk recovery

2016-06-28 Thread Steven Haigh
On 28/06/16 22:05, Austin S. Hemmelgarn wrote:
> On 2016-06-27 17:57, Zygo Blaxell wrote:
>> On Mon, Jun 27, 2016 at 10:17:04AM -0600, Chris Murphy wrote:
>>> On Mon, Jun 27, 2016 at 5:21 AM, Austin S. Hemmelgarn
>>>  wrote:
 On 2016-06-25 12:44, Chris Murphy wrote:
> On Fri, Jun 24, 2016 at 12:19 PM, Austin S. Hemmelgarn
>  wrote:
>
> OK but hold on. During scrub, it should read data, compute checksums
> *and* parity, and compare those to what's on-disk - > EXTENT_CSUM in
> the checksum tree, and the parity strip in the chunk tree. And if
> parity is wrong, then it should be replaced.

 Except that's horribly inefficient.  With limited exceptions involving
 highly situational co-processors, computing a checksum of a parity
 block is
 always going to be faster than computing parity for the stripe.  By
 using
 that to check parity, we can safely speed up the common case of near
 zero
 errors during a scrub by a pretty significant factor.
>>>
>>> OK I'm in favor of that. Although somehow md gets away with this by
>>> computing and checking parity for its scrubs, and still manages to
>>> keep drives saturated in the process - at least HDDs, I'm not sure how
>>> it fares on SSDs.
>>
>> A modest desktop CPU can compute raid6 parity at 6GB/sec, a less-modest
>> one at more than 10GB/sec.  Maybe a bottleneck is within reach of an
>> array of SSDs vs. a slow CPU.
> OK, great for people who are using modern desktop or server CPU's.  Not
> everyone has that luxury, and even on many such CPU's, it's _still_
> faster to computer CRC32c checksums.  On top of that, we don't appear to
> be using the in-kernel parity-raid libraries (or if we are, I haven't
> been able to find where we are calling the functions for it), so we
> don't necessarily get assembly optimized or co-processor accelerated
> computation of the parity itself.  The other thing that I didn't mention
> above though, is that computing parity checksums will always take less
> time than computing parity, because you have to process significantly
> less data.  On a 4 disk RAID5 array, you're processing roughly 2/3 as
> much data to do the parity checksums instead of parity itself, which
> means that the parity computation would need to be 200% faster than the
> CRC32c computation to break even, and this margin gets bigger and bigger
> as you add more disks.
> 
> On small arrays, this obviously won't have much impact.  Once you start
> to scale past a few TB though, even a few hundred MB/s faster processing
> means a significant decrease in processing time.  Say you have a CPU
> which gets about 12.0GB/s for RAID5 parity, and and about 12.25GB/s for
> CRC32c (~2% is a conservative ratio assuming you use the CRC32c
> instruction and assembly optimized RAID5 parity computations on a modern
> x86_64 processor (the ratio on both the mobile Core i5 in my laptop and
> the Xeon E3 in my home server is closer to 5%)).  Assuming those
> numbers, and that we're already checking checksums on non-parity blocks,
> processing 120TB of data in a 4 disk array (which gives 40TB of parity
> data, so 160TB total) gives:
> For computing the parity to scrub:
> 120TB / 12.25GB =  9795.9 seconds for processing CRC32c csums of all the
> regular data
> 120TB / 12GB= 1 seconds for processing parity of all stripes
> = 19795.9 seconds total
> ~ 5.4 hours total
> 
> For computing csums of the parity:
> 120TB / 12.25GB =  9795.9 seconds for processing CRC32c csums of all the
> regular data
> 40TB / 12.25GB  =  3265.3 seconds for processing CRC32c csums of all the
> parity data
> = 13061.2 seconds total
> ~ 3.6 hours total
> 
> The checksum based computation is approximately 34% faster than the
> parity computation.  Much of this of course is that you have to process
> the regular data twice for the parity computation method (once for
> csums, once for parity).  You could probably do one pass computing both
> values, but that would need to be done carefully; and, without
> significant optimization, would likely not get you much benefit other
> than cutting the number of loads in half.

And it all means jack shit because you don't get the data to disk that
quick. Who cares if its 500% faster - if it still saturates the
throughput of the actual drives, what difference does it make?

I'm all for actual solutions, but the nirvana fallacy seems to apply here...

-- 
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897



signature.asc
Description: OpenPGP digital signature


Re: Adventures in btrfs raid5 disk recovery

2016-06-28 Thread Austin S. Hemmelgarn

On 2016-06-27 17:57, Zygo Blaxell wrote:

On Mon, Jun 27, 2016 at 10:17:04AM -0600, Chris Murphy wrote:

On Mon, Jun 27, 2016 at 5:21 AM, Austin S. Hemmelgarn
 wrote:

On 2016-06-25 12:44, Chris Murphy wrote:

On Fri, Jun 24, 2016 at 12:19 PM, Austin S. Hemmelgarn
 wrote:

OK but hold on. During scrub, it should read data, compute checksums
*and* parity, and compare those to what's on-disk - > EXTENT_CSUM in
the checksum tree, and the parity strip in the chunk tree. And if
parity is wrong, then it should be replaced.


Except that's horribly inefficient.  With limited exceptions involving
highly situational co-processors, computing a checksum of a parity block is
always going to be faster than computing parity for the stripe.  By using
that to check parity, we can safely speed up the common case of near zero
errors during a scrub by a pretty significant factor.


OK I'm in favor of that. Although somehow md gets away with this by
computing and checking parity for its scrubs, and still manages to
keep drives saturated in the process - at least HDDs, I'm not sure how
it fares on SSDs.


A modest desktop CPU can compute raid6 parity at 6GB/sec, a less-modest
one at more than 10GB/sec.  Maybe a bottleneck is within reach of an
array of SSDs vs. a slow CPU.
OK, great for people who are using modern desktop or server CPU's.  Not 
everyone has that luxury, and even on many such CPU's, it's _still_ 
faster to computer CRC32c checksums.  On top of that, we don't appear to 
be using the in-kernel parity-raid libraries (or if we are, I haven't 
been able to find where we are calling the functions for it), so we 
don't necessarily get assembly optimized or co-processor accelerated 
computation of the parity itself.  The other thing that I didn't mention 
above though, is that computing parity checksums will always take less 
time than computing parity, because you have to process significantly 
less data.  On a 4 disk RAID5 array, you're processing roughly 2/3 as 
much data to do the parity checksums instead of parity itself, which 
means that the parity computation would need to be 200% faster than the 
CRC32c computation to break even, and this margin gets bigger and bigger 
as you add more disks.


On small arrays, this obviously won't have much impact.  Once you start 
to scale past a few TB though, even a few hundred MB/s faster processing 
means a significant decrease in processing time.  Say you have a CPU 
which gets about 12.0GB/s for RAID5 parity, and and about 12.25GB/s for 
CRC32c (~2% is a conservative ratio assuming you use the CRC32c 
instruction and assembly optimized RAID5 parity computations on a modern 
x86_64 processor (the ratio on both the mobile Core i5 in my laptop and 
the Xeon E3 in my home server is closer to 5%)).  Assuming those 
numbers, and that we're already checking checksums on non-parity blocks, 
processing 120TB of data in a 4 disk array (which gives 40TB of parity 
data, so 160TB total) gives:

For computing the parity to scrub:
120TB / 12.25GB =  9795.9 seconds for processing CRC32c csums of all the 
regular data

120TB / 12GB= 1 seconds for processing parity of all stripes
= 19795.9 seconds total
~ 5.4 hours total

For computing csums of the parity:
120TB / 12.25GB =  9795.9 seconds for processing CRC32c csums of all the 
regular data
40TB / 12.25GB  =  3265.3 seconds for processing CRC32c csums of all the 
parity data

= 13061.2 seconds total
~ 3.6 hours total

The checksum based computation is approximately 34% faster than the 
parity computation.  Much of this of course is that you have to process 
the regular data twice for the parity computation method (once for 
csums, once for parity).  You could probably do one pass computing both 
values, but that would need to be done carefully; and, without 
significant optimization, would likely not get you much benefit other 
than cutting the number of loads in half.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adventures in btrfs raid5 disk recovery

2016-06-28 Thread Austin S. Hemmelgarn

On 2016-06-27 23:17, Zygo Blaxell wrote:

On Mon, Jun 27, 2016 at 08:39:21PM -0600, Chris Murphy wrote:

On Mon, Jun 27, 2016 at 7:52 PM, Zygo Blaxell
 wrote:

On Mon, Jun 27, 2016 at 04:30:23PM -0600, Chris Murphy wrote:

Btrfs does have something of a work around for when things get slow,
and that's balance, read and rewrite everything. The write forces
sector remapping by the drive firmware for bad sectors.


It's a crude form of "resilvering" as ZFS calls it.


In what manner is it crude?


Balance relocates extents, looks up backrefs, and rewrites metadata, all
of which are extra work above what is required by resilvering (and extra
work that is proportional to the number of backrefs and the (currently
extremely poor) performance of the backref walking code, so snapshots
and large files multiply the workload).

Resilvering should just read data, reconstruct it from a mirror if
necessary, and write it back to the original location (or read one
mirror and rewrite the other).  That's more like what scrub does, except
scrub rewrites only the blocks it couldn't read (or that failed csum).
It's worth pointing out that balance was not designed for resilvering, 
it was designed for reshaping arrays, converting replication profiles, 
and compaction at the chunk level.  Balance is not a resilvering tool, 
that just happens to be a useful side effect of running a balance 
(actually, so is the chunk level compaction).



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: invalid opcode 0000 / kernel bug with defect HDD

2016-06-28 Thread Duncan
Peter Becker posted on Tue, 28 Jun 2016 10:16:58 +0200 as excerpted:

> Cause of kernel bugs was a defective HDD (/dev/sdd).

Just a short note to mention that invalid opcode  doesn't say much.  
I'm just another user and list regular, but apparently, opcode  is 
used as a deliberate way to trigger a kernel bugon, when the kernel hits 
code that it isn't expected to ever hit in normal operations.  Given that 
common usage, opcode  itself says very little except that something 
went wrong, which we already know given the triggered backtrace.

While the  opcode isn't of much help, however, devs can make more 
sense of the backtrace, particularly in cases such as this where it has 
the file and line number that triggered the bugon.  That and the other 
information provided does help quite a bit to trace down the bug.

(At first I was wondering about all these opcode  traces too, until 
someone explained that to me.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/4] btrfs feature testing + props fix

2016-06-28 Thread Eryu Guan
On Mon, Jun 27, 2016 at 04:14:10PM -0400, je...@suse.com wrote:
> From: Jeff Mahoney 
> 
> Hi all -
> 
> Thanks, Eryu, for the review.  The btrfs feature testing changes were a
> patchet I wrote three years ago, and it looks like significant cleanup
> has happened in the xfstests since then.  I'm sorry for the level of the
> review you had to do for them, but do appreciate that you did.

No problem :) Thanks for updating the tests! I'll review again, but that
may take some time.

Thanks,
Eryu

> 
> This version should fix the outstanding issues, including some issues
> with the tests themselves, where e.g. the 32MB reserved size was file
> system-size (and implementation) dependent.  Most notably, since these
> tests share some common functionality that ultimately hit ~250 lines, I
> chose to create a new common/btrfs library.  Other than that, I tried to
> meet the level of consistency you were looking for with just printing
> errors instead of failing, not depending on error codes, etc.
> 
> Thanks,
> 
> -Jeff
> 
> ---
> 
> Jeff Mahoney (4):
>   btrfs/048: extend _filter_btrfs_prop_error to handle additional errors
>   btrfs/124: test global metadata reservation reporting
>   btrfs/125: test sysfs exports of allocation and device membership info
>   btrfs/126,127,128: test feature ioctl and sysfs interfaces
> 
>  .gitignore   |   1 +
>  common/btrfs | 253 
> +++
>  common/config|   7 +-
>  common/filter.btrfs  |  10 +-
>  src/Makefile |   3 +-
>  src/btrfs_ioctl_helper.c | 220 +
>  tests/btrfs/048  |   6 +-
>  tests/btrfs/048.out  |   4 +-
>  tests/btrfs/124  |  84 
>  tests/btrfs/124.out  |   1 +
>  tests/btrfs/125  | 177 +
>  tests/btrfs/125.out  |   1 +
>  tests/btrfs/126  | 244 +
>  tests/btrfs/126.out  |   1 +
>  tests/btrfs/127  | 166 +++
>  tests/btrfs/127.out  |   1 +
>  tests/btrfs/128  | 128 
>  tests/btrfs/128.out  |   1 +
>  tests/btrfs/group|   5 +
>  19 files changed, 1302 insertions(+), 11 deletions(-)
>  create mode 100644 common/btrfs
>  create mode 100644 src/btrfs_ioctl_helper.c
>  create mode 100755 tests/btrfs/124
>  create mode 100644 tests/btrfs/124.out
>  create mode 100755 tests/btrfs/125
>  create mode 100644 tests/btrfs/125.out
>  create mode 100755 tests/btrfs/126
>  create mode 100644 tests/btrfs/126.out
>  create mode 100755 tests/btrfs/127
>  create mode 100644 tests/btrfs/127.out
>  create mode 100755 tests/btrfs/128
>  create mode 100644 tests/btrfs/128.out
> 
> -- 
> 1.8.5.6
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


invalid opcode 0000 / kernel bug with defect HDD

2016-06-28 Thread Peter Becker
Cause of kernel bugs was a defective HDD (/dev/sdd).

The kernel BUG:

May 16 07:41:38 nas kernel: [37168.832800]
btrfs_dev_stat_print_on_error: 470 callbacks suppressed
May 16 07:41:38 nas kernel: [37168.832806] BTRFS error (device sdd):
bdev /dev/sdb errs: wr 49293, rd 567248, flush 0, corrupt 0, gen 0
May 16 07:41:38 nas kernel: [37168.832843] BTRFS error (device sdd):
bdev /dev/sdf errs: wr 0, rd 537544, flush 0, corrupt 0, gen 0
May 16 07:41:38 nas kernel: [37168.832887] BTRFS error (device sdd):
bdev /dev/sdb errs: wr 49293, rd 567249, flush 0, corrupt 0, gen 0
May 16 07:41:38 nas kernel: [37168.832893] BTRFS error (device sdd):
bdev /dev/sdf errs: wr 0, rd 537545, flush 0, corrupt 0, gen 0
May 16 07:41:38 nas kernel: [37168.832969] BTRFS error (device sdd):
bdev /dev/sdb errs: wr 49293, rd 567250, flush 0, corrupt 0, gen 0
May 16 07:41:38 nas kernel: [37168.832977] BTRFS error (device sdd):
bdev /dev/sdf errs: wr 0, rd 537546, flush 0, corrupt 0, gen 0
May 16 07:41:38 nas kernel: [37168.832987] BTRFS error (device sdd):
bdev /dev/sdb errs: wr 49293, rd 567251, flush 0, corrupt 0, gen 0
May 16 07:41:38 nas kernel: [37168.832992] BTRFS error (device sdd):
bdev /dev/sdf errs: wr 0, rd 537547, flush 0, corrupt 0, gen 0
May 16 07:41:38 nas kernel: [37168.862127] BTRFS error (device sdd):
bdev /dev/sdf errs: wr 0, rd 537548, flush 0, corrupt 0, gen 0
May 16 07:41:38 nas kernel: [37168.862188] BTRFS error (device sdd):
bdev /dev/sdb errs: wr 49293, rd 567252, flush 0, corrupt 0, gen 0
May 16 07:41:42 nas kernel: [37173.103386] [ cut here ]
May 16 07:41:42 nas kernel: [37173.103414] kernel BUG at
/home/kernel/COD/linux/fs/btrfs/ctree.c:5201!
May 16 07:41:42 nas kernel: [37173.103434] invalid opcode:  [#1] SMP
May 16 07:41:42 nas kernel: [37173.103450] Modules linked in: cpuid
xt_nat veth xt_addrtype xt_conntrack br_netfilter dm_thin_pool
dm_persistent_data dm_bio_prison dm_bufio libcrc32c nvram msr
input_leds joydev hid_generic usbhid hid xt_CHECKSUM iptable_mangle
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp
llc iptable_filter ip_tables x_tables autofs4 eeepc_wmi asus_wmi
sparse_keymap intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd
dm_multipath serio_raw snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic snd_hda_intel snd_hda_codec bnep rfcomm
snd_hda_core bluetooth snd_hwdep snd_pcm snd_seq_midi
snd_seq_midi_event snd_rawmidi lpc_ich snd_seq snd_seq_device mei_me
snd_timer mei snd soundcore mac_hid shpchp nfsd auth_rpcgss parport_pc
nfs_acl ppdev nfs nct6775 hwmon_vid binfmt_misc coretemp lockd grace
lp parport sunrpc fscache btrfs xor raid6_pq nls_iso8859_1 dm_mirror
dm_region_hash dm_log uas usb_storage i915 e1000e psmouse ahci libahci
i2c_algo_bit drm_kms_helper ptp pps_core syscopyarea sysfillrect
sysimgblt fb_sys_fops drm video fjes wmi
May 16 07:41:42 nas kernel: [37173.103893] CPU: 1 PID: 17784 Comm:
btrfs Tainted: G U  W   4.5.4-040504-generic #201605120823
May 16 07:41:42 nas kernel: [37173.103916] Hardware name: ASUS All
Series/H87I-PLUS, BIOS 2003 11/05/2014
May 16 07:41:42 nas kernel: [37173.103932] task: 88020501c240 ti:
880161f18000 task.ti: 880161f18000
May 16 07:41:42 nas kernel: [37173.103950] RIP:
0010:[]  []
btrfs_search_forward+0x24d/0x330 [btrfs]
May 16 07:41:42 nas kernel: [37173.103995] RSP: 0018:880161f1bc38
EFLAGS: 00010246
May 16 07:41:42 nas kernel: [37173.104009] RAX:  RBX:
 RCX: 0001
May 16 07:41:42 nas kernel: [37173.104029] RDX: 0001 RSI:
091572628000 RDI: 8801ff2fa368
May 16 07:41:42 nas kernel: [37173.104048] RBP: 880161f1bc98 R08:
091571c0 R09: 0915b1c0
May 16 07:41:42 nas kernel: [37173.104067] R10: 880161f1ba30 R11:
 R12: 8801d71b6930
May 16 07:41:42 nas kernel: [37173.104086] R13: 0001 R14:
 R15: 
May 16 07:41:42 nas kernel: [37173.104106] FS:  7ff94d968900()
GS:88021fb0() knlGS:
May 16 07:41:42 nas kernel: [37173.104129] CS:  0010 DS:  ES: 
CR0: 80050033
May 16 07:41:42 nas kernel: [37173.104144] CR2: 7ff94ca80a00 CR3:
000214a96000 CR4: 000406e0
May 16 07:41:42 nas kernel: [37173.104164] Stack:
May 16 07:41:42 nas kernel: [37173.104173]  00ff8801
880161f1bcd7 8800d7699800 c3ff88010001
May 16 07:41:42 nas kernel: [37173.104203]  0119
 7539a5b8 8801d71b6930
May 16 07:41:42 nas kernel: [37173.104233]  880161f1bd30
 880161f1bcd7 
May 16 07:41:42 nas kernel: [37173.104262] Call Trace:
May 16 07:41:42 nas kernel: [37173.104329]  []
search_ioctl+0xed/0x1b0 [btrfs]
May 16 07:41:42 nas kernel: [37173.104388]  []
btrfs_ioctl_tree_search+0x7