[PATCH 4/4] btrfs: qgroup: Use generation aware subtree swap to mark dirty extents

2018-09-03 Thread Qu Wenruo
Before this patch, for quota enabled balance, btrfs needs to mark the
whole subtree dirty for quota.

E.g.
OO = Old tree blocks (from file tree)
NN = New tree blocks (from tree reloc tree)

File tree (src) Tree reloc tree (dst)
OO (a)  NN (a)
   /  \/  \
 (b) OOOO (c)(b) NNNN (c)
/  \  /  \  /  \  /  \
   OO  OO OO OO (d)OO  OO OO NN (d)

For old balance + quota case, quota will mark the whole src and dst tree
dirty, including all the 3 old tree blocks in tree reloc tree.

It's doable for small file tree or new tree blocks are all
located at lower level.

But for large file tree or new tree blocks are all located at higher
level, this will lead to mark the whole tree dirty, and be unbelievably
slow.

This patch will change how we handle such balance with quota enabled
case.

Now we will search from (b) and (c) for any new tree blocks whose generation
is equal to @last_snapshot, and only mark them dirty.

In above case, we only need to trace tree blocks NN(b), NN(c) and NN(d).
(NN(a) will be traced when CoW happens for nodeptr modification).
And also for tree blocks OO(b), OO(c), OO(d). (OO(a) will be traced when
CoW happens for nodeptr modification)

For above case, we could skip 3 tree blocks, but for larger tree, we can
skip tons of unmodified tree blocks, and hugely speed up balance.

This patch will introduce a new function,
btrfs_qgroup_trace_subtree_swap(), which will do the following main
work:

1) Read out real root eb
   And setup basic dst_path for later calls
2) Call qgroup_trace_new_subtree_blocks()
   To trace all new tree blocks in tree reloc tree and their counter
   parts in file tree.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/qgroup.c | 94 +++
 fs/btrfs/qgroup.h | 10 +
 fs/btrfs/relocation.c | 11 ++---
 3 files changed, 107 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 31d2a70c9f27..d71c2e523c27 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1942,6 +1942,100 @@ static int qgroup_trace_new_subtree_blocks(struct 
btrfs_trans_handle* trans,
return ret;
 }
 
+/*
+ * For balance subtree swap.
+ *
+ * Will go down the tree block pointed by @dst_eb (pointed by @dst_parent and
+ * @dst_slot), and find any tree blocks whose generation is at @last_snapshot,
+ * and then go down @src_eb (pointed by @src_parent and @src_slot) to find
+ * the conterpart of the tree block, then mark both tree blocks as qgroup 
dirty,
+ * and skip all tree blocks whose generation is smaller than last_snapshot.
+ *
+ * This would skip tons of tree blocks of original 
btrfs_qgroup_trace_subtree(),
+ * which is the culprit of super slow balance if the file tree is large.
+ *
+ * @src_parent, @src_slot: pointer to src (file tree) eb.
+ * @dst_parent, @dst_slot: pointer to dst (tree reloc tree) eb.
+ */
+int btrfs_qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans,
+   struct extent_buffer *src_parent, int src_slot,
+   struct extent_buffer *dst_parent, int dst_slot,
+   u64 last_snapshot)
+{
+   struct btrfs_fs_info *fs_info = trans->fs_info;
+   struct btrfs_path *dst_path = NULL;
+   struct btrfs_key first_key;
+   struct extent_buffer *src_eb = NULL;
+   struct extent_buffer *dst_eb = NULL;
+   u64 child_gen;
+   u64 child_bytenr;
+   int level;
+   int ret;
+
+   if (!test_bit(BTRFS_FS_QUOTA_ENABLED, _info->flags))
+   return 0;
+
+   /* Wrong parameter order */
+   BUG_ON(btrfs_node_ptr_generation(src_parent, src_slot) >
+   btrfs_node_ptr_generation(dst_parent, dst_slot));
+
+   /* Read out real @src_eb, pointed by @src_parent and @src_slot */
+   child_bytenr = btrfs_node_blockptr(src_parent, src_slot);
+   child_gen = btrfs_node_ptr_generation(src_parent, src_slot);
+   btrfs_node_key_to_cpu(src_parent, _key, src_slot);
+
+   src_eb = read_tree_block(fs_info, child_bytenr, child_gen,
+   btrfs_header_level(src_parent) - 1, _key);
+   if (IS_ERR(src_eb)) {
+   ret = PTR_ERR(src_eb);
+   goto out;
+   }
+
+   /* Read out real @dst_eb, pointed by @src_parent and @src_slot */
+   child_bytenr = btrfs_node_blockptr(dst_parent, dst_slot);
+   child_gen = btrfs_node_ptr_generation(dst_parent, dst_slot);
+   btrfs_node_key_to_cpu(dst_parent, _key, dst_slot);
+
+   dst_eb = read_tree_block(fs_info, child_bytenr, child_gen,
+   btrfs_header_level(dst_parent) - 1, _key);
+   if (IS_ERR(dst_eb)) {
+   ret = PTR_ERR(dst_eb);
+   goto out;
+   }
+
+   if (!extent_buffer_uptodate(src_eb) || !extent_buffer_uptodate(dst_eb)) 
{

[PATCH 3/4] btrfs: qgroup: Introduce function to find all new tree blocks of tree reloc tree

2018-09-03 Thread Qu Wenruo
Introduce new function, qgroup_trace_new_subtree_blocks(), to iterate
all new tree blocks in a tree reloc tree.
So that qgroup could skip unrelated tree blocks during balance, which
should hugely speedup balance speed when quota is enabled.

The function qgroup_trace_new_subtree_blocks() itself only cares about
new tree blocks in tree reloc tree.

All its main works are:

1) Read out tree blocks according to parent pointers

2) Do recursive depth-first search
   Will call the same function on all its children tree blocks, with
   search level set to current level -1.
   And will also skip all children whose generation is smaller than
   @last_snapshot.

3) Call qgroup_trace_extent_swap() to trace tree blocks

So although we have parameter list related to source file tree, it's not
used at all, but only passed to qgroup_trace_extent_swap().
Thus despite the tree read code, the core should be pretty short and all
about recursive depth-first search.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/qgroup.c | 114 ++
 1 file changed, 114 insertions(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index e748fdd3a30c..31d2a70c9f27 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1828,6 +1828,120 @@ static int qgroup_trace_extent_swap(struct 
btrfs_trans_handle* trans,
return ret;
 }
 
+/*
+ * Helper function to do recursive generation aware depth-first search, to
+ * locate all new tree blocks in a subtree of tree reloc tree.
+ *
+ * E.g. (OO = Old tree blocks, NN = New tree blocks, whose gen == 
last_snapshot)
+ *   Tree reloc tree
+ * L2 NN (a)
+ *  /\
+ * L1OONN (b)
+ *  /  \  /  \
+ * L0  OO  OOOO  NN
+ *   (c) (d)
+ * If we pass:
+ * @dst_path = [ nodes[1] = NN(b), nodes[0] = NULL ],
+ * @cur_level = 1
+ * @root_level = 1
+ *
+ * We will iterate through tree blocks NN(b), NN(d) and info qgroup to trace
+ * above tree blocks along with their counter parts in file tree.
+ * While during search, old tree blocsk OO(c) will be skiped as tree block swap
+ * won't affect OO(c).
+ */
+static int qgroup_trace_new_subtree_blocks(struct btrfs_trans_handle* trans,
+  struct extent_buffer *src_eb,
+  struct btrfs_path *dst_path,
+  int cur_level, int root_level,
+  u64 last_snapshot)
+{
+   struct btrfs_fs_info *fs_info = trans->fs_info;
+   struct extent_buffer *eb;
+   bool need_cleanup = false;
+   int ret = 0;
+   int i;
+
+   /* Read the tree block if needed */
+   if (dst_path->nodes[cur_level] == NULL) {
+   struct btrfs_key first_key;
+   int parent_slot;
+   u64 child_gen;
+   u64 child_bytenr;
+
+   /*
+* We need to get child blockptr/gen from parent before
+* we can read it.
+ */
+   eb = dst_path->nodes[cur_level + 1];
+   parent_slot = dst_path->slots[cur_level + 1];
+   child_bytenr = btrfs_node_blockptr(eb, parent_slot);
+   child_gen = btrfs_node_ptr_generation(eb, parent_slot);
+   btrfs_node_key_to_cpu(eb, _key, parent_slot);
+
+   /* This node is old, no need to trace */
+   if (child_gen < last_snapshot)
+   goto out;
+
+   eb = read_tree_block(fs_info, child_bytenr, child_gen,
+cur_level, _key);
+   if (IS_ERR(eb)) {
+   ret = PTR_ERR(eb);
+   goto out;
+   } else if (!extent_buffer_uptodate(eb)) {
+   free_extent_buffer(eb);
+   ret = -EIO;
+   goto out;
+   }
+
+   dst_path->nodes[cur_level] = eb;
+   dst_path->slots[cur_level] = 0;
+
+   btrfs_tree_read_lock(eb);
+   btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
+   dst_path->locks[cur_level] = BTRFS_READ_LOCK_BLOCKING;
+   need_cleanup = true;
+   }
+
+   /* Now record this tree block and its counter part for qgroups */
+   ret = qgroup_trace_extent_swap(trans, src_eb, dst_path, cur_level,
+  root_level);
+   if (ret < 0)
+   goto cleanup;
+
+   eb = dst_path->nodes[cur_level];
+
+   if (cur_level > 0) {
+   /* Iterate all children tree blocks */
+   for (i = 0; i < btrfs_header_nritems(eb); i++) {
+   /* Skip old tree blocks as they won't be swapped */
+   if (btrfs_node_ptr_generation(eb, i) < last_snapshot)
+   continue;
+   dst_path->slots[cur_level] = i;
+
+   /* Recursive 

[PATCH 2/4] btrfs: qgroup: Introduce function to trace two swaped extents

2018-09-03 Thread Qu Wenruo
Introduce a new function, qgroup_trace_extent_swap(), which will be used
later for balance qgroup speedup.

The basis idea of balance is swapping tree blocks between tree reloc
tree and the real file tree.

The swap will happen in highest tree block, but there may be a lot of
tree blocks involved.

For example:
 OO = Old tree blocks
 NN = New tree blocks allocated during balance

  File tree (257)   Tree reloc tree for 257
L2  OONN
  /\/\
L1  OO  OO (a)OO  NN (a)
   / \ / \   / \ / \
L0   OO   OO OO   OO   OO   OO NN   NN
 (b)  (c)  (b)  (c)

When calling qgroup_trace_extent_swap(), we will pass:
@src_eb = OO(a)
@dst_path = [ nodes[1] = NN(a), nodes[0] = NN(c) ]
@dst_level = 0
@root_level = 1

In that case, qgroup_trace_extent_swap() will search from OO(a) to
reach OO(c), then mark both OO(c) and NN(c) as qgroup dirty.

The main work of qgroup_trace_extent_swap() can be split into 3 parts:

1) Tree search from @src_eb
   It should acts as a simplified btrfs_search_slot().
   The key for search can be extracted from @dst_path->nodes[dst_level]
   (first key).

2) Mark the final tree blocks in @src_path and @dst_path qgroup dirty
   NOTE: In above case, OO(a) and NN(a) won't be marked qgroup dirty.
   They should be marked during preivous (@dst_level = 1) iteration.

3) Mark file extents in leaves dirty
   We don't have good way to pick out new file extents only.
   So we still follow the old method by scanning all file extents in
   the leave.

This function can free us from keeping two pathes, thus later we only need
to care about how to iterate all new tree blocks in tree reloc tree.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/qgroup.c | 115 ++
 1 file changed, 115 insertions(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 5977eedc00d8..e748fdd3a30c 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1713,6 +1713,121 @@ static int adjust_slots_upwards(struct btrfs_path 
*path, int root_level)
return 0;
 }
 
+/*
+ * helper function to trace a subtree tree block swap.
+ *
+ * The swapped tree block is marked by @dst_path->nodes[level].
+ * Thus @dst_path should have been populated already.
+ *
+ * While for src tree block, we only need the root eb as @src_eb.
+ * The needed path and search will done by this function.
+ */
+static int qgroup_trace_extent_swap(struct btrfs_trans_handle* trans,
+   struct extent_buffer *src_eb,
+   struct btrfs_path *dst_path,
+   int dst_level, int root_level)
+{
+   struct btrfs_key key;
+   struct btrfs_path *src_path;
+   struct btrfs_fs_info *fs_info = trans->fs_info;
+   u32 nodesize = fs_info->nodesize;
+   int cur_level = root_level;
+   int ret;
+
+   BUG_ON(dst_level > root_level);
+   /* Level mismatch */
+   if (btrfs_header_level(src_eb) != root_level)
+   return -EINVAL;
+
+   src_path = btrfs_alloc_path();
+   if (!src_path) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   if (dst_level)
+   btrfs_node_key_to_cpu(dst_path->nodes[dst_level], , 0);
+   else
+   btrfs_item_key_to_cpu(dst_path->nodes[dst_level], , 0);
+
+   /* For src_path */
+   extent_buffer_get(src_eb);
+   src_path->nodes[root_level] = src_eb;
+   src_path->slots[root_level] = 0;
+   src_path->locks[root_level] = 0;
+
+   /* A simplified version of btrfs_search_slot() */
+   while (cur_level >= dst_level) {
+   if (src_path->nodes[cur_level] == NULL) {
+   struct btrfs_key first_key;
+   struct extent_buffer *eb;
+   int parent_slot;
+   u64 child_gen;
+   u64 child_bytenr;
+
+   eb = src_path->nodes[cur_level + 1];
+   parent_slot = src_path->slots[cur_level + 1];
+   child_bytenr = btrfs_node_blockptr(eb, parent_slot);
+   child_gen = btrfs_node_ptr_generation(eb, parent_slot);
+   btrfs_node_key_to_cpu(eb, _key, parent_slot);
+
+   eb = read_tree_block(fs_info, child_bytenr, child_gen,
+cur_level, _key);
+   if (IS_ERR(eb)) {
+   ret = PTR_ERR(eb);
+   goto out;
+   } else if (!extent_buffer_uptodate(eb)) {
+   free_extent_buffer(eb);
+   ret = -EIO;
+   goto out;
+   }
+
+

[PATCH 0/4] btrfs: quota: Skip unmodified tree blocks for balance

2018-09-03 Thread Qu Wenruo
This patchset can be fetched from github:
https://github.com/adam900710/linux/tree/qgroup_balance_skip_trees
The base commit is v4.19-rc1 tag.

There are a lot of reports of system hang for balance on quota enabled
fs.
It's most obvious for large fs.

The hang is caused by tons of unmodified extents marked as qgroup dirty.
One of the unmodified extent source is tree blocks.
(BTW, other sources includes unmodified file extent items, and tree
reloc tree drop subtree)

E.g.
OO = Old tree blocks from file tree
NN = New tree blocks from tree reloc tree

file treetree reloc tree
   OO (a)  NN (a)
  /  \/  \
(b) OOOO (c)(b) NNNN (c)
   / \   / \   / \   / \
 OO  OO OO  OO   OO  OO OO  NN
(d) (e) (f) (g) (d) (e) (f) (g)

In above case, balance will modify nodeptr in OO(a) to point NN(b) and
NN(c), and modify NN(a) to point to OO(B) and OO(c).

Before this patch, quota will mark the whole subtree from its parent
down to the leaves as dirty.
So btrfs quota need to trace all tree block from (a) to (g).

However tree blocks (d) (e) (f) are shared between both trees, thus
there is no need to trace those 3 tree blocks.

This patchset will change how this work by only tracing modified tree
blocks in tree reloc tree, and their counter parts in file tree.

Nodeptr swap will happen for tree blocks (b) and (c) in both tree.

For tree block (b), in tree reloc tree we could find that all its
children's generation is smaller than last_snapshot, thus no need to
trace them, only need to trace NN(b), and its counter part OO(b).

For tree block (c), in tree reloc tree, we find its child NN(g) need
tracing, and for tree block NN(g), there is no child need to trace.

So for subtree starting at tree block NN(c), we need to trace NN(c) and
NN(g), along with its counter part OO(c) and OO(c).

With this patch, we could skip tree blocks OO(d)~OO(f) in above example,
thus reduce some some overhead caused by qgroup.

The improvement is mostly related to metadata relocation.
If there is some high level tree blocks get relocated but its children are
still unmodified, we could save a lot of time.

Even for the worst case, it should be no worse than original full
subtree marking method.

Qu Wenruo (4):
  btrfs: qgroup: Introduce trace event to analyse the number of dirty
extents accounted
  btrfs: qgroup: Introduce function to trace two swaped extents
  btrfs: qgroup: Introduce function to find all new tree blocks of tree
reloc tree
  btrfs: qgroup: Use generation aware subtree swap to mark dirty extents

 fs/btrfs/qgroup.c| 327 +++
 fs/btrfs/qgroup.h|  10 ++
 fs/btrfs/relocation.c|  11 +-
 include/trace/events/btrfs.h |  21 +++
 4 files changed, 361 insertions(+), 8 deletions(-)

-- 
2.18.0



[PATCH 1/4] btrfs: qgroup: Introduce trace event to analyse the number of dirty extents accounted

2018-09-03 Thread Qu Wenruo
Number of qgroup dirty extents is directly linked to the performance
overhead, so add a new trace event, trace_qgroup_num_dirty_extents(), to
record how many dirty extents is processed in
btrfs_qgroup_account_extents().

This will be pretty handy to analyse later balance performance
improvement.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/qgroup.c|  4 
 include/trace/events/btrfs.h | 21 +
 2 files changed, 25 insertions(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 4353bb69bb86..5977eedc00d8 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2133,6 +2133,7 @@ int btrfs_qgroup_account_extents(struct 
btrfs_trans_handle *trans)
struct btrfs_delayed_ref_root *delayed_refs;
struct ulist *new_roots = NULL;
struct rb_node *node;
+   u64 num_dirty_extents = 0;
u64 qgroup_to_skip;
int ret = 0;
 
@@ -2142,6 +2143,7 @@ int btrfs_qgroup_account_extents(struct 
btrfs_trans_handle *trans)
record = rb_entry(node, struct btrfs_qgroup_extent_record,
  node);
 
+   num_dirty_extents++;
trace_btrfs_qgroup_account_extents(fs_info, record);
 
if (!ret) {
@@ -2187,6 +2189,8 @@ int btrfs_qgroup_account_extents(struct 
btrfs_trans_handle *trans)
kfree(record);
 
}
+   trace_qgroup_num_dirty_extents(fs_info, trans->transid,
+  num_dirty_extents);
return ret;
 }
 
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index b401c4e36394..79253666d1d0 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -1575,6 +1575,27 @@ DEFINE_EVENT(btrfs_qgroup_extent, 
btrfs_qgroup_trace_extent,
TP_ARGS(fs_info, rec)
 );
 
+TRACE_EVENT(qgroup_num_dirty_extents,
+
+   TP_PROTO(const struct btrfs_fs_info *fs_info, u64 transid,
+u64 num_dirty_extents),
+
+   TP_ARGS(fs_info, transid, num_dirty_extents),
+
+   TP_STRUCT__entry_btrfs(
+   __field(u64, transid)
+   __field(u64, num_dirty_extents  )
+   ),
+
+   TP_fast_assign_btrfs(fs_info,
+   __entry->transid   = transid;
+   __entry->num_dirty_extents = num_dirty_extents;
+   ),
+
+   TP_printk_btrfs("transid=%llu num_dirty_extents=%llu",
+   __entry->transid, __entry->num_dirty_extents)
+);
+
 TRACE_EVENT(btrfs_qgroup_account_extent,
 
TP_PROTO(const struct btrfs_fs_info *fs_info, u64 transid, u64 bytenr,
-- 
2.18.0



Re: Does ssd auto detected work for microSD cards?

2018-09-03 Thread Chris Murphy
On Mon, Sep 3, 2018 at 7:53 PM, GWB  wrote:
> Curious instance here, but perhaps this is the expected behaviour:
>
> mount | grep btrfs
> /dev/sdb3 on / type btrfs (rw,ssd,subvol=@)
> /dev/sdb3 on /home type btrfs (rw,ssd,subvol=@home)
> /dev/sde1 on /media/gwb09/btrfs-32G-MicroSDc type btrfs
> (rw,nosuid,nodev,uhelper=udisks2)
>
> This is on an Ubuntu 14 client.
>
> /dev/sdb is indeed an ssd, a Samsung 850 EVO 500Gig, where Ubuntu runs
> on btrfs root.   It appears btrfs did indeed auto detected an ssd
> drive.   However:
>
> /dev/sde is a micro SD card (32Gig Samsung) sitting in a USB 3 card
> reader, inserted into a USB 3 card slot.  But ssh is not detected.
>
> So is that the expected behavior?

cat /sys/block/sde/queue/rotational

That's what Btrfs uses for detection. I'm willing to bet the SD Card
slot is not using the mmc driver, but instead USB and therefore always
treated as a rotational device.


> If not, does it make a difference?
>
> Would it be best to mount an sd card with ssd_spread?

For the described use case, it probably doesn't make much of a
difference. It sounds like these are fairly large contiguous files,
ZFS send files.

I think for both SDXC and eMMC, F2FS is probably more applicable
overall than Btrfs due to its reduced wandering trees problem. But
again for your use case it may not matter much.



> Yet another side note: both btrfs and zfs are now "housed" at Oracle
> (and most of java, correct?).

Not really. The ZFS we care about now is OpenZFS, forked from Oracle's
ZFS. And a bunch of people not related to Oracle do that work. And
Btrfs has a wide assortment of developers: Facebook, SUSE, Fujitsu,
Oracle, and more.


-- 
Chris Murphy


Does ssd auto detected work for microSD cards?

2018-09-03 Thread GWB
Curious instance here, but perhaps this is the expected behaviour:

mount | grep btrfs
/dev/sdb3 on / type btrfs (rw,ssd,subvol=@)
/dev/sdb3 on /home type btrfs (rw,ssd,subvol=@home)
/dev/sde1 on /media/gwb09/btrfs-32G-MicroSDc type btrfs
(rw,nosuid,nodev,uhelper=udisks2)

This is on an Ubuntu 14 client.

/dev/sdb is indeed an ssd, a Samsung 850 EVO 500Gig, where Ubuntu runs
on btrfs root.   It appears btrfs did indeed auto detected an ssd
drive.   However:

/dev/sde is a micro SD card (32Gig Samsung) sitting in a USB 3 card
reader, inserted into a USB 3 card slot.  But ssh is not detected.

So is that the expected behavior?

If not, does it make a difference?

Would it be best to mount an sd card with ssd_spread?

The wiki suggests this, but I thought I would check:

https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)#MOUNT_OPTIONS

I think this is an older btrfs userland tools bundle:

btrfs --version
Btrfs v3.12

It's whatever is still in the Ubuntu 14 standard (universal?) repository.

Here's the tl/dr.  This is incidental info, and probably does not
affect the question above, or answer.

I use btrfs on Ubuntu root, and then zfs for home (lots of data, just
shy of 2 TB on this laptop).  I snapshot and send the zfs filesystems
onto a btrfs formatted sd card, and then use zfs receive for the
backup zpools:

sudo zfs send -D -v -i 20180830042620u zpb9/home2@20180904005531u >
zfs-send-zpb9-home2-20180904005531u

(on the sd card, copy to the back up zpools, and then:)

zfs receive -v zpf3/BackUpPoolHome < zfs-send-zpb9-home2-20180904005531u

No complaints at all about btrfs for the root file system, and
apt-btrfs-snapshot works great for rolling back from failed upgrades.
My servers are Solaris and Ubuntu, both of which support zfs, as long
as I don't upgrade beyond the "dreaded" point of no return for "open
source", zpool version 28 and zfs version 5.

When one server is upgraded to Ubuntu 18, I will try again to use
btrfs on the larger Toshiba Hard Drives (6TB and 8TB, either the x300
NAS or Desktop models).  I don't want to try that with Ubuntu 14 given
the older userland tools.

Many thanks.  Please point me to the correct mail list if this is not
the right one.

Yet another side note: both btrfs and zfs are now "housed" at Oracle
(and most of java, correct?).  Any chance of Solaris 11 getting btrfs
in the kernel?  I'm guessing not for Sparc, but it might help x86
Solaris users migrate to Oracle Linux.  At this point, I think Ubuntu
is the only distribution with a version of both btrfs and zfs in the
kernel.  But not Oracle.

Gordon Bynum


Re: RAID1 & BTRFS critical (device sda2): corrupt leaf, bad key order

2018-09-03 Thread Qu Wenruo


On 2018/9/3 下午10:18, Etienne Champetier wrote:
> Hello btfrs hackers,
> 
> I have a computer acting as backup server with BTRFS RAID1, and I
> would like to know the different options to rebuild this RAID
> (I saw this thread
> https://www.spinics.net/lists/linux-btrfs/msg68679.html but there was
> no raid 1)
> 
> # uname -a
> Linux servmaison 4.4.0-134-generic #160-Ubuntu SMP Wed Aug 15 14:58:00
> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
> 
> # btrfs --version
> btrfs-progs v4.4
> 
> # dmesg
> ...
> [ 1955.581972] BTRFS critical (device sda2): corrupt leaf, bad key
> order: block=6020235362304,root=1, slot=63
> [ 1955.582299] BTRFS critical (device sda2): corrupt leaf, bad key
> order: block=6020235362304,root=1, slot=63

Please provide the following dump:

# btrfs inspect dump-tree -t root /dev/sda2
# btrfs inspect dump-tree -b 6020235362304 /dev/sda2

The first one is to inspect current root tree to find any corruption.
The second one is to show that corrupted tree block, and compare with
first output to make sure if it's already in the committed tree.

And the critical error message already shows what's causing the bug,
your root tree is corrupted, some keys are not in correct order.

All the following errors could be caused by this problem.

[snip]
> 
> # btrfs fi show /
> Label: none  uuid: 4917db5e-fc20-4369-9556-83082a32d4cd
> Total devices 2 FS bytes used 2.25TiB
> devid1 size 3.64TiB used 2.34TiB path /dev/sda2
> devid2 size 3.64TiB used 2.34TiB path /dev/sdb2
> 
> # btrfs device stats /
> [/dev/sda2].write_io_errs   0
> [/dev/sda2].read_io_errs0
> [/dev/sda2].flush_io_errs   0
> [/dev/sda2].corruption_errs 0
> [/dev/sda2].generation_errs 0
> [/dev/sdb2].write_io_errs   0
> [/dev/sdb2].read_io_errs0
> [/dev/sdb2].flush_io_errs   0
> [/dev/sdb2].corruption_errs 0
> [/dev/sdb2].generation_errs 0
> 
> device stats report no errors :(
> 
> # btrfs fi df /
> Data, RAID1: total=2.32TiB, used=2.23TiB
> System, RAID1: total=96.00MiB, used=368.00KiB
> Metadata, RAID1: total=22.00GiB, used=19.12GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> # btrfs scrub status /
> scrub status for 4917db5e-fc20-4369-9556-83082a32d4cd
> scrub started at Mon Sep  3 05:32:52 2018, interrupted after
> 00:27:35, not running
> total bytes scrubbed: 514.05GiB with 0 errors
> 
> I've already tried 2 times to run btrfs scrub (after reboot), but it
> stops before the end, with the previous dmesg error
> 
> My question is what is the safest way to rebuild this BTRFS RAID1?

It depends on the inspect dump-tree output.

> I haven't tried "btrfs check --repair" yet

We need "btrfs check --readonly" output to verify if the bad key order
is the only problem.

If it's the only problem, "btrfs check --repair" indeed could fix it.

Thanks,
Qu

> (I can boot on a more up to date Linux live if it helps)
> 
> Thanks
> Etienne
> 



signature.asc
Description: OpenPGP digital signature


Re: fsck lowmem mode only: ERROR: errors found in fs roots

2018-09-03 Thread Christoph Anton Mitterer
Hey.


On Fri, 2018-08-31 at 10:33 +0800, Su Yue wrote:
> Can you please fetch btrfs-progs from my repo and run lowmem check
> in readonly?
> Repo: https://github.com/Damenly/btrfs-progs/tree/lowmem_debug
> It's based on v4.17.1 plus additonal output for debug only.

I've adapted your patch to 4.17 from Debian (i.e. not the 4.17.1).


First I ran it again with the pristine 4.17 from Debian:
# btrfs check --mode=lowmem /dev/mapper/system ; echo $?
Checking filesystem on /dev/mapper/system
UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c
checking extents
checking free space cache
checking fs roots
ERROR: errors found in fs roots
found 435924422656 bytes used, error(s) found
total csum bytes: 423418948
total tree bytes: 2218328064
total fs tree bytes: 1557168128
total extent tree bytes: 125894656
btree space waste bytes: 429599230
file data blocks allocated: 5193373646848
 referenced 555255164928
[ 1248.687628] [ cut here ]
[ 1248.688352] generic_make_request: Trying to write to read-only block-device 
dm-0 (partno 0)
[ 1248.689127] WARNING: CPU: 3 PID: 933 at 
/build/linux-LgHyGB/linux-4.17.17/block/blk-core.c:2180 
generic_make_request_checks+0x43d/0x610
[ 1248.689909] Modules linked in: dm_crypt algif_skcipher af_alg dm_mod 
snd_hda_codec_hdmi snd_hda_codec_realtek intel_rapl snd_hda_codec_generic 
x86_pkg_temp_thermal intel_powerclamp i915 iwlwifi btusb coretemp btrtl btbcm 
uvcvideo kvm_intel snd_hda_intel btintel videobuf2_vmalloc bluetooth 
snd_hda_codec kvm videobuf2_memops videobuf2_v4l2 videobuf2_common cfg80211 
snd_hda_core irqbypass videodev jitterentropy_rng drm_kms_helper 
crct10dif_pclmul snd_hwdep crc32_pclmul drbg ghash_clmulni_intel intel_cstate 
snd_pcm ansi_cprng ppdev intel_uncore drm media ecdh_generic iTCO_wdt snd_timer 
iTCO_vendor_support rtsx_pci_ms crc16 snd intel_rapl_perf memstick joydev 
mei_me rfkill evdev soundcore sg parport_pc pcspkr serio_raw fujitsu_laptop mei 
i2c_algo_bit parport shpchp sparse_keymap pcc_cpufreq lpc_ich button
[ 1248.693639]  video battery ac ip_tables x_tables autofs4 btrfs 
zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic 
raid1 raid0 multipath linear md_mod sd_mod uas usb_storage crc32c_intel 
rtsx_pci_sdmmc mmc_core ahci xhci_pci libahci aesni_intel ehci_pci aes_x86_64 
libata crypto_simd xhci_hcd ehci_hcd cryptd glue_helper psmouse i2c_i801 
scsi_mod rtsx_pci e1000e usbcore usb_common
[ 1248.696956] CPU: 3 PID: 933 Comm: btrfs Not tainted 4.17.0-3-amd64 #1 Debian 
4.17.17-1
[ 1248.698118] Hardware name: FUJITSU LIFEBOOK E782/FJNB253, BIOS Version 2.11 
07/15/2014
[ 1248.699299] RIP: 0010:generic_make_request_checks+0x43d/0x610
[ 1248.700495] RSP: 0018:ac89827c7d88 EFLAGS: 00010286
[ 1248.701702] RAX:  RBX: 98f4848a9200 RCX: 0006
[ 1248.702930] RDX: 0007 RSI: 0082 RDI: 98f49e2d6730
[ 1248.704170] RBP: 98f484f6d460 R08: 033e R09: 00aa
[ 1248.705422] R10: ac89827c7e60 R11:  R12: 
[ 1248.706675] R13: 0001 R14:  R15: 
[ 1248.707928] FS:  7f92842018c0() GS:98f49e2c() 
knlGS:
[ 1248.709190] CS:  0010 DS:  ES:  CR0: 80050033
[ 1248.710448] CR2: 55fc6fe1a5b0 CR3: 000407f62001 CR4: 001606e0
[ 1248.711707] Call Trace:
[ 1248.712960]  ? do_writepages+0x4b/0xe0
[ 1248.714201]  ? blkdev_readpages+0x20/0x20
[ 1248.715441]  ? do_writepages+0x4b/0xe0
[ 1248.716684]  generic_make_request+0x64/0x400
[ 1248.717935]  ? finish_wait+0x80/0x80
[ 1248.719181]  ? mempool_alloc+0x67/0x1a0
[ 1248.720425]  ? submit_bio+0x6c/0x140
[ 1248.721663]  submit_bio+0x6c/0x140
[ 1248.722902]  submit_bio_wait+0x53/0x80
[ 1248.724139]  blkdev_issue_flush+0x7c/0xb0
[ 1248.725377]  blkdev_fsync+0x2f/0x40
[ 1248.726612]  do_fsync+0x38/0x60
[ 1248.727849]  __x64_sys_fsync+0x10/0x20
[ 1248.729086]  do_syscall_64+0x55/0x110
[ 1248.730323]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1248.731565] RIP: 0033:0x7f928354d161
[ 1248.732805] RSP: 002b:7ffd35e3f5d8 EFLAGS: 0246 ORIG_RAX: 
004a
[ 1248.734067] RAX: ffda RBX: 55fc09c0c260 RCX: 7f928354d161
[ 1248.735342] RDX: 55fc09c13e28 RSI: 55fc0899f820 RDI: 0004
[ 1248.736614] RBP: 55fc09c0c2d0 R08: 0005 R09: 55fc09c0da70
[ 1248.738001] R10: 009e R11: 0246 R12: 
[ 1248.739272] R13: 55fc0899d213 R14: 55fc09c0c290 R15: 0001
[ 1248.740542] Code: 24 54 03 00 00 48 8d 74 24 08 48 89 df c6 05 3e 03 d9 00 
01 e8 d5 63 01 00 44 89 e2 48 89 c6 48 c7 c7 80 e1 e6 ad e8 a3 4e d1 ff <0f> 0b 
4c 8b 63 08 e9 7b fc ff ff 80 3d 15 03 d9 00 00 0f 85 94
[ 1248.741909] ---[ end trace c2f580dbd579028c ]---
1

Not really sure why btrfs-check apparently tries to write to the device

Re: IO errors when building RAID1.... ?

2018-09-03 Thread Chris Murphy
On Mon, Sep 3, 2018 at 4:23 AM, Adam Borowski  wrote:
> On Sun, Sep 02, 2018 at 09:15:25PM -0600, Chris Murphy wrote:
>> For > 10 years drive firmware handles bad sector remapping internally.
>> It remaps the sector logical address to a reserve physical sector.
>>
>> NTFS and ext[234] have a means of accepting a list of bad sectors, and
>> will avoid using them. Btrfs doesn't. But also ZFS, XFS, APFS, HFS+
>> and I think even FAT, lack this capability.
>
> 
> FAT entry FF7 (FAT12)/FFF7 (FAT16)/...
> 

Oh yeah even Linux mkdosfs does have -c option to check for bad
sectors and presumably will remove them from use. It doesn't accept a
separate list though, like badblocks + mke2fs.

-- 
Chris Murphy


Re: RAID1 & BTRFS critical (device sda2): corrupt leaf, bad key order

2018-09-03 Thread Chris Murphy
On Mon, Sep 3, 2018 at 7:52 AM, Etienne Champetier
 wrote:
> Hello linux-btfrs,
>
> I have a computer acting as backup server with BTRFS RAID1, and I
> would like to know the different options to rebuild this RAID
> (I saw this thread
> https://www.spinics.net/lists/linux-btrfs/msg68679.html but there was
> no raid 1)
>
> # uname -a
> Linux servmaison 4.4.0-134-generic #160-Ubuntu SMP Wed Aug 15 14:58:00
> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>
> # btrfs --version
> btrfs-progs v4.4
>
> # dmesg
> ...
> [ 1955.581972] BTRFS critical (device sda2): corrupt leaf, bad key
> order: block=6020235362304,root=1, slot=63
> [ 1955.582299] BTRFS critical (device sda2): corrupt leaf, bad key
> order: block=6020235362304,root=1, slot=63
> [ 1955.582414] [ cut here ]
> [ 1955.582452] WARNING: CPU: 0 PID: 2071 at
> /build/linux-osVS4h/linux-4.4.0/fs/btrfs/extent-tree.c:2930
> btrfs_run_delayed_refs+0x26b/0x2a0 [btrfs]()
> [ 1955.582454] BTRFS: Transaction aborted (error -5)
> [ 1955.582456] Modules linked in: eeepc_wmi asus_wmi sparse_keymap
> ppdev intel_rapl x86_pkg_temp_thermal snd_hda_codec_hdmi
> snd_hda_codec_realtek intel_powerclamp snd_hda_codec_generic coretemp
> snd_hda_intel snd_hda_codec bridge kvm_intel crct10dif_pclmul stp
> crc32_pclmul kvm snd_hda_core snd_hwdep llc ghash_clmulni_intel
> irqbypass snd_pcm input_leds serio_raw snd_timer 8250_fintek snd
> mei_me ie31200_edac mei lpc_ich mac_hid soundcore edac_core parport_pc
> shpchp parport ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core
> ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4
> btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
> async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
> hid_generic usbhid pata_acpi hid i915 aesni_intel i2c_algo_bit
> aes_x86_64 glue_helper
> [ 1955.582509]  drm_kms_helper lrw gf128mul ablk_helper syscopyarea
> cryptd sysfillrect sysimgblt fb_sys_fops ahci drm r8169 libahci mii
> wmi fjes video
> [ 1955.582522] CPU: 0 PID: 2071 Comm: kworker/u8:1 Not tainted
> 4.4.0-134-generic #160-Ubuntu
> [ 1955.582524] Hardware name: System manufacturer System Product
> Name/P8H77-M PRO, BIOS 1003 10/12/2012
> [ 1955.582546] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
> [ 1955.582548]  0286 e1236dd013ef459f 88034938fc98
> 814039f3
> [ 1955.582552]  88034938fce0 c03ff478 88034938fcd0
> 81084982
> [ 1955.582555]  880405a62980 8804076f7800 8803e6c6d0a0
> 02ec
> [ 1955.582558] Call Trace:
> [ 1955.582566]  [] dump_stack+0x63/0x90
> [ 1955.582571]  [] warn_slowpath_common+0x82/0xc0
> [ 1955.582574]  [] warn_slowpath_fmt+0x5c/0x80
> [ 1955.582592]  [] ?
> __btrfs_run_delayed_refs+0xce7/0x1220 [btrfs]
> [ 1955.582608]  [] btrfs_run_delayed_refs+0x26b/0x2a0 
> [btrfs]
> [ 1955.582624]  [] delayed_ref_async_start+0x37/0x90 [btrfs]
> [ 1955.582643]  [] btrfs_scrubparity_helper+0xcf/0x320 
> [btrfs]
> [ 1955.582661]  [] btrfs_extent_refs_helper+0xe/0x10 [btrfs]
> [ 1955.582666]  [] process_one_work+0x16b/0x490
> [ 1955.582670]  [] worker_thread+0x4b/0x4d0
> [ 1955.582674]  [] ? process_one_work+0x490/0x490
> [ 1955.582677]  [] kthread+0xe7/0x100
> [ 1955.582680]  [] ? kthread_create_on_node+0x1e0/0x1e0
> [ 1955.582685]  [] ret_from_fork+0x55/0x80
> [ 1955.582689]  [] ? kthread_create_on_node+0x1e0/0x1e0
> [ 1955.582691] ---[ end trace cc65b5ec2d2430fc ]---
> [ 1955.582694] BTRFS: error (device sda2) in
> btrfs_run_delayed_refs:2930: errno=-5 IO failure
> [ 1955.582743] BTRFS info (device sda2): forced readonly
> [ 1955.595017] BTRFS critical (device sda2): corrupt leaf, bad key
> order: block=6020235362304,root=1, slot=63
> [ 1955.595106] BTRFS: error (device sda2) in
> btrfs_run_delayed_refs:2930: errno=-5 IO failure
> [ 1955.604374] BTRFS critical (device sda2): corrupt leaf, bad key
> order: block=6020235362304,root=1, slot=63
> [ 1955.60] BTRFS: error (device sda2) in
> btrfs_run_delayed_refs:2930: errno=-5 IO failure
> [ 1955.605331] BTRFS warning (device sda2): failed setting block group
> ro, ret=-30
> [ 1955.605334] BTRFS warning (device sda2): failed setting block group
> ro, ret=-30
>
> # btrfs fi show /
> Label: none  uuid: 4917db5e-fc20-4369-9556-83082a32d4cd
> Total devices 2 FS bytes used 2.25TiB
> devid1 size 3.64TiB used 2.34TiB path /dev/sda2
> devid2 size 3.64TiB used 2.34TiB path /dev/sdb2
>
> # btrfs device stats /
> [/dev/sda2].write_io_errs   0
> [/dev/sda2].read_io_errs0
> [/dev/sda2].flush_io_errs   0
> [/dev/sda2].corruption_errs 0
> [/dev/sda2].generation_errs 0
> [/dev/sdb2].write_io_errs   0
> [/dev/sdb2].read_io_errs0
> [/dev/sdb2].flush_io_errs   0
> [/dev/sdb2].corruption_errs 0
> [/dev/sdb2].generation_errs 0
>
> device stats report no errors :(
>
> # btrfs fi df /
> Data, RAID1: total=2.32TiB, used=2.23TiB
> System, RAID1: total=96.00MiB, used=368.00KiB
> Metadata, RAID1: total=22.00GiB, used=19.12GiB
> 

RAID1 & BTRFS critical (device sda2): corrupt leaf, bad key order

2018-09-03 Thread Etienne Champetier
Hello btfrs hackers,

I have a computer acting as backup server with BTRFS RAID1, and I
would like to know the different options to rebuild this RAID
(I saw this thread
https://www.spinics.net/lists/linux-btrfs/msg68679.html but there was
no raid 1)

# uname -a
Linux servmaison 4.4.0-134-generic #160-Ubuntu SMP Wed Aug 15 14:58:00
UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

# btrfs --version
btrfs-progs v4.4

# dmesg
...
[ 1955.581972] BTRFS critical (device sda2): corrupt leaf, bad key
order: block=6020235362304,root=1, slot=63
[ 1955.582299] BTRFS critical (device sda2): corrupt leaf, bad key
order: block=6020235362304,root=1, slot=63
[ 1955.582414] [ cut here ]
[ 1955.582452] WARNING: CPU: 0 PID: 2071 at
/build/linux-osVS4h/linux-4.4.0/fs/btrfs/extent-tree.c:2930
btrfs_run_delayed_refs+0x26b/0x2a0 [btrfs]()
[ 1955.582454] BTRFS: Transaction aborted (error -5)
[ 1955.582456] Modules linked in: eeepc_wmi asus_wmi sparse_keymap
ppdev intel_rapl x86_pkg_temp_thermal snd_hda_codec_hdmi
snd_hda_codec_realtek intel_powerclamp snd_hda_codec_generic coretemp
snd_hda_intel snd_hda_codec bridge kvm_intel crct10dif_pclmul stp
crc32_pclmul kvm snd_hda_core snd_hwdep llc ghash_clmulni_intel
irqbypass snd_pcm input_leds serio_raw snd_timer 8250_fintek snd
mei_me ie31200_edac mei lpc_ich mac_hid soundcore edac_core parport_pc
shpchp parport ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core
ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4
btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid pata_acpi hid i915 aesni_intel i2c_algo_bit
aes_x86_64 glue_helper
[ 1955.582509]  drm_kms_helper lrw gf128mul ablk_helper syscopyarea
cryptd sysfillrect sysimgblt fb_sys_fops ahci drm r8169 libahci mii
wmi fjes video
[ 1955.582522] CPU: 0 PID: 2071 Comm: kworker/u8:1 Not tainted
4.4.0-134-generic #160-Ubuntu
[ 1955.582524] Hardware name: System manufacturer System Product
Name/P8H77-M PRO, BIOS 1003 10/12/2012
[ 1955.582546] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[ 1955.582548]  0286 e1236dd013ef459f 88034938fc98
814039f3
[ 1955.582552]  88034938fce0 c03ff478 88034938fcd0
81084982
[ 1955.582555]  880405a62980 8804076f7800 8803e6c6d0a0
02ec
[ 1955.582558] Call Trace:
[ 1955.582566]  [] dump_stack+0x63/0x90
[ 1955.582571]  [] warn_slowpath_common+0x82/0xc0
[ 1955.582574]  [] warn_slowpath_fmt+0x5c/0x80
[ 1955.582592]  [] ?
__btrfs_run_delayed_refs+0xce7/0x1220 [btrfs]
[ 1955.582608]  [] btrfs_run_delayed_refs+0x26b/0x2a0 [btrfs]
[ 1955.582624]  [] delayed_ref_async_start+0x37/0x90 [btrfs]
[ 1955.582643]  [] btrfs_scrubparity_helper+0xcf/0x320 [btrfs]
[ 1955.582661]  [] btrfs_extent_refs_helper+0xe/0x10 [btrfs]
[ 1955.582666]  [] process_one_work+0x16b/0x490
[ 1955.582670]  [] worker_thread+0x4b/0x4d0
[ 1955.582674]  [] ? process_one_work+0x490/0x490
[ 1955.582677]  [] kthread+0xe7/0x100
[ 1955.582680]  [] ? kthread_create_on_node+0x1e0/0x1e0
[ 1955.582685]  [] ret_from_fork+0x55/0x80
[ 1955.582689]  [] ? kthread_create_on_node+0x1e0/0x1e0
[ 1955.582691] ---[ end trace cc65b5ec2d2430fc ]---
[ 1955.582694] BTRFS: error (device sda2) in
btrfs_run_delayed_refs:2930: errno=-5 IO failure
[ 1955.582743] BTRFS info (device sda2): forced readonly
[ 1955.595017] BTRFS critical (device sda2): corrupt leaf, bad key
order: block=6020235362304,root=1, slot=63
[ 1955.595106] BTRFS: error (device sda2) in
btrfs_run_delayed_refs:2930: errno=-5 IO failure
[ 1955.604374] BTRFS critical (device sda2): corrupt leaf, bad key
order: block=6020235362304,root=1, slot=63
[ 1955.60] BTRFS: error (device sda2) in
btrfs_run_delayed_refs:2930: errno=-5 IO failure
[ 1955.605331] BTRFS warning (device sda2): failed setting block group
ro, ret=-30
[ 1955.605334] BTRFS warning (device sda2): failed setting block group
ro, ret=-30

# btrfs fi show /
Label: none  uuid: 4917db5e-fc20-4369-9556-83082a32d4cd
Total devices 2 FS bytes used 2.25TiB
devid1 size 3.64TiB used 2.34TiB path /dev/sda2
devid2 size 3.64TiB used 2.34TiB path /dev/sdb2

# btrfs device stats /
[/dev/sda2].write_io_errs   0
[/dev/sda2].read_io_errs0
[/dev/sda2].flush_io_errs   0
[/dev/sda2].corruption_errs 0
[/dev/sda2].generation_errs 0
[/dev/sdb2].write_io_errs   0
[/dev/sdb2].read_io_errs0
[/dev/sdb2].flush_io_errs   0
[/dev/sdb2].corruption_errs 0
[/dev/sdb2].generation_errs 0

device stats report no errors :(

# btrfs fi df /
Data, RAID1: total=2.32TiB, used=2.23TiB
System, RAID1: total=96.00MiB, used=368.00KiB
Metadata, RAID1: total=22.00GiB, used=19.12GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

# btrfs scrub status /
scrub status for 4917db5e-fc20-4369-9556-83082a32d4cd
scrub started at Mon Sep  3 05:32:52 2018, interrupted after
00:27:35, not running
total bytes scrubbed: 514.05GiB with 0 errors

I've already tried 

Re: [PATCH 12/35] btrfs: add ALLOC_CHUNK_FORCE to the flushing code

2018-09-03 Thread Nikolay Borisov



On 30.08.2018 20:42, Josef Bacik wrote:
> + if (flush_state == ALLOC_CHUNK_FORCE && !commit_cycles)
> + flush_state++;

This is a bit obscure. So if we allocated a chunk and !commit_cycles
just break from the loop? What's the reasoning behind this ?


RAID1 & BTRFS critical (device sda2): corrupt leaf, bad key order

2018-09-03 Thread Etienne Champetier
Hello linux-btfrs,

I have a computer acting as backup server with BTRFS RAID1, and I
would like to know the different options to rebuild this RAID
(I saw this thread
https://www.spinics.net/lists/linux-btrfs/msg68679.html but there was
no raid 1)

# uname -a
Linux servmaison 4.4.0-134-generic #160-Ubuntu SMP Wed Aug 15 14:58:00
UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

# btrfs --version
btrfs-progs v4.4

# dmesg
...
[ 1955.581972] BTRFS critical (device sda2): corrupt leaf, bad key
order: block=6020235362304,root=1, slot=63
[ 1955.582299] BTRFS critical (device sda2): corrupt leaf, bad key
order: block=6020235362304,root=1, slot=63
[ 1955.582414] [ cut here ]
[ 1955.582452] WARNING: CPU: 0 PID: 2071 at
/build/linux-osVS4h/linux-4.4.0/fs/btrfs/extent-tree.c:2930
btrfs_run_delayed_refs+0x26b/0x2a0 [btrfs]()
[ 1955.582454] BTRFS: Transaction aborted (error -5)
[ 1955.582456] Modules linked in: eeepc_wmi asus_wmi sparse_keymap
ppdev intel_rapl x86_pkg_temp_thermal snd_hda_codec_hdmi
snd_hda_codec_realtek intel_powerclamp snd_hda_codec_generic coretemp
snd_hda_intel snd_hda_codec bridge kvm_intel crct10dif_pclmul stp
crc32_pclmul kvm snd_hda_core snd_hwdep llc ghash_clmulni_intel
irqbypass snd_pcm input_leds serio_raw snd_timer 8250_fintek snd
mei_me ie31200_edac mei lpc_ich mac_hid soundcore edac_core parport_pc
shpchp parport ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core
ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4
btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid pata_acpi hid i915 aesni_intel i2c_algo_bit
aes_x86_64 glue_helper
[ 1955.582509]  drm_kms_helper lrw gf128mul ablk_helper syscopyarea
cryptd sysfillrect sysimgblt fb_sys_fops ahci drm r8169 libahci mii
wmi fjes video
[ 1955.582522] CPU: 0 PID: 2071 Comm: kworker/u8:1 Not tainted
4.4.0-134-generic #160-Ubuntu
[ 1955.582524] Hardware name: System manufacturer System Product
Name/P8H77-M PRO, BIOS 1003 10/12/2012
[ 1955.582546] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[ 1955.582548]  0286 e1236dd013ef459f 88034938fc98
814039f3
[ 1955.582552]  88034938fce0 c03ff478 88034938fcd0
81084982
[ 1955.582555]  880405a62980 8804076f7800 8803e6c6d0a0
02ec
[ 1955.582558] Call Trace:
[ 1955.582566]  [] dump_stack+0x63/0x90
[ 1955.582571]  [] warn_slowpath_common+0x82/0xc0
[ 1955.582574]  [] warn_slowpath_fmt+0x5c/0x80
[ 1955.582592]  [] ?
__btrfs_run_delayed_refs+0xce7/0x1220 [btrfs]
[ 1955.582608]  [] btrfs_run_delayed_refs+0x26b/0x2a0 [btrfs]
[ 1955.582624]  [] delayed_ref_async_start+0x37/0x90 [btrfs]
[ 1955.582643]  [] btrfs_scrubparity_helper+0xcf/0x320 [btrfs]
[ 1955.582661]  [] btrfs_extent_refs_helper+0xe/0x10 [btrfs]
[ 1955.582666]  [] process_one_work+0x16b/0x490
[ 1955.582670]  [] worker_thread+0x4b/0x4d0
[ 1955.582674]  [] ? process_one_work+0x490/0x490
[ 1955.582677]  [] kthread+0xe7/0x100
[ 1955.582680]  [] ? kthread_create_on_node+0x1e0/0x1e0
[ 1955.582685]  [] ret_from_fork+0x55/0x80
[ 1955.582689]  [] ? kthread_create_on_node+0x1e0/0x1e0
[ 1955.582691] ---[ end trace cc65b5ec2d2430fc ]---
[ 1955.582694] BTRFS: error (device sda2) in
btrfs_run_delayed_refs:2930: errno=-5 IO failure
[ 1955.582743] BTRFS info (device sda2): forced readonly
[ 1955.595017] BTRFS critical (device sda2): corrupt leaf, bad key
order: block=6020235362304,root=1, slot=63
[ 1955.595106] BTRFS: error (device sda2) in
btrfs_run_delayed_refs:2930: errno=-5 IO failure
[ 1955.604374] BTRFS critical (device sda2): corrupt leaf, bad key
order: block=6020235362304,root=1, slot=63
[ 1955.60] BTRFS: error (device sda2) in
btrfs_run_delayed_refs:2930: errno=-5 IO failure
[ 1955.605331] BTRFS warning (device sda2): failed setting block group
ro, ret=-30
[ 1955.605334] BTRFS warning (device sda2): failed setting block group
ro, ret=-30

# btrfs fi show /
Label: none  uuid: 4917db5e-fc20-4369-9556-83082a32d4cd
Total devices 2 FS bytes used 2.25TiB
devid1 size 3.64TiB used 2.34TiB path /dev/sda2
devid2 size 3.64TiB used 2.34TiB path /dev/sdb2

# btrfs device stats /
[/dev/sda2].write_io_errs   0
[/dev/sda2].read_io_errs0
[/dev/sda2].flush_io_errs   0
[/dev/sda2].corruption_errs 0
[/dev/sda2].generation_errs 0
[/dev/sdb2].write_io_errs   0
[/dev/sdb2].read_io_errs0
[/dev/sdb2].flush_io_errs   0
[/dev/sdb2].corruption_errs 0
[/dev/sdb2].generation_errs 0

device stats report no errors :(

# btrfs fi df /
Data, RAID1: total=2.32TiB, used=2.23TiB
System, RAID1: total=96.00MiB, used=368.00KiB
Metadata, RAID1: total=22.00GiB, used=19.12GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

# btrfs scrub status /
scrub status for 4917db5e-fc20-4369-9556-83082a32d4cd
scrub started at Mon Sep  3 05:32:52 2018, interrupted after
00:27:35, not running
total bytes scrubbed: 514.05GiB with 0 errors

I've already tried 2 

Re: [PATCH 06/35] btrfs: check if free bgs for commit

2018-09-03 Thread Nikolay Borisov



On  3.09.2018 12:06, Nikolay Borisov wrote:
> 
> 
> On 30.08.2018 20:41, Josef Bacik wrote:
>> may_commit_transaction will skip committing the transaction if we don't
>> have enough pinned space or if we're trying to find space for a SYSTEM
>> chunk.  However if we have pending free block groups in this transaction
>> we still want to commit as we may be able to allocate a chunk to make
>> our reservation.  So instead of just returning ENOSPC, check if we have
>> free block groups pending, and if so commit the transaction to allow us
>> to use that free space.
>>
>> Signed-off-by: Josef Bacik 
>> ---
>>  fs/btrfs/extent-tree.c | 18 --
>>  1 file changed, 12 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 6e7f350754d2..80615a579b18 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -4804,6 +4804,7 @@ static int may_commit_transaction(struct btrfs_fs_info 
>> *fs_info,
>>  struct btrfs_trans_handle *trans;
>>  u64 bytes;
>>  u64 reclaim_bytes = 0;
>> +bool do_commit = true;
>>  
>>  trans = (struct btrfs_trans_handle *)current->journal_info;
>>  if (trans)
> 
> While you are at it, does this check even make sense, since the
> transaction handle is  acquired proper later I think this can be removed?
> 
>> @@ -4832,8 +4833,10 @@ static int may_commit_transaction(struct 
>> btrfs_fs_info *fs_info,
>>   * See if there is some space in the delayed insertion reservation for
>>   * this reservation.
>>   */
>> -if (space_info != delayed_rsv->space_info)
>> -return -ENOSPC;
>> +if (space_info != delayed_rsv->space_info) {
>> +do_commit = false;
>> +goto commit;
>> +}
>>  
>>  spin_lock(_rsv->lock);
>>  reclaim_bytes += delayed_rsv->reserved;
>> @@ -4848,15 +4851,18 @@ static int may_commit_transaction(struct 
>> btrfs_fs_info *fs_info,
>>  
>>  if (__percpu_counter_compare(_info->total_bytes_pinned,
>> bytes,
>> -   BTRFS_TOTAL_BYTES_PINNED_BATCH) < 0) {
>> -return -ENOSPC;
>> -}
>> -
>> +   BTRFS_TOTAL_BYTES_PINNED_BATCH) < 0)
>> +do_commit = false;
>>  commit:
>>  trans = btrfs_join_transaction(fs_info->extent_root);
>>  if (IS_ERR(trans))
>>  return -ENOSPC;
>>  
>> +if (!do_commit &&> +!test_bit(BTRFS_TRANS_HAVE_FREE_BGS, 
>> >transaction->flags)) {


offtopic: And yet on a more different note, do we really need the
BTRFS_TRANS_HAVE_FREE_BFS flag when we can just check
fs_info->free_chunk_space ?

>> +btrfs_end_transaction(trans);
>> +return -ENOSPC;
>> +}
>>  return btrfs_commit_transaction(trans);
>>  }
>>  
>>
> 


Re: [PATCH 3/3] btrfs: Make btrfs_find_device_by_devspec return btrfs_device directly

2018-09-03 Thread Qu Wenruo



On 2018/9/3 下午5:46, Nikolay Borisov wrote:
> Instead of returning an error value and using one of the parameters for
> returning the actual object we are interested in just refactor the
> function to directly return btrfs_device *. Also bubble up the error
> handling for the special BTRFS_ERROR_DEV_MISSING_NOT_FOUND value into
> btrfs_rm_device. No functional changes.
> 
> Signed-off-by: Nikolay Borisov 

Reviewed-by: Qu Wenruo 

Thanks,
Qu

> ---
>  fs/btrfs/dev-replace.c |  8 
>  fs/btrfs/volumes.c | 40 +++-
>  fs/btrfs/volumes.h |  6 +++---
>  3 files changed, 26 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index dec01970d8c5..4e2b67d06305 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -409,10 +409,10 @@ int btrfs_dev_replace_start(struct btrfs_fs_info 
> *fs_info,
>   struct btrfs_device *tgt_device = NULL;
>   struct btrfs_device *src_device = NULL;
>  
> - ret = btrfs_find_device_by_devspec(fs_info, srcdevid,
> - srcdev_name, _device);
> - if (ret)
> - return ret;
> + src_device = btrfs_find_device_by_devspec(fs_info, srcdevid,
> +   srcdev_name);
> + if (IS_ERR(src_device))
> + return PTR_ERR(src_device);
>  
>   ret = btrfs_init_dev_replace_tgtdev(fs_info, tgtdev_name,
>   src_device, _device);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 6202aa15d0d7..ce336b39fa0f 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1877,10 +1877,16 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, 
> const char *device_path,
>   if (ret)
>   goto out;
>  
> - ret = btrfs_find_device_by_devspec(fs_info, devid, device_path,
> -);
> - if (ret)
> + device = btrfs_find_device_by_devspec(fs_info, devid, device_path);
> +
> + if (IS_ERR(device)) {
> + if (PTR_ERR(device) == -ENOENT &&
> + strcmp(device_path, "missing") == 0)
> + ret = BTRFS_ERROR_DEV_MISSING_NOT_FOUND;
> + else
> + ret = PTR_ERR(device);
>   goto out;
> + }
>  
>   if (test_bit(BTRFS_DEV_STATE_REPLACE_TGT, >dev_state)) {
>   ret = BTRFS_ERROR_DEV_TGT_REPLACE;
> @@ -2153,30 +2159,22 @@ btrfs_find_device_missing_or_by_path(struct 
> btrfs_fs_info *fs_info,
>  /*
>   * Lookup a device given by device id, or the path if the id is 0.
>   */
> -int btrfs_find_device_by_devspec(struct btrfs_fs_info *fs_info, u64 devid,
> -  const char *devpath,
> -  struct btrfs_device **device)
> +struct btrfs_device *
> +btrfs_find_device_by_devspec(struct btrfs_fs_info *fs_info, u64 devid,
> +  const char *devpath)
>  {
> - int ret = 0;
> + struct btrfs_device *device;
>  
>   if (devid) {
> - *device = btrfs_find_device(fs_info, devid, NULL, NULL);
> - if (!*device)
> - ret = -ENOENT;
> + device = btrfs_find_device(fs_info, devid, NULL, NULL);
> + if (!device)
> + return ERR_PTR(-ENOENT);
>   } else {
>   if (!devpath || !devpath[0])
> - return -EINVAL;
> -
> - *device = btrfs_find_device_missing_or_by_path(fs_info, 
> devpath);
> - if (IS_ERR(*device)) {
> - if (PTR_ERR(*device) == -ENOENT &&
> - strcmp(devpath, "missing") == 0)
> - ret = BTRFS_ERROR_DEV_MISSING_NOT_FOUND;
> - else
> - ret = PTR_ERR(*device);
> - }
> + return ERR_PTR(-EINVAL);
> + device = btrfs_find_device_missing_or_by_path(fs_info, devpath);
>   }
> - return ret;
> + return device;
>  }
>  
>  /*
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index e7811473024d..aefce895e994 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -410,9 +410,9 @@ int btrfs_close_devices(struct btrfs_fs_devices 
> *fs_devices);
>  void btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices, int step);
>  void btrfs_assign_next_active_device(struct btrfs_device *device,
>struct btrfs_device *this_dev);
> -int btrfs_find_device_by_devspec(struct btrfs_fs_info *fs_info, u64 devid,
> -  const char *devpath,
> -  struct btrfs_device **device);
> +struct btrfs_device *btrfs_find_device_by_devspec(struct btrfs_fs_info 
> *fs_info,
> +   u64 devid,
> +   const char *devpath);
>  

Re: [PATCH 2/3] btrfs: Make btrfs_find_device_missing_or_by_path return directly a device

2018-09-03 Thread Qu Wenruo



On 2018/9/3 下午5:46, Nikolay Borisov wrote:
> This function returns a numeric error value and additionally the
> device found in one of its input parameters. Simplify this by making
> the function directly return a pointer to btrfs_device. Additionally
> adjust the caller to handle the case when we want to remove the
> 'missing' device and ENOENT is returned to return the expected
> positive error value, parsed by progs. Finally, unexport the function
> since it's not called outside of volume.c. No functional changes.
> 
> Signed-off-by: Nikolay Borisov 

Reviewed-by: Qu Wenruo 

Just changed when BTRFS_ERROR_DEV_MISSING_NOT_FOUND is returned, and
since btrfs_find_device_missing_or_by_path() is only called in
btrfs_find_device_by_devspec(), there is indeed no functional change.

Thanks,
Qu

> ---
>  fs/btrfs/volumes.c | 33 ++---
>  fs/btrfs/volumes.h |  3 ---
>  2 files changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 715ea45c6c28..6202aa15d0d7 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2123,11 +2123,11 @@ btrfs_find_device_by_path(struct btrfs_fs_info 
> *fs_info,
>   return device;
>  }
>  
> -int btrfs_find_device_missing_or_by_path(struct btrfs_fs_info *fs_info,
> -  const char *device_path,
> -  struct btrfs_device **device)
> +static struct btrfs_device *
> +btrfs_find_device_missing_or_by_path(struct btrfs_fs_info *fs_info,
> +  const char *device_path)
>  {
> - *device = NULL;
> + struct btrfs_device *device = NULL;
>   if (strcmp(device_path, "missing") == 0) {
>   struct list_head *devices;
>   struct btrfs_device *tmp;
> @@ -2136,20 +2136,18 @@ int btrfs_find_device_missing_or_by_path(struct 
> btrfs_fs_info *fs_info,
>   list_for_each_entry(tmp, devices, dev_list) {
>   if (test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
>   >dev_state) && !tmp->bdev) {
> - *device = tmp;
> + device = tmp;
>   break;
>   }
>   }
>  
> - if (!*device)
> - return BTRFS_ERROR_DEV_MISSING_NOT_FOUND;
> + if (!device)
> + return ERR_PTR(-ENOENT);
>   } else {
> - *device = btrfs_find_device_by_path(fs_info, device_path);
> - if (IS_ERR(*device))
> - return PTR_ERR(*device);
> + device = btrfs_find_device_by_path(fs_info, device_path);
>   }
>  
> - return 0;
> + return device;
>  }
>  
>  /*
> @@ -2159,10 +2157,9 @@ int btrfs_find_device_by_devspec(struct btrfs_fs_info 
> *fs_info, u64 devid,
>const char *devpath,
>struct btrfs_device **device)
>  {
> - int ret;
> + int ret = 0;
>  
>   if (devid) {
> - ret = 0;
>   *device = btrfs_find_device(fs_info, devid, NULL, NULL);
>   if (!*device)
>   ret = -ENOENT;
> @@ -2170,8 +2167,14 @@ int btrfs_find_device_by_devspec(struct btrfs_fs_info 
> *fs_info, u64 devid,
>   if (!devpath || !devpath[0])
>   return -EINVAL;
>  
> - ret = btrfs_find_device_missing_or_by_path(fs_info, devpath,
> -device);
> + *device = btrfs_find_device_missing_or_by_path(fs_info, 
> devpath);
> + if (IS_ERR(*device)) {
> + if (PTR_ERR(*device) == -ENOENT &&
> + strcmp(devpath, "missing") == 0)
> + ret = BTRFS_ERROR_DEV_MISSING_NOT_FOUND;
> + else
> + ret = PTR_ERR(*device);
> + }
>   }
>   return ret;
>  }
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 23e9285d88de..e7811473024d 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -410,9 +410,6 @@ int btrfs_close_devices(struct btrfs_fs_devices 
> *fs_devices);
>  void btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices, int step);
>  void btrfs_assign_next_active_device(struct btrfs_device *device,
>struct btrfs_device *this_dev);
> -int btrfs_find_device_missing_or_by_path(struct btrfs_fs_info *fs_info,
> -  const char *device_path,
> -  struct btrfs_device **device);
>  int btrfs_find_device_by_devspec(struct btrfs_fs_info *fs_info, u64 devid,
>const char *devpath,
>struct btrfs_device **device);
> 


Re: [PATCH 1/3] btrfs: Make btrfs_find_device_by_path return struct btrfs_device

2018-09-03 Thread Qu Wenruo



On 2018/9/3 下午5:46, Nikolay Borisov wrote:
> Currently this function returns an error code as well as uses one of
> its arguments as a return value for struct btrfs_device. Change the
> function so that it returns btrfs_device directly and use the usual
> "encode error in pointer" mechanics if something goes wrong. No
> functional changes.
> 
> Signed-off-by: Nikolay Borisov 

Reviewed-by: Qu Wenruo 

Thanks,
Qu

> ---
>  fs/btrfs/volumes.c | 26 ++
>  1 file changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index da86706123ff..715ea45c6c28 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2096,9 +2096,9 @@ void btrfs_destroy_dev_replace_tgtdev(struct 
> btrfs_device *tgtdev)
>   call_rcu(>rcu, free_device_rcu);
>  }
>  
> -static int btrfs_find_device_by_path(struct btrfs_fs_info *fs_info,
> -  const char *device_path,
> -  struct btrfs_device **device)
> +static struct btrfs_device *
> +btrfs_find_device_by_path(struct btrfs_fs_info *fs_info,
> +   const char *device_path)
>  {
>   int ret = 0;
>   struct btrfs_super_block *disk_super;
> @@ -2106,21 +2106,21 @@ static int btrfs_find_device_by_path(struct 
> btrfs_fs_info *fs_info,
>   u8 *dev_uuid;
>   struct block_device *bdev;
>   struct buffer_head *bh;
> + struct btrfs_device *device;
>  
> - *device = NULL;
>   ret = btrfs_get_bdev_and_sb(device_path, FMODE_READ,
>   fs_info->bdev_holder, 0, , );
>   if (ret)
> - return ret;
> + return ERR_PTR(ret);
>   disk_super = (struct btrfs_super_block *)bh->b_data;
>   devid = btrfs_stack_device_id(_super->dev_item);
>   dev_uuid = disk_super->dev_item.uuid;
> - *device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->fsid);
> + device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->fsid);
>   brelse(bh);
> - if (!*device)
> - ret = -ENOENT;
> + if (!device)
> + device = ERR_PTR(-ENOENT);
>   blkdev_put(bdev, FMODE_READ);
> - return ret;
> + return device;
>  }
>  
>  int btrfs_find_device_missing_or_by_path(struct btrfs_fs_info *fs_info,
> @@ -2143,11 +2143,13 @@ int btrfs_find_device_missing_or_by_path(struct 
> btrfs_fs_info *fs_info,
>  
>   if (!*device)
>   return BTRFS_ERROR_DEV_MISSING_NOT_FOUND;
> -
> - return 0;
>   } else {
> - return btrfs_find_device_by_path(fs_info, device_path, device);
> + *device = btrfs_find_device_by_path(fs_info, device_path);
> + if (IS_ERR(*device))
> + return PTR_ERR(*device);
>   }
> +
> + return 0;
>  }
>  
>  /*
> 


[PATCH v3] btrfs-progs: defrag: open files RO on new enough kernels

2018-09-03 Thread Adam Borowski
Defragging an executable conflicts both way with it being run, resulting in
ETXTBSY.  This either makes defrag fail or prevents the program from being
executed.

Kernels 4.19-rc1 and later allow defragging files you could have possibly
opened rw, even if the passed descriptor is ro (commit 616d374efa23).

Signed-off-by: Adam Borowski 
---
v2: more eloquent description; root can't defrag RO on old kernels (unlike
dedupe)
v3: more eloquentier description; s/defrag_ro/defrag_open_mode/

 cmds-filesystem.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 06c8311b..99e2aec0 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -39,12 +40,14 @@
 #include "list_sort.h"
 #include "disk-io.h"
 #include "help.h"
+#include "fsfeatures.h"
 
 /*
  * for btrfs fi show, we maintain a hash of fsids we've already printed.
  * This way we don't print dups if a given FS is mounted more than once.
  */
 static struct seen_fsid *seen_fsid_hash[SEEN_FSID_HASH_SIZE] = {NULL,};
+static mode_t defrag_open_mode = O_RDONLY;
 
 static const char * const filesystem_cmd_group_usage[] = {
"btrfs filesystem []  []",
@@ -877,7 +880,7 @@ static int defrag_callback(const char *fpath, const struct 
stat *sb,
if ((typeflag == FTW_F) && S_ISREG(sb->st_mode)) {
if (defrag_global_verbose)
printf("%s\n", fpath);
-   fd = open(fpath, O_RDWR);
+   fd = open(fpath, defrag_open_mode);
if (fd < 0) {
goto error;
}
@@ -914,6 +917,9 @@ static int cmd_filesystem_defrag(int argc, char **argv)
int compress_type = BTRFS_COMPRESS_NONE;
DIR *dirstream;
 
+   if (get_running_kernel_version() < KERNEL_VERSION(4,19,0))
+   defrag_open_mode = O_RDWR;
+
/*
 * Kernel has a different default (256K) that is supposed to be safe,
 * but it does not defragment very well. The 32M will likely lead to
@@ -1014,7 +1020,7 @@ static int cmd_filesystem_defrag(int argc, char **argv)
int defrag_err = 0;
 
dirstream = NULL;
-   fd = open_file_or_dir(argv[i], );
+   fd = open_file_or_dir3(argv[i], , defrag_open_mode);
if (fd < 0) {
error("cannot open %s: %m", argv[i]);
ret = -errno;
-- 
2.19.0.rc1



Re: [PATCH v2] btrfs-progs: defrag: open files RO on new enough kernels

2018-09-03 Thread Nikolay Borisov



On  3.09.2018 14:31, Adam Borowski wrote:
> Defragging an executable conflicts both way with it being run, resulting in
> ETXTBSY.  This either makes defrag fail or prevents the program from being
> executed.
> 
> Kernels 4.19-rc1 and later allow defragging files you could have possibly
> opened rw, even if the passed descriptor is ro.
> 
> Signed-off-by: Adam Borowski 

So this commit really seems to be the userspace counterpart of
616d374efa23 ("btrfs: allow defrag on a file opened read-only that has
rw permissions") as such IMO it will be good if you reference it.


> ---
> v2: more eloquent description; root can't defrag RO on old kernels (unlike
> dedupe)
> 
> 
>  cmds-filesystem.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/cmds-filesystem.c b/cmds-filesystem.c
> index 06c8311b..17e992a3 100644
> --- a/cmds-filesystem.c
> +++ b/cmds-filesystem.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -39,12 +40,14 @@
>  #include "list_sort.h"
>  #include "disk-io.h"
>  #include "help.h"
> +#include "fsfeatures.h"
>  
>  /*
>   * for btrfs fi show, we maintain a hash of fsids we've already printed.
>   * This way we don't print dups if a given FS is mounted more than once.
>   */
>  static struct seen_fsid *seen_fsid_hash[SEEN_FSID_HASH_SIZE] = {NULL,};
> +static mode_t defrag_ro = O_RDONLY;
>  
>  static const char * const filesystem_cmd_group_usage[] = {
>   "btrfs filesystem []  []",
> @@ -877,7 +880,7 @@ static int defrag_callback(const char *fpath, const 
> struct stat *sb,
>   if ((typeflag == FTW_F) && S_ISREG(sb->st_mode)) {
>   if (defrag_global_verbose)
>   printf("%s\n", fpath);
> - fd = open(fpath, O_RDWR);
> + fd = open(fpath, defrag_ro);
>   if (fd < 0) {
>   goto error;
>   }
> @@ -914,6 +917,9 @@ static int cmd_filesystem_defrag(int argc, char **argv)
>   int compress_type = BTRFS_COMPRESS_NONE;
>   DIR *dirstream;
>  
> + if (get_running_kernel_version() < KERNEL_VERSION(4,19,0))
> + defrag_ro = O_RDWR;

I completely missed those lines in the previous posting, so alongside
the context information of the kernel commit this makes sense. However,
defrag_ro is a really bad name because there are case where, well, it's
not going to be ro, is it ? How about something like "defrag_open_mode"
or "defrag_open_flags" or something neutral which won't be implying the
mode of operation.

> +
>   /*
>* Kernel has a different default (256K) that is supposed to be safe,
>* but it does not defragment very well. The 32M will likely lead to
> @@ -1014,7 +1020,7 @@ static int cmd_filesystem_defrag(int argc, char **argv)
>   int defrag_err = 0;
>  
>   dirstream = NULL;
> - fd = open_file_or_dir(argv[i], );
> + fd = open_file_or_dir3(argv[i], , defrag_ro);
>   if (fd < 0) {
>   error("cannot open %s: %m", argv[i]);
>   ret = -errno;
> 


[PATCH v2] btrfs-progs: defrag: open files RO on new enough kernels

2018-09-03 Thread Adam Borowski
Defragging an executable conflicts both way with it being run, resulting in
ETXTBSY.  This either makes defrag fail or prevents the program from being
executed.

Kernels 4.19-rc1 and later allow defragging files you could have possibly
opened rw, even if the passed descriptor is ro.

Signed-off-by: Adam Borowski 
---
v2: more eloquent description; root can't defrag RO on old kernels (unlike
dedupe)


 cmds-filesystem.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 06c8311b..17e992a3 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -39,12 +40,14 @@
 #include "list_sort.h"
 #include "disk-io.h"
 #include "help.h"
+#include "fsfeatures.h"
 
 /*
  * for btrfs fi show, we maintain a hash of fsids we've already printed.
  * This way we don't print dups if a given FS is mounted more than once.
  */
 static struct seen_fsid *seen_fsid_hash[SEEN_FSID_HASH_SIZE] = {NULL,};
+static mode_t defrag_ro = O_RDONLY;
 
 static const char * const filesystem_cmd_group_usage[] = {
"btrfs filesystem []  []",
@@ -877,7 +880,7 @@ static int defrag_callback(const char *fpath, const struct 
stat *sb,
if ((typeflag == FTW_F) && S_ISREG(sb->st_mode)) {
if (defrag_global_verbose)
printf("%s\n", fpath);
-   fd = open(fpath, O_RDWR);
+   fd = open(fpath, defrag_ro);
if (fd < 0) {
goto error;
}
@@ -914,6 +917,9 @@ static int cmd_filesystem_defrag(int argc, char **argv)
int compress_type = BTRFS_COMPRESS_NONE;
DIR *dirstream;
 
+   if (get_running_kernel_version() < KERNEL_VERSION(4,19,0))
+   defrag_ro = O_RDWR;
+
/*
 * Kernel has a different default (256K) that is supposed to be safe,
 * but it does not defragment very well. The 32M will likely lead to
@@ -1014,7 +1020,7 @@ static int cmd_filesystem_defrag(int argc, char **argv)
int defrag_err = 0;
 
dirstream = NULL;
-   fd = open_file_or_dir(argv[i], );
+   fd = open_file_or_dir3(argv[i], , defrag_ro);
if (fd < 0) {
error("cannot open %s: %m", argv[i]);
ret = -errno;
-- 
2.19.0.rc1



Re: [PATCH 2/2] btrfs-progs: defrag: open files RO on new enough kernels or if root

2018-09-03 Thread Adam Borowski
On Mon, Sep 03, 2018 at 02:04:23PM +0300, Nikolay Borisov wrote:
> On  3.09.2018 13:14, Adam Borowski wrote:
> > -   fd = open(fpath, O_RDWR);
> > +   fd = open(fpath, defrag_ro);
> 
> Looking at the kernel code I think this is in fact incorrect, because in
> ioctl.c we have:
> 
> if (!(file->f_mode & FMODE_WRITE)) {
> 
> ret = -EINVAL;
> 
> goto out;
> 
> }
> 
> So it seems a hard requirement to have opened a file for RW when you
> want to defragment it.

Oif!  I confused this with dedup, which does allow root to dedup RO even on
old kernels.  Good catch.


-- 
⢀⣴⠾⠻⢶⣦⠀ What Would Jesus Do, MUD/MMORPG edition:
⣾⠁⢰⠒⠀⣿⡁ • multiplay with an admin char to benefit your mortal [Mt3:16-17]
⢿⡄⠘⠷⠚⠋⠀ • abuse item cloning bugs [Mt14:17-20, Mt15:34-37]
⠈⠳⣄ • use glitches to walk on water [Mt14:25-26]


Re: [PATCH 2/2] btrfs-progs: defrag: open files RO on new enough kernels or if root

2018-09-03 Thread Adam Borowski
On Mon, Sep 03, 2018 at 02:01:21PM +0300, Nikolay Borisov wrote:
> On  3.09.2018 13:14, Adam Borowski wrote:
> > Fixes EXTXBSY races.
> 
> You have to be more eloquent than that and explain at least one race
> condition.

If you try to defrag an executable that's currently running:

ERROR: cannot open XXX: Text file busy
total 1 failures

If you try to run an executable that's being defragged:

-bash: XXX: Text file busy

The former tends to be a long-lasting condition but has only benign fallout
(executables almost never get fragmented, not recompressing a single file is
not the end of the world), the latter is only a brief window of time but has
potential for data loss.

> > +static mode_t defrag_ro = O_RDONLY;
> 
> This brings no value whatsoever, just use O_RDONLY directly

On old kernels it gets overwritten with:

> > +   if (get_running_kernel_version() < KERNEL_VERSION(4,19,0) && getuid())
> > +   defrag_ro = O_RDWR;


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ What Would Jesus Do, MUD/MMORPG edition:
⣾⠁⢰⠒⠀⣿⡁ • multiplay with an admin char to benefit your mortal [Mt3:16-17]
⢿⡄⠘⠷⠚⠋⠀ • abuse item cloning bugs [Mt14:17-20, Mt15:34-37]
⠈⠳⣄ • use glitches to walk on water [Mt14:25-26]


Re: [PATCH 2/2] btrfs-progs: defrag: open files RO on new enough kernels or if root

2018-09-03 Thread Nikolay Borisov



On  3.09.2018 13:14, Adam Borowski wrote:
> Fixes EXTXBSY races.
> 
> Signed-off-by: Adam Borowski 
> ---
>  cmds-filesystem.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/cmds-filesystem.c b/cmds-filesystem.c
> index 06c8311b..4c9df69f 100644
> --- a/cmds-filesystem.c
> +++ b/cmds-filesystem.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -39,12 +40,14 @@
>  #include "list_sort.h"
>  #include "disk-io.h"
>  #include "help.h"
> +#include "fsfeatures.h"
>  
>  /*
>   * for btrfs fi show, we maintain a hash of fsids we've already printed.
>   * This way we don't print dups if a given FS is mounted more than once.
>   */
>  static struct seen_fsid *seen_fsid_hash[SEEN_FSID_HASH_SIZE] = {NULL,};
> +static mode_t defrag_ro = O_RDONLY;
>  
>  static const char * const filesystem_cmd_group_usage[] = {
>   "btrfs filesystem []  []",
> @@ -877,7 +880,7 @@ static int defrag_callback(const char *fpath, const 
> struct stat *sb,
>   if ((typeflag == FTW_F) && S_ISREG(sb->st_mode)) {
>   if (defrag_global_verbose)
>   printf("%s\n", fpath);
> - fd = open(fpath, O_RDWR);
> + fd = open(fpath, defrag_ro);

Looking at the kernel code I think this is in fact incorrect, because in
ioctl.c we have:

if (!(file->f_mode & FMODE_WRITE)) {

ret = -EINVAL;

goto out;

}

So it seems a hard requirement to have opened a file for RW when you
want to defragment it.

>   if (fd < 0) {
>   goto error;
>   }
> @@ -914,6 +917,9 @@ static int cmd_filesystem_defrag(int argc, char **argv)
>   int compress_type = BTRFS_COMPRESS_NONE;
>   DIR *dirstream;
>  
> + if (get_running_kernel_version() < KERNEL_VERSION(4,19,0) && getuid())
> + defrag_ro = O_RDWR;
> +
>   /*
>* Kernel has a different default (256K) that is supposed to be safe,
>* but it does not defragment very well. The 32M will likely lead to
> @@ -1014,7 +1020,7 @@ static int cmd_filesystem_defrag(int argc, char **argv)
>   int defrag_err = 0;
>  
>   dirstream = NULL;
> - fd = open_file_or_dir(argv[i], );
> + fd = open_file_or_dir3(argv[i], , defrag_ro);
>   if (fd < 0) {
>   error("cannot open %s: %m", argv[i]);
>   ret = -errno;
> 


Re: [PATCH 2/2] btrfs-progs: defrag: open files RO on new enough kernels or if root

2018-09-03 Thread Nikolay Borisov



On  3.09.2018 13:14, Adam Borowski wrote:
> Fixes EXTXBSY races.

You have to be more eloquent than that and explain at least one race
condition.


> 
> Signed-off-by: Adam Borowski 
> ---
>  cmds-filesystem.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/cmds-filesystem.c b/cmds-filesystem.c
> index 06c8311b..4c9df69f 100644
> --- a/cmds-filesystem.c
> +++ b/cmds-filesystem.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -39,12 +40,14 @@
>  #include "list_sort.h"
>  #include "disk-io.h"
>  #include "help.h"
> +#include "fsfeatures.h"
>  
>  /*
>   * for btrfs fi show, we maintain a hash of fsids we've already printed.
>   * This way we don't print dups if a given FS is mounted more than once.
>   */
>  static struct seen_fsid *seen_fsid_hash[SEEN_FSID_HASH_SIZE] = {NULL,};
> +static mode_t defrag_ro = O_RDONLY;

This brings no value whatsoever, just use O_RDONLY directly

>  
>  static const char * const filesystem_cmd_group_usage[] = {
>   "btrfs filesystem []  []",
> @@ -877,7 +880,7 @@ static int defrag_callback(const char *fpath, const 
> struct stat *sb,
>   if ((typeflag == FTW_F) && S_ISREG(sb->st_mode)) {
>   if (defrag_global_verbose)
>   printf("%s\n", fpath);
> - fd = open(fpath, O_RDWR);
> + fd = open(fpath, defrag_ro);
>   if (fd < 0) {
>   goto error;
>   }
> @@ -914,6 +917,9 @@ static int cmd_filesystem_defrag(int argc, char **argv)
>   int compress_type = BTRFS_COMPRESS_NONE;
>   DIR *dirstream;
>  
> + if (get_running_kernel_version() < KERNEL_VERSION(4,19,0) && getuid())
> + defrag_ro = O_RDWR;
> +
>   /*
>* Kernel has a different default (256K) that is supposed to be safe,
>* but it does not defragment very well. The 32M will likely lead to
> @@ -1014,7 +1020,7 @@ static int cmd_filesystem_defrag(int argc, char **argv)
>   int defrag_err = 0;
>  
>   dirstream = NULL;
> - fd = open_file_or_dir(argv[i], );
> + fd = open_file_or_dir3(argv[i], , defrag_ro);
>   if (fd < 0) {
>   error("cannot open %s: %m", argv[i]);
>   ret = -errno;
> 


Re: IO errors when building RAID1.... ?

2018-09-03 Thread Adam Borowski
On Sun, Sep 02, 2018 at 09:15:25PM -0600, Chris Murphy wrote:
> For > 10 years drive firmware handles bad sector remapping internally.
> It remaps the sector logical address to a reserve physical sector.
> 
> NTFS and ext[234] have a means of accepting a list of bad sectors, and
> will avoid using them. Btrfs doesn't. But also ZFS, XFS, APFS, HFS+
> and I think even FAT, lack this capability.


FAT entry FF7 (FAT12)/FFF7 (FAT16)/...



-- 
⢀⣴⠾⠻⢶⣦⠀ What Would Jesus Do, MUD/MMORPG edition:
⣾⠁⢰⠒⠀⣿⡁ • multiplay with an admin char to benefit your mortal [Mt3:16-17]
⢿⡄⠘⠷⠚⠋⠀ • abuse item cloning bugs [Mt14:17-20, Mt15:34-37]
⠈⠳⣄ • use glitches to walk on water [Mt14:25-26]


[PATCH 1/2] btrfs-progs: fix kernel version parsing on some versions past 3.0

2018-09-03 Thread Adam Borowski
The code fails if the third section is missing (like "4.18") or is followed
by anything but "." or "-".  This happens for example if we're not exactly
at a tag and CONFIG_LOCALVERSION_AUTO=n (which results in "4.18.5+").

Signed-off-by: Adam Borowski 
---
 fsfeatures.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/fsfeatures.c b/fsfeatures.c
index 7d85d60f..66111bf4 100644
--- a/fsfeatures.c
+++ b/fsfeatures.c
@@ -216,11 +216,8 @@ u32 get_running_kernel_version(void)
return (u32)-1;
version |= atoi(tmp) << 8;
tmp = strtok_r(NULL, ".", );
-   if (tmp) {
-   if (!string_is_numerical(tmp))
-   return (u32)-1;
+   if (tmp && string_is_numerical(tmp))
version |= atoi(tmp);
-   }
 
return version;
 }
-- 
2.19.0.rc1



[PATCH 2/2] btrfs-progs: defrag: open files RO on new enough kernels or if root

2018-09-03 Thread Adam Borowski
Fixes EXTXBSY races.

Signed-off-by: Adam Borowski 
---
 cmds-filesystem.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 06c8311b..4c9df69f 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -39,12 +40,14 @@
 #include "list_sort.h"
 #include "disk-io.h"
 #include "help.h"
+#include "fsfeatures.h"
 
 /*
  * for btrfs fi show, we maintain a hash of fsids we've already printed.
  * This way we don't print dups if a given FS is mounted more than once.
  */
 static struct seen_fsid *seen_fsid_hash[SEEN_FSID_HASH_SIZE] = {NULL,};
+static mode_t defrag_ro = O_RDONLY;
 
 static const char * const filesystem_cmd_group_usage[] = {
"btrfs filesystem []  []",
@@ -877,7 +880,7 @@ static int defrag_callback(const char *fpath, const struct 
stat *sb,
if ((typeflag == FTW_F) && S_ISREG(sb->st_mode)) {
if (defrag_global_verbose)
printf("%s\n", fpath);
-   fd = open(fpath, O_RDWR);
+   fd = open(fpath, defrag_ro);
if (fd < 0) {
goto error;
}
@@ -914,6 +917,9 @@ static int cmd_filesystem_defrag(int argc, char **argv)
int compress_type = BTRFS_COMPRESS_NONE;
DIR *dirstream;
 
+   if (get_running_kernel_version() < KERNEL_VERSION(4,19,0) && getuid())
+   defrag_ro = O_RDWR;
+
/*
 * Kernel has a different default (256K) that is supposed to be safe,
 * but it does not defragment very well. The 32M will likely lead to
@@ -1014,7 +1020,7 @@ static int cmd_filesystem_defrag(int argc, char **argv)
int defrag_err = 0;
 
dirstream = NULL;
-   fd = open_file_or_dir(argv[i], );
+   fd = open_file_or_dir3(argv[i], , defrag_ro);
if (fd < 0) {
error("cannot open %s: %m", argv[i]);
ret = -errno;
-- 
2.19.0.rc1



[PATCH] btrfs-progs: tests: Add test for missing device delete error value

2018-09-03 Thread Nikolay Borisov
Add a test which ensures the kernel returns the correct error value
when missing device removal is requested. This test verifies that kernel
refactoring didn't broken the return value.

Signed-off-by: Nikolay Borisov 
---
 tests/misc-tests/011-delete-missing-device/test.sh | 17 +
 1 file changed, 17 insertions(+)

diff --git a/tests/misc-tests/011-delete-missing-device/test.sh 
b/tests/misc-tests/011-delete-missing-device/test.sh
index 4c976421c091..b799a25c201d 100755
--- a/tests/misc-tests/011-delete-missing-device/test.sh
+++ b/tests/misc-tests/011-delete-missing-device/test.sh
@@ -44,6 +44,21 @@ test_delete_missing()
run_check_umount_test_dev
 }
 
+test_missing_error()
+{
+   run_check $SUDO_HELPER "$TOP/mkfs.btrfs" -f "$TEST_DEV"
+   run_check_mount_test_dev
+   local out
+   out=$(run_mustfail_stdout "Unexpected success" \
+   $SUDO_HELPER "$TOP/btrfs" device remove missing "$TEST_MNT")
+
+   if ! echo "$out" | grep -q "no missing devices found to remove"; then
+   _fail "IOCTL returned unexpected error value"
+   fi
+
+   run_check_umount_test_dev
+}
+
 setup_loopdevs 4
 prepare_loopdevs
 dev1=${loopdevs[1]}
@@ -53,5 +68,7 @@ TEST_DEV=$dev1
 test_do_mkfs -m raid1 -d raid1
 test_wipefs
 test_delete_missing
+test_missing_error
+
 
 cleanup_loopdevs
-- 
2.7.4



[PATCH 0/3] cleanup couple of device-related functions' retval

2018-09-03 Thread Nikolay Borisov
Currently btrfs_find_device_by_path, btrfs_find_device_missing_or_by_path and 
btrfs_find_device_by_devspec are called in a chain and they all return an 
integer value to signal error and at the same time use one of their parameters
as an output. This patch set refactors those functions starting from the 
bottom, gradually making them return a pointer to btrfs_device. This is 
sufficient to convey an error when it occurs as well as return the actual 
device we are looking for. One added benefit is that the ioctl-specific positive
return value BTRFS_ERROR_DEV_MISSING_NOT_FOUND is now returned from 
btrfs_rm_device rather than from some internal function. 

Additionally I'll be sending a patch to progs, adding a test ensuring that 
BTRFS_ERROR_MISSING_NOT_FOUND is returned appropriately. 

This survived both my btrfs-progs test as well as xfstest run. No functional 
changes.


Nikolay Borisov (3):
  btrfs: Make btrfs_find_device_by_path return struct btrfs_device
  btrfs: Make btrfs_find_device_missing_or_by_path return directly a
device
  btrfs: Make btrfs_find_device_by_devspec return btrfs_device directly

 fs/btrfs/dev-replace.c |  8 ++---
 fs/btrfs/volumes.c | 73 ++
 fs/btrfs/volumes.h |  9 ++
 3 files changed, 45 insertions(+), 45 deletions(-)

-- 
2.17.1



[PATCH 3/3] btrfs: Make btrfs_find_device_by_devspec return btrfs_device directly

2018-09-03 Thread Nikolay Borisov
Instead of returning an error value and using one of the parameters for
returning the actual object we are interested in just refactor the
function to directly return btrfs_device *. Also bubble up the error
handling for the special BTRFS_ERROR_DEV_MISSING_NOT_FOUND value into
btrfs_rm_device. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/dev-replace.c |  8 
 fs/btrfs/volumes.c | 40 +++-
 fs/btrfs/volumes.h |  6 +++---
 3 files changed, 26 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index dec01970d8c5..4e2b67d06305 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -409,10 +409,10 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
struct btrfs_device *tgt_device = NULL;
struct btrfs_device *src_device = NULL;
 
-   ret = btrfs_find_device_by_devspec(fs_info, srcdevid,
-   srcdev_name, _device);
-   if (ret)
-   return ret;
+   src_device = btrfs_find_device_by_devspec(fs_info, srcdevid,
+ srcdev_name);
+   if (IS_ERR(src_device))
+   return PTR_ERR(src_device);
 
ret = btrfs_init_dev_replace_tgtdev(fs_info, tgtdev_name,
src_device, _device);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 6202aa15d0d7..ce336b39fa0f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1877,10 +1877,16 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, 
const char *device_path,
if (ret)
goto out;
 
-   ret = btrfs_find_device_by_devspec(fs_info, devid, device_path,
-  );
-   if (ret)
+   device = btrfs_find_device_by_devspec(fs_info, devid, device_path);
+
+   if (IS_ERR(device)) {
+   if (PTR_ERR(device) == -ENOENT &&
+   strcmp(device_path, "missing") == 0)
+   ret = BTRFS_ERROR_DEV_MISSING_NOT_FOUND;
+   else
+   ret = PTR_ERR(device);
goto out;
+   }
 
if (test_bit(BTRFS_DEV_STATE_REPLACE_TGT, >dev_state)) {
ret = BTRFS_ERROR_DEV_TGT_REPLACE;
@@ -2153,30 +2159,22 @@ btrfs_find_device_missing_or_by_path(struct 
btrfs_fs_info *fs_info,
 /*
  * Lookup a device given by device id, or the path if the id is 0.
  */
-int btrfs_find_device_by_devspec(struct btrfs_fs_info *fs_info, u64 devid,
-const char *devpath,
-struct btrfs_device **device)
+struct btrfs_device *
+btrfs_find_device_by_devspec(struct btrfs_fs_info *fs_info, u64 devid,
+const char *devpath)
 {
-   int ret = 0;
+   struct btrfs_device *device;
 
if (devid) {
-   *device = btrfs_find_device(fs_info, devid, NULL, NULL);
-   if (!*device)
-   ret = -ENOENT;
+   device = btrfs_find_device(fs_info, devid, NULL, NULL);
+   if (!device)
+   return ERR_PTR(-ENOENT);
} else {
if (!devpath || !devpath[0])
-   return -EINVAL;
-
-   *device = btrfs_find_device_missing_or_by_path(fs_info, 
devpath);
-   if (IS_ERR(*device)) {
-   if (PTR_ERR(*device) == -ENOENT &&
-   strcmp(devpath, "missing") == 0)
-   ret = BTRFS_ERROR_DEV_MISSING_NOT_FOUND;
-   else
-   ret = PTR_ERR(*device);
-   }
+   return ERR_PTR(-EINVAL);
+   device = btrfs_find_device_missing_or_by_path(fs_info, devpath);
}
-   return ret;
+   return device;
 }
 
 /*
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index e7811473024d..aefce895e994 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -410,9 +410,9 @@ int btrfs_close_devices(struct btrfs_fs_devices 
*fs_devices);
 void btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices, int step);
 void btrfs_assign_next_active_device(struct btrfs_device *device,
 struct btrfs_device *this_dev);
-int btrfs_find_device_by_devspec(struct btrfs_fs_info *fs_info, u64 devid,
-const char *devpath,
-struct btrfs_device **device);
+struct btrfs_device *btrfs_find_device_by_devspec(struct btrfs_fs_info 
*fs_info,
+ u64 devid,
+ const char *devpath);
 struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
const u64 *devid,
const u8 *uuid);
-- 
2.17.1



[PATCH 2/3] btrfs: Make btrfs_find_device_missing_or_by_path return directly a device

2018-09-03 Thread Nikolay Borisov
This function returns a numeric error value and additionally the
device found in one of its input parameters. Simplify this by making
the function directly return a pointer to btrfs_device. Additionally
adjust the caller to handle the case when we want to remove the
'missing' device and ENOENT is returned to return the expected
positive error value, parsed by progs. Finally, unexport the function
since it's not called outside of volume.c. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/volumes.c | 33 ++---
 fs/btrfs/volumes.h |  3 ---
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 715ea45c6c28..6202aa15d0d7 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2123,11 +2123,11 @@ btrfs_find_device_by_path(struct btrfs_fs_info *fs_info,
return device;
 }
 
-int btrfs_find_device_missing_or_by_path(struct btrfs_fs_info *fs_info,
-const char *device_path,
-struct btrfs_device **device)
+static struct btrfs_device *
+btrfs_find_device_missing_or_by_path(struct btrfs_fs_info *fs_info,
+const char *device_path)
 {
-   *device = NULL;
+   struct btrfs_device *device = NULL;
if (strcmp(device_path, "missing") == 0) {
struct list_head *devices;
struct btrfs_device *tmp;
@@ -2136,20 +2136,18 @@ int btrfs_find_device_missing_or_by_path(struct 
btrfs_fs_info *fs_info,
list_for_each_entry(tmp, devices, dev_list) {
if (test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
>dev_state) && !tmp->bdev) {
-   *device = tmp;
+   device = tmp;
break;
}
}
 
-   if (!*device)
-   return BTRFS_ERROR_DEV_MISSING_NOT_FOUND;
+   if (!device)
+   return ERR_PTR(-ENOENT);
} else {
-   *device = btrfs_find_device_by_path(fs_info, device_path);
-   if (IS_ERR(*device))
-   return PTR_ERR(*device);
+   device = btrfs_find_device_by_path(fs_info, device_path);
}
 
-   return 0;
+   return device;
 }
 
 /*
@@ -2159,10 +2157,9 @@ int btrfs_find_device_by_devspec(struct btrfs_fs_info 
*fs_info, u64 devid,
 const char *devpath,
 struct btrfs_device **device)
 {
-   int ret;
+   int ret = 0;
 
if (devid) {
-   ret = 0;
*device = btrfs_find_device(fs_info, devid, NULL, NULL);
if (!*device)
ret = -ENOENT;
@@ -2170,8 +2167,14 @@ int btrfs_find_device_by_devspec(struct btrfs_fs_info 
*fs_info, u64 devid,
if (!devpath || !devpath[0])
return -EINVAL;
 
-   ret = btrfs_find_device_missing_or_by_path(fs_info, devpath,
-  device);
+   *device = btrfs_find_device_missing_or_by_path(fs_info, 
devpath);
+   if (IS_ERR(*device)) {
+   if (PTR_ERR(*device) == -ENOENT &&
+   strcmp(devpath, "missing") == 0)
+   ret = BTRFS_ERROR_DEV_MISSING_NOT_FOUND;
+   else
+   ret = PTR_ERR(*device);
+   }
}
return ret;
 }
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 23e9285d88de..e7811473024d 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -410,9 +410,6 @@ int btrfs_close_devices(struct btrfs_fs_devices 
*fs_devices);
 void btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices, int step);
 void btrfs_assign_next_active_device(struct btrfs_device *device,
 struct btrfs_device *this_dev);
-int btrfs_find_device_missing_or_by_path(struct btrfs_fs_info *fs_info,
-const char *device_path,
-struct btrfs_device **device);
 int btrfs_find_device_by_devspec(struct btrfs_fs_info *fs_info, u64 devid,
 const char *devpath,
 struct btrfs_device **device);
-- 
2.17.1



[PATCH 1/3] btrfs: Make btrfs_find_device_by_path return struct btrfs_device

2018-09-03 Thread Nikolay Borisov
Currently this function returns an error code as well as uses one of
its arguments as a return value for struct btrfs_device. Change the
function so that it returns btrfs_device directly and use the usual
"encode error in pointer" mechanics if something goes wrong. No
functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/volumes.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index da86706123ff..715ea45c6c28 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2096,9 +2096,9 @@ void btrfs_destroy_dev_replace_tgtdev(struct btrfs_device 
*tgtdev)
call_rcu(>rcu, free_device_rcu);
 }
 
-static int btrfs_find_device_by_path(struct btrfs_fs_info *fs_info,
-const char *device_path,
-struct btrfs_device **device)
+static struct btrfs_device *
+btrfs_find_device_by_path(struct btrfs_fs_info *fs_info,
+ const char *device_path)
 {
int ret = 0;
struct btrfs_super_block *disk_super;
@@ -2106,21 +2106,21 @@ static int btrfs_find_device_by_path(struct 
btrfs_fs_info *fs_info,
u8 *dev_uuid;
struct block_device *bdev;
struct buffer_head *bh;
+   struct btrfs_device *device;
 
-   *device = NULL;
ret = btrfs_get_bdev_and_sb(device_path, FMODE_READ,
fs_info->bdev_holder, 0, , );
if (ret)
-   return ret;
+   return ERR_PTR(ret);
disk_super = (struct btrfs_super_block *)bh->b_data;
devid = btrfs_stack_device_id(_super->dev_item);
dev_uuid = disk_super->dev_item.uuid;
-   *device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->fsid);
+   device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->fsid);
brelse(bh);
-   if (!*device)
-   ret = -ENOENT;
+   if (!device)
+   device = ERR_PTR(-ENOENT);
blkdev_put(bdev, FMODE_READ);
-   return ret;
+   return device;
 }
 
 int btrfs_find_device_missing_or_by_path(struct btrfs_fs_info *fs_info,
@@ -2143,11 +2143,13 @@ int btrfs_find_device_missing_or_by_path(struct 
btrfs_fs_info *fs_info,
 
if (!*device)
return BTRFS_ERROR_DEV_MISSING_NOT_FOUND;
-
-   return 0;
} else {
-   return btrfs_find_device_by_path(fs_info, device_path, device);
+   *device = btrfs_find_device_by_path(fs_info, device_path);
+   if (IS_ERR(*device))
+   return PTR_ERR(*device);
}
+
+   return 0;
 }
 
 /*
-- 
2.17.1



Re: [PATCH 08/35] btrfs: release metadata before running delayed refs

2018-09-03 Thread Nikolay Borisov



On 30.08.2018 20:41, Josef Bacik wrote:
> We want to release the unused reservation we have since it refills the
> delayed refs reserve, which will make everything go smoother when
> running the delayed refs if we're short on our reservation.
> 
> Signed-off-by: Josef Bacik 

Reviewed-by: Nikolay Borisov 

> ---
>  fs/btrfs/transaction.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 99741254e27e..ebb0c0405598 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1915,6 +1915,9 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
> *trans)
>   return ret;
>   }
>  
> + btrfs_trans_release_metadata(trans);
> + trans->block_rsv = NULL;
> +
>   /* make a pass through all the delayed refs we have so far
>* any runnings procs may add more while we are here
>*/
> @@ -1924,9 +1927,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
> *trans)
>   return ret;
>   }
>  
> - btrfs_trans_release_metadata(trans);
> - trans->block_rsv = NULL;
> -
>   cur_trans = trans->transaction;
>  
>   /*
> 


Re: [PATCH 06/35] btrfs: check if free bgs for commit

2018-09-03 Thread Nikolay Borisov



On 30.08.2018 20:41, Josef Bacik wrote:
> may_commit_transaction will skip committing the transaction if we don't
> have enough pinned space or if we're trying to find space for a SYSTEM
> chunk.  However if we have pending free block groups in this transaction
> we still want to commit as we may be able to allocate a chunk to make
> our reservation.  So instead of just returning ENOSPC, check if we have
> free block groups pending, and if so commit the transaction to allow us
> to use that free space.
> 
> Signed-off-by: Josef Bacik 
> ---
>  fs/btrfs/extent-tree.c | 18 --
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 6e7f350754d2..80615a579b18 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4804,6 +4804,7 @@ static int may_commit_transaction(struct btrfs_fs_info 
> *fs_info,
>   struct btrfs_trans_handle *trans;
>   u64 bytes;
>   u64 reclaim_bytes = 0;
> + bool do_commit = true;
>  
>   trans = (struct btrfs_trans_handle *)current->journal_info;
>   if (trans)

While you are at it, does this check even make sense, since the
transaction handle is  acquired proper later I think this can be removed?

> @@ -4832,8 +4833,10 @@ static int may_commit_transaction(struct btrfs_fs_info 
> *fs_info,
>* See if there is some space in the delayed insertion reservation for
>* this reservation.
>*/
> - if (space_info != delayed_rsv->space_info)
> - return -ENOSPC;
> + if (space_info != delayed_rsv->space_info) {
> + do_commit = false;
> + goto commit;
> + }
>  
>   spin_lock(_rsv->lock);
>   reclaim_bytes += delayed_rsv->reserved;
> @@ -4848,15 +4851,18 @@ static int may_commit_transaction(struct 
> btrfs_fs_info *fs_info,
>  
>   if (__percpu_counter_compare(_info->total_bytes_pinned,
>  bytes,
> -BTRFS_TOTAL_BYTES_PINNED_BATCH) < 0) {
> - return -ENOSPC;
> - }
> -
> +BTRFS_TOTAL_BYTES_PINNED_BATCH) < 0)
> + do_commit = false;
>  commit:
>   trans = btrfs_join_transaction(fs_info->extent_root);
>   if (IS_ERR(trans))
>   return -ENOSPC;
>  
> + if (!do_commit &&> +!test_bit(BTRFS_TRANS_HAVE_FREE_BGS, 
> >transaction->flags)) {
> + btrfs_end_transaction(trans);
> + return -ENOSPC;
> + }
>   return btrfs_commit_transaction(trans);
>  }
>  
> 


Re: IO errors when building RAID1.... ?

2018-09-03 Thread Pierre Couderc




On 09/03/2018 05:15 AM, Chris Murphy wrote:

On Sat, Sep 1, 2018 at 1:03 AM, Pierre Couderc  wrote:


On 08/31/2018 08:52 PM, Chris Murphy wrote:


Bad sector which is failing write. This is fatal, there isn't anything
the block layer or Btrfs (or ext4 or XFS) can do about it. Well,
ext234 do have an option to scan for bad sectors and create a bad
sector map which then can be used at mkfs time, and ext234 will avoid
using those sectors. And also the md driver has a bad sector option
for the same, and does remapping. But XFS and Btrfs don't do that.

If the drive is under warranty, get it swapped out, this is definitely
a warranty covered problem.





Thank you very much.

Once upon a time...(I am old), there were lists of bad sectors, and the
software did avoid wrting in them. It seems to have disappeared. For which
reason ? Maybe because these errors occur so  rarely, that it is not worth
the trouble ?

For > 10 years drive firmware handles bad sector remapping internally.
It remaps the sector logical address to a reserve physical sector.

NTFS and ext[234] have a means of accepting a list of bad sectors, and
will avoid using them. Btrfs doesn't. But also ZFS, XFS, APFS, HFS+
and I think even FAT, lack this capability. I'm not aware of any file
system that once had bad sector tracking, that has since dropped the
capability.


Thank you, you are very clear.