Timeline or roadmap to split lowmem mode and original mode check?

2018-01-16 Thread Qu Wenruo
Hi David,

There is the long planned work to split the original mode and lowmem
mode check into their own .[ch] files.

I found it harder and harder to locate repair functions for original and
lowmem mode when writing fixes for them.

I wonder if it's a good time to start the split work, and if you're OK
with it, my plan is to:

1) Start the split work based on 4.14.1

2) Split check code into:
   check/common.[ch]<<< Not sure how many functions will be there
   check/lowmem.[ch]
   check/origin.[ch]
   check/main.c

Currently most fsck fixes and enhancement are from Fujitsu guys who I'm
quite familiar with, so rebasing their code won't be a big problem for me.

If this works for you, I can start the work asap.

Thanks,
Qu



signature.asc
Description: OpenPGP digital signature


[PATCH 0/3] Lowmem fsck repair to fix filetype mismatch

2018-01-16 Thread Qu Wenruo
Sebastian reported a filesystem corruption where DIR_INDEX has wrong
filetype against INODE_ITEM.

Lowmem mode normally handles such problem by checking DIR_INDEX,
DIR_ITEM and INODE_REF/INODE_ITEM to determine the correct file type.
In such case, lowmem mode fsck can get the correct filetype.

When fixing the problem, lowmem mode will try to re-insert correct
(DIR_INDEX, DIR_ITEM, INODE_REF) tuple, and if existing correct
DIR_ITEM and INODE_REF is found, btrfs_link() will just skip and only
insert correct DIR_INDEX.

However, when inserting correct DIR_INDEX, due to extra DIR_INDEX
validation, incorrect one will be skiped and correct one will be
inserted after invalid one.

This leads to lowmem mode repair to create duplicated DIR_INDEX.

This patch will fix it by removing the whole (DIR_INDEX, DIR_ITEM,
INODE_REF) tuple before inserting correct tuple.
And the removing part, btrfs_unlink(), will be enhanced to handle
incorrect tuple member more robust.

Please note that, due a bug in lowmem mode repair, btrfs check will
still show "error(s) found in fs tree" even repair is done successfully.

And test case for this repair still needs extra work for original mode
to support such repair, or test case won't pass original mode test.

Qu Wenruo (3):
  btrfs-progs: lowmem fsck: Remove corupted link before re-add correct
link
  btrfs-progs: dir-item: Allow dir item and dir index lookup to ignore
found problem
  btrfs-progs: dir-item: Make btrfs_delete_one_dir_name more robust to
handle corrupted name len

 cmds-check.c   |  6 +-
 convert/main.c |  2 +-
 ctree.h|  4 ++--
 dir-item.c | 45 +++--
 inode.c| 14 +++---
 5 files changed, 50 insertions(+), 21 deletions(-)

-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs-progs: dir-item: Allow dir item and dir index lookup to ignore found problem

2018-01-16 Thread Qu Wenruo
For functions btrfs_lookup_dir_index() and btrfs_lookup_dir_item(), they
will check if the dir item/index is valid before doing further check.

The behavior is pretty good for healthy fs, but calling them on
corrupted fs with incorrect dir item/index will also return NULL, making
repair code unable to reuse them.

This patch add a new parameter @verify to address the problem.
For normal operation, @verify should be set to true to keep the old
behavior.

While for some functions used in repair, like btrfs_unlink(), they can
set @verify to false, so repair code can locate corrupted dir index/item
and do what they should do (either repair or just delete).

Signed-off-by: Qu Wenruo 
---
 cmds-check.c   |  2 +-
 convert/main.c |  2 +-
 ctree.h|  4 ++--
 dir-item.c | 34 ++
 inode.c| 14 +++---
 5 files changed, 37 insertions(+), 19 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index f302724dd840..59575506c83e 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -3074,7 +3074,7 @@ static int delete_dir_index(struct btrfs_root *root,
btrfs_init_path(&path);
di = btrfs_lookup_dir_index(trans, root, &path, backref->dir,
backref->name, backref->namelen,
-   backref->index, -1);
+   backref->index, -1, false);
if (IS_ERR(di)) {
ret = PTR_ERR(di);
btrfs_release_path(&path);
diff --git a/convert/main.c b/convert/main.c
index 89f9261172ca..102ba4fc5ae2 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -1580,7 +1580,7 @@ static int do_rollback(const char *devname)
/* Search the image file */
root_dir = btrfs_root_dirid(&image_root->root_item);
dir = btrfs_lookup_dir_item(NULL, image_root, &path, root_dir,
-   image_name, strlen(image_name), 0);
+   image_name, strlen(image_name), 0, true);
 
if (!dir || IS_ERR(dir)) {
btrfs_release_path(&path);
diff --git a/ctree.h b/ctree.h
index ef422ea60031..02529f4cb021 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2679,12 +2679,12 @@ struct btrfs_dir_item *btrfs_lookup_dir_item(struct 
btrfs_trans_handle *trans,
 struct btrfs_root *root,
 struct btrfs_path *path, u64 dir,
 const char *name, int name_len,
-int mod);
+int mod, bool verify);
 struct btrfs_dir_item *btrfs_lookup_dir_index(struct btrfs_trans_handle *trans,
  struct btrfs_root *root,
  struct btrfs_path *path, u64 dir,
  const char *name, int name_len,
- u64 index, int mod);
+ u64 index, int mod, bool verify);
 int btrfs_delete_one_dir_name(struct btrfs_trans_handle *trans,
  struct btrfs_root *root,
  struct btrfs_path *path,
diff --git a/dir-item.c b/dir-item.c
index 462546c0eaf4..31ad1eca29d5 100644
--- a/dir-item.c
+++ b/dir-item.c
@@ -24,7 +24,7 @@
 
 static struct btrfs_dir_item *btrfs_match_dir_item_name(struct btrfs_root 
*root,
  struct btrfs_path *path,
- const char *name, int name_len);
+ const char *name, int name_len, bool verify);
 
 static struct btrfs_dir_item *insert_with_overflow(struct btrfs_trans_handle
   *trans,
@@ -43,7 +43,8 @@ static struct btrfs_dir_item *insert_with_overflow(struct 
btrfs_trans_handle
ret = btrfs_insert_empty_item(trans, root, path, cpu_key, data_size);
if (ret == -EEXIST) {
struct btrfs_dir_item *di;
-   di = btrfs_match_dir_item_name(root, path, name, name_len);
+   di = btrfs_match_dir_item_name(root, path, name, name_len,
+  true);
if (di)
return ERR_PTR(-EEXIST);
ret = btrfs_extend_item(root, path, data_size);
@@ -196,7 +197,7 @@ struct btrfs_dir_item *btrfs_lookup_dir_item(struct 
btrfs_trans_handle *trans,
 struct btrfs_root *root,
 struct btrfs_path *path, u64 dir,
 const char *name, int name_len,
-int mod)
+int mod, bool verify)
 {
int ret;
struct btrfs_key key;
@@ -227,14 +228,31 @@ struct btrfs_dir_item *btrfs_lookup_dir_item(struct 
btrfs_trans_handle *trans,

[PATCH 1/3] btrfs-progs: lowmem fsck: Remove corupted link before re-add correct link

2018-01-16 Thread Qu Wenruo
For repair_ternary_lowmem() used in lowmem mode, if it found 1 of
DIR_INDEX/DIR_ITEM/INODE_REF missing, it will try to insert correct
link.

However for case like invalid type in DIR_INDEX, we should delete the
corrupted DIR_INDEX first before inserting the correct link.

This patch will remove the corrupted link before re-insert.
This should solve the duplicated DIR_INDEX problem in old lowmem mode
repair.

Cc: Sebastian Andrzej Siewior 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index 7fc30da83ea1..f302724dd840 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -4997,6 +4997,10 @@ int repair_ternary_lowmem(struct btrfs_root *root, u64 
dir_ino, u64 ino,
goto out;
}
if (stage == 1) {
+   ret = btrfs_unlink(trans, root, ino, dir_ino, index, name,
+  name_len, 0);
+   if (ret)
+   goto out;
ret = btrfs_add_link(trans, root, ino, dir_ino, name, name_len,
   filetype, &index, 1, 1);
goto out;
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs-progs: dir-item: Make btrfs_delete_one_dir_name more robust to handle corrupted name len

2018-01-16 Thread Qu Wenruo
Function btrfs_delete_one_dir_name() will check if the dir_item is the
last content of the item, and delete the whole item if needed.

However if @name_len of one dir_item/dir_index is corrupted and larger
than the item size, the function will still try to treat it as partly
remove, which will screw up the whole leaf.

This patch will enhance the item deletion check, to cover corrupted name
len, so in that case we just delete the whole item.

Signed-off-by: Qu Wenruo 
---
 dir-item.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/dir-item.c b/dir-item.c
index 31ad1eca29d5..7ce3d2a40433 100644
--- a/dir-item.c
+++ b/dir-item.c
@@ -281,7 +281,6 @@ int btrfs_delete_one_dir_name(struct btrfs_trans_handle 
*trans,
  struct btrfs_path *path,
  struct btrfs_dir_item *di)
 {
-
struct extent_buffer *leaf;
u32 sub_item_len;
u32 item_len;
@@ -291,7 +290,15 @@ int btrfs_delete_one_dir_name(struct btrfs_trans_handle 
*trans,
sub_item_len = sizeof(*di) + btrfs_dir_name_len(leaf, di) +
btrfs_dir_data_len(leaf, di);
item_len = btrfs_item_size_nr(leaf, path->slots[0]);
-   if (sub_item_len == item_len) {
+
+   /*
+* If @sub_item_len is longer than @item_len, then it means the
+* name_len is just corrupted.
+* No good idea to know if there is anything we can recover from
+* the corrupted item.
+* Just delete the item.
+*/
+   if (sub_item_len >= item_len) {
ret = btrfs_del_item(trans, root, path);
} else {
unsigned long ptr = (unsigned long)di;
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Add chunk allocation ENOSPC debug message for enospc_debug mount option

2018-01-16 Thread Qu Wenruo


On 2018年01月16日 21:47, Nikolay Borisov wrote:
> 
> 
> On 15.01.2018 08:13, Qu Wenruo wrote:
>> Enospc_debug makes extent allocator to print more debug messages,
>> however for chunk allocation, there is no debug message for enospc_debug
>> at all.
>>
>> This patch will add message for the following parts of chunk allocator:
>>
>> 1) No rw device at all
>>Quite rare, but at least output one message for this case.
>>
>> 2) No enough space for some device
>>This debug message is quite handy for unbalanced disks with stripe
>>based profiles (RAID0/10/5/6).
>>
>> 3) Not enough free devices
>>This debug message should tell us if current chunk allocator is
>>working correctly on minimal device requirement.
>>
>> Although under most case, we will hit other ENOSPC before we even hit a
>> chunk allocator ENOSPC, but such debug info won't help.
>>
>> Signed-off-by: Qu Wenruo 
>> ---
>>  fs/btrfs/volumes.c | 19 +--
>>  1 file changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index a25684287501..664d8a1b90b3 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -4622,8 +4622,11 @@ static int __btrfs_alloc_chunk(struct 
>> btrfs_trans_handle *trans,
>>  
>>  BUG_ON(!alloc_profile_is_valid(type, 0));
>>  
>> -if (list_empty(&fs_devices->alloc_list))
>> +if (list_empty(&fs_devices->alloc_list)) {
>> +if (btrfs_test_opt(info, ENOSPC_DEBUG))
>> +btrfs_warn(info, "%s: No writable device", __func__);
> 
> perhaps this shouldn't be gated on ENOSPC_DEBUG if it's a warning, or if
> it's to be gated then make it a DEBUG.

Because the case of no writeable device is rare.

But change it to debug seems good.

> 
>>  return -ENOSPC;
>> +}
>>  
>>  index = __get_raid_index(type);
>>  
>> @@ -4705,8 +4708,14 @@ static int __btrfs_alloc_chunk(struct 
>> btrfs_trans_handle *trans,
>>  if (ret == 0)
>>  max_avail = max_stripe_size * dev_stripes;
>>  
>> -if (max_avail < BTRFS_STRIPE_LEN * dev_stripes)
>> +if (max_avail < BTRFS_STRIPE_LEN * dev_stripes) {
>> +if (btrfs_test_opt(info, ENOSPC_DEBUG))
>> +btrfs_debug(info,
>> +"%s: devid %llu has no free space, have=%llu want=%u",
>> +__func__, device->devid, max_avail,
>> +BTRFS_STRIPE_LEN * dev_stripes);
> 
> Here we have a debug output gated on ENOSCP_DEBUG so let's be consistent
> (hence my previous comment)
>>  continue;
>> +}
>>  
>>  if (ndevs == fs_devices->rw_devices) {
>>  WARN(1, "%s: found more than %llu devices\n",
>> @@ -4731,6 +4740,12 @@ static int __btrfs_alloc_chunk(struct 
>> btrfs_trans_handle *trans,
>>  
>>  if (ndevs < devs_increment * sub_stripes || ndevs < devs_min) {
>>  ret = -ENOSPC;
>> +if (btrfs_test_opt(info, ENOSPC_DEBUG)) {
>> +btrfs_debug(info,
>> +"%s: not enough devices with free space: have=%d minimal=%d 
>> increment=%d",
>> +__func__, ndevs, devs_min,
>> +devs_increment * sub_stripes);
> 
> Without looking at the code it's not really obvious what increment is.
> Perhaps you can use a more descriptive word?

"increment" is indeed less meaningful.

I'll change it to only output "minimal" just min(minimal, devs_min).

Thanks,
Qu

> 
>> +}
>>  goto error;
>>  }
>>  
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature


[PATCH] Fstests: btrfs/011 fix device mounted when tests aborts

2018-01-16 Thread Liu Bo
One of btrfs tests, btrfs/011, uses SCRATCH_DEV_POOL and puts a
non-SCRATCH_DEV device as the first one when doing mkfs, and this makes
_require_scratch{_nocheck} confused since it checks mount point with
SCRATCH_DEV only.

This adds _scratch_umount to cleanup() to umount SCRATCH_MNT by
011 itself.

Signed-off-by: Liu Bo 
---
 tests/btrfs/011 | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/btrfs/011 b/tests/btrfs/011
index 28f1388..f4c5309 100755
--- a/tests/btrfs/011
+++ b/tests/btrfs/011
@@ -51,6 +51,7 @@ _cleanup()
fi
wait
rm -f $tmp.tmp
+   _scratch_unmount > /dev/null 2>&1
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fstests: btrfs/027 unmount scratch device if test fails

2018-01-16 Thread Liu Bo
This test, btrfs/027, runs tests against different raid profiles
in a loop, if one of them aborts, it also fails the following ones
with errors like,

Test -m raid10 -d raid10
ERROR: /dev/xxx is mounted
Test -m raid5 -d raid5
ERROR: /dev/xxx is mounted
Test -m raid6 -d raid6
ERROR: /dev/xxx is mounted

_scratch_unmount is added to avoid the above.

Signed-off-by: Liu Bo 
---
 tests/btrfs/027 | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/btrfs/027 b/tests/btrfs/027
index 625a27f..689cd4c 100755
--- a/tests/btrfs/027
+++ b/tests/btrfs/027
@@ -95,6 +95,7 @@ run_test()
$SCRATCH_MNT >>$seqres.full 2>&1
if [ $? -ne 0 ]; then
echo "btrfs replace failed"
+   _scratch_unmount
_spare_dev_put
_scratch_dev_pool_put
return
@@ -102,6 +103,7 @@ run_test()
$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT >>$seqres.full 2>&1
if [ $? -ne 0 ]; then
echo "btrfs scrub failed"
+   _scratch_unmount
_spare_dev_put
_scratch_dev_pool_put
return
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: correct wrong comment about magic number of index_cnt

2018-01-16 Thread David Sterba
On Fri, Jan 12, 2018 at 11:08:03AM +0800, Su Yue wrote:
> There is no function named btrfs_get_inode_index_count.
> Explanation for magic number index_cnt=2 in btrfs_new_inode() is
> actually located in btrfs_set_inode_index_count().
> 
> So replace 'btrfs_get_inode_index_count' in the comment by
> 'btrfs_set_inode_index_count'.
> 
> Signed-off-by: Su Yue 

Added to 4.16 queue, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Streamline btrfs_delalloc_reserve_metadata initial operations

2018-01-16 Thread David Sterba
On Fri, Jan 12, 2018 at 04:21:05PM +0200, Nikolay Borisov wrote:
> The behavior of btrfs_delalloc_reserve_metadata depends on whether
> the inode we are allocating for is the freespace inode or not. As it
> stands if we are the free node we set 'flush' and 'delalloc_lock'
> variable to certain values. Subsequently we check the values of those
> vars and act accordingly. Instead, simplify things by having 1 if
> which checks whether we are the freespace inode or not and do any
> specific operation in either branches of that if. This makes the code
> a bit easier to understand, as an added bonus it also shrinks the
> compiled size:
> 
> add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-17 (-17)
> Function old new   delta
> btrfs_delalloc_reserve_metadata 18761859 -17
> Total: Before=85966, After=85949, chg -0.02%

This looks too fine grained and IMHO not useful to mention. The overall
module size delta is interesting when compared between the base nad pull
request, but not for individual patches, namely if it's just 17 bytes.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Add chunk allocation ENOSPC debug message for enospc_debug mount option

2018-01-16 Thread Nikolay Borisov


On 15.01.2018 08:13, Qu Wenruo wrote:
> Enospc_debug makes extent allocator to print more debug messages,
> however for chunk allocation, there is no debug message for enospc_debug
> at all.
> 
> This patch will add message for the following parts of chunk allocator:
> 
> 1) No rw device at all
>Quite rare, but at least output one message for this case.
> 
> 2) No enough space for some device
>This debug message is quite handy for unbalanced disks with stripe
>based profiles (RAID0/10/5/6).
> 
> 3) Not enough free devices
>This debug message should tell us if current chunk allocator is
>working correctly on minimal device requirement.
> 
> Although under most case, we will hit other ENOSPC before we even hit a
> chunk allocator ENOSPC, but such debug info won't help.
> 
> Signed-off-by: Qu Wenruo 
> ---
>  fs/btrfs/volumes.c | 19 +--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index a25684287501..664d8a1b90b3 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4622,8 +4622,11 @@ static int __btrfs_alloc_chunk(struct 
> btrfs_trans_handle *trans,
>  
>   BUG_ON(!alloc_profile_is_valid(type, 0));
>  
> - if (list_empty(&fs_devices->alloc_list))
> + if (list_empty(&fs_devices->alloc_list)) {
> + if (btrfs_test_opt(info, ENOSPC_DEBUG))
> + btrfs_warn(info, "%s: No writable device", __func__);

perhaps this shouldn't be gated on ENOSPC_DEBUG if it's a warning, or if
it's to be gated then make it a DEBUG.

>   return -ENOSPC;
> + }
>  
>   index = __get_raid_index(type);
>  
> @@ -4705,8 +4708,14 @@ static int __btrfs_alloc_chunk(struct 
> btrfs_trans_handle *trans,
>   if (ret == 0)
>   max_avail = max_stripe_size * dev_stripes;
>  
> - if (max_avail < BTRFS_STRIPE_LEN * dev_stripes)
> + if (max_avail < BTRFS_STRIPE_LEN * dev_stripes) {
> + if (btrfs_test_opt(info, ENOSPC_DEBUG))
> + btrfs_debug(info,
> + "%s: devid %llu has no free space, have=%llu want=%u",
> + __func__, device->devid, max_avail,
> + BTRFS_STRIPE_LEN * dev_stripes);

Here we have a debug output gated on ENOSCP_DEBUG so let's be consistent
(hence my previous comment)
>   continue;
> + }
>  
>   if (ndevs == fs_devices->rw_devices) {
>   WARN(1, "%s: found more than %llu devices\n",
> @@ -4731,6 +4740,12 @@ static int __btrfs_alloc_chunk(struct 
> btrfs_trans_handle *trans,
>  
>   if (ndevs < devs_increment * sub_stripes || ndevs < devs_min) {
>   ret = -ENOSPC;
> + if (btrfs_test_opt(info, ENOSPC_DEBUG)) {
> + btrfs_debug(info,
> + "%s: not enough devices with free space: have=%d minimal=%d 
> increment=%d",
> + __func__, ndevs, devs_min,
> + devs_increment * sub_stripes);

Without looking at the code it's not really obvious what increment is.
Perhaps you can use a more descriptive word?

> + }
>   goto error;
>   }
>  
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recommendations for balancing as part of regular maintenance?

2018-01-16 Thread Austin S. Hemmelgarn

On 2018-01-16 01:45, Chris Murphy wrote:

On Mon, Jan 15, 2018 at 11:23 AM, Tom Worster  wrote:

On 13 Jan 2018, at 17:09, Chris Murphy wrote:


On Fri, Jan 12, 2018 at 11:24 AM, Austin S. Hemmelgarn
 wrote:


To that end, I propose the following text for the FAQ:

Q: Do I need to run a balance regularly?

A: While not strictly necessary for normal operations, running a filtered
balance regularly can help prevent your filesystem from ending up with
ENOSPC issues.  The following command run daily on each BTRFS volume
should
be more than sufficient for most users:

`btrfs balance start -dusage=25 -dlimit=2..10 -musage=25 -mlimit=2..10`


Daily? Seems excessive.

I've got multiple Btrfs file systems that I haven't balanced, full or
partial, in a year. And I have no problems. One is a laptop which
accumulates snapshots until roughly 25% free space remains and then
most of the snapshots are deleted, except the most recent few, all at
one time. I'm not experiencing any problems so far. The other is a NAS
and it's multiple copies, with maybe 100-200 snapshots. One backup
volume is 99% full, there's no more unallocated free space, I delete
snapshots only to make room for btrfs send receive to keep pushing the
most recent snapshot from the main volume to the backup. Again no
problems.

I really think suggestions this broad are just going to paper over
bugs or design flaws, we won't see as many bug reports and then real
problems won't get fixed.


This is just an answer to a FAQ. This is not Austin or anyone else trying to
telling you or anyone else that you should do this. It should be clear that
there is an implied caveat along the lines of: "There are other ways to
manage allocation besides regular balancing. This recommendation is a
For-Dummies-kinda default that should work well enough if you don't have
another strategy better adapted to your situation." If this implication is
not obvious enough then we can add something explicit.


It's an upstream answer to a frequently asked question. It's rather
official, or about as close as it gets to it.




I also thing the time based method is too subjective. What about the
layout means a balance is needed? And if it's really a suggestion, why
isn't there a chron or systemd unit that just does this for the user,
in btrfs-progs, working and enabled by default?


As a newcomer to BTRFS, I was astonished to learn that it demands each user
figure out some workaround for what is, in my judgement, a required but
missing feature, i.e. a defect, a bug. At present the docs are pretty
confusing for someone trying to deal with it on their own.

Unless some better fix is in the works, this _should_ be a systemd unit or
something. Until then, please put it in FAQ.


At least openSUSE has a systemd unit for a long time now, but last
time I checked (a bit over a year ago) it's disabled by default. Why?

And insofar as I'm aware, openSUSE users aren't having big problems
related to lack of balancing, they have problems due to the lack of
balancing combined with schizo snapper defaults, which are these days
masked somewhat by turning on quotas so snapper can be more accurate
about cleaning up.
And in turn causing other issues because of the quotas, but that's 
getting OT...


Basically the scripted balance tells me two things:
a. Something is broken (still)
b. None of the developers has time to investigate coherent bug reports
about a. and fix/refine it.
I don't entirely agree here.  The issue is essentially inherent in the 
very design of the two-stage allocator itself, so it's not really 
something that can just be fixed by some simple surface patch.  The only 
real options I see to fix it are either:

1. Redesign the allocator
or:
2. figure out some way to handle this generically and automatically.

The first case is pretty much immediately out because it will almost 
certainly require a breaking change in the on-disk format.  The second 
is extremely challenging to do right, and likely to cause some 
significant controversy among list regulars (I for one don't want the FS 
doing stuff behind my back that impacts performance, and I have a 
feeling that quite a lot of other people here don't either).


Given that, I would say time is only a (probably small) part of it. 
This is not an easy thing to fix given the current situation, and 
difficult problems tend to sit around with no progress for very long 
periods of time in open source development.


And therefore papering over the problem is all we have. Basically it's
a sledgehammer approach.
How exactly is this any different than requiring a user to manually 
scrub things to check data that's not being actively used?  Or requiring 
manual invocation of defragmentation?  Or even batch deduplication?


All of those are manually triggered solutions to 'problems' with the 
filesystem, just like this is.  The only difference is that people are 
used to needing to manually defrag disks, and reasonably used to the 
need for manual scrubs

Re: invalid files names, btrfs check can't repair it

2018-01-16 Thread Qu Wenruo


On 2018年01月16日 20:15, Sebastian Andrzej Siewior wrote:
> On 2018-01-16 19:20:56 [+0800], Qu Wenruo wrote:
>> This is a very minor problem.
>>
>> btrfs check --repair should handle it without problem.
> 
> awesome, thank you! After the --repair completed, the following didn't
> report the error anymore. And now I was able to remove the two files.
> Thank you.

And thanks for your report on this issue too, I'm working on the generic
repair function for such type mismatch and name len mismatch.

It would be soon to see such repair function in btrfs-progs, and no need
to bother using any hard coded fixer.

(Just a little spoiler, the repair functionality will land in lowmem
mode first, as lowmem mode already have such infrastructure to address
it, although if executed with mainline btrfs-progs, it will just worse
the problem)

Thanks,
Qu

> 
>> Thanks,
>> Qu
> 
> Sebastian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature


Re: invalid files names, btrfs check can't repair it

2018-01-16 Thread Sebastian Andrzej Siewior
On 2018-01-16 19:20:56 [+0800], Qu Wenruo wrote:
> This is a very minor problem.
> 
> btrfs check --repair should handle it without problem.

awesome, thank you! After the --repair completed, the following didn't
report the error anymore. And now I was able to remove the two files.
Thank you.

> Thanks,
> Qu

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/4] btrfs: cleanup btrfs_mount() using btrfs_mount_root()

2018-01-16 Thread Anand Jain



On 01/16/2018 03:26 AM, David Sterba wrote:

On Fri, Jan 12, 2018 at 06:14:40PM +0800, Anand Jain wrote:


Misono,

   This change is causing subsequent (subvol) mount to fail when device
   option is specified. The simplest eg for failure is ..
 mkfs.btrfs -qf /dev/sdc /dev/sdb
 mount -o device=/dev/sdb /dev/sdc /btrfs
 mount -o device=/dev/sdb /dev/sdc /btrfs1
mount: /dev/sdc is already mounted or /btrfs1 busy

Looks like
  blkdev_get_by_path() <-- is failing.
  btrfs_scan_one_device()
  btrfs_parse_early_options()
  btrfs_mount()

   Which is due to different holders (viz. btrfs_root_fs_type and
   btrfs_fs_type) one is used for vfs_mount and other for scan,
   so they form different holders and can't let EXCL open which
   is needed for both scan and open.


This looks close to what I see in the random test failures. I've
reverted your patch "btrfs: optimize move uuid_mutex closer to the
critical section" as I bisected to it. The uuid mutex around
blkdev_get_path probably protected the concurrent mount and scan so they
did not ask for EXCL at the same time.

Reverting (or removing the patch from the current misc-next) queue is
simpler for me ATM as I want to get to a stable base now, we can add it
later if we understand the issue with the mount/scan.


 Right. I don't see above test case failing on your branch [1] which
 does not have the uuid_mutex patch. Quite strange, there isn't any
 concurrency (mount and scan) in this test case.
 [1]
   git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next

 Ran xfstests, got stuck at btrfs/011 failures, (and will wait for
 Liubo's v2 patch). OR is there any other test case you were referring
 to as random test failures ?

Thanks, Anand


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: invalid files names, btrfs check can't repair it

2018-01-16 Thread Qu Wenruo


On 2018年01月16日 18:49, Sebastian Andrzej Siewior wrote:
> On 2018-01-16 18:22:14 [+0800], Qu Wenruo wrote:
>> Btrfs check is always recommended before really mount it.
> 
>  # btrfs check -p /dev/sdb4 
>  Checking filesystem on /dev/sdb4
>  UUID: b3bfb56e-d445-4335-93f0-c1fb2d1f6df1
>  checking extents [O]
>  cache and super generation don't match, space cache will be invalidated
>  root 5 inode 57648595 errors 200, dir isize wrong

This is a very minor problem.

btrfs check --repair should handle it without problem.

Or just ignore it if you don't trust check --repair.

> 
>  ERROR: errors found in fs roots
>  found 64009740288 bytes used, error(s) found
>  total csum bytes: 61644600
>  total tree bytes: 865746944
>  total fs tree bytes: 693469184
>  total extent tree bytes: 84905984
>  btree space waste bytes: 225096302
>  file data blocks allocated: 63203807232
>   referenced 63142268928
> 
>> However as long as there is no other problem, btrfs check should not
>> report anything wrong.
> 
> Is it important to note that the filesystem is a raid1 one?

Nope, the repair code handles any chunk profile well.

Thanks,
Qu

> 
>> Thanks,
>> Qu
> 
> Sebastian
> 



signature.asc
Description: OpenPGP digital signature


Re: Recommendations for balancing as part of regular maintenance?

2018-01-16 Thread Andrei Borzenkov
On Tue, Jan 16, 2018 at 9:45 AM, Chris Murphy  wrote:
...
>>
>> Unless some better fix is in the works, this _should_ be a systemd unit or
>> something. Until then, please put it in FAQ.
>
> At least openSUSE has a systemd unit for a long time now, but last
> time I checked (a bit over a year ago) it's disabled by default. Why?
>

It is now enabled by default on Tumbleweed and hence likely on SLE/Leap 15.

> And insofar as I'm aware, openSUSE users aren't having big problems
> related to lack of balancing, they have problems due to the lack of
> balancing combined with schizo snapper defaults, which are these days
> masked somewhat by turning on quotas so snapper can be more accurate
> about cleaning up.
>

Not only that but also making snapshot policy less aggressive - now
(in Tumbleweed/Leap 42.3) periodical snapshots are turned off by
default, only configuration changes via YaST/package updates via
zypper trigger snapshot creation.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: invalid files names, btrfs check can't repair it

2018-01-16 Thread Sebastian Andrzej Siewior
On 2018-01-16 18:22:14 [+0800], Qu Wenruo wrote:
> Btrfs check is always recommended before really mount it.

 # btrfs check -p /dev/sdb4 
 Checking filesystem on /dev/sdb4
 UUID: b3bfb56e-d445-4335-93f0-c1fb2d1f6df1
 checking extents [O]
 cache and super generation don't match, space cache will be invalidated
 root 5 inode 57648595 errors 200, dir isize wrong

 ERROR: errors found in fs roots
 found 64009740288 bytes used, error(s) found
 total csum bytes: 61644600
 total tree bytes: 865746944
 total fs tree bytes: 693469184
 total extent tree bytes: 84905984
 btree space waste bytes: 225096302
 file data blocks allocated: 63203807232
  referenced 63142268928

> However as long as there is no other problem, btrfs check should not
> report anything wrong.

Is it important to note that the filesystem is a raid1 one?

> Thanks,
> Qu

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: invalid files names, btrfs check can't repair it

2018-01-16 Thread Qu Wenruo


On 2018年01月16日 18:10, Sebastian Andrzej Siewior wrote:
> On 2018-01-16 13:19:50 [+0800], Qu Wenruo wrote:
>> Usage:
>> ./btrfs-corrupt-block -X 
> # ./btrfs-corrupt-block -X /dev/sdb4
> Fix is done successfully
> 
> may I mount it now or should I run btrfs check or something first?

Btrfs check is always recommended before really mount it.

However as long as there is no other problem, btrfs check should not
report anything wrong.

Thanks,
Qu
> 
>> Thanks,
>> Qu
> 
> Sebastian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature


Re: invalid files names, btrfs check can't repair it

2018-01-16 Thread Sebastian Andrzej Siewior
On 2018-01-16 13:19:50 [+0800], Qu Wenruo wrote:
> Usage:
> ./btrfs-corrupt-block -X 
# ./btrfs-corrupt-block -X /dev/sdb4
Fix is done successfully

may I mount it now or should I run btrfs check or something first?

> Thanks,
> Qu

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: invalid files names, btrfs check can't repair it

2018-01-16 Thread Sebastian Andrzej Siewior
On 16 January 2018 05:51:23 CET, Qu Wenruo  wrote:
>Since there is no other hit, I assume it's root subvolume (5), but I
>still need the extra confirm since the fix will be hard-coded.

Yes, 5 is correct.

>
>Thanks,
>Qu



Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html