Re: Scrub aborts due to corrupt leaf

2018-08-28 Thread Qu Wenruo


On 2018/8/28 下午9:56, Chris Murphy wrote:
> On Tue, Aug 28, 2018 at 7:42 AM, Qu Wenruo  wrote:
>>
>>
>> On 2018/8/28 下午9:29, Larkin Lowrey wrote:
>>> On 8/27/2018 10:12 PM, Larkin Lowrey wrote:
 On 8/27/2018 12:46 AM, Qu Wenruo wrote:
>
>> The system uses ECC memory and edac-util has not reported any errors.
>> However, I will run a memtest anyway.
> So it should not be the memory problem.
>
> BTW, what's the current generation of the fs?
>
> # btrfs inspect dump-super  | grep generation
>
> The corrupted leaf has generation 2862, I'm not sure how recent did the
> corruption happen.

 generation  358392
 chunk_root_generation   357256
 cache_generation358392
 uuid_tree_generation358392
 dev_item.generation 0

 I don't recall the last time I ran a scrub but I doubt it has been
 more than a year.

 I am running 'btrfs check --init-csum-tree' now. Hopefully that clears
 everything up.
>>>
>>> No such luck:
>>>
>>> Creating a new CRC tree
>>> Checking filesystem on /dev/Cached/Backups
>>> UUID: acff5096-1128-4b24-a15e-4ba04261edc3
>>> Reinitialize checksum tree
>>> csum result is 0 for block 2412149436416
>>> extent-tree.c:2764: alloc_tree_block: BUG_ON `ret` triggered, value -28
>>
>> It's ENOSPC, meaning btrfs can't find enough space for the new csum tree
>> blocks.
> 
> Seems bogus, there's >4TiB unallocated.

What a shame.
Btrfs won't try to allocate new chunk if we're allocating new tree
blocks for metadata trees (extent, csum, etc).

One quick (and dirty) way to avoid such limitation is to use the
following patch
--
diff --git a/extent-tree.c b/extent-tree.c
index 5d49af5a901e..0a1d21a8d148 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -2652,17 +2652,15 @@ int btrfs_reserve_extent(struct
btrfs_trans_handle *trans,
profile = BTRFS_BLOCK_GROUP_METADATA | alloc_profile;
}

-   if (root->ref_cows) {
-   if (!(profile & BTRFS_BLOCK_GROUP_METADATA)) {
-   ret = do_chunk_alloc(trans, info,
-num_bytes,
-BTRFS_BLOCK_GROUP_METADATA);
-   BUG_ON(ret);
-   }
+   if (!(profile & BTRFS_BLOCK_GROUP_METADATA)) {
ret = do_chunk_alloc(trans, info,
-num_bytes + SZ_2M, profile);
+num_bytes,
+BTRFS_BLOCK_GROUP_METADATA);
BUG_ON(ret);
}
+   ret = do_chunk_alloc(trans, info,
+num_bytes + SZ_2M, profile);
+   BUG_ON(ret);

WARN_ON(num_bytes < info->sectorsize);
ret = find_free_extent(trans, root, num_bytes, empty_size,
--

Thanks,
Qu

> 
>> Label: none  uuid: acff5096-1128-4b24-a15e-4ba04261edc3
>>Total devices 1 FS bytes used 66.61TiB
>>devid1 size 72.77TiB used 68.03TiB path /dev/mapper/Cached-Backups
>>
>> Data, single: total=67.80TiB, used=66.52TiB
>> System, DUP: total=40.00MiB, used=7.41MiB
>> Metadata, DUP: total=98.50GiB, used=95.21GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> Even if all metadata is only csum tree, and ~200GiB needs to be
> written, there's plenty of free space for it.
> 
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 0/6] btrfs-progs: Variant fixes for fuzz-tests

2018-08-28 Thread Qu Wenruo
Gentle ping.

These fixes are pretty small, I'd like to see them merged before I need
to rebase them again and again.

Thanks,
Qu

On 2018/8/3 下午1:50, Qu Wenruo wrote:
> This can be fetched from github:
> https://github.com/adam900710/btrfs-progs/tree/fixes_for_fuzz_test
> 
> The base HEAD is:
> commit d7a1b84756157d544a9ddc399ef48c6132eaafcf (david/devel)
> Author: Qu Wenruo 
> Date:   Thu Jul 5 15:37:31 2018 +0800
> 
> btrfs-progs: check/original: Don't overwrite return value when we failed 
> to repair
> 
> 
> Thanks for the already merged fixes for fuzz/003, the remaining part is
> pretty small now, +20/-7.
> 
> Mostly of the fixes are for fuzz/003, just a small bunch of BUG_ON()
> removal. (Patch 1~3 and 5)
> 
> There is also a fix for fuzz/003 dead loop. (Patch 4)
> 
> Finally we have a fix for fuzz/007, the bug is a segfault triggered by
> accessing poisoned list_head, caused by double list freeing. (Patch 6)
> 
> Now fuzz-test should finally work without problem.
> 
> Qu Wenruo (6):
>   btrfs-progs: Exit gracefully if we hit ENOSPC when allocating tree
> block
>   btrfs-progs: Exit gracefully when failed to repair root dir item
>   btrfs-progs: Don't report dirty leaked eb using BUG_ON
>   btrfs-progs: Fix infinite loop when failed to repair bad key order
>   btrfs-progs: Exit gracefull when we failed to alloc dev extent
>   btrfs-progs: rescue-super: Don't double free fs_devices
> 
>  check/main.c| 10 +-
>  extent-tree.c   |  5 -
>  extent_io.c |  6 +-
>  super-recover.c |  3 ---
>  volumes.c   |  3 ++-
>  5 files changed, 20 insertions(+), 7 deletions(-)
> 



signature.asc
Description: OpenPGP digital signature


[PATCH v3 1/2] btrfs: Enhance btrfs_trim_fs function to handle error better

2018-08-28 Thread Qu Wenruo
Function btrfs_trim_fs() doesn't handle errors in a consistent way, if
error happens when trimming existing block groups, it will skip the
remaining blocks and continue to trim unallocated space for each device.

And the return value will only reflect the final error from device
trimming.

This patch will fix such behavior by:

1) Recording last error from block group or device trimming
   So return value will also reflect the last error during trimming.
   Make developer more aware of the problem.

2) Continuing trimming if we can
   If we failed to trim one block group or device, we could still try
   next block group or device.

3) Report number of failures during block group and device trimming
   So it would be less noisy, but still gives user a brief summary of
   what's going wrong.

Such behavior can avoid confusion for case like failure to trim the
first block group and then only unallocated space is trimmed.

Reported-by: Chris Murphy 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/extent-tree.c | 57 ++
 1 file changed, 41 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index de6f75f5547b..7768f206196a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10832,6 +10832,16 @@ static int btrfs_trim_free_extents(struct btrfs_device 
*device,
return ret;
 }
 
+/*
+ * Trim the whole fs, by:
+ * 1) Trimming free space in each block group
+ * 2) Trimming unallocated space in each device
+ *
+ * Will try to continue trimming even if we failed to trim one block group or
+ * device.
+ * The return value will be the last error during trim.
+ * Or 0 if nothing wrong happened.
+ */
 int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
 {
struct btrfs_block_group_cache *cache = NULL;
@@ -10842,6 +10852,10 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, 
struct fstrim_range *range)
u64 end;
u64 trimmed = 0;
u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
+   u64 bg_failed = 0;
+   u64 dev_failed = 0;
+   int bg_ret = 0;
+   int dev_ret = 0;
int ret = 0;
 
/*
@@ -10852,7 +10866,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct 
fstrim_range *range)
else
cache = btrfs_lookup_block_group(fs_info, range->start);
 
-   while (cache) {
+   for (; cache; cache = next_block_group(fs_info, cache)) {
if (cache->key.objectid >= (range->start + range->len)) {
btrfs_put_block_group(cache);
break;
@@ -10866,45 +10880,56 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, 
struct fstrim_range *range)
if (!block_group_cache_done(cache)) {
ret = cache_block_group(cache, 0);
if (ret) {
-   btrfs_put_block_group(cache);
-   break;
+   bg_failed++;
+   bg_ret = ret;
+   continue;
}
ret = wait_block_group_cache_done(cache);
if (ret) {
-   btrfs_put_block_group(cache);
-   break;
+   bg_failed++;
+   bg_ret = ret;
+   continue;
}
}
-   ret = btrfs_trim_block_group(cache,
-_trimmed,
-start,
-end,
-range->minlen);
+   ret = btrfs_trim_block_group(cache, _trimmed,
+   start, end, range->minlen);
 
trimmed += group_trimmed;
if (ret) {
-   btrfs_put_block_group(cache);
-   break;
+   bg_failed++;
+   bg_ret = ret;
+   continue;
}
}
-
-   cache = next_block_group(fs_info, cache);
}
 
+   if (bg_failed)
+   btrfs_warn(fs_info,
+   "failed to trim %llu block group(s), last error was %d",
+  bg_failed, bg_ret);
mutex_lock(_info->fs_devices->device_list_mutex);
devices = _info->fs_devices->alloc_list;
list_for_each_entry(device, devices, dev_alloc_list) {
ret = btrfs_trim_free_extents(device, range->minlen,

[PATCH v3 2/2] btrfs: Ensure btrfs_trim_fs can trim the whole fs

2018-08-28 Thread Qu Wenruo
[BUG]
fstrim on some btrfs only trims the unallocated space, not trimming any
space in existing block groups.

[CAUSE]
Before fstrim_range passed to btrfs_trim_fs(), it get truncated to
range [0, super->total_bytes).
So later btrfs_trim_fs() will only be able to trim block groups in range
[0, super->total_bytes).

While for btrfs, any bytenr aligned to sector size is valid, since btrfs use
its logical address space, there is nothing limiting the location where
we put block groups.

For btrfs with routine balance, it's quite easy to relocate all
block groups and bytenr of block groups will start beyond super->total_bytes.

In that case, btrfs will not trim existing block groups.

[FIX]
Just remove the truncation in btrfs_ioctl_fitrim(), so btrfs_trim_fs()
can get the unmodified range, which is normally set to [0, U64_MAX].

Reported-by: Chris Murphy 
Fixes: f4c697e6406d ("btrfs: return EINVAL if start > total_bytes in fitrim 
ioctl")
Cc:  # v4.0+
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/extent-tree.c | 10 +-
 fs/btrfs/ioctl.c   | 11 +++
 2 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 7768f206196a..d1478d66c7a5 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10851,21 +10851,13 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, 
struct fstrim_range *range)
u64 start;
u64 end;
u64 trimmed = 0;
-   u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
u64 bg_failed = 0;
u64 dev_failed = 0;
int bg_ret = 0;
int dev_ret = 0;
int ret = 0;
 
-   /*
-* try to trim all FS space, our block group may start from non-zero.
-*/
-   if (range->len == total_bytes)
-   cache = btrfs_lookup_first_block_group(fs_info, range->start);
-   else
-   cache = btrfs_lookup_block_group(fs_info, range->start);
-
+   cache = btrfs_lookup_first_block_group(fs_info, range->start);
for (; cache; cache = next_block_group(fs_info, cache)) {
if (cache->key.objectid >= (range->start + range->len)) {
btrfs_put_block_group(cache);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 63600dc2ac4c..8165a4bfa579 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -491,7 +491,6 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, 
void __user *arg)
struct fstrim_range range;
u64 minlen = ULLONG_MAX;
u64 num_devices = 0;
-   u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
int ret;
 
if (!capable(CAP_SYS_ADMIN))
@@ -515,11 +514,15 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, 
void __user *arg)
return -EOPNOTSUPP;
if (copy_from_user(, arg, sizeof(range)))
return -EFAULT;
-   if (range.start > total_bytes ||
-   range.len < fs_info->sb->s_blocksize)
+
+   /*
+* NOTE: Don't truncate the range using super->total_bytes.
+* Bytenr of btrfs block group is in btrfs logical address space,
+* which can be any sector size aligned bytenr in [0, U64_MAX].
+*/
+   if (range.len < fs_info->sb->s_blocksize)
return -EINVAL;
 
-   range.len = min(range.len, total_bytes - range.start);
range.minlen = max(range.minlen, minlen);
ret = btrfs_trim_fs(fs_info, );
if (ret < 0)
-- 
2.18.0



[PATCH v3 0/2] btrfs: trim enhancement to allow btrfs really trim block groups

2018-08-28 Thread Qu Wenruo
This patchset can be fetched from github:
https://github.com/adam900710/linux/tree/trim_fix
Which is based on v4.19-rc1 tag.

This patchset introduces 2 enhancement, one to output better error
messages during trim, the other one is to ensure we could really trim
block groups if logical bytenr of block groups are beyond physical
device size.

These two patches are in the wild for a long time, and are pretty small
and the 2nd patch in facts fix a regression, and we already have test
case for it (btrfs/156).

Changelog:
v2:
  Only report total number of errors and first errno to make it less
  noisy.
  Change message level from warning to debug
v3:
  Rebase to v4.19-rc1.
  Change back message level from debug to warning since it's less noisy
  and will only report total failed bgs and devices.

Qu Wenruo (2):
  btrfs: Enhance btrfs_trim_fs function to handle error better
  btrfs: Ensure btrfs_trim_fs can trim the whole fs

 fs/btrfs/extent-tree.c | 67 ++
 fs/btrfs/ioctl.c   | 11 ---
 2 files changed, 49 insertions(+), 29 deletions(-)

-- 
2.18.0



Re: Scrub aborts due to corrupt leaf

2018-08-28 Thread Qu Wenruo


On 2018/8/28 下午9:56, Chris Murphy wrote:
> On Tue, Aug 28, 2018 at 7:42 AM, Qu Wenruo  wrote:
>>
>>
>> On 2018/8/28 下午9:29, Larkin Lowrey wrote:
>>> On 8/27/2018 10:12 PM, Larkin Lowrey wrote:
 On 8/27/2018 12:46 AM, Qu Wenruo wrote:
>
>> The system uses ECC memory and edac-util has not reported any errors.
>> However, I will run a memtest anyway.
> So it should not be the memory problem.
>
> BTW, what's the current generation of the fs?
>
> # btrfs inspect dump-super  | grep generation
>
> The corrupted leaf has generation 2862, I'm not sure how recent did the
> corruption happen.

 generation  358392
 chunk_root_generation   357256
 cache_generation358392
 uuid_tree_generation358392
 dev_item.generation 0

 I don't recall the last time I ran a scrub but I doubt it has been
 more than a year.

 I am running 'btrfs check --init-csum-tree' now. Hopefully that clears
 everything up.
>>>
>>> No such luck:
>>>
>>> Creating a new CRC tree
>>> Checking filesystem on /dev/Cached/Backups
>>> UUID: acff5096-1128-4b24-a15e-4ba04261edc3
>>> Reinitialize checksum tree
>>> csum result is 0 for block 2412149436416
>>> extent-tree.c:2764: alloc_tree_block: BUG_ON `ret` triggered, value -28
>>
>> It's ENOSPC, meaning btrfs can't find enough space for the new csum tree
>> blocks.
> 
> Seems bogus, there's >4TiB unallocated.

Pretty strange.

This either means chunk allocator doesn't work or we have something else
wrong.

I'll take a look into this problem.

Thanks,
Qu

> 
>> Label: none  uuid: acff5096-1128-4b24-a15e-4ba04261edc3
>>Total devices 1 FS bytes used 66.61TiB
>>devid1 size 72.77TiB used 68.03TiB path /dev/mapper/Cached-Backups
>>
>> Data, single: total=67.80TiB, used=66.52TiB
>> System, DUP: total=40.00MiB, used=7.41MiB
>> Metadata, DUP: total=98.50GiB, used=95.21GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> Even if all metadata is only csum tree, and ~200GiB needs to be
> written, there's plenty of free space for it.
> 
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: 14Gb of space lost after distro upgrade on BTFS root partition (long thread with logs)

2018-08-28 Thread Chris Murphy
On Tue, Aug 28, 2018 at 1:14 PM, Menion  wrote:
> You are correct, indeed in order to cleanup you need
>
> 1) someone realize that snapshots have been created
> 2) apt-brtfs-snapshot is manually installed on the system
>
> Assuming also that the snapshots created during do-release-upgrade are
> managed for auto cleanup

Ha! I should have read all the emails.

Anyway, good sleuthing. I think it's a good idea to file a bug report
on it, so at the least other people can fix it manually.


-- 
Chris Murphy


Re: 14Gb of space lost after distro upgrade on BTFS root partition (long thread with logs)

2018-08-28 Thread Chris Murphy
On Tue, Aug 28, 2018 at 8:56 AM, Menion  wrote:
> [sudo] password for menion:
> ID  gen top level   path
> --  --- -   
> 257 600627  5   /@
> 258 600626  5   /@home
> 296 599489  5
> /@apt-snapshot-release-upgrade-bionic-2018-08-27_15:29:55
> 297 599489  5
> /@apt-snapshot-release-upgrade-bionic-2018-08-27_15:30:08
> 298 599489  5
> /@apt-snapshot-release-upgrade-bionic-2018-08-27_15:33:30
>
> So, there are snapshots, right?

Yep. So you can use 'sudo btrfs fi du -s ' to get a
report on  how much exclusive space is being used by each of those
snapshots and I'll bet it all adds up to about 10G or whatever you're
missing.

>The time stamp is when I have launched
> do-release-upgrade, but it didn't ask anything about snapshot, neither
> I asked for it.

Yep, not sure what's creating them or what the cleanup policy is (if
there is one). So it's worth asking in an Ubuntu forum what these
snapshots are where they came from and what cleans them up so you
don't run out of space, or otherwise how to configure it if you want
more space just because.

I mean, it's a neat idea. But also it needs to clean up after itself
if for no other reason than to avoid user confusion :-)


> If it is confirmed, how can I remove the unwanted snapshot, keeping
> the current "visible" filesystem contents
> Sorry, I am still learning BTRFS and I would like to avoid mistakes
> Bye

You can definitely use Btrfs specific tools to get rid of the
snapshots and not piss off Btrfs at all. However, if you delete them
behind the back of the thing that created them in the first place, it
might get pissed off if they just suddenly go missing. Sometimes those
tools want to do the cleanups because it's tracking the snapshots and
what their purpose is. So if they just go away, it's like having the
rug pulled out from under them.

Anyway:

'sudo btrfs sub del ' will delete it.


Also, I can't tell you for sure what sort of write amplification Btrfs
contributes in your use case on eMMC compared to F2FS. Btrfs has a
"wandering trees" problem that F2FS doesn't have as big a problem.
It's not a big deal (probably) on other kinds of SSDs like SATA/SAS
and NVMe. But on eMMC? If it were SD Card I'd say you can keep using
Btrfs, and maybe mitigate the wandering trees with compression to
reduce overall writes. But if your eMMC is soldered onto a board, I
might consider F2FS instead. And Btrfs for other things.


-- 
Chris Murphy


Re: DRDY errors are not consistent with scrub results

2018-08-28 Thread Chris Murphy
On Tue, Aug 28, 2018 at 5:04 PM, Cerem Cem ASLAN  wrote:
> What I want to achive is that I want to add the problematic disk as
> raid1 and see how/when it fails and how BTRFS recovers these fails.
> While the party goes on, the main system shouldn't be interrupted
> since this is a production system. For example, I would never expect
> to be ended up with such a readonly state while trying to add a disk
> with "unknown health" to the system. Was it somewhat expected?

I don't know. I also can't tell you how LVM or mdraid behave in the
same situation either though. For sure I've come across bug reports
where underlying devices go read only and the file system falls over
totally and developers shrug and say they can't do anything.

This situation is a little different and difficult. You're starting
out with a one drive setup so the profile is single/DUP or
single/single, and that doesn't change when adding. So the 2nd drive
is actually *mandatory* for a brief period of time before you've made
it raid1 or higher. It's a developer question what is the design, and
if this is a bug: maybe the device being added should be written to
with placeholder supers or even just zeros in all the places for 'dev
add' metadata, and only if that succeeds, to then write real updated
supers to all devices. It's possible the 'dev add' presently writes
updated supers to all devices at the same time, and has a brief period
where the state is fragile and if it fails, it goes read only to
prevent damaging the file system.

Anyway, without a call trace, no idea why it ended up read only. So I
have to speculate.


>
> Although we know that disk is about to fail, it still survives.

That's very tenuous rationalization, a drive that rejects even a
single write is considered failed by the md driver. Btrfs is still
very tolerant of this, so if it had successfully added and you were
running in production, you should expect to see thousands of write
errors dumped to the kernel log because Btrfs never ejects a bad drive
still. It keeps trying. And keeps reporting the failures. And all
those errors being logged can end up causing more write demand if the
logs are on the same volume as the failing device, even more errors to
record, and you get an escalating situation with heavy log writing.


> Shouldn't we expect in such a scenario that when system tries to read
> or write some data from/to that BROKEN_DISK and when it recognizes it
> failed, it will try to recover the part of the data from GOOD_DISK and
> try to store that recovered data in some other part of the
> BROKEN_DISK?

Nope. Btrfs can only write supers to fixed locations on the drive,
same as any other file system. Btrfs metadata could possibly go
elsewhere because it doesn't have fixed locations, but Btrfs doesn't
do bad sector tracking. So once it decides metadata goes in location
X, if X reports a write error it will not try to write elsewhere and
insofar as I'm aware ext4 and XFS and LVM and md don't either; md does
have an optional bad block map it will use for tracking bad sectors
and remap to known good sectors. Normally the drive firmware should do
this and when that fails the drive is considered toast for production
purpose

>Or did I misunderstood the whole thing?

Well in a way this is sorta user sabotage. It's a valid test and I'd
say ideally things should fail safely, rather than fall over. But at
the same time it's not wrong for developers to say: "look if you add a
bad device there's a decent chance we're going face plant and go read
only to avoid causing worse problems, so next time you should qualify
the drive before putting it into production."

I'm willing to bet all the other file system devs would say something
like that even if Btrfs devs think something better could happen, it's
probably not a super high priority.




-- 
Chris Murphy


Re: DRDY errors are not consistent with scrub results

2018-08-28 Thread Cerem Cem ASLAN
What I want to achive is that I want to add the problematic disk as
raid1 and see how/when it fails and how BTRFS recovers these fails.
While the party goes on, the main system shouldn't be interrupted
since this is a production system. For example, I would never expect
to be ended up with such a readonly state while trying to add a disk
with "unknown health" to the system. Was it somewhat expected?

Although we know that disk is about to fail, it still survives.
Shouldn't we expect in such a scenario that when system tries to read
or write some data from/to that BROKEN_DISK and when it recognizes it
failed, it will try to recover the part of the data from GOOD_DISK and
try to store that recovered data in some other part of the
BROKEN_DISK? Or did I misunderstood the whole thing?
Chris Murphy , 29 Ağu 2018 Çar, 00:07
tarihinde şunu yazdı:
>
> On Tue, Aug 28, 2018 at 12:50 PM, Cerem Cem ASLAN  
> wrote:
> > I've successfully moved everything to another disk. (The only hard
> > part was configuring the kernel parameters, as my root partition was
> > on LVM which is on LUKS partition. Here are the notes, if anyone
> > needs: 
> > https://github.com/ceremcem/smith-sync/blob/master/create-bootable-backup.md)
> >
> > Now I'm seekin for trouble :) I tried to convert my new system (booted
> > with new disk) into raid1 coupled with the problematic old disk. To do
> > so, I issued:
> >
> > sudo btrfs device add /dev/mapper/master-root /mnt/peynir/
> > /dev/mapper/master-root appears to contain an existing filesystem (btrfs).
> > ERROR: use the -f option to force overwrite of /dev/mapper/master-root
> > aea@aea3:/mnt$ sudo btrfs device add /dev/mapper/master-root /mnt/peynir/ -f
> > ERROR: error adding device '/dev/mapper/master-root': Input/output error
> > aea@aea3:/mnt$ sudo btrfs device add /dev/mapper/master-root /mnt/peynir/
> > sudo: unable to open /var/lib/sudo/ts/aea: Read-only file system
> >
> > Now I ended up with a readonly file system. Isn't it possible to add a
> > device to a running system?
>
> Yes.
>
> The problem is the 2nd error message:
>
> ERROR: error adding device '/dev/mapper/master-root': Input/output error
>
> So you need to look in dmesg to see what Btrfs kernel messages
> occurred at that time. I'm gonna guess it's a failed write. You have a
> few of those in the smartctl log output. Any time a write failure
> happens, the operation is always fatal regardless of the file system.
>
>
>
> --
> Chris Murphy


WARNING in clone_finish_inode_update while de-duplicating with bedup

2018-08-28 Thread Sam Tygier

Hello,

I get the following WARNING while de-duplicating with bedup 0.10.1. I am 
running Debian with backports kernel:
Linux lithium 4.17.0-0.bpo.1-amd64 #1 SMP Debian 4.17.8-1~bpo9+1 
(2018-07-23) x86_64 GNU/Linux


I ran:
sudo bedup scan /media/btrfs/
sudo bedup dedupe /media/btrfs/

/media/btrfs has 4 drives in btrfs raid1

Aug 19 03:32:39 lithium kernel: BTRFS: Transaction aborted (error -28)
Aug 19 03:32:39 lithium kernel: WARNING: CPU: 2 PID: 4204 at 
/build/linux-hvYKKE/linux-4.17.8/fs/btrfs/ioctl.c:3249 
clone_finish_inode_update+0xf3/0x140 [btrfs]
Aug 19 03:32:39 lithium kernel: Modules linked in: fuse ufs qnx4 hfsplus 
hfs minix ntfs vfat msdos fat jfs xfs dm_mod xt_multiport iptable_filter 
iTCO_wdt iTCO_vendor_support ppdev evdev intel_powerclamp squashfs 
ir_rc6_decoder pcspkr serio_raw sg rc_rc6_mce lpc_ich shpchp fintek_cir 
parport_pc rc_core parport video button f71882fg lm78 hwmon_vid coretemp 
nfsd auth_rpcgss nfs_acl loop lockd grace sunrpc ip_tables x_tables 
autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb crypto_simd cryptd 
glue_helper aes_x86_64 btrfs xor zstd_decompress zstd_compress xxhash 
raid6_pq libcrc32c crc32c_generic sd_mod ahci libahci libata i2c_i801 
psmouse scsi_mod uhci_hcd ehci_pci ehci_hcd e1000e usbcore usb_common 
thermal
Aug 19 03:32:39 lithium kernel: CPU: 2 PID: 4204 Comm: bedup Not tainted 
4.17.0-0.bpo.1-amd64 #1 Debian 4.17.8-1~bpo9+1

Aug 19 03:32:39 lithium kernel: Hardware name:  /, BIOS 4.6.5 12/11/2012
Aug 19 03:32:39 lithium kernel: RIP: 
0010:clone_finish_inode_update+0xf3/0x140 [btrfs]

Aug 19 03:32:39 lithium kernel: RSP: 0018:a2e44931fc38 EFLAGS: 00010282
Aug 19 03:32:39 lithium kernel: RAX:  RBX: 
ffe4 RCX: 0006
Aug 19 03:32:39 lithium kernel: RDX: 0007 RSI: 
0086 RDI: 93571fd16730
Aug 19 03:32:39 lithium kernel: RBP: a2e44931fc68 R08: 
0001 R09: 0492
Aug 19 03:32:39 lithium kernel: R10: 935687ff6ea8 R11: 
0492 R12: 93561add38f0
Aug 19 03:32:39 lithium kernel: R13: 0438 R14: 
93552159d288 R15: 935603ef4a10
Aug 19 03:32:39 lithium kernel: FS:  7fb3dffe6700() 
GS:93571fd0() knlGS:
Aug 19 03:32:39 lithium kernel: CS:  0010 DS:  ES:  CR0: 
80050033
Aug 19 03:32:39 lithium kernel: CR2: 7fb3d87d5024 CR3: 
000170bcc000 CR4: 06e0

Aug 19 03:32:39 lithium kernel: Call Trace:
Aug 19 03:32:39 lithium kernel:  btrfs_clone+0x938/0x10e0 [btrfs]
Aug 19 03:32:39 lithium kernel:  btrfs_clone_files+0x16f/0x370 [btrfs]
Aug 19 03:32:39 lithium kernel:  vfs_clone_file_range+0x120/0x200
Aug 19 03:32:39 lithium kernel:  ioctl_file_clone+0x9f/0x100
Aug 19 03:32:39 lithium kernel:  ? __vma_rb_erase+0x11a/0x230
Aug 19 03:32:39 lithium kernel:  do_vfs_ioctl+0x341/0x620
Aug 19 03:32:39 lithium kernel:  ? do_munmap+0x34a/0x460
Aug 19 03:32:39 lithium kernel:  ksys_ioctl+0x70/0x80
Aug 19 03:32:39 lithium kernel:  __x64_sys_ioctl+0x16/0x20
Aug 19 03:32:39 lithium kernel:  do_syscall_64+0x55/0x110
Aug 19 03:32:39 lithium kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Aug 19 03:32:39 lithium kernel: RIP: 0033:0x7fb3deda9dd7
Aug 19 03:32:39 lithium kernel: RSP: 002b:7fffd9e39e48 EFLAGS: 
0246 ORIG_RAX: 0010
Aug 19 03:32:39 lithium kernel: RAX: ffda RBX: 
40049409 RCX: 7fb3deda9dd7
Aug 19 03:32:39 lithium kernel: RDX: 0014 RSI: 
40049409 RDI: 0016
Aug 19 03:32:39 lithium kernel: RBP: 557b716730e0 R08: 
 R09: 7fffd9e39c20
Aug 19 03:32:39 lithium kernel: R10: 0100 R11: 
0246 R12: 7fffd9e39e70
Aug 19 03:32:39 lithium kernel: R13: 557b7189f4c0 R14: 
0016 R15: 0001
Aug 19 03:32:39 lithium kernel: Code: 89 c7 e9 67 ff ff ff 49 8b 44 24 
50 f0 48 0f ba a8 30 17 00 00 02 72 15 83 fb fb 74 3b 89 de 48 c7 c7 78 
1f 70 c0 e8 3d a5 5d f9 <0f> 0b 89 d9 4c 89 e7 ba b1 0c 00 00 48 c7 c6 
50 62 6f c0 e8 cf

Aug 19 03:32:39 lithium kernel: ---[ end trace d8e04102b2b7c95a ]---
Aug 19 03:32:39 lithium kernel: BTRFS: error (device sda2) in 
clone_finish_inode_update:3249: errno=-28 No space left

Aug 19 03:32:39 lithium kernel: BTRFS info (device sda2): forced readonly
Aug 19 03:32:39 lithium kernel: BTRFS error (device sda2): pending csums 
is 275275776


Bedup crashed, I guess because the warning forced the filesystem readonly

Deduplicated:
- '/media/btrfs/foo1/a/b/c/bar.mkv'
- '/media/btrfs/foo2/a/b/c/d/e/bar.mkv'
03:27:43 Size group 64/17378 (1344397706) sampled 158 hashed 150 freed 
355814871095

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/bedup/tracking.py", line 
609, in dedup_tracked1

    dedup_fileset(ds, fileset, fd_names, fd_inodes, size)
  File "/usr/local/lib/python3.5/dist-packages/bedup/tracking.py", line 
632, in dedup_fileset

    deduped = clone_data(dest=dfd, src=sfd, 

Re: DRDY errors are not consistent with scrub results

2018-08-28 Thread Chris Murphy
On Tue, Aug 28, 2018 at 12:50 PM, Cerem Cem ASLAN  wrote:
> I've successfully moved everything to another disk. (The only hard
> part was configuring the kernel parameters, as my root partition was
> on LVM which is on LUKS partition. Here are the notes, if anyone
> needs: 
> https://github.com/ceremcem/smith-sync/blob/master/create-bootable-backup.md)
>
> Now I'm seekin for trouble :) I tried to convert my new system (booted
> with new disk) into raid1 coupled with the problematic old disk. To do
> so, I issued:
>
> sudo btrfs device add /dev/mapper/master-root /mnt/peynir/
> /dev/mapper/master-root appears to contain an existing filesystem (btrfs).
> ERROR: use the -f option to force overwrite of /dev/mapper/master-root
> aea@aea3:/mnt$ sudo btrfs device add /dev/mapper/master-root /mnt/peynir/ -f
> ERROR: error adding device '/dev/mapper/master-root': Input/output error
> aea@aea3:/mnt$ sudo btrfs device add /dev/mapper/master-root /mnt/peynir/
> sudo: unable to open /var/lib/sudo/ts/aea: Read-only file system
>
> Now I ended up with a readonly file system. Isn't it possible to add a
> device to a running system?

Yes.

The problem is the 2nd error message:

ERROR: error adding device '/dev/mapper/master-root': Input/output error

So you need to look in dmesg to see what Btrfs kernel messages
occurred at that time. I'm gonna guess it's a failed write. You have a
few of those in the smartctl log output. Any time a write failure
happens, the operation is always fatal regardless of the file system.



-- 
Chris Murphy


Re: Strange behavior (possible bugs) in btrfs

2018-08-28 Thread Jayashree Mohan
Hi Filipe,

This is to follow up the status of crash consistency bugs we reported
on btrfs. We see that there has been a patch(not in the kernel yet)
(https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg77875.html)
that resolves one of the reported bugs. However, the other bugs we
reported still exist on the latest kernel (4.19-rc1), even with the
submitted patch. Here is the list of other inconsistencies we
reported, along with the workload to reproduce them :
https://www.spinics.net/lists/linux-btrfs/msg77219.html

We just wanted to ensure that resolving these are on your to-do list.
Additionally, if there are more patches queued to address these
issues, please let us know.

Thanks,
Jayashree Mohan

Thanks,
Jayashree Mohan



On Fri, May 11, 2018 at 10:45 AM Filipe Manana  wrote:
>
> On Mon, Apr 30, 2018 at 5:04 PM, Vijay Chidambaram  wrote:
> > Hi,
> >
> > We found two more cases where the btrfs behavior is a little strange.
> > In one case, an fsync-ed file goes missing after a crash. In the
> > other, a renamed file shows up in both directories after a crash.
> >
> > Workload 1:
> >
> > mkdir A
> > mkdir B
> > mkdir A/C
> > creat B/foo
> > fsync B/foo
> > link B/foo A/C/foo
> > fsync A
> > -- crash --
> >
> > Expected state after recovery:
> > B B/foo A A/C exist
> >
> > What we find:
> > Only B B/foo exist
> >
> > A is lost even after explicit fsync to A.
> >
> > Workload 2:
> >
> > mkdir A
> > mkdir A/C
> > rename A/C B
> > touch B/bar
> > fsync B/bar
> > rename B/bar A/bar
> > rename A B (replacing B with A at this point)
> > fsync B/bar
> > -- crash --
> >
> > Expected contents after recovery:
> > A/bar
> >
> > What we find after recovery:
> > A/bar
> > B/bar
> >
> > We think this breaks rename's atomicity guarantee. bar should be
> > present in either A or B, but now it is present in both.
>
> I'll take a look at these, and all the other potential issues you
> reported in other threads, next week and let you know.
> Thanks.
>
> >
> > Thanks,
> > Vijay
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Filipe David Manana,
>
> “Whether you think you can, or you think you can't — you're right.”


Re: 14Gb of space lost after distro upgrade on BTFS root partition (long thread with logs)

2018-08-28 Thread Austin S. Hemmelgarn

On 2018-08-28 15:14, Menion wrote:

You are correct, indeed in order to cleanup you need

1) someone realize that snapshots have been created
2) apt-brtfs-snapshot is manually installed on the system
Your second requirement is only needed if you want the nice automated 
cleanup.  There's absolutely nothing preventing you from manually 
removing the snapshots.


Assuming also that the snapshots created during do-release-upgrade are 
managed for auto cleanup


Il martedì 28 agosto 2018, Noah Massey > ha scritto:


On Tue, Aug 28, 2018 at 1:25 PM Menion mailto:men...@gmail.com>> wrote:
 >
 > Ok, I have removed the snapshot and the free expected space is
here, thank you!
 > As a side note: apt-btrfs-snapshot was not installed, but it is
 > present in Ubuntu repository and I have used it (and I like the idea
 > of automatic snapshot during upgrade)
 > This means that the do-release-upgrade does it's own job on BTRFS,
 > silently which I believe is not good from the usability perspective,

You are correct. DistUpgradeController.py from python3-distupgrade
imports 'apt_btrfs_snapshot', which I read as coming from
/usr/lib/python3/dist-packages/apt_btrfs_snapshot.py, supplied by
apt-btrfs-snapshot, but I missed the fact that python3-distupgrade
ships its own
/usr/lib/python3/dist-packages/DistUpgrade/apt_btrfs_snapshot.py

So now it looks like that cannot be easily disabled, and without the
apt-btrfs-snapshot package scheduling cleanups it's not ever
automatically removed?

 > just google it, there is no mention of this behaviour
 > Il giorno mar 28 ago 2018 alle ore 19:07 Austin S. Hemmelgarn
 > mailto:ahferro...@gmail.com>> ha scritto:
 > >
 > > On 2018-08-28 12:05, Noah Massey wrote:
 > > > On Tue, Aug 28, 2018 at 11:47 AM Austin S. Hemmelgarn
 > > > mailto:ahferro...@gmail.com>> wrote:
 > > >>
 > > >> On 2018-08-28 11:27, Noah Massey wrote:
 > > >>> On Tue, Aug 28, 2018 at 10:59 AM Menion mailto:men...@gmail.com>> wrote:
 > > 
 > >  [sudo] password for menion:
 > >  ID      gen     top level       path
 > >  --      ---     -       
 > >  257     600627  5               /@
 > >  258     600626  5               /@home
 > >  296     599489  5
 > > 
/@apt-snapshot-release-upgrade-bionic-2018-08-27_15:29:55
 > >  297     599489  5
 > > 
/@apt-snapshot-release-upgrade-bionic-2018-08-27_15:30:08
 > >  298     599489  5
 > > 
/@apt-snapshot-release-upgrade-bionic-2018-08-27_15:33:30
 > > 
 > >  So, there are snapshots, right? The time stamp is when I
have launched
 > >  do-release-upgrade, but it didn't ask anything about
snapshot, neither
 > >  I asked for it.
 > > >>>
 > > >>> This is an Ubuntu thing
 > > >>> `apt show apt-btrfs-snapshot`
 > > >>> which "will create a btrfs snapshot of the root filesystem
each time
 > > >>> that apt installs/removes/upgrades a software package."
 > > >> Not Ubuntu, Debian.  It's just that Ubuntu installs and
configures the
 > > >> package by default, while Debian does not.
 > > >
 > > > Ubuntu also maintains the package, and I did not find it in
Debian repositories.
 > > > I think it's also worth mentioning that these snapshots were
created
 > > > by the do-release-upgrade script using the package directly,
not as a
 > > > result of the apt configuration. Meaning if you do not want a
snapshot
 > > > taken prior to upgrade, you have to remove the apt-btrfs-snapshot
 > > > package prior to running the upgrade script. You cannot just
update
 > > > /etc/apt/apt.conf.d/80-btrfs-snapshot
 > > Hmm... I could have sworn that it was in the Debian repositories.
 > >
 > > That said, it's kind of stupid that the snapshot is not trivially
 > > optional for a release upgrade.  Yes, that's where it's
arguably the
 > > most important, but it's still kind of stupid to have to remove a
 > > package to get rid of that behavior and then reinstall it again
afterwards.





Re: DRDY errors are not consistent with scrub results

2018-08-28 Thread Cerem Cem ASLAN
I've successfully moved everything to another disk. (The only hard
part was configuring the kernel parameters, as my root partition was
on LVM which is on LUKS partition. Here are the notes, if anyone
needs: 
https://github.com/ceremcem/smith-sync/blob/master/create-bootable-backup.md)

Now I'm seekin for trouble :) I tried to convert my new system (booted
with new disk) into raid1 coupled with the problematic old disk. To do
so, I issued:

sudo btrfs device add /dev/mapper/master-root /mnt/peynir/
/dev/mapper/master-root appears to contain an existing filesystem (btrfs).
ERROR: use the -f option to force overwrite of /dev/mapper/master-root
aea@aea3:/mnt$ sudo btrfs device add /dev/mapper/master-root /mnt/peynir/ -f
ERROR: error adding device '/dev/mapper/master-root': Input/output error
aea@aea3:/mnt$ sudo btrfs device add /dev/mapper/master-root /mnt/peynir/
sudo: unable to open /var/lib/sudo/ts/aea: Read-only file system

Now I ended up with a readonly file system. Isn't it possible to add a
device to a running system?

Chris Murphy , 28 Ağu 2018 Sal, 04:08
tarihinde şunu yazdı:
>
> On Mon, Aug 27, 2018 at 6:49 PM, Cerem Cem ASLAN  
> wrote:
> > Thanks for your guidance, I'll get the device replaced first thing in
> > the morning.
> >
> > Here is balance results which I think resulted not too bad:
> >
> > sudo btrfs balance start /mnt/peynir/
> > WARNING:
> >
> > Full balance without filters requested. This operation is very
> > intense and takes potentially very long. It is recommended to
> > use the balance filters to narrow down the balanced data.
> > Use 'btrfs balance start --full-balance' option to skip this
> > warning. The operation will start in 10 seconds.
> > Use Ctrl-C to stop it.
> > 10 9 8 7 6 5 4 3 2 1
> > Starting balance without any filters.
> > Done, had to relocate 18 out of 18 chunks
> >
> > I suppose this means I've not lost any data, but I'm very prone to due
> > to previous `smartctl ...` results.
>
>
> OK so nothing fatal anyway. We'd have to see any kernel messages that
> appeared during the balance to see if there were read or write errors,
> but presumably any failure means the balance fails so... might get you
> by for a while actually.
>
>
>
>
>
>
>
> --
> Chris Murphy


Re: 14Gb of space lost after distro upgrade on BTFS root partition (long thread with logs)

2018-08-28 Thread Noah Massey
On Tue, Aug 28, 2018 at 1:25 PM Menion  wrote:
>
> Ok, I have removed the snapshot and the free expected space is here, thank 
> you!
> As a side note: apt-btrfs-snapshot was not installed, but it is
> present in Ubuntu repository and I have used it (and I like the idea
> of automatic snapshot during upgrade)
> This means that the do-release-upgrade does it's own job on BTRFS,
> silently which I believe is not good from the usability perspective,

You are correct. DistUpgradeController.py from python3-distupgrade
imports 'apt_btrfs_snapshot', which I read as coming from
/usr/lib/python3/dist-packages/apt_btrfs_snapshot.py, supplied by
apt-btrfs-snapshot, but I missed the fact that python3-distupgrade
ships its own /usr/lib/python3/dist-packages/DistUpgrade/apt_btrfs_snapshot.py

So now it looks like that cannot be easily disabled, and without the
apt-btrfs-snapshot package scheduling cleanups it's not ever
automatically removed?

> just google it, there is no mention of this behaviour
> Il giorno mar 28 ago 2018 alle ore 19:07 Austin S. Hemmelgarn
>  ha scritto:
> >
> > On 2018-08-28 12:05, Noah Massey wrote:
> > > On Tue, Aug 28, 2018 at 11:47 AM Austin S. Hemmelgarn
> > >  wrote:
> > >>
> > >> On 2018-08-28 11:27, Noah Massey wrote:
> > >>> On Tue, Aug 28, 2018 at 10:59 AM Menion  wrote:
> > 
> >  [sudo] password for menion:
> >  ID  gen top level   path
> >  --  --- -   
> >  257 600627  5   /@
> >  258 600626  5   /@home
> >  296 599489  5
> >  /@apt-snapshot-release-upgrade-bionic-2018-08-27_15:29:55
> >  297 599489  5
> >  /@apt-snapshot-release-upgrade-bionic-2018-08-27_15:30:08
> >  298 599489  5
> >  /@apt-snapshot-release-upgrade-bionic-2018-08-27_15:33:30
> > 
> >  So, there are snapshots, right? The time stamp is when I have launched
> >  do-release-upgrade, but it didn't ask anything about snapshot, neither
> >  I asked for it.
> > >>>
> > >>> This is an Ubuntu thing
> > >>> `apt show apt-btrfs-snapshot`
> > >>> which "will create a btrfs snapshot of the root filesystem each time
> > >>> that apt installs/removes/upgrades a software package."
> > >> Not Ubuntu, Debian.  It's just that Ubuntu installs and configures the
> > >> package by default, while Debian does not.
> > >
> > > Ubuntu also maintains the package, and I did not find it in Debian 
> > > repositories.
> > > I think it's also worth mentioning that these snapshots were created
> > > by the do-release-upgrade script using the package directly, not as a
> > > result of the apt configuration. Meaning if you do not want a snapshot
> > > taken prior to upgrade, you have to remove the apt-btrfs-snapshot
> > > package prior to running the upgrade script. You cannot just update
> > > /etc/apt/apt.conf.d/80-btrfs-snapshot
> > Hmm... I could have sworn that it was in the Debian repositories.
> >
> > That said, it's kind of stupid that the snapshot is not trivially
> > optional for a release upgrade.  Yes, that's where it's arguably the
> > most important, but it's still kind of stupid to have to remove a
> > package to get rid of that behavior and then reinstall it again afterwards.


Re: 14Gb of space lost after distro upgrade on BTFS root partition (long thread with logs)

2018-08-28 Thread Menion
Ok, I have removed the snapshot and the free expected space is here, thank you!
As a side note: apt-btrfs-snapshot was not installed, but it is
present in Ubuntu repository and I have used it (and I like the idea
of automatic snapshot during upgrade)
This means that the do-release-upgrade does it's own job on BTRFS,
silently which I believe is not good from the usability perspective,
just google it, there is no mention of this behaviour
Il giorno mar 28 ago 2018 alle ore 19:07 Austin S. Hemmelgarn
 ha scritto:
>
> On 2018-08-28 12:05, Noah Massey wrote:
> > On Tue, Aug 28, 2018 at 11:47 AM Austin S. Hemmelgarn
> >  wrote:
> >>
> >> On 2018-08-28 11:27, Noah Massey wrote:
> >>> On Tue, Aug 28, 2018 at 10:59 AM Menion  wrote:
> 
>  [sudo] password for menion:
>  ID  gen top level   path
>  --  --- -   
>  257 600627  5   /@
>  258 600626  5   /@home
>  296 599489  5
>  /@apt-snapshot-release-upgrade-bionic-2018-08-27_15:29:55
>  297 599489  5
>  /@apt-snapshot-release-upgrade-bionic-2018-08-27_15:30:08
>  298 599489  5
>  /@apt-snapshot-release-upgrade-bionic-2018-08-27_15:33:30
> 
>  So, there are snapshots, right? The time stamp is when I have launched
>  do-release-upgrade, but it didn't ask anything about snapshot, neither
>  I asked for it.
> >>>
> >>> This is an Ubuntu thing
> >>> `apt show apt-btrfs-snapshot`
> >>> which "will create a btrfs snapshot of the root filesystem each time
> >>> that apt installs/removes/upgrades a software package."
> >> Not Ubuntu, Debian.  It's just that Ubuntu installs and configures the
> >> package by default, while Debian does not.
> >
> > Ubuntu also maintains the package, and I did not find it in Debian 
> > repositories.
> > I think it's also worth mentioning that these snapshots were created
> > by the do-release-upgrade script using the package directly, not as a
> > result of the apt configuration. Meaning if you do not want a snapshot
> > taken prior to upgrade, you have to remove the apt-btrfs-snapshot
> > package prior to running the upgrade script. You cannot just update
> > /etc/apt/apt.conf.d/80-btrfs-snapshot
> Hmm... I could have sworn that it was in the Debian repositories.
>
> That said, it's kind of stupid that the snapshot is not trivially
> optional for a release upgrade.  Yes, that's where it's arguably the
> most important, but it's still kind of stupid to have to remove a
> package to get rid of that behavior and then reinstall it again afterwards.


Re: 14Gb of space lost after distro upgrade on BTFS root partition (long thread with logs)

2018-08-28 Thread Austin S. Hemmelgarn

On 2018-08-28 12:05, Noah Massey wrote:

On Tue, Aug 28, 2018 at 11:47 AM Austin S. Hemmelgarn
 wrote:


On 2018-08-28 11:27, Noah Massey wrote:

On Tue, Aug 28, 2018 at 10:59 AM Menion  wrote:


[sudo] password for menion:
ID  gen top level   path
--  --- -   
257 600627  5   /@
258 600626  5   /@home
296 599489  5
/@apt-snapshot-release-upgrade-bionic-2018-08-27_15:29:55
297 599489  5
/@apt-snapshot-release-upgrade-bionic-2018-08-27_15:30:08
298 599489  5
/@apt-snapshot-release-upgrade-bionic-2018-08-27_15:33:30

So, there are snapshots, right? The time stamp is when I have launched
do-release-upgrade, but it didn't ask anything about snapshot, neither
I asked for it.


This is an Ubuntu thing
`apt show apt-btrfs-snapshot`
which "will create a btrfs snapshot of the root filesystem each time
that apt installs/removes/upgrades a software package."

Not Ubuntu, Debian.  It's just that Ubuntu installs and configures the
package by default, while Debian does not.


Ubuntu also maintains the package, and I did not find it in Debian repositories.
I think it's also worth mentioning that these snapshots were created
by the do-release-upgrade script using the package directly, not as a
result of the apt configuration. Meaning if you do not want a snapshot
taken prior to upgrade, you have to remove the apt-btrfs-snapshot
package prior to running the upgrade script. You cannot just update
/etc/apt/apt.conf.d/80-btrfs-snapshot

Hmm... I could have sworn that it was in the Debian repositories.

That said, it's kind of stupid that the snapshot is not trivially 
optional for a release upgrade.  Yes, that's where it's arguably the 
most important, but it's still kind of stupid to have to remove a 
package to get rid of that behavior and then reinstall it again afterwards.


Re: 14Gb of space lost after distro upgrade on BTFS root partition (long thread with logs)

2018-08-28 Thread Noah Massey
On Tue, Aug 28, 2018 at 11:47 AM Austin S. Hemmelgarn
 wrote:
>
> On 2018-08-28 11:27, Noah Massey wrote:
> > On Tue, Aug 28, 2018 at 10:59 AM Menion  wrote:
> >>
> >> [sudo] password for menion:
> >> ID  gen top level   path
> >> --  --- -   
> >> 257 600627  5   /@
> >> 258 600626  5   /@home
> >> 296 599489  5
> >> /@apt-snapshot-release-upgrade-bionic-2018-08-27_15:29:55
> >> 297 599489  5
> >> /@apt-snapshot-release-upgrade-bionic-2018-08-27_15:30:08
> >> 298 599489  5
> >> /@apt-snapshot-release-upgrade-bionic-2018-08-27_15:33:30
> >>
> >> So, there are snapshots, right? The time stamp is when I have launched
> >> do-release-upgrade, but it didn't ask anything about snapshot, neither
> >> I asked for it.
> >
> > This is an Ubuntu thing
> > `apt show apt-btrfs-snapshot`
> > which "will create a btrfs snapshot of the root filesystem each time
> > that apt installs/removes/upgrades a software package."
> Not Ubuntu, Debian.  It's just that Ubuntu installs and configures the
> package by default, while Debian does not.

Ubuntu also maintains the package, and I did not find it in Debian repositories.
I think it's also worth mentioning that these snapshots were created
by the do-release-upgrade script using the package directly, not as a
result of the apt configuration. Meaning if you do not want a snapshot
taken prior to upgrade, you have to remove the apt-btrfs-snapshot
package prior to running the upgrade script. You cannot just update
/etc/apt/apt.conf.d/80-btrfs-snapshot

>
> This behavior in general is not specific to Debian either, a lot of
> distributions are either working on or already have this type of
> functionality, because it's the only sane and correct way to handle
> updates short of rebuilding the entire system from scratch.

Yup. Everyone in their own way, plus all the home-brews.


Re: 14Gb of space lost after distro upgrade on BTFS root partition (long thread with logs)

2018-08-28 Thread Austin S. Hemmelgarn

On 2018-08-28 11:27, Noah Massey wrote:

On Tue, Aug 28, 2018 at 10:59 AM Menion  wrote:


[sudo] password for menion:
ID  gen top level   path
--  --- -   
257 600627  5   /@
258 600626  5   /@home
296 599489  5
/@apt-snapshot-release-upgrade-bionic-2018-08-27_15:29:55
297 599489  5
/@apt-snapshot-release-upgrade-bionic-2018-08-27_15:30:08
298 599489  5
/@apt-snapshot-release-upgrade-bionic-2018-08-27_15:33:30

So, there are snapshots, right? The time stamp is when I have launched
do-release-upgrade, but it didn't ask anything about snapshot, neither
I asked for it.


This is an Ubuntu thing
`apt show apt-btrfs-snapshot`
which "will create a btrfs snapshot of the root filesystem each time
that apt installs/removes/upgrades a software package."
Not Ubuntu, Debian.  It's just that Ubuntu installs and configures the 
package by default, while Debian does not.


This behavior in general is not specific to Debian either, a lot of 
distributions are either working on or already have this type of 
functionality, because it's the only sane and correct way to handle 
updates short of rebuilding the entire system from scratch.



During the do-release-upgrade I got some issues due to the (very) bad
behaviour of the script in remote terminal, then I have fixed
everything manually and now the filesystem is operational in bionic
version
If it is confirmed, how can I remove the unwanted snapshot, keeping
the current "visible" filesystem contents


By default, the package runs a weekly cron job to cleanup old
snapshots. (Defaults to 90d, but you can configure that in
APT::Snapshots::MaxAge) Alternatively, you can cleanup with the
command yourself. Run `sudo apt-btrfs-snapshot list`, and then `sudo
apt-btrfs-snapshot delete `


Re: 14Gb of space lost after distro upgrade on BTFS root partition (long thread with logs)

2018-08-28 Thread Noah Massey
On Tue, Aug 28, 2018 at 10:59 AM Menion  wrote:
>
> [sudo] password for menion:
> ID  gen top level   path
> --  --- -   
> 257 600627  5   /@
> 258 600626  5   /@home
> 296 599489  5
> /@apt-snapshot-release-upgrade-bionic-2018-08-27_15:29:55
> 297 599489  5
> /@apt-snapshot-release-upgrade-bionic-2018-08-27_15:30:08
> 298 599489  5
> /@apt-snapshot-release-upgrade-bionic-2018-08-27_15:33:30
>
> So, there are snapshots, right? The time stamp is when I have launched
> do-release-upgrade, but it didn't ask anything about snapshot, neither
> I asked for it.

This is an Ubuntu thing
`apt show apt-btrfs-snapshot`
which "will create a btrfs snapshot of the root filesystem each time
that apt installs/removes/upgrades a software package."

> During the do-release-upgrade I got some issues due to the (very) bad
> behaviour of the script in remote terminal, then I have fixed
> everything manually and now the filesystem is operational in bionic
> version
> If it is confirmed, how can I remove the unwanted snapshot, keeping
> the current "visible" filesystem contents

By default, the package runs a weekly cron job to cleanup old
snapshots. (Defaults to 90d, but you can configure that in
APT::Snapshots::MaxAge) Alternatively, you can cleanup with the
command yourself. Run `sudo apt-btrfs-snapshot list`, and then `sudo
apt-btrfs-snapshot delete `

~ Noah


Re: 14Gb of space lost after distro upgrade on BTFS root partition (long thread with logs)

2018-08-28 Thread Menion
[sudo] password for menion:
ID  gen top level   path
--  --- -   
257 600627  5   /@
258 600626  5   /@home
296 599489  5
/@apt-snapshot-release-upgrade-bionic-2018-08-27_15:29:55
297 599489  5
/@apt-snapshot-release-upgrade-bionic-2018-08-27_15:30:08
298 599489  5
/@apt-snapshot-release-upgrade-bionic-2018-08-27_15:33:30

So, there are snapshots, right? The time stamp is when I have launched
do-release-upgrade, but it didn't ask anything about snapshot, neither
I asked for it.
During the do-release-upgrade I got some issues due to the (very) bad
behaviour of the script in remote terminal, then I have fixed
everything manually and now the filesystem is operational in bionic
version
If it is confirmed, how can I remove the unwanted snapshot, keeping
the current "visible" filesystem contents
Sorry, I am still learning BTRFS and I would like to avoid mistakes
Bye
Il giorno mar 28 ago 2018 alle ore 15:47 Chris Murphy
 ha scritto:
>
> On Tue, Aug 28, 2018 at 3:34 AM, Menion  wrote:
> > Hi all
> > I have run a distro upgrade on my Ubuntu 16.04 that runs ppa kernel
> > 4.17.2 with btrfsprogs 4.17.0
> > The root filesystem is BTRFS single created by the Ubuntu Xenial
> > installer (so on kernel 4.4.0) on an internal mmc, located in
> > /dev/mmcblk0p3
> > After the upgrade I have cleaned apt cache and checked the free space,
> > the results were odd, following some checks (shrinked), followed by
> > more comments:
>
> Do you know if you're using Timeshift? I'm not sure if it's enabled by
> default on Ubuntu when using Btrfs, but you may have snapshots.
>
> 'sudo btrfs sub list -at /'
>
> That should show all subvolumes (includes snapshots).
>
>
>
> > [48479.254106] BTRFS info (device mmcblk0p3): 17 enospc errors during 
> > balance
>
> Probably soft enospc errors it was able to work around.
>
>
> --
> Chris Murphy


Re: Scrub aborts due to corrupt leaf

2018-08-28 Thread Chris Murphy
On Tue, Aug 28, 2018 at 7:42 AM, Qu Wenruo  wrote:
>
>
> On 2018/8/28 下午9:29, Larkin Lowrey wrote:
>> On 8/27/2018 10:12 PM, Larkin Lowrey wrote:
>>> On 8/27/2018 12:46 AM, Qu Wenruo wrote:

> The system uses ECC memory and edac-util has not reported any errors.
> However, I will run a memtest anyway.
 So it should not be the memory problem.

 BTW, what's the current generation of the fs?

 # btrfs inspect dump-super  | grep generation

 The corrupted leaf has generation 2862, I'm not sure how recent did the
 corruption happen.
>>>
>>> generation  358392
>>> chunk_root_generation   357256
>>> cache_generation358392
>>> uuid_tree_generation358392
>>> dev_item.generation 0
>>>
>>> I don't recall the last time I ran a scrub but I doubt it has been
>>> more than a year.
>>>
>>> I am running 'btrfs check --init-csum-tree' now. Hopefully that clears
>>> everything up.
>>
>> No such luck:
>>
>> Creating a new CRC tree
>> Checking filesystem on /dev/Cached/Backups
>> UUID: acff5096-1128-4b24-a15e-4ba04261edc3
>> Reinitialize checksum tree
>> csum result is 0 for block 2412149436416
>> extent-tree.c:2764: alloc_tree_block: BUG_ON `ret` triggered, value -28
>
> It's ENOSPC, meaning btrfs can't find enough space for the new csum tree
> blocks.

Seems bogus, there's >4TiB unallocated.

>Label: none  uuid: acff5096-1128-4b24-a15e-4ba04261edc3
>Total devices 1 FS bytes used 66.61TiB
>devid1 size 72.77TiB used 68.03TiB path /dev/mapper/Cached-Backups
>
>Data, single: total=67.80TiB, used=66.52TiB
>System, DUP: total=40.00MiB, used=7.41MiB
>Metadata, DUP: total=98.50GiB, used=95.21GiB
>GlobalReserve, single: total=512.00MiB, used=0.00B

Even if all metadata is only csum tree, and ~200GiB needs to be
written, there's plenty of free space for it.



-- 
Chris Murphy


Re: 14Gb of space lost after distro upgrade on BTFS root partition (long thread with logs)

2018-08-28 Thread Chris Murphy
On Tue, Aug 28, 2018 at 3:34 AM, Menion  wrote:
> Hi all
> I have run a distro upgrade on my Ubuntu 16.04 that runs ppa kernel
> 4.17.2 with btrfsprogs 4.17.0
> The root filesystem is BTRFS single created by the Ubuntu Xenial
> installer (so on kernel 4.4.0) on an internal mmc, located in
> /dev/mmcblk0p3
> After the upgrade I have cleaned apt cache and checked the free space,
> the results were odd, following some checks (shrinked), followed by
> more comments:

Do you know if you're using Timeshift? I'm not sure if it's enabled by
default on Ubuntu when using Btrfs, but you may have snapshots.

'sudo btrfs sub list -at /'

That should show all subvolumes (includes snapshots).



> [48479.254106] BTRFS info (device mmcblk0p3): 17 enospc errors during balance

Probably soft enospc errors it was able to work around.


-- 
Chris Murphy


Re: Scrub aborts due to corrupt leaf

2018-08-28 Thread Qu Wenruo


On 2018/8/28 下午9:29, Larkin Lowrey wrote:
> On 8/27/2018 10:12 PM, Larkin Lowrey wrote:
>> On 8/27/2018 12:46 AM, Qu Wenruo wrote:
>>>
 The system uses ECC memory and edac-util has not reported any errors.
 However, I will run a memtest anyway.
>>> So it should not be the memory problem.
>>>
>>> BTW, what's the current generation of the fs?
>>>
>>> # btrfs inspect dump-super  | grep generation
>>>
>>> The corrupted leaf has generation 2862, I'm not sure how recent did the
>>> corruption happen.
>>
>> generation  358392
>> chunk_root_generation   357256
>> cache_generation    358392
>> uuid_tree_generation    358392
>> dev_item.generation 0
>>
>> I don't recall the last time I ran a scrub but I doubt it has been
>> more than a year.
>>
>> I am running 'btrfs check --init-csum-tree' now. Hopefully that clears
>> everything up.
> 
> No such luck:
> 
> Creating a new CRC tree
> Checking filesystem on /dev/Cached/Backups
> UUID: acff5096-1128-4b24-a15e-4ba04261edc3
> Reinitialize checksum tree
> csum result is 0 for block 2412149436416
> extent-tree.c:2764: alloc_tree_block: BUG_ON `ret` triggered, value -28

It's ENOSPC, meaning btrfs can't find enough space for the new csum tree
blocks.

I could try to enhance the behavior, from current one to delete tree
blocks first and then refill.

But this needs some extra time to implement.

BTW, from the line number, it's not the latest btrfs-progs.

Thanks,
Qu

> btrfs(+0x1da16)[0x55cc43796a16]
> btrfs(btrfs_alloc_free_block+0x207)[0x55cc4379c177]
> btrfs(+0x1602f)[0x55cc4378f02f]
> btrfs(btrfs_search_slot+0xed2)[0x55cc43790be2]
> btrfs(btrfs_csum_file_block+0x48f)[0x55cc437a213f]
> btrfs(+0x55cef)[0x55cc437cecef]
> btrfs(cmd_check+0xd49)[0x55cc437ddbc9]
> btrfs(main+0x81)[0x55cc4378b4d1]
> /lib64/libc.so.6(__libc_start_main+0xeb)[0x7f4717e6324b]
> btrfs(_start+0x2a)[0x55cc4378b5ea]
> Aborted (core dumped)
> 
> --Larkin



signature.asc
Description: OpenPGP digital signature


Re: Scrub aborts due to corrupt leaf

2018-08-28 Thread Larkin Lowrey

On 8/27/2018 10:12 PM, Larkin Lowrey wrote:

On 8/27/2018 12:46 AM, Qu Wenruo wrote:



The system uses ECC memory and edac-util has not reported any errors.
However, I will run a memtest anyway.

So it should not be the memory problem.

BTW, what's the current generation of the fs?

# btrfs inspect dump-super  | grep generation

The corrupted leaf has generation 2862, I'm not sure how recent did the
corruption happen.


generation  358392
chunk_root_generation   357256
cache_generation    358392
uuid_tree_generation    358392
dev_item.generation 0

I don't recall the last time I ran a scrub but I doubt it has been 
more than a year.


I am running 'btrfs check --init-csum-tree' now. Hopefully that clears 
everything up.


No such luck:

Creating a new CRC tree
Checking filesystem on /dev/Cached/Backups
UUID: acff5096-1128-4b24-a15e-4ba04261edc3
Reinitialize checksum tree
csum result is 0 for block 2412149436416
extent-tree.c:2764: alloc_tree_block: BUG_ON `ret` triggered, value -28
btrfs(+0x1da16)[0x55cc43796a16]
btrfs(btrfs_alloc_free_block+0x207)[0x55cc4379c177]
btrfs(+0x1602f)[0x55cc4378f02f]
btrfs(btrfs_search_slot+0xed2)[0x55cc43790be2]
btrfs(btrfs_csum_file_block+0x48f)[0x55cc437a213f]
btrfs(+0x55cef)[0x55cc437cecef]
btrfs(cmd_check+0xd49)[0x55cc437ddbc9]
btrfs(main+0x81)[0x55cc4378b4d1]
/lib64/libc.so.6(__libc_start_main+0xeb)[0x7f4717e6324b]
btrfs(_start+0x2a)[0x55cc4378b5ea]
Aborted (core dumped)

--Larkin


Re: 14Gb of space lost after distro upgrade on BTFS root partition (long thread with logs)

2018-08-28 Thread Qu Wenruo


On 2018/8/28 下午9:07, Menion wrote:
> Ok, thanks for your replay
> This is a root FS, how can I defragment it?

If it's a rootfs, then it's a little strange.

Normally package manager should overwrite the whole file during package
update transaction, thus the extent booking doesn't happen as frequent
as other workload.

One solution is booting from other device, and try defrag.

BTW, is there any snapshots in the fs?

One way to determine it is by btrfs ins dump-tree:

# btrfs ins dump-tree -t root 

Above command can be executed on mounted device, and above root tree
dump doesn't contain confidential info except subvolume names.

If there is no extra subvolumes at all, then try to defrag may make sense.

Thanks,
Qu

> If I try to launch it I get this output:
> 
> menion@Menionubuntu:~$ sudo btrfs filesystem defragment -r /
> ERROR: defrag failed on /bin/bash: Text file busy
> ERROR: defrag failed on /bin/dash: Text file busy
> ERROR: defrag failed on /bin/btrfs: Text file busy
> ERROR: defrag failed on /lib/systemd/systemd: Text file busy
> ERROR: defrag failed on /lib/systemd/systemd-journald: Text file busy
> ERROR: defrag failed on /lib/systemd/systemd-logind: Text file busy
> ERROR: defrag failed on /lib/systemd/systemd-resolved: Text file busy
> ERROR: defrag failed on /lib/systemd/systemd-timesyncd: Text file busy
> ERROR: defrag failed on /lib/systemd/systemd-udevd: Text file busy
> ERROR: defrag failed on /lib/x86_64-linux-gnu/ld-2.27.so: Text file busy
> 
> Bye
> Il giorno mar 28 ago 2018 alle ore 13:54 Qu Wenruo
>  ha scritto:
>>
>>
>>
>> On 2018/8/28 下午5:34, Menion wrote:
>>> Hi all
>>> I have run a distro upgrade on my Ubuntu 16.04 that runs ppa kernel
>>> 4.17.2 with btrfsprogs 4.17.0
>>> The root filesystem is BTRFS single created by the Ubuntu Xenial
>>> installer (so on kernel 4.4.0) on an internal mmc, located in
>>> /dev/mmcblk0p3
>>> After the upgrade I have cleaned apt cache and checked the free space,
>>> the results were odd, following some checks (shrinked), followed by
>>> more comments:
>>>
>>> root@Menionubuntu:/home/menion# df -h
>>> Filesystem  Size  Used Avail Use% Mounted on
>>> ...
>>> /dev/mmcblk0p3   28G   24G  2.7G  90% /
>>>
>>> root@Menionubuntu:/home/menion# btrfs fi usage /usr
>>> Overall:
>>> Device size:  27.07GiB
>>> Device allocated: 25.28GiB
>>> Device unallocated:1.79GiB
>>> Device missing:  0.00B
>>> Used: 23.88GiB
>>> Free (estimated):  2.69GiB  (min: 2.69GiB)
>>> Data ratio:   1.00
>>> Metadata ratio:   1.00
>>> Global reserve:   72.94MiB  (used: 0.00B)
>>>
>>> Data,single: Size:24.00GiB, Used:23.10GiB
>>>/dev/mmcblk0p3 24.00GiB
>>>
>>> Metadata,single: Size:1.25GiB, Used:801.97MiB
>>>/dev/mmcblk0p3  1.25GiB
>>>
>>> System,single: Size:32.00MiB, Used:16.00KiB
>>>/dev/mmcblk0p3 32.00MiB
>>>
>>> Unallocated:
>>>/dev/mmcblk0p3  1.79GiB
>>>
>>> root@Menionubuntu:/home/menion# btrfs fi df /mnt
>>> Data, single: total=24.00GiB, used=23.10GiB
>>> System, single: total=32.00MiB, used=16.00KiB
>>> Metadata, single: total=1.25GiB, used=801.92MiB
>>> GlobalReserve, single: total=72.89MiB, used=0.00B
>>>
>>> The different ways to check the free space are coherent, but if I
>>> check the directories usage on root, surprise:
>>>
>>> root@Menionubuntu:/home/menion# du -x -s -h /*
>>> 17M /bin
>>> 189M/boot
>>> 36K /dead.letter
>>> 0   /dev
>>> 18M /etc
>>> 6.1G/home
>>> 4.0K/initrd.img
>>> 4.0K/initrd.img.old
>>> 791M/lib
>>> 8.3M/lib64
>>> 0   /media
>>> 4.0K/mnt
>>> 0   /opt
>>> du: cannot access '/proc/24660/task/24660/fd/3': No such file or directory
>>> du: cannot access '/proc/24660/task/24660/fdinfo/3': No such file or 
>>> directory
>>> du: cannot access '/proc/24660/fd/3': No such file or directory
>>> du: cannot access '/proc/24660/fdinfo/3': No such file or directory
>>> 0   /proc
>>> 2.9M/root
>>> 2.9M/run
>>> 17M /sbin
>>> 4.0K/snap
>>> 0   /srv
>>> 0   /sys
>>> 0   /tmp
>>> 6.1G/usr
>>> 2.0G/var
>>> 4.0K/vmlinuz
>>> 4.0K/vmlinuz.old
>>> 4.0K/webmin-setup.out
>>>
>>> The computed usage is 15Gb which is what I expected, so there are 9Gb
>>> lost somewhere.
>>> I have run scrub and then full balance with:
>>
>> I think this is related to btrfs CoW and extent booking.
>>
>> One simple example would be:
>>
>> xfs_io -f -c "pwrite 0 128k" -c "sync" -c "pwrite 0 64K" \
>> /mnt/btrfs/file1
>>
>> The result "/mnt/btrfs/file1" will only be sized 128K in du, but it
>> on-disk usage is 128K + 64K.
>>
>> The first 128K is the data written by the first "pwrite" command, it
>> caused a full 128K extent on disk.
>> Then the 2nd pwrite command also created a new 64K extent, 

Re: 14Gb of space lost after distro upgrade on BTFS root partition (long thread with logs)

2018-08-28 Thread Menion
Ok, thanks for your replay
This is a root FS, how can I defragment it?
If I try to launch it I get this output:

menion@Menionubuntu:~$ sudo btrfs filesystem defragment -r /
ERROR: defrag failed on /bin/bash: Text file busy
ERROR: defrag failed on /bin/dash: Text file busy
ERROR: defrag failed on /bin/btrfs: Text file busy
ERROR: defrag failed on /lib/systemd/systemd: Text file busy
ERROR: defrag failed on /lib/systemd/systemd-journald: Text file busy
ERROR: defrag failed on /lib/systemd/systemd-logind: Text file busy
ERROR: defrag failed on /lib/systemd/systemd-resolved: Text file busy
ERROR: defrag failed on /lib/systemd/systemd-timesyncd: Text file busy
ERROR: defrag failed on /lib/systemd/systemd-udevd: Text file busy
ERROR: defrag failed on /lib/x86_64-linux-gnu/ld-2.27.so: Text file busy

Bye
Il giorno mar 28 ago 2018 alle ore 13:54 Qu Wenruo
 ha scritto:
>
>
>
> On 2018/8/28 下午5:34, Menion wrote:
> > Hi all
> > I have run a distro upgrade on my Ubuntu 16.04 that runs ppa kernel
> > 4.17.2 with btrfsprogs 4.17.0
> > The root filesystem is BTRFS single created by the Ubuntu Xenial
> > installer (so on kernel 4.4.0) on an internal mmc, located in
> > /dev/mmcblk0p3
> > After the upgrade I have cleaned apt cache and checked the free space,
> > the results were odd, following some checks (shrinked), followed by
> > more comments:
> >
> > root@Menionubuntu:/home/menion# df -h
> > Filesystem  Size  Used Avail Use% Mounted on
> > ...
> > /dev/mmcblk0p3   28G   24G  2.7G  90% /
> >
> > root@Menionubuntu:/home/menion# btrfs fi usage /usr
> > Overall:
> > Device size:  27.07GiB
> > Device allocated: 25.28GiB
> > Device unallocated:1.79GiB
> > Device missing:  0.00B
> > Used: 23.88GiB
> > Free (estimated):  2.69GiB  (min: 2.69GiB)
> > Data ratio:   1.00
> > Metadata ratio:   1.00
> > Global reserve:   72.94MiB  (used: 0.00B)
> >
> > Data,single: Size:24.00GiB, Used:23.10GiB
> >/dev/mmcblk0p3 24.00GiB
> >
> > Metadata,single: Size:1.25GiB, Used:801.97MiB
> >/dev/mmcblk0p3  1.25GiB
> >
> > System,single: Size:32.00MiB, Used:16.00KiB
> >/dev/mmcblk0p3 32.00MiB
> >
> > Unallocated:
> >/dev/mmcblk0p3  1.79GiB
> >
> > root@Menionubuntu:/home/menion# btrfs fi df /mnt
> > Data, single: total=24.00GiB, used=23.10GiB
> > System, single: total=32.00MiB, used=16.00KiB
> > Metadata, single: total=1.25GiB, used=801.92MiB
> > GlobalReserve, single: total=72.89MiB, used=0.00B
> >
> > The different ways to check the free space are coherent, but if I
> > check the directories usage on root, surprise:
> >
> > root@Menionubuntu:/home/menion# du -x -s -h /*
> > 17M /bin
> > 189M/boot
> > 36K /dead.letter
> > 0   /dev
> > 18M /etc
> > 6.1G/home
> > 4.0K/initrd.img
> > 4.0K/initrd.img.old
> > 791M/lib
> > 8.3M/lib64
> > 0   /media
> > 4.0K/mnt
> > 0   /opt
> > du: cannot access '/proc/24660/task/24660/fd/3': No such file or directory
> > du: cannot access '/proc/24660/task/24660/fdinfo/3': No such file or 
> > directory
> > du: cannot access '/proc/24660/fd/3': No such file or directory
> > du: cannot access '/proc/24660/fdinfo/3': No such file or directory
> > 0   /proc
> > 2.9M/root
> > 2.9M/run
> > 17M /sbin
> > 4.0K/snap
> > 0   /srv
> > 0   /sys
> > 0   /tmp
> > 6.1G/usr
> > 2.0G/var
> > 4.0K/vmlinuz
> > 4.0K/vmlinuz.old
> > 4.0K/webmin-setup.out
> >
> > The computed usage is 15Gb which is what I expected, so there are 9Gb
> > lost somewhere.
> > I have run scrub and then full balance with:
>
> I think this is related to btrfs CoW and extent booking.
>
> One simple example would be:
>
> xfs_io -f -c "pwrite 0 128k" -c "sync" -c "pwrite 0 64K" \
> /mnt/btrfs/file1
>
> The result "/mnt/btrfs/file1" will only be sized 128K in du, but it
> on-disk usage is 128K + 64K.
>
> The first 128K is the data written by the first "pwrite" command, it
> caused a full 128K extent on disk.
> Then the 2nd pwrite command also created a new 64K extent, which is the
> default data CoW behavior.
> The first half of the original 128K extent is not used by anyone, but it
> still takes space.
>
> Above btrfs extent booking behavior could cause a lot of wasted space
> even there is only one single subvolume without any snapshot.
>
> In that case, instead of balance, defrag should be your friend to free
> up some space.
>
> Thanks,
> Qu
>
> >
> > btrfs scrub start /
> > btrfs balance start /
> > The balance freed 100Mb of space, it was running in background so I
> > have checked dmesg when "btrfs balance status" said that was completed
> >
> > dmesg of balance:
> >
> > [47264.250141] BTRFS info (device mmcblk0p3): relocating block group
> > 37154193408 

Re: 14Gb of space lost after distro upgrade on BTFS root partition (long thread with logs)

2018-08-28 Thread Qu Wenruo


On 2018/8/28 下午5:34, Menion wrote:
> Hi all
> I have run a distro upgrade on my Ubuntu 16.04 that runs ppa kernel
> 4.17.2 with btrfsprogs 4.17.0
> The root filesystem is BTRFS single created by the Ubuntu Xenial
> installer (so on kernel 4.4.0) on an internal mmc, located in
> /dev/mmcblk0p3
> After the upgrade I have cleaned apt cache and checked the free space,
> the results were odd, following some checks (shrinked), followed by
> more comments:
> 
> root@Menionubuntu:/home/menion# df -h
> Filesystem  Size  Used Avail Use% Mounted on
> ...
> /dev/mmcblk0p3   28G   24G  2.7G  90% /
> 
> root@Menionubuntu:/home/menion# btrfs fi usage /usr
> Overall:
> Device size:  27.07GiB
> Device allocated: 25.28GiB
> Device unallocated:1.79GiB
> Device missing:  0.00B
> Used: 23.88GiB
> Free (estimated):  2.69GiB  (min: 2.69GiB)
> Data ratio:   1.00
> Metadata ratio:   1.00
> Global reserve:   72.94MiB  (used: 0.00B)
> 
> Data,single: Size:24.00GiB, Used:23.10GiB
>/dev/mmcblk0p3 24.00GiB
> 
> Metadata,single: Size:1.25GiB, Used:801.97MiB
>/dev/mmcblk0p3  1.25GiB
> 
> System,single: Size:32.00MiB, Used:16.00KiB
>/dev/mmcblk0p3 32.00MiB
> 
> Unallocated:
>/dev/mmcblk0p3  1.79GiB
> 
> root@Menionubuntu:/home/menion# btrfs fi df /mnt
> Data, single: total=24.00GiB, used=23.10GiB
> System, single: total=32.00MiB, used=16.00KiB
> Metadata, single: total=1.25GiB, used=801.92MiB
> GlobalReserve, single: total=72.89MiB, used=0.00B
> 
> The different ways to check the free space are coherent, but if I
> check the directories usage on root, surprise:
> 
> root@Menionubuntu:/home/menion# du -x -s -h /*
> 17M /bin
> 189M/boot
> 36K /dead.letter
> 0   /dev
> 18M /etc
> 6.1G/home
> 4.0K/initrd.img
> 4.0K/initrd.img.old
> 791M/lib
> 8.3M/lib64
> 0   /media
> 4.0K/mnt
> 0   /opt
> du: cannot access '/proc/24660/task/24660/fd/3': No such file or directory
> du: cannot access '/proc/24660/task/24660/fdinfo/3': No such file or directory
> du: cannot access '/proc/24660/fd/3': No such file or directory
> du: cannot access '/proc/24660/fdinfo/3': No such file or directory
> 0   /proc
> 2.9M/root
> 2.9M/run
> 17M /sbin
> 4.0K/snap
> 0   /srv
> 0   /sys
> 0   /tmp
> 6.1G/usr
> 2.0G/var
> 4.0K/vmlinuz
> 4.0K/vmlinuz.old
> 4.0K/webmin-setup.out
> 
> The computed usage is 15Gb which is what I expected, so there are 9Gb
> lost somewhere.
> I have run scrub and then full balance with:

I think this is related to btrfs CoW and extent booking.

One simple example would be:

xfs_io -f -c "pwrite 0 128k" -c "sync" -c "pwrite 0 64K" \
/mnt/btrfs/file1

The result "/mnt/btrfs/file1" will only be sized 128K in du, but it
on-disk usage is 128K + 64K.

The first 128K is the data written by the first "pwrite" command, it
caused a full 128K extent on disk.
Then the 2nd pwrite command also created a new 64K extent, which is the
default data CoW behavior.
The first half of the original 128K extent is not used by anyone, but it
still takes space.

Above btrfs extent booking behavior could cause a lot of wasted space
even there is only one single subvolume without any snapshot.

In that case, instead of balance, defrag should be your friend to free
up some space.

Thanks,
Qu

> 
> btrfs scrub start /
> btrfs balance start /
> The balance freed 100Mb of space, it was running in background so I
> have checked dmesg when "btrfs balance status" said that was completed
> 
> dmesg of balance:
> 
> [47264.250141] BTRFS info (device mmcblk0p3): relocating block group
> 37154193408 flags system
> [47264.592082] BTRFS info (device mmcblk0p3): relocating block group
> 36046897152 flags data
> [47271.499809] BTRFS info (device mmcblk0p3): found 73 extents
> [47272.329921] BTRFS info (device mmcblk0p3): found 60 extents
> [47272.471059] BTRFS info (device mmcblk0p3): relocating block group
> 35778461696 flags metadata
> [47280.530041] BTRFS info (device mmcblk0p3): found 3199 extents
> [47280.735667] BTRFS info (device mmcblk0p3): relocating block group
> 34704719872 flags data
> [47301.460523] BTRFS info (device mmcblk0p3): relocating block group
> 37221302272 flags data
> [47306.038404] BTRFS info (device mmcblk0p3): found 5 extents
> [47306.481371] BTRFS info (device mmcblk0p3): found 5 extents
> [47306.673135] BTRFS info (device mmcblk0p3): relocating block group
> 37187747840 flags system
> [47306.874874] BTRFS info (device mmcblk0p3): found 1 extents
> [47307.073288] BTRFS info (device mmcblk0p3): relocating block group
> 34704719872 flags data
> [47371.059074] BTRFS info (device mmcblk0p3): found 16258 extents
> [47388.191208] BTRFS info (device mmcblk0p3): found 

Re: [PATCH] btrfs: extent-tree: Detect bytes_may_use underflow earlier

2018-08-28 Thread Qu Wenruo



On 2018/8/28 下午7:48, Nikolay Borisov wrote:
> 
> 
> On 28.08.2018 14:46, Qu Wenruo wrote:
>>
>>
>> On 2018/8/28 下午4:48, Nikolay Borisov wrote:
>>>
>>>
>>> On 28.08.2018 09:46, Qu Wenruo wrote:
 Although we have space_info::bytes_may_use underflow detection in
 btrfs_free_reserved_data_space_noquota(), we have more callers who are
 subtracting number from space_info::bytes_may_use.

 So instead of doing underflow detection for every caller, introduce a
 new wrapper update_bytes_may_use() to replace open coded bytes_may_use
 modifiers.

 This also introduce a macro to declare more wrappers, but currently
 space_info::bytes_may_use is the mostly interesting one.

 Signed-off-by: Qu Wenruo 
>>>
>>> The more important question is why underflows happen in the first place?
>>> And this explanation is missing from the changelog.
>>
>> As in current mainline/misc-next kernel, there is no (at least obvious)
>> underflow.
>>
>> So there is no explanation for the non-exist underflow.
>>
>>>
>>> As far as I can see this underflow seems to only affect the data
>>> space_info, since the metadata one is only modified in
>>> __reserve_metadata_bytes and due to overcommit it's generally higher
>>> than what is actually being used so it seems unlikely it can underflow.
>>
>> Yep, for data space info.
>>
>>> IMO this is also useful information to put in the commit message.
>>
>> For the full explanation, it's related to this patch:
>> [PATCH v2] btrfs: Always check nocow for quota enabled case to make sure
>> we won't reserve unnecessary data space
>>
>> And the reason why I'm digging into bytes_may_use underflow and how
>> above patch is causing problem can be found in the commit message of
>> this patch:
>> [PATCH RFC] btrfs: clone: Flush data before doing clone
>>
>> The short summary is:
>> --
>> Due to the limitation of btrfs_cross_ref_exist(), run_delalloc_nocow()
>> can still fall back to CoW even only (unrelated) part of the
>> preallocated extent is shared.
>>
>> This makes the follow case to do unnecessary CoW:
>>
>>  # xfs_io -f -c "falloc 0 2M" $mnt/file
>>  # xfs_io -c "pwrite 0 1M" $mnt/file
>>  # xfs_io -c "reflink $mnt/file 1M 4M 1M" $mnt/file
>>  # sync
>>
>> The pwrite will still be CoWed, since at writeback time, the
>> preallocated extent is already shared, btrfs_cross_ref_exist() will
>> return 1 and make run_delalloc_nocow() fall back to cow_file_range().
>> -
>>
>> Combined these 2 pieces, it would be:
>> If we do early nocow detection at __btrfs_buffered_write() time, and at
>> delalloc time btrfs decides to fall back to CoW, then we could underflow
>> data space bytes_may_use.
>>
>> So it's a little hard to explain in such early detection patch.
>>
>> I'm pretty happy to add extra explanation into the commit message, but
>> I'm all ears for any advice on what should be put into the commit message.
> 
> So it seems this patch on its own doesn't make much sense it needs to be
> coupled with the other one in one series.

I'm OK combining it into one series, but this patch also make sense on
its own, to act as extra safe net or validation code to detect underflow
earlier (even we don't have underflow yet).

Thanks,
Qu

> 
>>
>> Thanks,
>> Qu
>>
>>>
 ---
  fs/btrfs/extent-tree.c | 44 +++---
  1 file changed, 28 insertions(+), 16 deletions(-)

 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index de6f75f5547b..10b58f231350 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -51,6 +51,21 @@ enum {
CHUNK_ALLOC_FORCE = 2,
  };
  
 +/* Helper function to detect various space info bytes underflow */
 +#define DECLARE_SPACE_INFO_UPDATE(name)   
 \
 +static inline void update_##name(struct btrfs_space_info *sinfo,  \
 +   s64 bytes) \
 +{ \
 +  if (bytes < 0 && sinfo->name < -bytes) {\
 +  WARN_ON(1); \
 +  sinfo->name = 0;\
 +  return; \
 +  }   \
 +  sinfo->name += bytes;   \
 +}
 +
 +DECLARE_SPACE_INFO_UPDATE(bytes_may_use);
 +
  static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
   struct btrfs_delayed_ref_node *node, u64 parent,
   u64 root_objectid, u64 owner_objectid,
 @@ -4221,7 +4236,7 @@ int btrfs_alloc_data_chunk_ondemand(struct 
 btrfs_inode *inode, u64 bytes)
  data_sinfo->flags, bytes, 1);
return 

Re: [PATCH] btrfs: extent-tree: Detect bytes_may_use underflow earlier

2018-08-28 Thread Nikolay Borisov



On 28.08.2018 14:46, Qu Wenruo wrote:
> 
> 
> On 2018/8/28 下午4:48, Nikolay Borisov wrote:
>>
>>
>> On 28.08.2018 09:46, Qu Wenruo wrote:
>>> Although we have space_info::bytes_may_use underflow detection in
>>> btrfs_free_reserved_data_space_noquota(), we have more callers who are
>>> subtracting number from space_info::bytes_may_use.
>>>
>>> So instead of doing underflow detection for every caller, introduce a
>>> new wrapper update_bytes_may_use() to replace open coded bytes_may_use
>>> modifiers.
>>>
>>> This also introduce a macro to declare more wrappers, but currently
>>> space_info::bytes_may_use is the mostly interesting one.
>>>
>>> Signed-off-by: Qu Wenruo 
>>
>> The more important question is why underflows happen in the first place?
>> And this explanation is missing from the changelog.
> 
> As in current mainline/misc-next kernel, there is no (at least obvious)
> underflow.
> 
> So there is no explanation for the non-exist underflow.
> 
>>
>> As far as I can see this underflow seems to only affect the data
>> space_info, since the metadata one is only modified in
>> __reserve_metadata_bytes and due to overcommit it's generally higher
>> than what is actually being used so it seems unlikely it can underflow.
> 
> Yep, for data space info.
> 
>> IMO this is also useful information to put in the commit message.
> 
> For the full explanation, it's related to this patch:
> [PATCH v2] btrfs: Always check nocow for quota enabled case to make sure
> we won't reserve unnecessary data space
> 
> And the reason why I'm digging into bytes_may_use underflow and how
> above patch is causing problem can be found in the commit message of
> this patch:
> [PATCH RFC] btrfs: clone: Flush data before doing clone
> 
> The short summary is:
> --
> Due to the limitation of btrfs_cross_ref_exist(), run_delalloc_nocow()
> can still fall back to CoW even only (unrelated) part of the
> preallocated extent is shared.
> 
> This makes the follow case to do unnecessary CoW:
> 
>  # xfs_io -f -c "falloc 0 2M" $mnt/file
>  # xfs_io -c "pwrite 0 1M" $mnt/file
>  # xfs_io -c "reflink $mnt/file 1M 4M 1M" $mnt/file
>  # sync
> 
> The pwrite will still be CoWed, since at writeback time, the
> preallocated extent is already shared, btrfs_cross_ref_exist() will
> return 1 and make run_delalloc_nocow() fall back to cow_file_range().
> -
> 
> Combined these 2 pieces, it would be:
> If we do early nocow detection at __btrfs_buffered_write() time, and at
> delalloc time btrfs decides to fall back to CoW, then we could underflow
> data space bytes_may_use.
> 
> So it's a little hard to explain in such early detection patch.
> 
> I'm pretty happy to add extra explanation into the commit message, but
> I'm all ears for any advice on what should be put into the commit message.

So it seems this patch on its own doesn't make much sense it needs to be
coupled with the other one in one series.

> 
> Thanks,
> Qu
> 
>>
>>> ---
>>>  fs/btrfs/extent-tree.c | 44 +++---
>>>  1 file changed, 28 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index de6f75f5547b..10b58f231350 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -51,6 +51,21 @@ enum {
>>> CHUNK_ALLOC_FORCE = 2,
>>>  };
>>>  
>>> +/* Helper function to detect various space info bytes underflow */
>>> +#define DECLARE_SPACE_INFO_UPDATE(name)
>>> \
>>> +static inline void update_##name(struct btrfs_space_info *sinfo,   \
>>> +s64 bytes) \
>>> +{  \
>>> +   if (bytes < 0 && sinfo->name < -bytes) {\
>>> +   WARN_ON(1); \
>>> +   sinfo->name = 0;\
>>> +   return; \
>>> +   }   \
>>> +   sinfo->name += bytes;   \
>>> +}
>>> +
>>> +DECLARE_SPACE_INFO_UPDATE(bytes_may_use);
>>> +
>>>  static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
>>>struct btrfs_delayed_ref_node *node, u64 parent,
>>>u64 root_objectid, u64 owner_objectid,
>>> @@ -4221,7 +4236,7 @@ int btrfs_alloc_data_chunk_ondemand(struct 
>>> btrfs_inode *inode, u64 bytes)
>>>   data_sinfo->flags, bytes, 1);
>>> return -ENOSPC;
>>> }
>>> -   data_sinfo->bytes_may_use += bytes;
>>> +   update_bytes_may_use(data_sinfo, bytes);
>>> trace_btrfs_space_reservation(fs_info, "space_info",
>>>   data_sinfo->flags, bytes, 1);
>>> spin_unlock(_sinfo->lock);
>>> @@ -4274,10 +4289,7 @@ void 

Re: [PATCH] btrfs: extent-tree: Detect bytes_may_use underflow earlier

2018-08-28 Thread Qu Wenruo



On 2018/8/28 下午4:48, Nikolay Borisov wrote:
> 
> 
> On 28.08.2018 09:46, Qu Wenruo wrote:
>> Although we have space_info::bytes_may_use underflow detection in
>> btrfs_free_reserved_data_space_noquota(), we have more callers who are
>> subtracting number from space_info::bytes_may_use.
>>
>> So instead of doing underflow detection for every caller, introduce a
>> new wrapper update_bytes_may_use() to replace open coded bytes_may_use
>> modifiers.
>>
>> This also introduce a macro to declare more wrappers, but currently
>> space_info::bytes_may_use is the mostly interesting one.
>>
>> Signed-off-by: Qu Wenruo 
> 
> The more important question is why underflows happen in the first place?
> And this explanation is missing from the changelog.

As in current mainline/misc-next kernel, there is no (at least obvious)
underflow.

So there is no explanation for the non-exist underflow.

> 
> As far as I can see this underflow seems to only affect the data
> space_info, since the metadata one is only modified in
> __reserve_metadata_bytes and due to overcommit it's generally higher
> than what is actually being used so it seems unlikely it can underflow.

Yep, for data space info.

> IMO this is also useful information to put in the commit message.

For the full explanation, it's related to this patch:
[PATCH v2] btrfs: Always check nocow for quota enabled case to make sure
we won't reserve unnecessary data space

And the reason why I'm digging into bytes_may_use underflow and how
above patch is causing problem can be found in the commit message of
this patch:
[PATCH RFC] btrfs: clone: Flush data before doing clone

The short summary is:
--
Due to the limitation of btrfs_cross_ref_exist(), run_delalloc_nocow()
can still fall back to CoW even only (unrelated) part of the
preallocated extent is shared.

This makes the follow case to do unnecessary CoW:

 # xfs_io -f -c "falloc 0 2M" $mnt/file
 # xfs_io -c "pwrite 0 1M" $mnt/file
 # xfs_io -c "reflink $mnt/file 1M 4M 1M" $mnt/file
 # sync

The pwrite will still be CoWed, since at writeback time, the
preallocated extent is already shared, btrfs_cross_ref_exist() will
return 1 and make run_delalloc_nocow() fall back to cow_file_range().
-

Combined these 2 pieces, it would be:
If we do early nocow detection at __btrfs_buffered_write() time, and at
delalloc time btrfs decides to fall back to CoW, then we could underflow
data space bytes_may_use.

So it's a little hard to explain in such early detection patch.

I'm pretty happy to add extra explanation into the commit message, but
I'm all ears for any advice on what should be put into the commit message.

Thanks,
Qu

> 
>> ---
>>  fs/btrfs/extent-tree.c | 44 +++---
>>  1 file changed, 28 insertions(+), 16 deletions(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index de6f75f5547b..10b58f231350 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -51,6 +51,21 @@ enum {
>>  CHUNK_ALLOC_FORCE = 2,
>>  };
>>  
>> +/* Helper function to detect various space info bytes underflow */
>> +#define DECLARE_SPACE_INFO_UPDATE(name) 
>> \
>> +static inline void update_##name(struct btrfs_space_info *sinfo,\
>> + s64 bytes) \
>> +{   \
>> +if (bytes < 0 && sinfo->name < -bytes) {\
>> +WARN_ON(1); \
>> +sinfo->name = 0;\
>> +return; \
>> +}   \
>> +sinfo->name += bytes;   \
>> +}
>> +
>> +DECLARE_SPACE_INFO_UPDATE(bytes_may_use);
>> +
>>  static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
>> struct btrfs_delayed_ref_node *node, u64 parent,
>> u64 root_objectid, u64 owner_objectid,
>> @@ -4221,7 +4236,7 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode 
>> *inode, u64 bytes)
>>data_sinfo->flags, bytes, 1);
>>  return -ENOSPC;
>>  }
>> -data_sinfo->bytes_may_use += bytes;
>> +update_bytes_may_use(data_sinfo, bytes);
>>  trace_btrfs_space_reservation(fs_info, "space_info",
>>data_sinfo->flags, bytes, 1);
>>  spin_unlock(_sinfo->lock);
>> @@ -4274,10 +4289,7 @@ void btrfs_free_reserved_data_space_noquota(struct 
>> inode *inode, u64 start,
>>  
>>  data_sinfo = fs_info->data_sinfo;
>>  spin_lock(_sinfo->lock);
>> -if (WARN_ON(data_sinfo->bytes_may_use < len))
>> -data_sinfo->bytes_may_use = 0;
>> -else
>> -data_sinfo->bytes_may_use -= len;
>> +

Re: corruption_errs

2018-08-28 Thread Austin S. Hemmelgarn

On 2018-08-27 18:53, John Petrini wrote:

Hi List,

I'm seeing corruption errors when running btrfs device stats but I'm
not sure what that means exactly. I've just completed a full scrub and
it reported no errors. I'm hoping someone here can enlighten me.
Thanks!


The first thing to understand here is that the error counters reported 
by `btrfs device stats` are cumulative.  In other words, they count 
errors since the last time they were reset (which means that if you've 
never run `btrfs device stats -z` on this filesystem, then they will 
count errors since the filesystem was created).  As a result, seeing a 
non-zero value there just means that errors of that type happened at 
some point in time since they were reset.


Building on this a bit further, corruption errors are checksum 
mismatches.  Each time a block is read and it's checksum does not match 
the stored checksum for it, a corruption error is recorded.  The thing 
is though, if you are using a profile which can rebuild that block (dup, 
raid1, raid10, or one of the parity profiles), the error gets corrected 
automatically by the filesystem (it will attempt to rebuild that block, 
then write out the correct block).  If that fix succeeds, there will be 
no errors there anymore, but the record of the error stays around 
(because there _was_ an error).


Given this, my guess is that you _had_ checksum mismatches somewhere, 
but they were fixed before you ran scrub.


Re: BTRFS support per-subvolume compression, isn't it?

2018-08-28 Thread Austin S. Hemmelgarn

On 2018-08-27 17:05, Eugene Bright wrote:

Greetings!

BTRFS wiki says there is no per-subvolume compression option [1].

At the same time next command allow me to set properties per-subvolume:
 btrfs property set /volume compression zstd

Corresponding get command shows distinct properties for every subvolume.
Should wiki be updated?


The wiki should be updated, but it's not technically wrong.

What the wiki is talking about is per-subvolume mount options to control 
compression (so, mounting individual subvolumes from the same volume 
with different `compress=` or `compress-force=` mount options), which is 
not currently supported.


You are correct though that properties can be used to achieve a similar 
result (compressing differently for different subvolumes.


14Gb of space lost after distro upgrade on BTFS root partition (long thread with logs)

2018-08-28 Thread Menion
Hi all
I have run a distro upgrade on my Ubuntu 16.04 that runs ppa kernel
4.17.2 with btrfsprogs 4.17.0
The root filesystem is BTRFS single created by the Ubuntu Xenial
installer (so on kernel 4.4.0) on an internal mmc, located in
/dev/mmcblk0p3
After the upgrade I have cleaned apt cache and checked the free space,
the results were odd, following some checks (shrinked), followed by
more comments:

root@Menionubuntu:/home/menion# df -h
Filesystem  Size  Used Avail Use% Mounted on
...
/dev/mmcblk0p3   28G   24G  2.7G  90% /

root@Menionubuntu:/home/menion# btrfs fi usage /usr
Overall:
Device size:  27.07GiB
Device allocated: 25.28GiB
Device unallocated:1.79GiB
Device missing:  0.00B
Used: 23.88GiB
Free (estimated):  2.69GiB  (min: 2.69GiB)
Data ratio:   1.00
Metadata ratio:   1.00
Global reserve:   72.94MiB  (used: 0.00B)

Data,single: Size:24.00GiB, Used:23.10GiB
   /dev/mmcblk0p3 24.00GiB

Metadata,single: Size:1.25GiB, Used:801.97MiB
   /dev/mmcblk0p3  1.25GiB

System,single: Size:32.00MiB, Used:16.00KiB
   /dev/mmcblk0p3 32.00MiB

Unallocated:
   /dev/mmcblk0p3  1.79GiB

root@Menionubuntu:/home/menion# btrfs fi df /mnt
Data, single: total=24.00GiB, used=23.10GiB
System, single: total=32.00MiB, used=16.00KiB
Metadata, single: total=1.25GiB, used=801.92MiB
GlobalReserve, single: total=72.89MiB, used=0.00B

The different ways to check the free space are coherent, but if I
check the directories usage on root, surprise:

root@Menionubuntu:/home/menion# du -x -s -h /*
17M /bin
189M/boot
36K /dead.letter
0   /dev
18M /etc
6.1G/home
4.0K/initrd.img
4.0K/initrd.img.old
791M/lib
8.3M/lib64
0   /media
4.0K/mnt
0   /opt
du: cannot access '/proc/24660/task/24660/fd/3': No such file or directory
du: cannot access '/proc/24660/task/24660/fdinfo/3': No such file or directory
du: cannot access '/proc/24660/fd/3': No such file or directory
du: cannot access '/proc/24660/fdinfo/3': No such file or directory
0   /proc
2.9M/root
2.9M/run
17M /sbin
4.0K/snap
0   /srv
0   /sys
0   /tmp
6.1G/usr
2.0G/var
4.0K/vmlinuz
4.0K/vmlinuz.old
4.0K/webmin-setup.out

The computed usage is 15Gb which is what I expected, so there are 9Gb
lost somewhere.
I have run scrub and then full balance with:

btrfs scrub start /
btrfs balance start /
The balance freed 100Mb of space, it was running in background so I
have checked dmesg when "btrfs balance status" said that was completed

dmesg of balance:

[47264.250141] BTRFS info (device mmcblk0p3): relocating block group
37154193408 flags system
[47264.592082] BTRFS info (device mmcblk0p3): relocating block group
36046897152 flags data
[47271.499809] BTRFS info (device mmcblk0p3): found 73 extents
[47272.329921] BTRFS info (device mmcblk0p3): found 60 extents
[47272.471059] BTRFS info (device mmcblk0p3): relocating block group
35778461696 flags metadata
[47280.530041] BTRFS info (device mmcblk0p3): found 3199 extents
[47280.735667] BTRFS info (device mmcblk0p3): relocating block group
34704719872 flags data
[47301.460523] BTRFS info (device mmcblk0p3): relocating block group
37221302272 flags data
[47306.038404] BTRFS info (device mmcblk0p3): found 5 extents
[47306.481371] BTRFS info (device mmcblk0p3): found 5 extents
[47306.673135] BTRFS info (device mmcblk0p3): relocating block group
37187747840 flags system
[47306.874874] BTRFS info (device mmcblk0p3): found 1 extents
[47307.073288] BTRFS info (device mmcblk0p3): relocating block group
34704719872 flags data
[47371.059074] BTRFS info (device mmcblk0p3): found 16258 extents
[47388.191208] BTRFS info (device mmcblk0p3): found 16094 extents
[47388.985462] BTRFS info (device mmcblk0p3): relocating block group
31215058944 flags metadata
[47439.164167] BTRFS info (device mmcblk0p3): found 7378 extents
[47440.163793] BTRFS info (device mmcblk0p3): relocating block group
30141317120 flags data
[47593.239048] BTRFS info (device mmcblk0p3): found 15636 extents
[47618.389357] BTRFS info (device mmcblk0p3): found 15634 extents
[47620.020122] BTRFS info (device mmcblk0p3): relocating block group
29012000768 flags data
[47637.708444] BTRFS info (device mmcblk0p3): found 1154 extents
[47639.757342] BTRFS info (device mmcblk0p3): found 1154 extents
[47640.375483] BTRFS info (device mmcblk0p3): relocating block group
27938258944 flags data
[47743.312441] BTRFS info (device mmcblk0p3): found 17009 extents
[47756.928461] BTRFS info (device mmcblk0p3): found 17005 extents
[47757.607346] BTRFS info (device mmcblk0p3): relocating block group
9416212480 flags metadata
[47825.819449] BTRFS info (device mmcblk0p3): found 11503 extents
[47826.465926] BTRFS info (device mmcblk0p3): relocating 

Re: [PATCH] btrfs: extent-tree: Detect bytes_may_use underflow earlier

2018-08-28 Thread Nikolay Borisov



On 28.08.2018 09:46, Qu Wenruo wrote:
> Although we have space_info::bytes_may_use underflow detection in
> btrfs_free_reserved_data_space_noquota(), we have more callers who are
> subtracting number from space_info::bytes_may_use.
> 
> So instead of doing underflow detection for every caller, introduce a
> new wrapper update_bytes_may_use() to replace open coded bytes_may_use
> modifiers.
> 
> This also introduce a macro to declare more wrappers, but currently
> space_info::bytes_may_use is the mostly interesting one.
> 
> Signed-off-by: Qu Wenruo 

The more important question is why underflows happen in the first place?
And this explanation is missing from the changelog.

As far as I can see this underflow seems to only affect the data
space_info, since the metadata one is only modified in
__reserve_metadata_bytes and due to overcommit it's generally higher
than what is actually being used so it seems unlikely it can underflow.
IMO this is also useful information to put in the commit message.

> ---
>  fs/btrfs/extent-tree.c | 44 +++---
>  1 file changed, 28 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index de6f75f5547b..10b58f231350 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -51,6 +51,21 @@ enum {
>   CHUNK_ALLOC_FORCE = 2,
>  };
>  
> +/* Helper function to detect various space info bytes underflow */
> +#define DECLARE_SPACE_INFO_UPDATE(name)  
> \
> +static inline void update_##name(struct btrfs_space_info *sinfo, \
> +  s64 bytes) \
> +{\
> + if (bytes < 0 && sinfo->name < -bytes) {\
> + WARN_ON(1); \
> + sinfo->name = 0;\
> + return; \
> + }   \
> + sinfo->name += bytes;   \
> +}
> +
> +DECLARE_SPACE_INFO_UPDATE(bytes_may_use);
> +
>  static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
>  struct btrfs_delayed_ref_node *node, u64 parent,
>  u64 root_objectid, u64 owner_objectid,
> @@ -4221,7 +4236,7 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode 
> *inode, u64 bytes)
> data_sinfo->flags, bytes, 1);
>   return -ENOSPC;
>   }
> - data_sinfo->bytes_may_use += bytes;
> + update_bytes_may_use(data_sinfo, bytes);
>   trace_btrfs_space_reservation(fs_info, "space_info",
> data_sinfo->flags, bytes, 1);
>   spin_unlock(_sinfo->lock);
> @@ -4274,10 +4289,7 @@ void btrfs_free_reserved_data_space_noquota(struct 
> inode *inode, u64 start,
>  
>   data_sinfo = fs_info->data_sinfo;
>   spin_lock(_sinfo->lock);
> - if (WARN_ON(data_sinfo->bytes_may_use < len))
> - data_sinfo->bytes_may_use = 0;
> - else
> - data_sinfo->bytes_may_use -= len;
> + update_bytes_may_use(data_sinfo, -len);
>   trace_btrfs_space_reservation(fs_info, "space_info",
> data_sinfo->flags, len, 0);
>   spin_unlock(_sinfo->lock);
> @@ -5074,7 +5086,7 @@ static int wait_reserve_ticket(struct btrfs_fs_info 
> *fs_info,
>   list_del_init(>list);
>   if (ticket->bytes && ticket->bytes < orig_bytes) {
>   u64 num_bytes = orig_bytes - ticket->bytes;
> - space_info->bytes_may_use -= num_bytes;
> + update_bytes_may_use(space_info, -num_bytes);
>   trace_btrfs_space_reservation(fs_info, "space_info",
> space_info->flags, num_bytes, 0);
>   }
> @@ -5120,13 +5132,13 @@ static int __reserve_metadata_bytes(struct 
> btrfs_fs_info *fs_info,
>* If not things get more complicated.
>*/
>   if (used + orig_bytes <= space_info->total_bytes) {
> - space_info->bytes_may_use += orig_bytes;
> + update_bytes_may_use(space_info, orig_bytes);
>   trace_btrfs_space_reservation(fs_info, "space_info",
> space_info->flags, orig_bytes, 1);
>   ret = 0;
>   } else if (can_overcommit(fs_info, space_info, orig_bytes, flush,
> system_chunk)) {
> - space_info->bytes_may_use += orig_bytes;
> + update_bytes_may_use(space_info, orig_bytes);
>   trace_btrfs_space_reservation(fs_info, "space_info",
> space_info->flags, orig_bytes, 1);
>   ret = 0;
> @@ -5189,7 +5201,7 @@ static 

Re: [PATCH RFC] btrfs: clone: Flush data before doing clone

2018-08-28 Thread Qu Wenruo


On 2018/8/28 下午1:54, Qu Wenruo wrote:
> Due to the limitation of btrfs_cross_ref_exist(), run_delalloc_nocow()
> can still fall back to CoW even only (unrelated) part of the
> preallocated extent is shared.
> 
> This makes the follow case to do unnecessary CoW:
> 
>  # xfs_io -f -c "falloc 0 2M" $mnt/file
>  # xfs_io -c "pwrite 0 1M" $mnt/file
>  # xfs_io -c "reflink $mnt/file 1M 4M 1M" $mnt/file
>  # sync
> 
> The pwrite will still be CoWed, since at writeback time, the
> preallocated extent is already shared, btrfs_cross_ref_exist() will
> return 1 and make run_delalloc_nocow() fall back to cow_file_range().
> 
> This is definitely an overkilling workaround, but this should be the
> simplest way without further screwing up already complex NOCOW routine.

Err, this is not even a working workaround.

It could still lead to bytes_may_use underflow as long as
btrfs_cross_ref_exist() could return 1 for partly shared prealloc extent.

So please ignore this patch.

Thanks,
Qu

> 
> Signed-off-by: Qu Wenruo 
> ---
>  fs/btrfs/ctree.h |  1 +
>  fs/btrfs/file.c  |  4 ++--
>  fs/btrfs/ioctl.c | 21 +
>  3 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 53af9f5253f4..ddacc41ff124 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3228,6 +3228,7 @@ int btrfs_add_inode_defrag(struct btrfs_trans_handle 
> *trans,
>  struct btrfs_inode *inode);
>  int btrfs_run_defrag_inodes(struct btrfs_fs_info *fs_info);
>  void btrfs_cleanup_defrag_inodes(struct btrfs_fs_info *fs_info);
> +int btrfs_start_ordered_ops(struct inode *inode, loff_t start, loff_t end);
>  int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int 
> datasync);
>  void btrfs_drop_extent_cache(struct btrfs_inode *inode, u64 start, u64 end,
>int skip_pinned);
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 2be00e873e92..118bfd019c6c 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1999,7 +1999,7 @@ int btrfs_release_file(struct inode *inode, struct file 
> *filp)
>   return 0;
>  }
>  
> -static int start_ordered_ops(struct inode *inode, loff_t start, loff_t end)
> +int btrfs_start_ordered_ops(struct inode *inode, loff_t start, loff_t end)
>  {
>   int ret;
>   struct blk_plug plug;
> @@ -2056,7 +2056,7 @@ int btrfs_sync_file(struct file *file, loff_t start, 
> loff_t end, int datasync)
>* multi-task, and make the performance up.  See
>* btrfs_wait_ordered_range for an explanation of the ASYNC check.
>*/
> - ret = start_ordered_ops(inode, start, end);
> + ret = btrfs_start_ordered_ops(inode, start, end);
>   if (ret)
>   goto out;
>  
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 63600dc2ac4c..866979f530bc 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -4266,6 +4266,27 @@ static noinline int btrfs_clone_files(struct file 
> *file, struct file *file_src,
>   goto out_unlock;
>   }
>  
> + /*
> +  * btrfs_cross_ref_exist() only does check at extent level,
> +  * we could cause unexpected NOCOW write to be COWed.
> +  * E.g.:
> +  * falloc 0 2M file1
> +  * pwrite 0 1M file1 (at this point it should go NOCOW)
> +  * reflink src=file1 srcoff=1M dst=file1 dstoff=4M len=1M
> +  * sync
> +  *
> +  * In above case, due to the preallocated extent is shared
> +  * the data at 0~1M can't go NOCOW.
> +  *
> +  * So flush the whole src inode to avoid any unneeded CoW.
> +  */
> + ret = btrfs_start_ordered_ops(src, 0, -1);
> + if (ret < 0)
> + goto out_unlock;
> + ret = btrfs_wait_ordered_range(src, 0, -1);
> + if (ret < 0)
> + goto out_unlock;
> +
>   /*
>* Lock the target range too. Right after we replace the file extent
>* items in the fs tree (which now point to the cloned data), we might
> 



signature.asc
Description: OpenPGP digital signature


[PATCH] btrfs: extent-tree: Detect bytes_may_use underflow earlier

2018-08-28 Thread Qu Wenruo
Although we have space_info::bytes_may_use underflow detection in
btrfs_free_reserved_data_space_noquota(), we have more callers who are
subtracting number from space_info::bytes_may_use.

So instead of doing underflow detection for every caller, introduce a
new wrapper update_bytes_may_use() to replace open coded bytes_may_use
modifiers.

This also introduce a macro to declare more wrappers, but currently
space_info::bytes_may_use is the mostly interesting one.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/extent-tree.c | 44 +++---
 1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index de6f75f5547b..10b58f231350 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -51,6 +51,21 @@ enum {
CHUNK_ALLOC_FORCE = 2,
 };
 
+/* Helper function to detect various space info bytes underflow */
+#define DECLARE_SPACE_INFO_UPDATE(name)
\
+static inline void update_##name(struct btrfs_space_info *sinfo,   \
+s64 bytes) \
+{  \
+   if (bytes < 0 && sinfo->name < -bytes) {\
+   WARN_ON(1); \
+   sinfo->name = 0;\
+   return; \
+   }   \
+   sinfo->name += bytes;   \
+}
+
+DECLARE_SPACE_INFO_UPDATE(bytes_may_use);
+
 static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
   struct btrfs_delayed_ref_node *node, u64 parent,
   u64 root_objectid, u64 owner_objectid,
@@ -4221,7 +4236,7 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode 
*inode, u64 bytes)
  data_sinfo->flags, bytes, 1);
return -ENOSPC;
}
-   data_sinfo->bytes_may_use += bytes;
+   update_bytes_may_use(data_sinfo, bytes);
trace_btrfs_space_reservation(fs_info, "space_info",
  data_sinfo->flags, bytes, 1);
spin_unlock(_sinfo->lock);
@@ -4274,10 +4289,7 @@ void btrfs_free_reserved_data_space_noquota(struct inode 
*inode, u64 start,
 
data_sinfo = fs_info->data_sinfo;
spin_lock(_sinfo->lock);
-   if (WARN_ON(data_sinfo->bytes_may_use < len))
-   data_sinfo->bytes_may_use = 0;
-   else
-   data_sinfo->bytes_may_use -= len;
+   update_bytes_may_use(data_sinfo, -len);
trace_btrfs_space_reservation(fs_info, "space_info",
  data_sinfo->flags, len, 0);
spin_unlock(_sinfo->lock);
@@ -5074,7 +5086,7 @@ static int wait_reserve_ticket(struct btrfs_fs_info 
*fs_info,
list_del_init(>list);
if (ticket->bytes && ticket->bytes < orig_bytes) {
u64 num_bytes = orig_bytes - ticket->bytes;
-   space_info->bytes_may_use -= num_bytes;
+   update_bytes_may_use(space_info, -num_bytes);
trace_btrfs_space_reservation(fs_info, "space_info",
  space_info->flags, num_bytes, 0);
}
@@ -5120,13 +5132,13 @@ static int __reserve_metadata_bytes(struct 
btrfs_fs_info *fs_info,
 * If not things get more complicated.
 */
if (used + orig_bytes <= space_info->total_bytes) {
-   space_info->bytes_may_use += orig_bytes;
+   update_bytes_may_use(space_info, orig_bytes);
trace_btrfs_space_reservation(fs_info, "space_info",
  space_info->flags, orig_bytes, 1);
ret = 0;
} else if (can_overcommit(fs_info, space_info, orig_bytes, flush,
  system_chunk)) {
-   space_info->bytes_may_use += orig_bytes;
+   update_bytes_may_use(space_info, orig_bytes);
trace_btrfs_space_reservation(fs_info, "space_info",
  space_info->flags, orig_bytes, 1);
ret = 0;
@@ -5189,7 +5201,7 @@ static int __reserve_metadata_bytes(struct btrfs_fs_info 
*fs_info,
if (ticket.bytes) {
if (ticket.bytes < orig_bytes) {
u64 num_bytes = orig_bytes - ticket.bytes;
-   space_info->bytes_may_use -= num_bytes;
+   update_bytes_may_use(space_info, -num_bytes);
trace_btrfs_space_reservation(fs_info, "space_info",
  space_info->flags,
  num_bytes, 0);
@@ -5373,7 +5385,7