Re: fsck lowmem mode only: ERROR: errors found in fs roots

2018-11-04 Thread Su Yue




On 11/3/18 5:20 PM, Nikolay Borisov wrote:



On 3.11.18 г. 3:34 ч., Su Yue wrote:



On 2018/11/2 10:10 PM, Christoph Anton Mitterer wrote:

Hey Su.



Sorry for the late reply cause I'm busy at other things.


Anything further I need to do in this matter or can I consider it
"solved" and you won't need further testing by my side, but just PR the
patches of that branch? :-)



I just looked through related codes and found the bug.
The patches can fix it. So no need to do more tests.
Thanks to your tests and patience. :)


In previous output of debug version, we can see @ret code
is 524296 which is (DIR_ITEM_MISMATCH(1 << 3) | DIR_INDEX_MISMATCH
(1<<19)).

In btrfs-progs v4.17,
function check_inode_extref() passes u64 @mode as the last parameter
of find_dir_item();
However, find_dir_item() is defined as:
static int find_dir_item(struct btrfs_root *root, struct btrfs_key *key,
  struct btrfs_key *location_key, char *name,
  u32 namelen, u8 file_type);

The type of the last argument is u8 not u64.


So this would have been caught by gcc's -Wconversion, except it likely
wouldn't have been because right now this option produces loads of false
positives... Too bad...
Yes, type cast is too common in C, enabling such compile warnings

is annoying...




So the case is that while checking files with inode_extrefs,
if (imode != file_type), then find_dir_item() thinks it found
DIR_ITEM_MISMATCH or DIR_INDEX_MISMATCH.

Thanks,
Su


Thanks,
Chris.

On Sat, 2018-10-27 at 14:15 +0200, Christoph Anton Mitterer wrote:

Hey.


Without the last patches on 4.17:

checking extents
checking free space cache
checking fs roots
ERROR: errors found in fs roots
Checking filesystem on /dev/mapper/system
UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c
found 619543498752 bytes used, error(s) found
total csum bytes: 602382204
total tree bytes: 2534309888
total fs tree bytes: 1652097024
total extent tree bytes: 160432128
btree space waste bytes: 459291608
file data blocks allocated: 7334036647936
   referenced 730839187456


With the last patches, on 4.17:

checking extents
checking free space cache
checking fs roots
checking only csum items (without verifying data)
checking root refs
Checking filesystem on /dev/mapper/system
UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c
found 619543498752 bytes used, no error found
total csum bytes: 602382204
total tree bytes: 2534309888
total fs tree bytes: 1652097024
total extent tree bytes: 160432128
btree space waste bytes: 459291608
file data blocks allocated: 7334036647936
   referenced 730839187456


Cheers,
Chris.











Re: Filesystem mounts fine but hangs on access

2018-11-04 Thread Duncan
Adam Borowski posted on Sun, 04 Nov 2018 20:55:30 +0100 as excerpted:

> On Sun, Nov 04, 2018 at 06:29:06PM +, Duncan wrote:
>> So do consider adding noatime to your mount options if you haven't done
>> so already.  AFAIK, the only /semi-common/ app that actually uses
>> atimes these days is mutt (for read-message tracking), and then not for
>> mbox, so you should be safe to at least test turning it off.
> 
> To the contrary, mutt uses atimes only for mbox.

Figures that I'd get it reversed.
 
>> And YMMV, but if you do use mutt or something else that uses atimes,
>> I'd go so far as to recommend finding an alternative, replacing either
>> btrfs (because as I said, relatime is arguably enough on a traditional
>> non-COW filesystem) or whatever it is that uses atimes, your call,
>> because IMO it really is that big a deal.
> 
> Fortunately, mutt's use could be fixed by teaching it to touch atimes
> manually.  And that's already done, for both forks (vanilla and
> neomutt).

Thanks.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman



Re: Filesystem mounts fine but hangs on access

2018-11-04 Thread Qu Wenruo


On 2018/11/5 上午1:00, Sebastian Ochmann wrote:
> Thank you very much for the quick reply.
> 
> On 04.11.18 14:37, Qu Wenruo wrote:
>>
>>
>> On 2018/11/4 下午9:15, Sebastian Ochmann wrote:
>>> Hello,
>>>
>>> I have a btrfs filesystem on a single encrypted (LUKS) 10 TB drive which
>>> stopped working correctly. The drive is used as a backup drive with zstd
>>> compression to which I regularly rsync and make daily snapshots. After I
>>> routinely removed a bunch of snapshots (about 20), I noticed later that
>>> the machine would hang when trying to unmount the filesystem. The
>>> current state is that I'm able to mount the filesystem without errors
>>> and I can view (ls) files in the root level, but trying to view contents
>>> of directories contained therein hangs just like when trying to unmount
>>> the filesystem. I have not yet tried to run check, repair, etc. Do you
>>> have any advice what I should try next?
>>
>> Could you please run "btrfs check" on the umounted fs?
> 
> I ran btrfs check on the unmounted fs and it reported no errors.

Great, then it's completely free space cache causing the problem.

You could use -o nospace_cache mount option to avoid the problem as a
workaround.

Free space cache only speed up free extent search, it doesn't has extra
effect on the fs (except the bug).
So you could disable free space cache without problem.

Thanks,
Qu
> 
[snip]
>> This looks pretty like this bug which should be fixed by the following
>> patch:
>>
>> https://patchwork.kernel.org/patch/10654433/
>>
>> If previous "btrfs check" shows no error, would you please try apply
>> that patch and try again?
> 
> I tried applying the patch to the 4.18.16 kernel I was running and it
> didn't apply cleanly, so I thought maybe I'd update to 4.19.0 first
> (which is already in the Arch testing repos) and then try applying the
> patch to that.
> 
> However, as it turns out, the fs now works again just by using 4.19.0
> even though I think the patch is not even in 4.19 yet (?). I'm able to
> navigate the directories and unmounting also works fine. Even after
> downgrading to 4.18.16, the fs still works.
> 
> So, thanks again for the quick reply, and "sorry" for not being able to
> test this particular patch now, but I guess the problem has been resolved.
> 
>> Thanks,
>> Qu
>>
> 
> Thanks,
> Ochi



signature.asc
Description: OpenPGP digital signature


Re: Filesystem mounts fine but hangs on access

2018-11-04 Thread Adam Borowski
On Sun, Nov 04, 2018 at 06:29:06PM +, Duncan wrote:
> So do consider adding noatime to your mount options if you haven't done 
> so already.  AFAIK, the only /semi-common/ app that actually uses atimes 
> these days is mutt (for read-message tracking), and then not for mbox, so 
> you should be safe to at least test turning it off.

To the contrary, mutt uses atimes only for mbox.
 
> And YMMV, but if you do use mutt or something else that uses atimes, I'd 
> go so far as to recommend finding an alternative, replacing either btrfs 
> (because as I said, relatime is arguably enough on a traditional non-COW 
> filesystem) or whatever it is that uses atimes, your call, because IMO it 
> really is that big a deal.

Fortunately, mutt's use could be fixed by teaching it to touch atimes
manually.  And that's already done, for both forks (vanilla and neomutt).


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ Have you heard of the Amber Road?  For thousands of years, the
⣾⠁⢰⠒⠀⣿⡁ Romans and co valued amber, hauled through the Europe over the
⢿⡄⠘⠷⠚⠋⠀ mountains and along the Vistula, from Gdańsk.  To where it came
⠈⠳⣄ together with silk (judging by today's amber stalls).


Re: Filesystem mounts fine but hangs on access

2018-11-04 Thread Sebastian Ochmann

On 04.11.18 19:31, Duncan wrote:

[This mail was also posted to gmane.comp.file-systems.btrfs.]

Sebastian Ochmann posted on Sun, 04 Nov 2018 14:15:55 +0100 as
excerpted:


Hello,

I have a btrfs filesystem on a single encrypted (LUKS) 10 TB drive
which stopped working correctly.



Kernel 4.18.16 (Arch Linux)


I see upgrading to 4.19 seems to have solved your problem, but this is
more about something I saw in the trace that has me wondering...


[  368.267315]  touch_atime+0xc0/0xe0


Do you have any atime-related mount options set?


That's an interesting point. On some machines, I have explicitly set 
"noatime", but on that particular system, I did not, thus using the 
"relatime" option as per default. Since I'm not using mutt or anything 
else (that I'm aware of) that exploits this feature, I will set noatime 
there as well.



FWIW, noatime is strongly recommended on btrfs.

Now I'm not a dev, just a btrfs user and list regular, and I don't know
if that function is called and just does nothing when noatime is set,
so you may well already have it set and this is "much ado about
nothing", but the chance that it's relevant, if not for you, perhaps
for others that may read it, begs for this post...

The problem with atime, access time, is that it turns most otherwise
read- only operations into read-and-write operations in ordered to
update the access time.  And on copy-on-write (COW) based filesystems
such as btrfs, that can be a big problem, because updating that tiny
bit of metadata will trigger a rewrite of the entire metadata block
containing it, which will trigger an update of the metadata for /that/
block in the parent metadata tier... all the way up the metadata tree,
ultimately to its root, the filesystem root and the superblocks, at the
next commit (normally every 30 seconds or less).

Not only is that a bunch of otherwise unnecessary work for a bit of
metadata barely anything actually uses, but forcing most read
operations to read-write obviously compounds the risk for all of those
would-be read- only operations when a filesystem already has problems.

Additionally, if your use-case includes regular snapshotting, with
atime on, on mostly read workloads with few writes (other than atime
updates), it may actually be the case that most of the changes in a
snapshot are actually atime updates, making reoccurring snapshot
updates far larger than they'd be otherwise.

Now a few years ago the kernel did change the default to relatime,
basically updating the atime for any particular file only once a day,
which does help quite a bit, and on traditional filesystems it's
arguably a reasonably sane default, but COW makes atime tracking enough
more expensive that setting noatime is still strongly recommended on
btrfs, particularly if you're doing regular snapshotting.

So do consider adding noatime to your mount options if you haven't done
so already.  AFAIK, the only /semi-common/ app that actually uses
atimes these days is mutt (for read-message tracking), and then not for
mbox, so you should be safe to at least test turning it off.

And YMMV, but if you do use mutt or something else that uses atimes,
I'd go so far as to recommend finding an alternative, replacing either
btrfs (because as I said, relatime is arguably enough on a traditional
non-COW filesystem) or whatever it is that uses atimes, your call,
because IMO it really is that big a deal.

Meanwhile, particularly after seeing that in the trace, if the 4.19
update hadn't already fixed it, I'd have suggested trying a read-only
mount, both as a test, and assuming it worked, at least allowing you to
access the data without the lockup, which would have then been related
to the write due to the atime update, not the actual read.


It would be nice to have a 1:1 image of the filesystem (or rather the 
raw block device) for more testing, but unfortunately I don't have 
another 10 TB drive lying around. :) I didn't really expect the 4.19 
upgrade to (apparently) fix the problem right away, so I also couldn't 
test the mentioned patch, but yeah... If it happens again (which for 
some reason I don't hope), I'll try you suggestion.



Actually, a read-only mount test is always a good troubleshooting step
when the trouble is a filesystem that either won't mount normally, or
will, but then locks up when you try to access something.  It's far
lest risky than a normal writable mount, and at minimum it provides you
the additional test data of whether it worked or not, plus if it does,
a chance to access the data and make sure your backups are current,
before actually trying to do any repairs.



Re: Filesystem mounts fine but hangs on access

2018-11-04 Thread Duncan
Sebastian Ochmann posted on Sun, 04 Nov 2018 14:15:55 +0100 as excerpted:

> Hello,
> 
> I have a btrfs filesystem on a single encrypted (LUKS) 10 TB drive which
> stopped working correctly.

> Kernel 4.18.16 (Arch Linux)

I see upgrading to 4.19 seems to have solved your problem, but this is 
more about something I saw in the trace that has me wondering...

> [  368.267315]  touch_atime+0xc0/0xe0

Do you have any atime-related mount options set?

FWIW, noatime is strongly recommended on btrfs.

Now I'm not a dev, just a btrfs user and list regular, and I don't know 
if that function is called and just does nothing when noatime is set, so 
you may well already have it set and this is "much ado about nothing", 
but the chance that it's relevant, if not for you, perhaps for others 
that may read it, begs for this post...

The problem with atime, access time, is that it turns most otherwise read-
only operations into read-and-write operations in ordered to update the 
access time.  And on copy-on-write (COW) based filesystems such as btrfs, 
that can be a big problem, because updating that tiny bit of metadata 
will trigger a rewrite of the entire metadata block containing it, which 
will trigger an update of the metadata for /that/ block in the parent 
metadata tier... all the way up the metadata tree, ultimately to its 
root, the filesystem root and the superblocks, at the next commit 
(normally every 30 seconds or less).

Not only is that a bunch of otherwise unnecessary work for a bit of 
metadata barely anything actually uses, but forcing most read operations 
to read-write obviously compounds the risk for all of those would-be read-
only operations when a filesystem already has problems.

Additionally, if your use-case includes regular snapshotting, with atime 
on, on mostly read workloads with few writes (other than atime updates), 
it may actually be the case that most of the changes in a snapshot are 
actually atime updates, making reoccurring snapshot updates far larger 
than they'd be otherwise.

Now a few years ago the kernel did change the default to relatime, 
basically updating the atime for any particular file only once a day, 
which does help quite a bit, and on traditional filesystems it's arguably 
a reasonably sane default, but COW makes atime tracking enough more 
expensive that setting noatime is still strongly recommended on btrfs, 
particularly if you're doing regular snapshotting.

So do consider adding noatime to your mount options if you haven't done 
so already.  AFAIK, the only /semi-common/ app that actually uses atimes 
these days is mutt (for read-message tracking), and then not for mbox, so 
you should be safe to at least test turning it off.

And YMMV, but if you do use mutt or something else that uses atimes, I'd 
go so far as to recommend finding an alternative, replacing either btrfs 
(because as I said, relatime is arguably enough on a traditional non-COW 
filesystem) or whatever it is that uses atimes, your call, because IMO it 
really is that big a deal.

Meanwhile, particularly after seeing that in the trace, if the 4.19 
update hadn't already fixed it, I'd have suggested trying a read-only 
mount, both as a test, and assuming it worked, at least allowing you to 
access the data without the lockup, which would have then been related to 
the write due to the atime update, not the actual read.

Actually, a read-only mount test is always a good troubleshooting step 
when the trouble is a filesystem that either won't mount normally, or 
will, but then locks up when you try to access something.  It's far lest 
risky than a normal writable mount, and at minimum it provides you the 
additional test data of whether it worked or not, plus if it does, a 
chance to access the data and make sure your backups are current, before 
actually trying to do any repairs.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman



Re: Filesystem mounts fine but hangs on access

2018-11-04 Thread Sebastian Ochmann

Thank you very much for the quick reply.

On 04.11.18 14:37, Qu Wenruo wrote:



On 2018/11/4 下午9:15, Sebastian Ochmann wrote:

Hello,

I have a btrfs filesystem on a single encrypted (LUKS) 10 TB drive which
stopped working correctly. The drive is used as a backup drive with zstd
compression to which I regularly rsync and make daily snapshots. After I
routinely removed a bunch of snapshots (about 20), I noticed later that
the machine would hang when trying to unmount the filesystem. The
current state is that I'm able to mount the filesystem without errors
and I can view (ls) files in the root level, but trying to view contents
of directories contained therein hangs just like when trying to unmount
the filesystem. I have not yet tried to run check, repair, etc. Do you
have any advice what I should try next?


Could you please run "btrfs check" on the umounted fs?


I ran btrfs check on the unmounted fs and it reported no errors.



A notable hardware change I did a few days before the problem is a
switch from an Intel Xeon platform to AMD Threadripper. However, I
haven't seen problems with the rest of the btrfs filesystems (in
particular, a RAID-1 consisting of three HDDs), which I also migrated to
the new platform, yet. I just want to mention it in case there are known
issues in that direction.

Kernel 4.18.16 (Arch Linux)
btrfs-progs 4.17.1

Kernel log after trying to "ls" a directory contained in the
filesystem's root directory:

[   79.279349] BTRFS info (device dm-5): use zstd compression, level 0
[   79.279351] BTRFS info (device dm-5): disk space caching is enabled
[   79.279352] BTRFS info (device dm-5): has skinny extents
[  135.202344] kauditd_printk_skb: 2 callbacks suppressed
[  135.202347] audit: type=1130 audit(1541335770.667:45): pid=1 uid=0
auid=4294967295 ses=4294967295 msg='unit=polkit comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  135.364850] audit: type=1130 audit(1541335770.831:46): pid=1 uid=0
auid=4294967295 ses=4294967295 msg='unit=udisks2 comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  135.589255] audit: type=1130 audit(1541335771.054:47): pid=1 uid=0
auid=4294967295 ses=4294967295 msg='unit=rtkit-daemon comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  368.266653] INFO: task kworker/u256:1:728 blocked for more than 120
seconds.
[  368.266657]   Tainted: P   OE 4.18.16-arch1-1-ARCH #1
[  368.266658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  368.20] kworker/u256:1  D    0   728  2 0x8080
[  368.266680] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
[btrfs]
[  368.266681] Call Trace:
[  368.266687]  ? __schedule+0x29b/0x8b0
[  368.266690]  ? preempt_count_add+0x68/0xa0
[  368.266692]  schedule+0x32/0x90
[  368.266707]  btrfs_tree_read_lock+0x7d/0x110 [btrfs]
[  368.266710]  ? wait_woken+0x80/0x80
[  368.266719]  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
[  368.266729]  btrfs_search_slot+0xf6/0xa00 [btrfs]
[  368.266732]  ? _raw_spin_unlock+0x16/0x30
[  368.266734]  ? inode_insert5+0x105/0x1a0
[  368.266746]  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
[  368.266749]  ? kmem_cache_alloc+0x179/0x1d0
[  368.266762]  btrfs_iget+0x113/0x690 [btrfs]
[  368.266764]  ? _raw_spin_unlock+0x16/0x30
[  368.266778]  __lookup_free_space_inode+0xd8/0x150 [btrfs]
[  368.266792]  lookup_free_space_inode+0x63/0xc0 [btrfs]
[  368.266806]  load_free_space_cache+0x6e/0x190 [btrfs]
[  368.266808]  ? kmem_cache_alloc_trace+0x181/0x1d0
[  368.266817]  ? cache_block_group+0x73/0x3e0 [btrfs]
[  368.266827]  cache_block_group+0x1c1/0x3e0 [btrfs]


This thread is trying to get tree root lock to create free space cache,
while some one already has locked the tree root.


[  368.266829]  ? wait_woken+0x80/0x80
[  368.266839]  find_free_extent+0x872/0x10e0 [btrfs]
[  368.266851]  btrfs_reserve_extent+0x9b/0x180 [btrfs]
[  368.266862]  btrfs_alloc_tree_block+0x1b3/0x4d0 [btrfs]
[  368.266872]  __btrfs_cow_block+0x11d/0x500 [btrfs]
[  368.266882]  btrfs_cow_block+0xdc/0x1a0 [btrfs]
[  368.266891]  btrfs_search_slot+0x282/0xa00 [btrfs]
[  368.266893]  ? _raw_spin_unlock+0x16/0x30
[  368.266903]  btrfs_insert_empty_items+0x67/0xc0 [btrfs]
[  368.266913]  __btrfs_run_delayed_refs+0x8ef/0x10a0 [btrfs]
[  368.266915]  ? preempt_count_add+0x68/0xa0
[  368.266926]  btrfs_run_delayed_refs+0x72/0x180 [btrfs]
[  368.266937]  delayed_ref_async_start+0x81/0x90 [btrfs]
[  368.266950]  normal_work_helper+0xbd/0x350 [btrfs]
[  368.266953]  process_one_work+0x1eb/0x3c0
[  368.266955]  worker_thread+0x2d/0x3d0
[  368.266956]  ? process_one_work+0x3c0/0x3c0
[  368.266958]  kthread+0x112/0x130
[  368.266960]  ? kthread_flush_work_fn+0x10/0x10
[  368.266961]  ret_from_fork+0x22/0x40
[  368.266978] INFO: task btrfs-cleaner:1196 blocked for more than 120
seconds.

[snip, this trace doesn't look interesting at all]

[  368.267135] INFO: task 

Re: BTRFS did it's job nicely (thanks!)

2018-11-04 Thread waxhead

Sterling Windmill wrote:

Out of curiosity, what led to you choosing RAID1 for data but RAID10
for metadata?

I've flip flipped between these two modes myself after finding out
that BTRFS RAID10 doesn't work how I would've expected.

Wondering what made you choose your configuration.

Thanks!
Sure,


The "RAID"1 profile for data was chosen to maximize disk space 
utilization since I got a lot of mixed size devices.


The "RAID"10 profile for metadata was chosen simply because it *feels* a 
bit faster for some of my (previous) workload which was reading a lot of 
small files (which I guess was embedded in the metadata). While I never 
remembered that I got any measurable performance increase the system 
simply felt smoother (which is strange since "RAID"10 should hog more 
disks at once).


I would love to try "RAID"10 for both data and metadata, but I have to 
delete some files first (or add yet another drive).


Would you like to elaborate a bit more yourself about how BTRFS "RAID"10 
does not work as you expected?


As far as I know BTRFS' version of "RAID"10 means it ensure 2 copies (1 
replica) is striped over as many disks it can (as long as there is free 
space).


So if I am not terribly mistaking a "RAID"10 with 20 devices will stripe 
over (20/2) x 2 and if you run out of space on 10 of the devices it will 
continue to stripe over (5/2) x 2. So your stripe width vary with the 
available space essentially... I may be terribly wrong about this (until 
someones corrects me that is...)




Re: [PATCH 0/7] fstests: test Btrfs swapfile support

2018-11-04 Thread Eryu Guan
On Fri, Nov 02, 2018 at 02:29:35PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval 
> 
> This series fixes a couple of generic swapfile tests and adds some
> Btrfs-specific swapfile tests. Btrfs swapfile support is scheduled for
> 4.21 [1].
> 
> 1: https://www.spinics.net/lists/linux-btrfs/msg83454.html
> 
> Thanks!

Thanks for the fixes and new tests!

> 
> Omar Sandoval (7):
>   generic/{472,496,497}: fix $seeqres typo
>   generic/{472,496}: fix swap file creation on Btrfs

I've merged above two patches, they're two obvious bug fixes.

>   btrfs: test swap file activation restrictions
>   btrfs: test invalid operations on a swap file
>   btrfs: test swap files on multiple devices
>   btrfs: test device add/remove/replace with an active swap file
>   btrfs: test balance and resize with an active swap file

These tests look fine to me, but it'd be really great if btrfs folks
could help review above tests and provide Reviewed-by tags.

And perhaps we could add test 17[56] to 'volume' group, as they do
device operations. And similarly, add the last test to 'balance' group.

Thanks,
Eryu

> 
>  tests/btrfs/173 | 55 ++
>  tests/btrfs/173.out |  5 +++
>  tests/btrfs/174 | 66 
>  tests/btrfs/174.out | 10 ++
>  tests/btrfs/175 | 73 
>  tests/btrfs/175.out |  8 +
>  tests/btrfs/176 | 82 +
>  tests/btrfs/176.out |  5 +++
>  tests/btrfs/177 | 64 +++
>  tests/btrfs/177.out |  6 
>  tests/btrfs/group   |  5 +++
>  tests/generic/472   | 16 -
>  tests/generic/496   |  8 ++---
>  tests/generic/497   |  2 +-
>  14 files changed, 391 insertions(+), 14 deletions(-)
>  create mode 100755 tests/btrfs/173
>  create mode 100644 tests/btrfs/173.out
>  create mode 100755 tests/btrfs/174
>  create mode 100644 tests/btrfs/174.out
>  create mode 100755 tests/btrfs/175
>  create mode 100644 tests/btrfs/175.out
>  create mode 100755 tests/btrfs/176
>  create mode 100644 tests/btrfs/176.out
>  create mode 100755 tests/btrfs/177
>  create mode 100644 tests/btrfs/177.out
> 
> -- 
> 2.19.1
> 


Re: BTRFS did it's job nicely (thanks!)

2018-11-04 Thread Sterling Windmill
Out of curiosity, what led to you choosing RAID1 for data but RAID10
for metadata?

I've flip flipped between these two modes myself after finding out
that BTRFS RAID10 doesn't work how I would've expected.

Wondering what made you choose your configuration.

Thanks!

On Fri, Nov 2, 2018 at 3:55 PM waxhead  wrote:
>
> Hi,
>
> my main computer runs on a 7x SSD BTRFS as rootfs with
> data:RAID1 and metadata:RAID10.
>
> One SSD is probably about to fail, and it seems that BTRFS fixed it
> nicely (thanks everyone!)
>
> I decided to just post the ugly details in case someone just wants to
> have a look. Note that I tend to interpret the btrfs de st / output as
> if the error was NOT fixed even if (seems clearly that) it was, so I
> think the output is a bit misleading... just saying...
>
>
>
> -- below are the details for those curious (just for fun) ---
>
> scrub status for [YOINK!]
>  scrub started at Fri Nov  2 17:49:45 2018 and finished after
> 00:29:26
>  total bytes scrubbed: 1.15TiB with 1 errors
>  error details: csum=1
>  corrected errors: 1, uncorrectable errors: 0, unverified errors: 0
>
>   btrfs fi us -T /
> Overall:
>  Device size:   1.18TiB
>  Device allocated:  1.17TiB
>  Device unallocated:9.69GiB
>  Device missing:  0.00B
>  Used:  1.17TiB
>  Free (estimated):  6.30GiB  (min: 6.30GiB)
>  Data ratio:   2.00
>  Metadata ratio:   2.00
>  Global reserve:  512.00MiB  (used: 0.00B)
>
>   Data  Metadata  System
> Id Path  RAID1 RAID10RAID10Unallocated
> -- - - - - ---
>   6 /dev/sda1 236.28GiB 704.00MiB  32.00MiB   485.00MiB
>   7 /dev/sdb1 233.72GiB   1.03GiB  32.00MiB 2.69GiB
>   2 /dev/sdc1 110.56GiB 352.00MiB -   904.00MiB
>   8 /dev/sdd1 234.96GiB   1.03GiB  32.00MiB 1.45GiB
>   1 /dev/sde1 164.90GiB   1.03GiB  32.00MiB 1.72GiB
>   9 /dev/sdf1 109.00GiB   1.03GiB  32.00MiB   744.00MiB
> 10 /dev/sdg1 107.98GiB   1.03GiB  32.00MiB 1.74GiB
> -- - - - - ---
> Total 598.70GiB   3.09GiB  96.00MiB 9.69GiB
> Used  597.25GiB   1.57GiB 128.00KiB
>
>
>
> uname -a
> Linux main 4.18.0-2-amd64 #1 SMP Debian 4.18.10-2 (2018-10-07) x86_64
> GNU/Linux
>
> btrfs --version
> btrfs-progs v4.17
>
>
> dmesg | grep -i btrfs
> [7.801817] Btrfs loaded, crc32c=crc32c-generic
> [8.163288] BTRFS: device label btrfsroot devid 10 transid 669961
> /dev/sdg1
> [8.163433] BTRFS: device label btrfsroot devid 9 transid 669961
> /dev/sdf1
> [8.163591] BTRFS: device label btrfsroot devid 1 transid 669961
> /dev/sde1
> [8.163734] BTRFS: device label btrfsroot devid 8 transid 669961
> /dev/sdd1
> [8.163974] BTRFS: device label btrfsroot devid 2 transid 669961
> /dev/sdc1
> [8.164117] BTRFS: device label btrfsroot devid 7 transid 669961
> /dev/sdb1
> [8.164262] BTRFS: device label btrfsroot devid 6 transid 669961
> /dev/sda1
> [8.206174] BTRFS info (device sde1): disk space caching is enabled
> [8.206236] BTRFS info (device sde1): has skinny extents
> [8.348610] BTRFS info (device sde1): enabling ssd optimizations
> [8.854412] BTRFS info (device sde1): enabling free space tree
> [8.854471] BTRFS info (device sde1): using free space tree
> [   68.170580] BTRFS warning (device sde1): csum failed root 3760 ino
> 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2
> [   68.185973] BTRFS warning (device sde1): csum failed root 3760 ino
> 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2
> [   68.185991] BTRFS warning (device sde1): csum failed root 3760 ino
> 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2
> [   68.186003] BTRFS warning (device sde1): csum failed root 3760 ino
> 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2
> [   68.186015] BTRFS warning (device sde1): csum failed root 3760 ino
> 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2
> [   68.186028] BTRFS warning (device sde1): csum failed root 3760 ino
> 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2
> [   68.186041] BTRFS warning (device sde1): csum failed root 3760 ino
> 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2
> [   68.186052] BTRFS warning (device sde1): csum failed root 3760 ino
> 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2
> [   68.186063] BTRFS warning (device sde1): csum failed root 3760 ino
> 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2
> [   68.186075] BTRFS warning (device sde1): csum failed root 3760 ino
> 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2
> [   68.199237] 

Re: BTRFS did it's job nicely (thanks!)

2018-11-04 Thread waxhead

Duncan wrote:

waxhead posted on Fri, 02 Nov 2018 20:54:40 +0100 as excerpted:


Note that I tend to interpret the btrfs de st / output as if the error
was NOT fixed even if (seems clearly that) it was, so I think the output
is a bit misleading... just saying...


See the btrfs-device manpage, stats subcommand, -z|--reset option, and
device stats section:

-z|--reset
Print the stats and reset the values to zero afterwards.

DEVICE STATS
The device stats keep persistent record of several error classes related
to doing IO. The current values are printed at mount time and
updated during filesystem lifetime or from a scrub run.


So stats keeps a count of historic errors and is only reset when you
specifically reset it, *NOT* when the error is fixed.

Yes, I am perfectly aware of all that. The issue I have is that the 
manpage describes corruption errors as "A block checksum mismatched or 
corrupted metadata header was found". This does not tell me if this was 
a permanent corruption or if it was fixed. That is why I think the 
output is a bit misleadning (and I should have said that more clearly).


My point being that btrfs device stats /mnt would have been a lot easier 
to read and understand if it distinguished between permanent corruption 
e.g. unfixable errors vs fixed errors.



(There's actually a recent patch, I believe in the current dev kernel
4.20/5.0, that will reset a device's stats automatically for the btrfs
replace case when it's actually a different device afterward anyway.
Apparently, it doesn't even do /that/ automatically yet.  Keep that in
mind if you replace that device.)

Oh thanks for the heads up, I was under the impression that the device 
stats was tracked by btrfs devid, but apparently it is (was) not. Good 
to know!


Re: Filesystem mounts fine but hangs on access

2018-11-04 Thread Qu Wenruo


On 2018/11/4 下午9:15, Sebastian Ochmann wrote:
> Hello,
> 
> I have a btrfs filesystem on a single encrypted (LUKS) 10 TB drive which
> stopped working correctly. The drive is used as a backup drive with zstd
> compression to which I regularly rsync and make daily snapshots. After I
> routinely removed a bunch of snapshots (about 20), I noticed later that
> the machine would hang when trying to unmount the filesystem. The
> current state is that I'm able to mount the filesystem without errors
> and I can view (ls) files in the root level, but trying to view contents
> of directories contained therein hangs just like when trying to unmount
> the filesystem. I have not yet tried to run check, repair, etc. Do you
> have any advice what I should try next?

Could you please run "btrfs check" on the umounted fs?

> 
> A notable hardware change I did a few days before the problem is a
> switch from an Intel Xeon platform to AMD Threadripper. However, I
> haven't seen problems with the rest of the btrfs filesystems (in
> particular, a RAID-1 consisting of three HDDs), which I also migrated to
> the new platform, yet. I just want to mention it in case there are known
> issues in that direction.
> 
> Kernel 4.18.16 (Arch Linux)
> btrfs-progs 4.17.1
> 
> Kernel log after trying to "ls" a directory contained in the
> filesystem's root directory:
> 
> [   79.279349] BTRFS info (device dm-5): use zstd compression, level 0
> [   79.279351] BTRFS info (device dm-5): disk space caching is enabled
> [   79.279352] BTRFS info (device dm-5): has skinny extents
> [  135.202344] kauditd_printk_skb: 2 callbacks suppressed
> [  135.202347] audit: type=1130 audit(1541335770.667:45): pid=1 uid=0
> auid=4294967295 ses=4294967295 msg='unit=polkit comm="systemd"
> exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> [  135.364850] audit: type=1130 audit(1541335770.831:46): pid=1 uid=0
> auid=4294967295 ses=4294967295 msg='unit=udisks2 comm="systemd"
> exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> [  135.589255] audit: type=1130 audit(1541335771.054:47): pid=1 uid=0
> auid=4294967295 ses=4294967295 msg='unit=rtkit-daemon comm="systemd"
> exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> [  368.266653] INFO: task kworker/u256:1:728 blocked for more than 120
> seconds.
> [  368.266657]   Tainted: P   OE 4.18.16-arch1-1-ARCH #1
> [  368.266658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  368.20] kworker/u256:1  D    0   728  2 0x8080
> [  368.266680] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
> [btrfs]
> [  368.266681] Call Trace:
> [  368.266687]  ? __schedule+0x29b/0x8b0
> [  368.266690]  ? preempt_count_add+0x68/0xa0
> [  368.266692]  schedule+0x32/0x90
> [  368.266707]  btrfs_tree_read_lock+0x7d/0x110 [btrfs]
> [  368.266710]  ? wait_woken+0x80/0x80
> [  368.266719]  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> [  368.266729]  btrfs_search_slot+0xf6/0xa00 [btrfs]
> [  368.266732]  ? _raw_spin_unlock+0x16/0x30
> [  368.266734]  ? inode_insert5+0x105/0x1a0
> [  368.266746]  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> [  368.266749]  ? kmem_cache_alloc+0x179/0x1d0
> [  368.266762]  btrfs_iget+0x113/0x690 [btrfs]
> [  368.266764]  ? _raw_spin_unlock+0x16/0x30
> [  368.266778]  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> [  368.266792]  lookup_free_space_inode+0x63/0xc0 [btrfs]
> [  368.266806]  load_free_space_cache+0x6e/0x190 [btrfs]
> [  368.266808]  ? kmem_cache_alloc_trace+0x181/0x1d0
> [  368.266817]  ? cache_block_group+0x73/0x3e0 [btrfs]
> [  368.266827]  cache_block_group+0x1c1/0x3e0 [btrfs]

This thread is trying to get tree root lock to create free space cache,
while some one already has locked the tree root.

> [  368.266829]  ? wait_woken+0x80/0x80
> [  368.266839]  find_free_extent+0x872/0x10e0 [btrfs]
> [  368.266851]  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> [  368.266862]  btrfs_alloc_tree_block+0x1b3/0x4d0 [btrfs]
> [  368.266872]  __btrfs_cow_block+0x11d/0x500 [btrfs]
> [  368.266882]  btrfs_cow_block+0xdc/0x1a0 [btrfs]
> [  368.266891]  btrfs_search_slot+0x282/0xa00 [btrfs]
> [  368.266893]  ? _raw_spin_unlock+0x16/0x30
> [  368.266903]  btrfs_insert_empty_items+0x67/0xc0 [btrfs]
> [  368.266913]  __btrfs_run_delayed_refs+0x8ef/0x10a0 [btrfs]
> [  368.266915]  ? preempt_count_add+0x68/0xa0
> [  368.266926]  btrfs_run_delayed_refs+0x72/0x180 [btrfs]
> [  368.266937]  delayed_ref_async_start+0x81/0x90 [btrfs]
> [  368.266950]  normal_work_helper+0xbd/0x350 [btrfs]
> [  368.266953]  process_one_work+0x1eb/0x3c0
> [  368.266955]  worker_thread+0x2d/0x3d0
> [  368.266956]  ? process_one_work+0x3c0/0x3c0
> [  368.266958]  kthread+0x112/0x130
> [  368.266960]  ? kthread_flush_work_fn+0x10/0x10
> [  368.266961]  ret_from_fork+0x22/0x40
> [  368.266978] INFO: task btrfs-cleaner:1196 blocked for more than 120
> seconds.
[snip, this trace doesn't look interesting at all]
> [  

Filesystem mounts fine but hangs on access

2018-11-04 Thread Sebastian Ochmann

Hello,

I have a btrfs filesystem on a single encrypted (LUKS) 10 TB drive which 
stopped working correctly. The drive is used as a backup drive with zstd 
compression to which I regularly rsync and make daily snapshots. After I 
routinely removed a bunch of snapshots (about 20), I noticed later that 
the machine would hang when trying to unmount the filesystem. The 
current state is that I'm able to mount the filesystem without errors 
and I can view (ls) files in the root level, but trying to view contents 
of directories contained therein hangs just like when trying to unmount 
the filesystem. I have not yet tried to run check, repair, etc. Do you 
have any advice what I should try next?


A notable hardware change I did a few days before the problem is a 
switch from an Intel Xeon platform to AMD Threadripper. However, I 
haven't seen problems with the rest of the btrfs filesystems (in 
particular, a RAID-1 consisting of three HDDs), which I also migrated to 
the new platform, yet. I just want to mention it in case there are known 
issues in that direction.


Kernel 4.18.16 (Arch Linux)
btrfs-progs 4.17.1

Kernel log after trying to "ls" a directory contained in the 
filesystem's root directory:


[   79.279349] BTRFS info (device dm-5): use zstd compression, level 0
[   79.279351] BTRFS info (device dm-5): disk space caching is enabled
[   79.279352] BTRFS info (device dm-5): has skinny extents
[  135.202344] kauditd_printk_skb: 2 callbacks suppressed
[  135.202347] audit: type=1130 audit(1541335770.667:45): pid=1 uid=0 
auid=4294967295 ses=4294967295 msg='unit=polkit comm="systemd" 
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  135.364850] audit: type=1130 audit(1541335770.831:46): pid=1 uid=0 
auid=4294967295 ses=4294967295 msg='unit=udisks2 comm="systemd" 
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  135.589255] audit: type=1130 audit(1541335771.054:47): pid=1 uid=0 
auid=4294967295 ses=4294967295 msg='unit=rtkit-daemon comm="systemd" 
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  368.266653] INFO: task kworker/u256:1:728 blocked for more than 120 
seconds.

[  368.266657]   Tainted: P   OE 4.18.16-arch1-1-ARCH #1
[  368.266658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.

[  368.20] kworker/u256:1  D0   728  2 0x8080
[  368.266680] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[  368.266681] Call Trace:
[  368.266687]  ? __schedule+0x29b/0x8b0
[  368.266690]  ? preempt_count_add+0x68/0xa0
[  368.266692]  schedule+0x32/0x90
[  368.266707]  btrfs_tree_read_lock+0x7d/0x110 [btrfs]
[  368.266710]  ? wait_woken+0x80/0x80
[  368.266719]  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
[  368.266729]  btrfs_search_slot+0xf6/0xa00 [btrfs]
[  368.266732]  ? _raw_spin_unlock+0x16/0x30
[  368.266734]  ? inode_insert5+0x105/0x1a0
[  368.266746]  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
[  368.266749]  ? kmem_cache_alloc+0x179/0x1d0
[  368.266762]  btrfs_iget+0x113/0x690 [btrfs]
[  368.266764]  ? _raw_spin_unlock+0x16/0x30
[  368.266778]  __lookup_free_space_inode+0xd8/0x150 [btrfs]
[  368.266792]  lookup_free_space_inode+0x63/0xc0 [btrfs]
[  368.266806]  load_free_space_cache+0x6e/0x190 [btrfs]
[  368.266808]  ? kmem_cache_alloc_trace+0x181/0x1d0
[  368.266817]  ? cache_block_group+0x73/0x3e0 [btrfs]
[  368.266827]  cache_block_group+0x1c1/0x3e0 [btrfs]
[  368.266829]  ? wait_woken+0x80/0x80
[  368.266839]  find_free_extent+0x872/0x10e0 [btrfs]
[  368.266851]  btrfs_reserve_extent+0x9b/0x180 [btrfs]
[  368.266862]  btrfs_alloc_tree_block+0x1b3/0x4d0 [btrfs]
[  368.266872]  __btrfs_cow_block+0x11d/0x500 [btrfs]
[  368.266882]  btrfs_cow_block+0xdc/0x1a0 [btrfs]
[  368.266891]  btrfs_search_slot+0x282/0xa00 [btrfs]
[  368.266893]  ? _raw_spin_unlock+0x16/0x30
[  368.266903]  btrfs_insert_empty_items+0x67/0xc0 [btrfs]
[  368.266913]  __btrfs_run_delayed_refs+0x8ef/0x10a0 [btrfs]
[  368.266915]  ? preempt_count_add+0x68/0xa0
[  368.266926]  btrfs_run_delayed_refs+0x72/0x180 [btrfs]
[  368.266937]  delayed_ref_async_start+0x81/0x90 [btrfs]
[  368.266950]  normal_work_helper+0xbd/0x350 [btrfs]
[  368.266953]  process_one_work+0x1eb/0x3c0
[  368.266955]  worker_thread+0x2d/0x3d0
[  368.266956]  ? process_one_work+0x3c0/0x3c0
[  368.266958]  kthread+0x112/0x130
[  368.266960]  ? kthread_flush_work_fn+0x10/0x10
[  368.266961]  ret_from_fork+0x22/0x40
[  368.266978] INFO: task btrfs-cleaner:1196 blocked for more than 120 
seconds.

[  368.266980]   Tainted: P   OE 4.18.16-arch1-1-ARCH #1
[  368.266981] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.

[  368.266982] btrfs-cleaner   D0  1196  2 0x8080
[  368.266983] Call Trace:
[  368.266985]  ? __schedule+0x29b/0x8b0
[  368.266987]  schedule+0x32/0x90
[  368.266997]  cache_block_group+0x148/0x3e0 [btrfs]
[  368.266998]  ? wait_woken+0x80/0x80
[