btrfs-progs: replace: error message can be improved when other operation is running

2017-12-10 Thread Lukas Pirl
Dear all,

when trying to replace a device of a file system for which a balance is
running, btrfs-progs fails with the error message:

ERROR: ioctl(DEV_REPLACE_START) on '/mnt/xyz' returns error: 

This might also be true for alike operations, such as "add", "delete"
and "resize", since those cases do not seem to be considered in
cmds-replace.c [0].

Apparently, this is not very helpful to the user (if not scary).
In contrast, other commands give very helpful output in similar
situations (e.g., "add/delete/… operation in progress" [1]).

Other users' confusions might also be related to this potential issue [2].

This is probably very easy to fix for someone into ioctl return values
and all this.

Thanks and cheers,

Lukas

GNU/Linux
4.13.0-0.bpo.1-amd64 #1 SMP Debian 4.13.13-1~bpo9+1 (2017-11-22) x86_64
btrfs-progs v4.13.3

[0]
https://github.com/kdave/btrfs-progs/blob/11c83cefb8b4a03b1835efaf603ddc95430a0c9e/cmds-replace.c#L48
[1]
https://github.com/kdave/btrfs-progs/blob/9fe889ac02b9c49b885c8999f5dd4e192697fa83/ioctl.h#L709
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=866734
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: zstd compression

2017-11-15 Thread Lukas Pirl
Hi Imran,

On 11/15/2017 09:51 AM, Imran Geriskovan wrote as excerpted:
> Any further advices?

you might be interested in the thread "Read before you deploy btrfs +
zstd"¹.

Cheers,

Lukas

¹ https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg69871.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Several questions regarding btrfs

2017-11-01 Thread Lukas Pirl
On 11/01/2017 03:05 PM, ST wrote as excerpted:
>> However, it's important to know that if your users have shell access, 
>> they can bypass qgroups.  Normal users can create subvolumes, and new 
>> subvolumes aren't added to an existing qgroup by default (and unless I'm 
>> mistaken, aren't constrained by the qgroup set on the parent subvolume), 
>> so simple shell access is enough to bypass quotas.

> I never did it before, but shouldn't it be possible to just whitelist
> commands users are allowed to use in the SSH config (and so block
> creation of subvolumes/cp --reflink)? I actually would have restricted
> users to sftp if I knew how to let them change their passwords once they
> wish to. As far as I know it is not possible with OpenSSH...

Possible only via a rather custom setup, I guess. You could
a) force users into a chroot via the sshd configuration
   (chroots need allowed binaries plus their libs and configs etc.),
b) solve the problem with file permissions on all binaries
   (probably a terrible pain to setup (users, groups, …) and maintain)

Cheers,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs scrub crashes OS

2017-09-26 Thread Lukas Pirl
On 09/26/2017 11:36 AM, Qu Wenruo wrote as excerpted:
> This is strange, this means that we can't find a chunk map for a 72K
> length data extent.
> 
> Either the new mapper code has some bug, or it's a big problem.
> But I think it's more possible for former case.
> 
> Would you please try to dump the chunk tree (which should be quite
> small) using the following command?
> 
> $ btrfs inspect-internal dump-tree -t chunk 

Sure, happy to provide that:
  https://static.lukas-pirl.de/dump-chunk-tree.txt
(too large for Pastebin, file will probably go away in a couple of weeks).

Cheers,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs scrub crashes OS

2017-09-26 Thread Lukas Pirl
Hi Qu,

On 09/26/2017 10:51 AM, Qu Wenruo wrote as excerpted:
> This make things more weird.
> Just in case, are you executing offline scrub by "btrfs scrub start
> --offline "

Yes. I even got some output (pretty sure the last lines are missing due
to the crash):

WARNING: Offline scrub doesn't support extra options other than -r
[I gave -d as well]
Invalid mapping for 644337258496-644337332224, got
645348196352-646421938176
Couldn't map the block 644337258496
ERROR: failed to read out data at bytenr 644337258496 mirror 1
Invalid mapping for 653402148864-653402152960, got
653938130944-655011872768
Couldn't map the block 653402148864
ERROR: failed to read out data at bytenr 653402148864 mirror 1
Invalid mapping for 717315420160-717315526656, got
718362640384-719436382208
Couldn't map the block 717315420160
ERROR: failed to read out data at bytenr 717315420160 mirror 1
Invalid mapping for 875072008192-875072040960, got
875128946688-876202688512
Couldn't map the block 875072008192
ERROR: failed to read tree block 875072008192 mirror 1
ERROR: extent 875072008192 len 32768 CORRUPTED: all mirror(s)
corrupted, can't be recovered

Can I find out on which disk a mirror of a block is?

> If so, I think there may be some problem outside the btrfs territory.

Of course, that is a possibility…

> Offline scrub has nothing to do with btrfs kernel module, it just reads
> out on-disk data and verify checksum in *user* space.
> 
> So if offline scrub can also screw up the system, it means there is
> something wrong in the disk IO routine, not btrfs.
> 
> And scrub can trigger it because normal btrfs IO won't try to read that
> part/mirror.

…especially when considering this.

> What about trying to read all data out of your raw disk?
> If offline crashes the system, reading the disk may crash it also.
> Using dd to read each of your disk (with btrfs unmounted) may expose
> which disk caused the problem.

That it is good idea! Will go ahead.

Thanks for your help so far.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs scrub crashes OS

2017-09-26 Thread Lukas Pirl
Dear Qu,

thanks for your reply.

On 09/25/2017 12:19 PM, Qu Wenruo wrote as excerpted:
> Even no dmesg output using tty or netconsole?

And thanks for the pointer to netconsole, I tried that one.
No success. I set netconsole up, verified it worked, started a scrub,
the machine went away after a couple of hours, netconsole empty.

> That's strange.
> Normally it should be kernel BUG_ON() to cause such problem.
>  
> And if the system is still responsible (either from TTY or ssh), is
> there anything strange like tons of IO or CPU usage?

I can't tell, the machine just disappears from the network. Dead. IIRC,
it was also all dead when I sat in front of it.

> Btrfs-progs v4.13 should have fixed it.
> As long as v4.13 btrfs check reports no error, its metadata should be
> good.

I can try that one, if helpful.

> You could try the out-of-tree offline scrub to do a full scrub of your
> fs unmounted, so it won't crash your system (if nothing wrong happened)
> https://github.com/gujx2017/btrfs-progs/tree/offline_scrub

Did that, machine crashed again.

>>    MIXED_BACKREF, BIG_METADATA, EXTENDED_IREF, SKINNY_METADATA, NO_HOLES
> 
> Only NO_HOLES is not ordinary, but shouldn't cause a problem.

Would it be sensible to turn that feature off using `btrfstune` (if
possible at all)?

> Without kernel backtrace, it's tricky to locate the problem.
> So I would recommend to use netconsole (IIRC more reliable, as I use it
> on my test VM to capture the dying message) or TTY output to verify
> there is no kernel message/backtrace.

Yeah I see we are in a tricky situation here.

I will try to scrub with autodefrag and compression deactivated.

Could a full balance be of any help? At least to find out if it crashes
the machine as well?

Cheers,

Lukas

> Thanks,
> Qu
> 
>>    no quotas in use
>>    see also https://pastebin.com/4me6zDsN for more details
>> btrfs-progs v4.12
>> GNU/Linux 4.12.0-0.bpo.1-amd64 #1 SMP Debian 4.12.6-1~bpo9+1 x86_64
>>
>> The question, obviously, is how can I make this fs "scrubable" again?
>> Are the errors found by btrfsck safe to repair using btrfsck or some
>> other tool?
>>
>> Thank you so much in advance,
>>
>> Lukas
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Wrong device?

2017-09-26 Thread Lukas Pirl
On 09/25/2017 06:11 PM, linux-bt...@oh3mqu.pp.hyper.fi wrote as excerpted:
> After a long googling (about more complex situations) I suddenly
> noticed "device sdb" WTF???  Filesystem is mounted from /dev/md3 (sdb
> is part of that mdraid) so btrfs should not even know anything about
> that /dev/sdb.

I would be interested in explanations regarding this too. It happened
to me as well, that I was confused by /dev/sd* device paths being
printed by btrfs in the logs, even though it runs on /dev/md-*
(/dev/mapper/*) devices exclusively.

> PS. I have noticed another bug too, but I haven't tested it with
> lastest kernels after I noticed that it happens only with
> compression=lzo.  So maybe it is already fixed.  With gzip or none
> compression probem does not happens.  I have email server with about
> 0.5 TB volume. It is using Maildir so it contains huge amount of
> files.  Sometimes some files goes unreadable.  After server reset
> problematic file could be readable again (but not always)...
> 
> But weird thing is that unreadable file always seems to be
> dovecot.index.log.

Confirm this (non-reproducible) behavior on a VPS running Debian
4.5.4-1~bpo8+1.

Lukas

-- 
+49 174 940 74 71
GPG key available via key servers



signature.asc
Description: OpenPGP digital signature


btrfs scrub crashes OS

2017-09-25 Thread Lukas Pirl
Dear all,

I experience reproducible OS crashes when scrubbing a btrfs file system.
Apart from that, the file system mounts rw and is usable without any
problems (including modifying snapshots and all that).

When the system crashes (i.e., freezes), there are no errors printed to
the system logs or via `dmesg` (had a display connected).

Recovery is only possible via power-cycling the machine.

The host experienced a lot of crashes and ATA errors due to hardware
failures in the past.
To the best of my knowledge, the hardware is stable now.

`btrfs device stats` outputs zeros for all counters.

`btrfsck --readonly --mode lowmem` outputs a bunch of
  referencer count mismatch …
and
  ERROR: data extent[… …] backref lost
see https://pastebin.com/seC4fReP for the full log.

System info:
btrfs RAID 1 (~1.5 years old), 7 SATA HDDs
  MIXED_BACKREF, BIG_METADATA, EXTENDED_IREF, SKINNY_METADATA, NO_HOLES
  no quotas in use
  see also https://pastebin.com/4me6zDsN for more details
btrfs-progs v4.12
GNU/Linux 4.12.0-0.bpo.1-amd64 #1 SMP Debian 4.12.6-1~bpo9+1 x86_64

The question, obviously, is how can I make this fs "scrubable" again?
Are the errors found by btrfsck safe to repair using btrfsck or some
other tool?

Thank you so much in advance,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Still in 4.4.0: livelock in recovery (free_reloc_roots)

2016-03-02 Thread Lukas Pirl
On 11/20/2015 10:04 AM, Lukas Pirl wrote as excerpted:
> I am (still) trying to recover a RAID1 that can only be mounted
> recovery,degraded,ro.
> 
> I experienced an issue that might be interesting for you: I tried to
> mount the file system rw,recovery and the kernel ended up burning one
> core (and only one specific core, never scheduled to another one).
> 
> The watchdog printed a stack trace roughly every 20 seconds. There were
> only a few stack traces that were printed alternating (see below).
> After a few hours with the mount command still being blocked and without
> visible IO activity, the system was power-cycled.
> 
> Summary:
> 
> Call Trace:
>  [] ? free_reloc_roots+0x11/0x30 [btrfs]
>  [] ? free_reloc_roots+0x1d/0x30 [btrfs]
>  [] ? merge_reloc_roots+0x165/0x220 [btrfs]
>  [] ? btrfs_recover_relocation+0x293/0x380 [btrfs]
>  [] ? open_ctree+0x20d2/0x23b0 [btrfs]
>  [] ? btrfs_mount+0x87b/0x990 [btrfs]
>  [] ? pcpu_next_unpop+0x3f/0x50
>  [] ? mount_fs+0x36/0x170
>  [] ? vfs_kern_mount+0x68/0x110
>  [] ? btrfs_mount+0x1bb/0x990 [btrfs]
>  …
> 
> Call Trace:
>[] ? rcu_dump_cpu_stacks+0x80/0xb0
>  [] ? rcu_check_callbacks+0x421/0x6e0
>  [] ? sched_clock+0x5/0x10
>  [] ? notifier_call_chain+0x45/0x70
>  [] ? timekeeping_update+0xf1/0x150
>  [] ? tick_sched_do_timer+0x40/0x40
>  [] ? update_process_times+0x36/0x60
>  [] ? tick_sched_do_timer+0x40/0x40
>  [] ? tick_sched_handle.isra.15+0x24/0x60
>  [] ? tick_sched_do_timer+0x40/0x40
>  [] ? tick_sched_timer+0x3b/0x70
>  [] ? __hrtimer_run_queues+0xdc/0x210
>  [] ? read_tsc+0x5/0x10
>  [] ? read_tsc+0x5/0x10
>  [] ? hrtimer_interrupt+0x9a/0x190
>  [] ? smp_apic_timer_interrupt+0x39/0x50
>  [] ? apic_timer_interrupt+0x6b/0x70
>[] ? _raw_spin_lock+0x10/0x20
>  [] ? __del_reloc_root+0x2f/0x100 [btrfs]
>  [] ? __add_reloc_root+0xe0/0xe0 [btrfs]
>  [] ? free_reloc_roots+0x1d/0x30 [btrfs]
>  [] ? merge_reloc_roots+0x165/0x220 [btrfs]
>  [] ? btrfs_recover_relocation+0x293/0x380 [btrfs]
>  [] ? open_ctree+0x20d2/0x23b0 [btrfs]
>  [] ? btrfs_mount+0x87b/0x990 [btrfs]
>  [] ? pcpu_next_unpop+0x3f/0x50
>  [] ? mount_fs+0x36/0x170
>  [] ? vfs_kern_mount+0x68/0x110
>  [] ? btrfs_mount+0x1bb/0x990 [btrfs]
>  …
> 
> Call Trace:
>  [] ? __del_reloc_root+0x2f/0x100 [btrfs]
>  [] ? free_reloc_roots+0x1d/0x30 [btrfs]
>  [] ? merge_reloc_roots+0x165/0x220 [btrfs]
>  [] ? btrfs_recover_relocation+0x293/0x380 [btrfs]
>  [] ? open_ctree+0x20d2/0x23b0 [btrfs]
>  [] ? btrfs_mount+0x87b/0x990 [btrfs]
>  [] ? pcpu_next_unpop+0x3f/0x50
>  [] ? mount_fs+0x36/0x170
>  [] ? vfs_kern_mount+0x68/0x110
>  [] ? btrfs_mount+0x1bb/0x990 [btrfs]
>  …
> 
> A longer excerpt can be found here: http://pastebin.com/NPM0Ckfy
> 
> I am using kernel 4.2.6 (Debian backports) and btrfs-tools 4.3.
> 
> btrfs check --readonly gave no errors.
> (except the probably false positives mentioned here
> http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg48325.html)
> 
> Reading the whole file system worked also.
> 
> If you need more information to trace this back, let me know and I'll
> try to get it.
> If you have suggestions regarding the recovery, please let me know as well.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing recursive fault and parent transid verify failed

2015-12-06 Thread Lukas Pirl
On 12/07/2015 02:57 PM, Alistair Grant wrote as excerpted:
> Fixing recursive fault, but reboot is needed

For the record:

I saw the same message (incl. hard lockup) when doing a balance on a
single-disk btrfs.

Besides that, the fs works flawlessly (~60GB, usage: no snapshots, ~15
lxc containers, low-load databases, few mails, a couple of Web servers).

As this is a production machine, I rather rebooted the machine instead
of investigating but the error is reproducible if that would be of
great interest.

> I've ran btrfs scrub and btrfsck on the drives, with the output
> included below.  Based on what I've found on the web, I assume that a
> btrfs-zero-log is required.
> 
> * Is this the recommended path?
> * Is there a way to find out which files will be affected by the loss of
>   the transactions?

> Kernel: Ubuntu 4.2.0-19-generic (which is based on mainline 4.2.6)

I used Debian Backports 4.2.6.

Cheers,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: implications of mixed mode

2015-11-27 Thread Lukas Pirl
On 11/27/2015 04:11 PM, Duncan wrote as excerpted:
> My big hesitancy would be over that fact that very few will run or test 
> mixed-mode at TB scale filesystem level, and where they do, it's likely 
> to be in ordered to work around the current (but set to soon be 
> eliminated) metadata-only (no data) dup mode limit on single-device, 
> since in that regard mixed-mode is treated as metadata and dup mode is 
> allowed.
> 
> So you're relatively more likely to run into rarely seen scaling issues 
> and perhaps bugs that nobody else has ever run into as (relatively) 
> nobody else runs mixed-mode on multi-terabyte-scale btrfs.  If you want 
> to be the guinea pig and make it easier for others to try later on, after 
> you've flushed out the worst bugs, that's definitely one way to do it.
> =:^]

I see. This somehow aligns with Qu's answer.

> It's worth noting that rsync... seems to stress btrfs more than pretty 
> much any other common single application.  It's extremely heavy access 
> pattern just seems to trigger bugs that nothing else does, and while they 
> do tend to get fixed, it really does seem to push btrfs to the limits, 
> and there have been a /lot/ of rsync triggered btrfs bugs reported over 
> the years.

Well, IMHO btrfs /has/ to deal with rsync workloads if it wants to be
an alternative for larger storages but that is another story.
I do run btrfs (non-mixed) with rsync workloads for quite a while now
and it is doing well (except for the deadlock that has been around a
while ago). Maybe my network is just slow enough to not trigger any
unfixed weird issues with the intense access patterns of rsync.
Anyways, thanks for the hint!

> Between the stresses of rsyncing half a TiB daily and the relatively 
> untested quantity that is mixed-mode btrfs at multi-terabyte scales on 
> multi-devices, there's a reasonably high chance that you /will/ be 
> working with the devs on various bugs for awhile.  If you're willing to 
> do it, great, somebody putting the filesystem thru those kinds of mixed-
> mode paces at that scale is just the sort of thing we need to get 
> coverage on that particular not yet well tested corner case, but don't 
> expect it to be particularly stable for a couple kernel cycles anyway, 
> and after that, you'll still be running a particularly rare corner-case 
> that's likely to put new code thru its paces as well, so just be aware of 
> the relatively stony path you're signing up to navigate, should you 
> choose to go that route.

Makes perfect sense. I think I sadly do not have the resources to be
that guinea pig…

> Meanwhile, assuming you're /not/ deliberately setting out to test a 
> rarely tested corner-case with stress tests known to rather too 
> frequently get the best of btrfs...
> 
> Why are you considering mixed-mode here?  At that size the ENOSPC hassles 
> of unmixed-mode btrfs on say single-digit GiB and below really should be 
> dwarfed into insignificance, particularly since btrfs since 3.17 or so 
> deletes empty chunks instead of letting them build up to the point where 
> they're a problem, so what possible reason, other than simply to test it 
> and cover that corner-case, could justify mixed-mode at that sort of 
> scale?
> 
> Unless of course, given that you didn't mention number of devices or 
> individual device size, only the 8 TB total, you have in mind a raid of 
> something like 1000 8-GB USB sticks, or the like, in which case mixed-
> mode on the individual sticks might make some sense (well, to the extent 
> that a 1000-device raid of /anything/ makes sense! =:^), given their 8-GB 
> each size.

That is not the case. I just came to the consideration because I
wondered why mixed-mode is not generally preferred when data and
metadata have the same replication level.

Thanks Duncan!

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


implications of mixed mode

2015-11-26 Thread Lukas Pirl
Dear list,

if a larger RAID file system (say disk space of 8 TB in total) is
created in mixed mode, what are the implications?

>From reading the mailing list and the Wiki, I can think of the following:

+ less hassle with "false positive" ENOSPC
- data and metadata have to have the same replication level
  forever (e.g. RAID 1)
- higher fragmentation
  (does this reduce with no(dir)atime?)
  -> more work for autodefrag

Is that roughly what is to be expected? Any implications on recovery etc.?

In the specific case, the file system usage is as follows:
* data spread over ~20 subvolumes
  * snapshotted with various frequencies
  * compression is used
* mostly archive storage
  * write once
  * read infrequently
* ~500GB of daily rsync'ed system backup

Thanks in advance,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.2.6: livelock in recovery (free_reloc_roots)?

2015-11-25 Thread Lukas Pirl
On 11/21/2015 10:01 PM, Alexander Fougner wrote as excerpted:
> This is fixed in btrfs-progs 4.3.1, that allows you to delete a
> device again by the 'missing' keyword.

Thanks Alexander! I just found the thread reporting the bug but not the
patch with the corresponding btrfs-tools version it was merged in.

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.2.6: livelock in recovery (free_reloc_roots)?

2015-11-21 Thread Lukas Pirl
On 11/21/2015 08:16 PM, Duncan wrote as excerpted:
> Lukas Pirl posted on Sat, 21 Nov 2015 13:37:37 +1300 as excerpted:
> 
>> > Can "btrfs_recover_relocation" prevented from being run? I would not
>> > mind losing a few recent writes (what was a balance) but instead going
>> > rw again, so I can restart a balance.
> I'm not familiar with that thread name (I run multiple small btrfs on 
> ssds, so scrub, balance, etc, take only a few minutes at most), but if 

First, thank you Duncan for taking the time to hack in those broad
explanations.

I am not sure if this name also corresponds to a thread name, but it is
for sure a function that appears in all the dumped traces when trying to
'mount -o recovery,degraded' the file system in question:

 [] ? free_reloc_roots+0x1d/0x30 [btrfs]
 [] ? merge_reloc_roots+0x165/0x220 [btrfs]
 [] ? btrfs_recover_relocation+0x293/0x380 [btrfs]
 [] ? open_ctree+0x20d2/0x23b0 [btrfs]
 [] ? btrfs_mount+0x87b/0x990 [btrfs]

> it's the balance thread, then yes, there's a mount option that cancels a 
> running balance.  See the wiki page covering mount options.

Yes, the file system is mounted with '-o skip_balance'.
(Although the '-o recovery' might trigger relocations?!)

>> > From what I have read, btrfs-zero-log would not help in this case (?) so
>> > I did not run it so far.
> Correct.  Btrfs is atomic at commit time, so doesn't need a journal in 
> the sense of older filesystems like reiserfs, jfs and ext3/4.
> …
> Otherwise, it generally does no good, and while 
> it generally does no serious harm beyond the loss of a few seconds worth 
> of fsyncs, etc, either, because the commits /are/ atomic and zeroing the 
> log simply returns the system to the state of such a commit, it's not 
> recommended as it /does/ needlessly kill the log of those last few 
> seconds of fsyncs.

So I see that it does no good but no serious harm (generally). Since it
is related to writes (not relocations, I assume) clearing the log is
unlikely to fix the problem with btrfs_recover_relocation or
merge_reloc_roots, respectively.

Maybe a dev helps us and shines some light in the (I assume) impossible
relocation issue.

Best,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


4.2.6: livelock in recovery (free_reloc_roots)?

2015-11-20 Thread Lukas Pirl
Dear list,

I am (still) trying to recover a RAID1 that can only be mounted
recovery,degraded,ro.

I experienced an issue that might be interesting for you: I tried to
mount the file system rw,recovery and the kernel ended up burning one
core (and only one specific core, never scheduled to another one).

The watchdog printed a stack trace roughly every 20 seconds. There were
only a few stack traces that were printed alternating (see below).
After a few hours with the mount command still being blocked and without
visible IO activity, the system was power-cycled.

Summary:

Call Trace:
 [] ? free_reloc_roots+0x11/0x30 [btrfs]
 [] ? free_reloc_roots+0x1d/0x30 [btrfs]
 [] ? merge_reloc_roots+0x165/0x220 [btrfs]
 [] ? btrfs_recover_relocation+0x293/0x380 [btrfs]
 [] ? open_ctree+0x20d2/0x23b0 [btrfs]
 [] ? btrfs_mount+0x87b/0x990 [btrfs]
 [] ? pcpu_next_unpop+0x3f/0x50
 [] ? mount_fs+0x36/0x170
 [] ? vfs_kern_mount+0x68/0x110
 [] ? btrfs_mount+0x1bb/0x990 [btrfs]
 …

Call Trace:
   [] ? rcu_dump_cpu_stacks+0x80/0xb0
 [] ? rcu_check_callbacks+0x421/0x6e0
 [] ? sched_clock+0x5/0x10
 [] ? notifier_call_chain+0x45/0x70
 [] ? timekeeping_update+0xf1/0x150
 [] ? tick_sched_do_timer+0x40/0x40
 [] ? update_process_times+0x36/0x60
 [] ? tick_sched_do_timer+0x40/0x40
 [] ? tick_sched_handle.isra.15+0x24/0x60
 [] ? tick_sched_do_timer+0x40/0x40
 [] ? tick_sched_timer+0x3b/0x70
 [] ? __hrtimer_run_queues+0xdc/0x210
 [] ? read_tsc+0x5/0x10
 [] ? read_tsc+0x5/0x10
 [] ? hrtimer_interrupt+0x9a/0x190
 [] ? smp_apic_timer_interrupt+0x39/0x50
 [] ? apic_timer_interrupt+0x6b/0x70
   [] ? _raw_spin_lock+0x10/0x20
 [] ? __del_reloc_root+0x2f/0x100 [btrfs]
 [] ? __add_reloc_root+0xe0/0xe0 [btrfs]
 [] ? free_reloc_roots+0x1d/0x30 [btrfs]
 [] ? merge_reloc_roots+0x165/0x220 [btrfs]
 [] ? btrfs_recover_relocation+0x293/0x380 [btrfs]
 [] ? open_ctree+0x20d2/0x23b0 [btrfs]
 [] ? btrfs_mount+0x87b/0x990 [btrfs]
 [] ? pcpu_next_unpop+0x3f/0x50
 [] ? mount_fs+0x36/0x170
 [] ? vfs_kern_mount+0x68/0x110
 [] ? btrfs_mount+0x1bb/0x990 [btrfs]
 …

Call Trace:
 [] ? __del_reloc_root+0x2f/0x100 [btrfs]
 [] ? free_reloc_roots+0x1d/0x30 [btrfs]
 [] ? merge_reloc_roots+0x165/0x220 [btrfs]
 [] ? btrfs_recover_relocation+0x293/0x380 [btrfs]
 [] ? open_ctree+0x20d2/0x23b0 [btrfs]
 [] ? btrfs_mount+0x87b/0x990 [btrfs]
 [] ? pcpu_next_unpop+0x3f/0x50
 [] ? mount_fs+0x36/0x170
 [] ? vfs_kern_mount+0x68/0x110
 [] ? btrfs_mount+0x1bb/0x990 [btrfs]
 …

A longer excerpt can be found here: http://pastebin.com/NPM0Ckfy

I am using kernel 4.2.6 (Debian backports) and btrfs-tools 4.3.

btrfs check --readonly gave no errors.
(except the probably false positives mentioned here
http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg48325.html)

Reading the whole file system worked also.

If you need more information to trace this back, let me know and I'll
try to get it.
If you have suggestions regarding the recovery, please let me know as well.

Best regards,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.2.6: livelock in recovery (free_reloc_roots)?

2015-11-20 Thread Lukas Pirl
A follow-up question:

Can "btrfs_recover_relocation" prevented from being run? I would not
mind losing a few recent writes (what was a balance) but instead going
rw again, so I can restart a balance.

>From what I have read, btrfs-zero-log would not help in this case (?) so
I did not run it so far.

By the way, I can confirm the defect of 'btrfs device remove missing …"
mentioned here: http://www.spinics.net/lists/linux-btrfs/msg48383.html :

$ btrfs device delete missing /mnt/data
ERROR: missing is not a block device
$ btrfs device delete 5 /mnt/data
ERROR: 5 is not a block device

Thanks and best regards,

Lukas
On 11/20/2015 10:04 PM, Lukas Pirl wrote as excerpted:
> Dear list,
> 
> I am (still) trying to recover a RAID1 that can only be mounted
> recovery,degraded,ro.
> 
> I experienced an issue that might be interesting for you: I tried to
> mount the file system rw,recovery and the kernel ended up burning one
> core (and only one specific core, never scheduled to another one).
> 
> The watchdog printed a stack trace roughly every 20 seconds. There were
> only a few stack traces that were printed alternating (see below).
> After a few hours with the mount command still being blocked and without
> visible IO activity, the system was power-cycled.
> 
> Summary:
> 
> Call Trace:
>  [] ? free_reloc_roots+0x11/0x30 [btrfs]
>  [] ? free_reloc_roots+0x1d/0x30 [btrfs]
>  [] ? merge_reloc_roots+0x165/0x220 [btrfs]
>  [] ? btrfs_recover_relocation+0x293/0x380 [btrfs]
>  [] ? open_ctree+0x20d2/0x23b0 [btrfs]
>  [] ? btrfs_mount+0x87b/0x990 [btrfs]
>  [] ? pcpu_next_unpop+0x3f/0x50
>  [] ? mount_fs+0x36/0x170
>  [] ? vfs_kern_mount+0x68/0x110
>  [] ? btrfs_mount+0x1bb/0x990 [btrfs]
>  …
> 
> Call Trace:
>[] ? rcu_dump_cpu_stacks+0x80/0xb0
>  [] ? rcu_check_callbacks+0x421/0x6e0
>  [] ? sched_clock+0x5/0x10
>  [] ? notifier_call_chain+0x45/0x70
>  [] ? timekeeping_update+0xf1/0x150
>  [] ? tick_sched_do_timer+0x40/0x40
>  [] ? update_process_times+0x36/0x60
>  [] ? tick_sched_do_timer+0x40/0x40
>  [] ? tick_sched_handle.isra.15+0x24/0x60
>  [] ? tick_sched_do_timer+0x40/0x40
>  [] ? tick_sched_timer+0x3b/0x70
>  [] ? __hrtimer_run_queues+0xdc/0x210
>  [] ? read_tsc+0x5/0x10
>  [] ? read_tsc+0x5/0x10
>  [] ? hrtimer_interrupt+0x9a/0x190
>  [] ? smp_apic_timer_interrupt+0x39/0x50
>  [] ? apic_timer_interrupt+0x6b/0x70
>[] ? _raw_spin_lock+0x10/0x20
>  [] ? __del_reloc_root+0x2f/0x100 [btrfs]
>  [] ? __add_reloc_root+0xe0/0xe0 [btrfs]
>  [] ? free_reloc_roots+0x1d/0x30 [btrfs]
>  [] ? merge_reloc_roots+0x165/0x220 [btrfs]
>  [] ? btrfs_recover_relocation+0x293/0x380 [btrfs]
>  [] ? open_ctree+0x20d2/0x23b0 [btrfs]
>  [] ? btrfs_mount+0x87b/0x990 [btrfs]
>  [] ? pcpu_next_unpop+0x3f/0x50
>  [] ? mount_fs+0x36/0x170
>  [] ? vfs_kern_mount+0x68/0x110
>  [] ? btrfs_mount+0x1bb/0x990 [btrfs]
>  …
> 
> Call Trace:
>  [] ? __del_reloc_root+0x2f/0x100 [btrfs]
>  [] ? free_reloc_roots+0x1d/0x30 [btrfs]
>  [] ? merge_reloc_roots+0x165/0x220 [btrfs]
>  [] ? btrfs_recover_relocation+0x293/0x380 [btrfs]
>  [] ? open_ctree+0x20d2/0x23b0 [btrfs]
>  [] ? btrfs_mount+0x87b/0x990 [btrfs]
>  [] ? pcpu_next_unpop+0x3f/0x50
>  [] ? mount_fs+0x36/0x170
>  [] ? vfs_kern_mount+0x68/0x110
>  [] ? btrfs_mount+0x1bb/0x990 [btrfs]
>  …
> 
> A longer excerpt can be found here: http://pastebin.com/NPM0Ckfy
> 
> I am using kernel 4.2.6 (Debian backports) and btrfs-tools 4.3.
> 
> btrfs check --readonly gave no errors.
> (except the probably false positives mentioned here
> http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg48325.html)
> 
> Reading the whole file system worked also.
> 
> If you need more information to trace this back, let me know and I'll
> try to get it.
> If you have suggestions regarding the recovery, please let me know as well.
> 
> Best regards,
> 
> Lukas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk

2015-11-20 Thread Lukas Pirl
On 11/21/2015 01:47 PM, Qu Wenruo wrote as excerpted:
> Hard to say, but we'd better keep an eye on this issue.
> At least, if it happens again, we should know if it's related to
> something like newer kernel or snapshots.

I can confirm the initially describe behavior of "btrfs check" and
reading the data works fine also.

Versions etc.:

$ uname -a
Linux 4.2.0-0.bpo.1-amd64 #1 SMP Debian 4.2.6-1~bpo8+1 …
$ btrfs filesystem show /mnt/data
Label: none  uuid: 5be372f5-5492-4f4b-b641-c14f4ad8ae23
Total devices 6 FS bytes used 2.87TiB
devid1 size 931.51GiB used 636.00GiB path /dev/mapper/…SZ
devid2 size 931.51GiB used 634.03GiB path /dev/mapper/…03
devid3 size 1.82TiB used 1.53TiB path /dev/mapper/…76
devid4 size 1.82TiB used 1.53TiB path /dev/mapper/…78
devid6 size 1.82TiB used 1.05TiB path /dev/mapper/…UK
*** Some devices missing

btrfs-progs v4.3

$ btrfs subvolume list /mnt/data | wc -l
62

Best,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: anything wrong with `balance -dusage -musage` together?

2015-11-19 Thread Lukas Pirl
On 11/20/2015 12:59 PM, Hugo Mills wrote as excerpted:
>Nothing actively wrong with that, no. It certainly won't break
> anything. It's just rarely actually useful. The usual situation is
> that you run out of one kind of storage before the other (data vs
> metadata, that is), and you need to free up some allocation of one of
> them so it can go to the other. This is typically too much data
> allocation, and metadata has run out (so -d is more often used than
> -m).
>
>For the "usual" case of running out of metadata allocation, you
> don't actually need much space to reclaim, so -dlimit=X for small X is
> an easier approach to use.

Thanks Hugo for your quick reply.

Alright, the look at https://github.com/kdave/btrfsmaintenance
just made me think over regular balances using -*usage before one or the
other space runs out (as suggested there).

Best,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


anything wrong with `balance -dusage -musage` together?

2015-11-19 Thread Lukas Pirl
Hi list,

I rarely see balance used with -dusage -musage together, esp. with
values other than zero.

The question is, is there anything wrong with running (say) `balance
-dusage=50 -musage=30` regularly?

Thanks and best regards,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupted RAID1: unsuccessful recovery / help needed

2015-10-29 Thread Lukas Pirl

TL;DR: thanks but recovery still preferred over recreation.

Hello Duncan and thanks for your reply!

On 10/26/2015 09:31 PM, Duncan wrote:

FWIW... Older btrfs userspace such as your v3.17 is "OK" for normal
runtime use, assuming you don't need any newer features, as in normal
runtime, it's the kernel code doing the real work and userspace for the
most part simply makes the appropriate kernel calls to do that work.

>

But, once you get into a recovery situation like the one you're in now,
current userspace becomes much more important, as the various things
you'll do to attempt recovery rely far more on userspace code directly
accessing the filesystem, and it's only the newest userspace code that
has the latest fixes.

So for a recovery situation, the newest userspace release (4.2.2 at
present) as well as a recent kernel is recommended, and depending on the
problem, you may at times need to run integration or apply patches on top
of that.


I am willing to update before trying further repairs. Is e.g. "balance" 
also influenced by the userspace tools or does the kernel the actual work?



General note about btrfs and btrfs raid.  Given that btrfs itself remains
a "stabilizing, but not yet fully mature and stable filesystem", while
btrfs raid will often let you recover from a bad device, sometimes that
recovery is in the form of letting you mount ro, so you can access the
data and copy it elsewhere, before blowing away the filesystem and
starting over.


If there is one subvolume that contains all other (read only) snapshots 
and there is insufficient storage to copy them all separately:

Is there an elegant way to preserve those when moving the data across disks?


Back to the problem at hand.  Current btrfs has a known limitation when
operating in degraded mode.  That being, a btrfs raid may be write-
mountable only once, degraded, after which it can only be read-only
mounted.  This is because under certain circumstances in degraded mode,
btrfs will fall back from its normal raid mode to single mode chunk
allocation for new writes, and once there's single-mode chunks on the
filesystem, btrfs mount isn't currently smart enough to check that all
chunks are actually available on present devices, and simply jumps to the
conclusion that there's single mode chunks on the missing device(s) as
well, so refuses to mount writable after that in ordered to prevent
further damage to the filesystem and preserve the ability to mount at
least ro, to copy off what isn't damaged.

There's a patch in the pipeline for this problem, that checks individual
chunks instead of leaping to conclusions based on the presence of single-
mode chunks on a degraded filesystem with missing devices.  If that's
your only problem (which the backtraces might reveal but I as a non-dev
btrfs user can't tell), the patches should let you mount writable.


Interesting, thanks for the insights.


But that patch isn't in kernel 4.2.  You'll need at least kernel 4.3-rc,
and possibly btrfs integration, or to cherrypick the patches onto 4.2.


Well, before digging into that, a hint that this is actually the case 
would be appreciated. :)



Meanwhile, in keeping with the admin's rule on backups, by definition, if
you valued the data more than the time and resources necessary for a
backup, by definition, you have a backup available, otherwise, by
definition, you valued the data less than the time and resources
necessary to back it up.

Therefore, no worries.  Regardless of the fate of the data, you saved
what your actions declared of most valuable to you, either the data, or
the hassle and resources cost of the backup you didn't do.  As such, if
you don't have a backup (or if you do but it's outdated), the data at
risk of loss is by definition of very limited value.

That said, it appears you don't even have to worry about loss of that
very limited value data, since mounting degraded,recovery,ro gives you
stable access to it, and you can use the opportunity provided to copy it
elsewhere, at least to the extent that the data we already know is of
limited value is even worth the hassle of doing that.

Which is exactly what I'd do.  Actually, I've had to resort to btrfs
restore[1] a couple times when the filesystem wouldn't mount at all, so
the fact that you can mount it degraded,recovery,ro, already puts you
ahead of the game. =:^)

So yeah, first thing, since you have the opportunity, unless your backups
are sufficiently current that it's not worth the trouble, copy off the
data while you can.

Then, unless you wish to keep the filesystem around in case the devs want
to use it to improve btrfs' recovery system, I'd just blow it away and
start over, restoring the data from backup once you have a fresh
filesystem to restore to.  That's the simplest and fastest way to a fully
working system once again, and what I did here after using btrfs restore
to recover the delta between current and my backups.


Thanks for all the elaborations. I guess there are 

corrupted RAID1: unsuccessful recovery / help needed

2015-10-26 Thread Lukas Pirl
TL;DR: RAID1 does not recover, I guess the interesting part in the stack 
trace is:


  Call Trace:
  [] __del_reloc_root+0x30/0x100 [btrfs]
  [] free_reloc_roots+0x25/0x40 [btrfs]
  [] merge_reloc_roots+0x18e/0x240 [btrfs]
  [] btrfs_recover_relocation+0x374/0x420 [btrfs]
  [] open_ctree+0x1b7d/0x23e0 [btrfs]
  [] btrfs_mount+0x94e/0xa70 [btrfs]
  [] ? find_next_bit+0x15/0x20
  [] mount_fs+0x38/0x160
  …

Hello list.

I'd appreciate some help for repairing a corrupted RAID1.

Setup:
* Linux 4.2.0-12, Btrfs v3.17, `btrfs fi show`:
  uuid: 5be372f5-5492-4f4b-b641-c14f4ad8ae23
  Total devices 6 FS bytes used 2.87TiB
  devid 1 size 931.51GiB used 636.00GiB path /dev/mapper/WD-WCC4J7AFLTSZ
  devid 2 size 931.51GiB used 634.03GiB path /dev/mapper/WD-WCAU45343103
  devid 3 size   1.82TiB used   1.53TiB path /dev/mapper/WD-WCAVY6423276
  devid 4 size   1.82TiB used   1.53TiB path /dev/mapper/WD-WCAZAF872578
  devid 6 size   1.82TiB used   1.05TiB path /dev/mapper/WD-WMC4M0H3Z5UK
  *** Some devices missing
* disks are dm-crypted

What happened:
* devid 5 started to die (slowly)
* added a new disk (devid 6) and tried `btrfs device delete`
* failed with kernel crashes (guess:) due to heavy IO errors
* removed devid 5 from /dev (deactivated in dm-crypt)
* tried `btrfs balance`
  * interrupted multiple times due to kernel crashes
(probably due to semi-corrupted file system?)
* file system did not mount anymore after a required hard-reset
* no successful recovery so far:
  if not read-only, kernel IO blocks eventually (hard-reset required)
* tried:
  * `-o degraded`
-> IO freeze, kernel log: http://pastebin.com/Rzrp7XeL
  * `-o degraded,recovery`
-> IO freeze, kernel log: http://pastebin.com/VemHfnuS
  * `-o degraded,recovery,ro`
-> file system accessible, system stable
* going rw again does not fix the problem

I did not btrfs-zero-log so far because my oops did not look very
similar to the one in the Wiki and I did not want to risk to make
recovery harder.

Thanks,

Lukas

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html