Re: Re: unable to mount btrfs partition, please help :(

2016-03-20 Thread Chris Murphy
On Sun, Mar 20, 2016 at 1:31 PM, Patrick Tschackert  wrote:
> My raid is done with the scrub now, this is what i get:
>
> $ cat /sys/block/md0/md/mismatch_cnt
> 311936608

I think this is an assembly problem. Read errors don't result in
mismatch counts. An md mismatch count happens when there's a mismatch
between data strip and parity strip(s). So this is a lot of
mismatches.

I think you need to take this problem to the linux-raid@ list, I don't
think anyone on this list is going to be able to help with this
portion of the problem. I'm only semi-literate with this, and you need
to find out why there are so many mismatches and confirm whether the
array is being assembled correctly.

In your writeup for the list you can include the URL for the first
post to this list. I wouldn't repeat any of the VM crashing stuff
because it's not really relevant. You'll need to include the kernel
you were using at the time of the problem, the kernel you're using for
the scrub, the version of mdadm, and all the device metadata (-E for
each device) and the array (-D), and smartctl -A for each device (you
could put smartctl -x for each drive into a file and the put the file
up somewhere like dropbox or google drive, or individually pastebin
them if you can keep it all separate, -x is really verbose but
sometimes contains read error information) to show bad sectors.


The summary line is basically: this was working, after a VM crash
followed by shutdown -r now, the Btrfs filesystem won't mount. A drive
was faulty and rebuilt with a spare. You just did a check scrub and
have all these errors in mismatch_cnt. The question is: how to confirm
the array is properly assembled? Because that's too many errors, and
the file system on that array will not mount. Further complicating
matters is even after rebuild you have another drive that has some
read errors. Those weren't being fixed this whole time (during rebuild
for example) likely because of the timeout vs SCT ERC
misconfiguration, other wise they would have been fixed.


>
> I also attached my dmesg output to this mail. Here's an excerpt:
> [12235.372901] sd 7:0:0:0: [sdh] tag#15 FAILED Result: hostbyte=DID_OK 
> driverbyte=DRIVER_SENSE
> [12235.372906] sd 7:0:0:0: [sdh] tag#15 Sense Key : Medium Error [current] 
> [descriptor]
> [12235.372909] sd 7:0:0:0: [sdh] tag#15 Add. Sense: Unrecovered read error - 
> auto reallocate failed
> [12235.372913] sd 7:0:0:0: [sdh] tag#15 CDB: Read(16) 88 00 00 00 00 00 af b2 
> bb 48 00 00 05 40 00 00
> [12235.372916] blk_update_request: I/O error, dev sdh, sector 2947727304
> [12235.372941] ata8: EH complete
> [12266.856747] ata8.00: exception Emask 0x0 SAct 0x7fff SErr 0x0 action 
> 0x0
> [12266.856753] ata8.00: irq_stat 0x4008
> [12266.856756] ata8.00: failed command: READ FPDMA QUEUED
> [12266.856762] ata8.00: cmd 60/40:d8:08:17:b5/05:00:af:00:00/40 tag 27 ncq 
> 688128 in
>  res 41/40:00:18:1b:b5/00:00:af:00:00/40 Emask 0x409 (media error) 
> [12266.856765] ata8.00: status: { DRDY ERR }
> [12266.856767] ata8.00: error: { UNC }
> [12266.858112] ata8.00: configured for UDMA/133

What do you get for
smartctl -x /dev/sdh


I see this too:
[11440.088441] ata8.00: status: { DRDY }
[11440.088443] ata8.00: failed command: READ FPDMA QUEUED
[11440.088447] ata8.00: cmd 60/40:c8:e8:bc:15/05:00:ab:00:00/40 tag 25
ncq 688128 in
 res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)

That's weird. You have several other identical model drives, so I
doubt this is some sort of NCQ incompatibility with this model drive,
no other drive is complaining like this. So I wonder if there's just
something wrong with this drive aside from the bad sectors (?) I can't
really tell but it's suspicious.



> If I understand correctly, my /dev/sdh drive is having trouble.
> Could this be the problem? Should I set the drive to failed and rebuild on a 
> spare disk?

You need to really slow down and understand the problem first. Every
data loss case I've ever come across with md/mdadm raid6 was user
induced because they changed too much stuff too fast without
consulting people who know better. They got impatient. So I suggest
going to the linux-raid@ list and asking there what's going on. The
less you change the better because most of the changes md/mdadm does
are irreversible.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unable to mount btrfs partition, please help :(

2016-03-20 Thread Chris Murphy
On Sun, Mar 20, 2016 at 6:19 AM, Martin Steigerwald  wrote:
> On Sonntag, 20. März 2016 10:18:26 CET Patrick Tschackert wrote:
>> > I think in retrospect the safe way to do these kinds of Virtual Box
>> > updates, which require kernel module updates, would have been to
>> > shutdown the VM and stop the array. *shrug*
>>
>>
>> After this, I think I'll just do away with the virtual machine on this host,
>> as the app contained in that vm can also run on the host. I tried to be
>> fancy, and it seems to needlessly complicate things.
>
> I am not completely sure and I have no exact reference anymore, but I think I
> read more than once about fs benchmarks running faster in Virtualbox than on
> the physical system, which may point at an at least incomplete fsync()
> implementation for writing into Virtualbox image files.
>
> I never found any proof of this nor did I specificially seeked to research it.
> So it may be true or not.

Sure but that would only affect the guest's file system, the one
inside the VDI. It's the host managed filesystem that's busted.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unable to mount btrfs partition, please help :(

2016-03-20 Thread Chris Murphy
On Sun, Mar 20, 2016 at 3:18 AM, Patrick Tschackert  wrote:
> Thanks for answering again!
> So, first of all I installed a newer kernel from the backports as per 
> Nicholas D Steeves suggestion:
>
> $ apt-get install -t jessie-backports linux-image-4.3.0-0.bpo.1-amd64
>
> After rebooting:
> $ uname -a
> Linux vmhost 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.5-1~bpo8+1 (2016-02-23) 
> x86_64 GNU/Linux
>
> But the problem with mounting the filesystem persists :(
>
>> OK I went back and read this again: host is managing the md raid5, the
>> guest is writing Btrfs to an "encrypted container" but what is that? A
>> LUKS encrypted LVM LV that's directly used by Virtual Box as a raw
>> device? It's hard to say what layer broke this. But the VM crashing is
>> in effect like a power failure, and it's an open question (for me) how
>> this setup deals with barriers. A shutdown -r now should still cleanly
>> stop the array so I wouldn't expect there to be an array problem but
>> then you also report a device failure. Bad luck.
>
> The host is managing an md raid 6 (/dev/md0), and I had an encrypted volume 
> (via cryptsetup) on top of that device.
> The host mounted the btrfs filesystem contained in that volume, and the VM 
> wrote to the filesystem as well using a virtualbox shared folder.

OK well to me the VM doesn't seem related off hand. Ultimately its
only the host writing to the filesystem, even for the shared folder.
The guest VM has no direct access to do Btrfs writes, it's something
like a network-like shared folder.


> After this, I think I'll just do away with the virtual machine on this host, 
> as the app contained in that vm can also run on the host.
> I tried to be fancy, and it seems to needlessly complicate things.

virt-manager or gnome-boxes work better, although you lose shared
folder, you'll have to come up with a work around, like using NFS.


> $ for i in /sys/class/scsi_generic/*/device/timeout; do echo 120 > "$i"; done
> (I know this isn't persistent across reboots...)

Correct.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unable to mount btrfs partition, please help :(

2016-03-20 Thread Martin Steigerwald
On Sonntag, 20. März 2016 10:18:26 CET Patrick Tschackert wrote:
> > I think in retrospect the safe way to do these kinds of Virtual Box
> > updates, which require kernel module updates, would have been to
> > shutdown the VM and stop the array. *shrug*
> 
>  
> After this, I think I'll just do away with the virtual machine on this host,
> as the app contained in that vm can also run on the host. I tried to be
> fancy, and it seems to needlessly complicate things.

I am not completely sure and I have no exact reference anymore, but I think I 
read more than once about fs benchmarks running faster in Virtualbox than on 
the physical system, which may point at an at least incomplete fsync() 
implementation for writing into Virtualbox image files.

I never found any proof of this nor did I specificially seeked to research it. 
So it may be true or not.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unable to mount btrfs partition, please help :(

2016-03-20 Thread Patrick Tschackert
Thanks for answering, I already upgraded to a backports kernel as mentioned 
here:
https://mail-archive.com/linux-btrfs@vger.kernel.org/msg51748.html

I now have

$ uname -a
Linux vmhost 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.5-1~bpo8+1 (2016-02-23) 
x86_64 GNU/Linux

As I wrote here 
https://mail-archive.com/linux-btrfs@vger.kernel.org/msg51748.html
the problem still persists :(
 
Cheers,
Patrick

Gesendet: Sonntag, 20. März 2016 um 13:11 Uhr
Von: "Martin Steigerwald" <mar...@lichtvoll.de>
An: "Chris Murphy" <li...@colorremedies.com>
Cc: "Patrick Tschackert" <killing-t...@gmx.de>, "Btrfs BTRFS" 
<linux-btrfs@vger.kernel.org>
Betreff: Re: unable to mount btrfs partition, please help :(
On Samstag, 19. März 2016 19:34:55 CET Chris Murphy wrote:
> >>> $ uname -a
> >>> Linux vmhost 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u4
> >>> (2016-02-29) x86_64 GNU/Linux
> >>
> >>This is old. You should upgrade to something newer, ideally 4.5 but
> >>4.4.6 is good also, and then oldest I'd suggest is 4.1.20.
> >>
> > Shouldn't I be able to get the newest kernel by executing "apt-get update
> > && apt-get dist-upgrade"? That's what I ran just now, and it doesn't
> > install a newer kernel. Do I really have to manually upgrade to a newer
> > one?
> I'm not sure. You might do a list search for debian, as I know debian
> users are using newer kernels that they didn't build themselves.

Try a backport¹ kernel. Add backports and do

apt-cache search linux-image

I use 4.3 backport kernel successfully on two server VMs which use BTRFS.

[1] http://backports.debian.org/

Thx,
--
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unable to mount btrfs partition, please help :(

2016-03-20 Thread Martin Steigerwald
On Samstag, 19. März 2016 19:34:55 CET Chris Murphy wrote:
> >>> $ uname -a
> >>> Linux vmhost 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u4
> >>> (2016-02-29) x86_64 GNU/Linux
> >>
> >>This is old. You should upgrade to something newer, ideally 4.5 but
> >>4.4.6 is good also, and then oldest I'd suggest is 4.1.20.
> >>
> > Shouldn't I be able to get the newest kernel by executing "apt-get update
> > && apt-get dist-upgrade"? That's what I ran just now, and it doesn't
> > install a newer kernel. Do I really have to manually upgrade to a newer
> > one?
> I'm not sure. You might do a list search for debian, as I know debian
> users are using newer kernels that they didn't build themselves.

Try a backport¹ kernel. Add backports and do 

apt-cache search linux-image 

I use 4.3 backport kernel successfully on two server VMs which use BTRFS.

[1] http://backports.debian.org/

Thx,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unable to mount btrfs partition, please help :(

2016-03-20 Thread Patrick Tschackert
Thanks for answering again!
So, first of all I installed a newer kernel from the backports as per Nicholas 
D Steeves suggestion:

$ apt-get install -t jessie-backports linux-image-4.3.0-0.bpo.1-amd64

After rebooting:
$ uname -a
Linux vmhost 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.5-1~bpo8+1 (2016-02-23) 
x86_64 GNU/Linux

But the problem with mounting the filesystem persists :(

> OK I went back and read this again: host is managing the md raid5, the
> guest is writing Btrfs to an "encrypted container" but what is that? A
> LUKS encrypted LVM LV that's directly used by Virtual Box as a raw
> device? It's hard to say what layer broke this. But the VM crashing is
> in effect like a power failure, and it's an open question (for me) how
> this setup deals with barriers. A shutdown -r now should still cleanly
> stop the array so I wouldn't expect there to be an array problem but
> then you also report a device failure. Bad luck.
 
The host is managing an md raid 6 (/dev/md0), and I had an encrypted volume 
(via cryptsetup) on top of that device.
The host mounted the btrfs filesystem contained in that volume, and the VM 
wrote to the filesystem as well using a virtualbox shared folder.
The vm then crashed, but I shut down the host with "shutdown -r now".
After the reboot, one disk of the array was no longer present, but I managed to 
rebuild/restore using a spare disk. The RAID now seems to be healthy.

> I think in retrospect the safe way to do these kinds of Virtual Box
> updates, which require kernel module updates, would have been to
> shutdown the VM and stop the array. *shrug*
 
After this, I think I'll just do away with the virtual machine on this host, as 
the app contained in that vm can also run on the host.
I tried to be fancy, and it seems to needlessly complicate things.
 
> These drives are technically not suitable for use in any kind of raid
> except linear and raid 0 (which have no redundancy so they aren't
> really raid). You'd have to dig up drive specs, assuming they're
> published, to see what the recovery times are for the drive models
> when a bad sector is encountered. But it's typical for such drives to
> exceed 30 seconds for recovery, with some drives reported to have 2+
> minute recoveries. To properly configure them, you'll have to increase
> the kernel's SCSI comment timer to at least 120 to make sure there's
> sufficient time to wait for the drive to explicitly spit back a read
> error to the kernel. Otherwise, the kernel gives up after 30 seconds,
> and resets the link to the drive, and any possibility of fixing up the
> bad sector via the raid read error fixup mechanism is thwarted. It's
> really common, the linux-raid@ list has many of these kinds of threads
> with this misconfiguration as the source problem.
 
> For the first listing of drives yes. And 120 second delays might be
> too long for your use case, but that's the reality.

> You should change the command timer for the drives that do not support
> configurable SCT ERC. And then do a scrub check. And then check both
> cat /sys/block/mdX/md/mismatch_cnt, which ideally should be 0, and
> also check kernel messages for libata read errors.

So I did this:
 
$ cat /sys/block/md0/md/mismatch_cnt
0

$ for i in /sys/class/scsi_generic/*/device/timeout; do echo 120 > "$i"; done
(I know this isn't persistent across reboots...)

$ echo check > /sys/block/md0/md/sync_action

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid6 sda[0] sdf[12](S) sdg[11](S) sdj[9] sdh[7] sdi[6] sdk[10] 
sde[4] sdd[3] sdc[2] sdb[1]
  20510948416 blocks super 1.2 level 6, 64k chunk, algorithm 2 [9/9] 
[U]
  [>]  check =  1.0% (30812476/2930135488) 
finish=340.6min speed=141864K/sec
  
unused devices: 

So the raid is currently doing a scrub, which will take a few hours.

> Hmm not good. See this similar thread.
> http://www.spinics.net/lists/linux-btrfs/msg51711.html

> backups in all superblocks have the same chunk_root, no alternative
> chunk root to try.

> So at the moment I think it's worth trying a newer kernel version and
> mounting normally; then mounting with -o recovery; then - recovery,ro.

> If that doesn't work, you're best off waiting for a developer to give
> advice on the next step;  'btrfs rescue chunk-recover' seems most
> appropriate but again someone else a while back had success with
> zero-log, but it's hard to say if the two cases are really similar and
> maybe that person just got lucky. Both of those change the file system
> in irreversible ways, that's why I suggest waiting or asking on IRC.

Thanks again for taking the time to answer. I'll wait while my RAID is doing 
the scrub, maybe a dev will answer (like you said).
The friendly people on IRC couldn't help and sent me here.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: unable to mount btrfs partition, please help :(

2016-03-19 Thread Duncan
Patrick Tschackert posted on Sat, 19 Mar 2016 23:15:33 +0100 as excerpted:

> I'm growing increasingly desperate, can anyone help me?

No need to be desperate.  As the sysadmin's rule of backups states, 
simple form, you either have at least one level of backup, or you are by 
your (in)action defining the data not backed up as worth less than the 
time, hassle and resources necessary to do that backup.

Therefore, there are only two possibilities:

1) You have a backup.  No sweat.  You can use it if you need to, so no 
desperation needed.

2) You don't have a backup.  No sweat.  By not having a backup, your 
actions defined the data at risk as worth less than the time, hassle and 
resources necessary for that backup, so if you lose the data, you can 
still be happy, because you saved what you defined as of most importance, 
the time, resources and hassle of doing that backup.

Since you saved what you yourself defined by your own actions as of most 
value to you, either way, you have what was most valuable to you and can 
thus be happy to have the valuable stuff, even if you lost what was 
therefore much more trivial.

There are no other possibilities.  Your words might lie.  Your actions 
don't.  Either way, you saved the valuable stuff and thus have no reason 
to be desperate.


And of course, btrfs, while stabilizing, is not yet fully stable and 
mature, and while stable enough to be potentially suitable for those who 
have tested backups or are only using it with trivial data they can 
afford to lose anyway, if they don't have backups, it's certainly not to 
the level of stability of the more mature filesystems the above sysadmin's 
rule of backups was designed for.  So that rule applies even MORE 
strongly to btrfs than it does to more mature and stable filesystems.  
(FWIW, there's a more complex version of the rule that takes relative 
risk into account and covers multiple levels of backup where either the 
risk is high enough or the data valuable enough to warrant it, but the 
simple form just says if you don't have at least one backup, you are by 
that lack of backup defining the data at risk as not worth the time and 
trouble to do it.)

And there's no way that not knowing the btrfs status changes that either, 
because if you didn't know the status, it can only be because you didn't 
care enough about the reliability of the filesystem you were entrusting 
your data to, to care about researching it.  After all, both the btrfs 
wiki and the kernel btrfs option stress the need for backups if you're 
choosing btrfs, as does this list, repeatedly.  So the only way someone 
couldn't know is if they didn't care enough to /bother/ to know, which 
again defines the data stored on the filesystem as of only trivial value, 
worth so little it's not worth researching a new filesystem you plan on 
storing it on.

So there's no reason to be desperate.  It'll only stress you out and 
increase your blood pressure.  Either you considered the data valuable 
enough to have a backup, or you didn't.  There is no third option.  And 
either way, it's not worth stressing out over, because you either have 
that backup and thus don't need to stress, or you yourself defined the 
data as trivial by not having it.

> $ uname -a Linux vmhost 3.16.0-4-amd64 #1 SMP Debian
> 3.16.7-ckt20-1+deb8u4 (2016-02-29) x86_64 GNU/Linux
> 
> $ btrfs --version btrfs-progs v4.4

As CMurphy says, that's an old kernel, not really supported by the 
list.   With btrfs still stabilizing, the code is still changing pretty 
fast, and old kernels are known buggy kernels.  The list focuses on the 
mainline kernel and its two primary tracks, LTS kernel series and current 
kernel series.  On the current kernel track, the last two kernels are 
best supported.  With 4.5 just out, that's 4.5 and 4.4.

On the LTS track, the two latest LTS kernel series are recommended, with 
4.4 being the latest LTS kernel, and 4.1 being the one previous to that.  
However, 3.18 was the one previous to that and has been reasonably 
stable, so while the two latest LTS series remain recommended, we're 
still trying to support 3.18 too, for those who need that far back.

But 3.16 is previous to that and is really too far back to be practically 
supported well by the list, as btrfs really is still stabilizing and our 
focus is forward, not backward.  That doesn't mean we won't try to 
support it, it simply means that when there's a problem, the first 
recommendation, as you've seen, is likely to be try a newer kernel.

Of course various distros do offer support for btrfs on older kernels and 
we recognize that.  However, our focus is on mainline, and we don't track 
what patches the various distros have backported and what patches they 
haven't, so we're not in a particularly good position to provide support 
for them, at least back further than the mainline kernels we support.  If 
you wish to use btrfs on such old kernels, then, our recommendation is to 
get that support 

Re: unable to mount btrfs partition, please help :(

2016-03-19 Thread Nicholas D Steeves
On 19 March 2016 at 21:34, Chris Murphy  wrote:
> On Sat, Mar 19, 2016 at 5:35 PM, Patrick Tschackert  
> wrote:
 $ uname -a
 Linux vmhost 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u4
 (2016-02-29) x86_64 GNU/Linux
>>>This is old. You should upgrade to something newer, ideally 4.5 but
>>>4.4.6 is good also, and then oldest I'd suggest is 4.1.20.
>>
>> Shouldn't I be able to get the newest kernel by executing "apt-get update && 
>> apt-get dist-upgrade"?
>> That's what I ran just now, and it doesn't install a newer kernel. Do I 
>> really have to manually upgrade to a newer one?
>
> I'm not sure. You might do a list search for debian, as I know debian
> users are using newer kernels that they didn't build themselves.
>
>
>> On top of the sticky situation i'm already in, i'm not sure if I trust 
>> myself manually building a new kernel. Should I?

If you enable Debian backports, which I assume you have since you're
running the version of btrfs-progs that was backported without a
warning not to use it with old kernels...well, if backports are
enabled then you can try:

apt-get install -t jessie-backports linux-image-4.3.0-0.bpo.1-amd64

linux-4.3.x was a complete mess for both my laptop (Thinkpad X220,
quite well supported), and I'm not sure if it was driver-related or
btrfs-related.  I actually started tracking linux-4.4 at rc1, it was
so bad.

If you don't want to try building your own kernel, I'd file a bug
report against linux-image-amd64 asking for a backport of linux-4.4,
which is in Stretch/testing; I'm surprised it hasn't been backported
yet...  The only issue I remember is an error message when booting, I
think because the microcode interface changed between 4.3.x and 4.4.x.
Installing microcode-related packages from backports is how think I
worked around this.

Alternatively, if you want to build your own kernel you might be able
to install linux-image from backports, download and untar linux-4.1.x
somewhere, and then copy the config from /boot/config-4.3* to
somedir/linux-4.1.x/.config.

I uploaded two scripts to github that I've been using for ages to
track the upstream LTS kernel branch that Debian didn't choose.  You
can find them here:

https://github.com/sten0/lts-convenience

All those syncs and btrfs sub sync lines are there because I always
seem to run strange issues with adding and removing snapshots.

Cheers,
Nicholas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unable to mount btrfs partition, please help :(

2016-03-19 Thread Chris Murphy
On Sat, Mar 19, 2016 at 5:35 PM, Patrick Tschackert  wrote:
> Hi Chris,
>
> thank you for answering so quickly!
>
>> Try 'btrfs check' without any options first.
> $ btrfs check /dev/mapper/storage
> checksum verify failed on 36340960788480 found 8F8E1006 wanted 4AA1BC89
> checksum verify failed on 36340960788480 found 8F8E1006 wanted 4AA1BC89
> bytenr mismatch, want=36340960788480, have=4530277753793296986
> Couldn't read chunk tree
> Couldn't open file system
>
>> To me it seems the problem is instigated by lower layers either not
>> completing critical writes at the time of the power failure, or didn't
>> rebuild correctly.
>
> There wasn't a power failure, a VM crashed whilst writing to the btrfs 
> filesys.

OK I went back and read this again: host is managing the md raid5, the
guest is writing Btrfs to an "encrypted container" but what is that? A
LUKS encrypted LVM LV that's directly used by Virtual Box as a raw
device? It's hard to say what layer broke this. But the VM crashing is
in effect like a power failure, and it's an open question (for me) how
this setup deals with barriers. A shutdown -r now should still cleanly
stop the array so I wouldn't expect there to be an array problem but
then you also report a device failure. Bad luck.

I think in retrospect the safe way to do these kinds of Virtual Box
updates, which require kernel module updates, would have been to
shutdown the VM and stop the array. *shrug*


>
>> You should check the SCT ERC setting on each drive with 'smartctl -l
>> scterc /dev/sdX' and also the kernel command timer setting with 'cat
>> /sys/block/sdX/device/timeout' also for each device. The SCT ERC value
>> must be less than the command timer. It's a common misconfiguration
>> with raid setups.
>
> $ smartctl -l scterc /dev/sda (sdb, sdc, sde, sdg)
> gives me
>
> smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)
> Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
>
> SCT Error Recovery Control command not supported

These drives are technically not suitable for use in any kind of raid
except linear and raid 0 (which have no redundancy so they aren't
really raid). You'd have to dig up drive specs, assuming they're
published, to see what the recovery times are for the drive models
when a bad sector is encountered. But it's typical for such drives to
exceed 30 seconds for recovery, with some drives reported to have 2+
minute recoveries. To properly configure them, you'll have to increase
the kernel's SCSI comment timer to at least 120 to make sure there's
sufficient time to wait for the drive to explicitly spit back a read
error to the kernel. Otherwise, the kernel gives up after 30 seconds,
and resets the link to the drive, and any possibility of fixing up the
bad sector via the raid read error fixup mechanism is thwarted. It's
really common, the linux-raid@ list has many of these kinds of threads
with this misconfiguration as the source problem.




>
> while
> $ smartctl -l scterc /dev/sdf (sdh, sdi, sdj, sdk)
> gives me
>
> smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)
> Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
>
> SCT Error Recovery Control:
>Read: 70 (7.0 seconds)
>   Write: 70 (7.0 seconds)

These drives are suitable for raid out of the box.


>
> $ cat /sys/block/sdX/device/timeout
> gives me "30" for every device
>
> Does that mean my settings for the device timeouts are wrong?

For the first listing of drives yes. And 120 second delays might be
too long for your use case, but that's the reality.

You should change the command timer for the drives that do not support
configurable SCT ERC. And then do a scrub check. And then check both
cat /sys/block/mdX/md/mismatch_cnt, which ideally should be 0, and
also check kernel messages for libata read errors.


>
>> After that's fixed you should do a scrub, and I'm thinking it's best
>> to do only a check, which means 'echo check >
>> /sys/block/mdX/md/sync_action' rather than issuing repair which
>> assumes data strips are correct and parity strips are wrong and
>> rebuilds all parity strips.
>
> I don't quite understand, I thought a scrub could only be done on a mounted 
> filesys?

You have two scrubs. There's a Btrfs scrub. And an md scrub. I'm
referring to the latter.


> Do you reall mean executing the command "echo check > 
> /sys/block/md0/md/sync_action"? At the moment it says "idle" in that file.
> Also, the btrfs filesys sits in an encrypted container, so the setup looks 
> like this:
>
> /dev/md0 (this is the Raid device)
> /dev/mapper/storage (after cryptsetup luksOpen, this is where filesys should 
> be mounted from)
> /media/storage (i always mounted the filesystem into this folder by executing 
> "mount /dev/mapper/storage /media/storage")
>
> Apologies if I didn't make that clear enough in my initial email

Ok so the host is writing Btrfs to 

Re: unable to mount btrfs partition, please help :(

2016-03-19 Thread Patrick Tschackert
Hi Chris,

thank you for answering so quickly!

> Try 'btrfs check' without any options first.
$ btrfs check /dev/mapper/storage
checksum verify failed on 36340960788480 found 8F8E1006 wanted 4AA1BC89
checksum verify failed on 36340960788480 found 8F8E1006 wanted 4AA1BC89
bytenr mismatch, want=36340960788480, have=4530277753793296986
Couldn't read chunk tree
Couldn't open file system

> To me it seems the problem is instigated by lower layers either not
> completing critical writes at the time of the power failure, or didn't
> rebuild correctly.

There wasn't a power failure, a VM crashed whilst writing to the btrfs filesys. 
I then rebooted the whole system via "shutdown -r now", after which the 
filesystem wasn't mountable.
The rebuild/restore of the raid seemed to go just fine though.

> You should check the SCT ERC setting on each drive with 'smartctl -l
> scterc /dev/sdX' and also the kernel command timer setting with 'cat
> /sys/block/sdX/device/timeout' also for each device. The SCT ERC value
> must be less than the command timer. It's a common misconfiguration
> with raid setups.

$ smartctl -l scterc /dev/sda (sdb, sdc, sde, sdg)
gives me

smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control command not supported

while
$ smartctl -l scterc /dev/sdf (sdh, sdi, sdj, sdk)
gives me

smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

$ cat /sys/block/sdX/device/timeout
gives me "30" for every device

Does that mean my settings for the device timeouts are wrong?

> After that's fixed you should do a scrub, and I'm thinking it's best
> to do only a check, which means 'echo check >
> /sys/block/mdX/md/sync_action' rather than issuing repair which
> assumes data strips are correct and parity strips are wrong and
> rebuilds all parity strips.

I don't quite understand, I thought a scrub could only be done on a mounted 
filesys?
Do you reall mean executing the command "echo check > 
/sys/block/md0/md/sync_action"? At the moment it says "idle" in that file.
Also, the btrfs filesys sits in an encrypted container, so the setup looks like 
this:

/dev/md0 (this is the Raid device)
/dev/mapper/storage (after cryptsetup luksOpen, this is where filesys should be 
mounted from)
/media/storage (i always mounted the filesystem into this folder by executing 
"mount /dev/mapper/storage /media/storage")

Apologies if I didn't make that clear enough in my initial email


>> $ uname -a
>> Linux vmhost 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u4
>> (2016-02-29) x86_64 GNU/Linux
>This is old. You should upgrade to something newer, ideally 4.5 but
>4.4.6 is good also, and then oldest I'd suggest is 4.1.20.

Shouldn't I be able to get the newest kernel by executing "apt-get update && 
apt-get dist-upgrade"?
That's what I ran just now, and it doesn't install a newer kernel. Do I really 
have to manually upgrade to a newer one?
On top of the sticky situation i'm already in, i'm not sure if I trust myself 
manually building a new kernel. Should I?

> What do you get for
> btrfs-find-root /dev/mdX
> btrfs-show-super -fa /dev/mdX

$ btrfs-find-root /dev/mapper/storage
Couldn't read chunk tree
Open ctree failed

$ btrfs-show-super -fa /dev/mapper/storage
superblock: bytenr=65536, device=/dev/mapper/storage
-
csum                    0xf3887f83 [match]
bytenr                  65536
flags                   0x1
                        ( WRITTEN )
magic                   _BHRfS_M [match]
fsid                    9868d803-78d1-40c3-b1ee-a4ce3363df87
label
generation              1322969
root                    24022309593088
sys_array_size          97
chunk_root_generation   1275381
root_level              2
chunk_root              36340959809536
chunk_root_level        2
log_root                0
log_root_transid        0
log_root_level          0
total_bytes             21003208163328
bytes_used              17670843191296
sectorsize              4096
nodesize                4096
leafsize                4096
stripesize              4096
root_dir                6
num_devices             1
compat_flags            0x0
compat_ro_flags         0x0
incompat_flags          0x1
                        ( MIXED_BACKREF )
csum_type               0
csum_size               4
cache_generation        1322969
uuid_tree_generation    1322969
dev_item.uuid           c1123f55-46ce-4931-8722-7387fee07608
dev_item.fsid           9868d803-78d1-40c3-b1ee-a4ce3363df87 [match]
dev_item.type           0
dev_item.total_bytes    21003208163328
dev_item.bytes_used     17886424858624
dev_item.io_align       4096
dev_item.io_width       4096
dev_item.sector_size    4096

Re: unable to mount btrfs partition, please help :(

2016-03-19 Thread Chris Murphy
On Sat, Mar 19, 2016 at 4:15 PM, Patrick Tschackert  wrote:

> I'm growing increasingly desperate, can anyone help me? I'm thinking
> of trying one or more of the following, but would like an informed
> opinion:
> 1) btrfs check --fix-crc
> 2) btrfs-check --init-csum-tree
> 3) btrfs rescue chunk-recover
> 4) btrfs-check --repair
> 5) btrfs rescue zero-log

None of the above. Try 'btrfs check' without any options first.

To me it seems the problem is instigated by lower layers either not
completing critical writes at the time of the power failure, or didn't
rebuild correctly.

You should check the SCT ERC setting on each drive with 'smartctl -l
scterc /dev/sdX' and also the kernel command timer setting with 'cat
/sys/block/sdX/device/timeout' also for each device. The SCT ERC value
must be less than the command timer. It's a common misconfiguration
with raid setups.

After that's fixed you should do a scrub, and I'm thinking it's best
to do only a check, which means 'echo check >
/sys/block/mdX/md/sync_action' rather than issuing repair which
assumes data strips are correct and parity strips are wrong and
rebuilds all parity strips.


>
> $ uname -a
> Linux vmhost 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u4
> (2016-02-29) x86_64 GNU/Linux

This is old. You should upgrade to something newer, ideally 4.5 but
4.4.6 is good also, and then oldest I'd suggest is 4.1.20.

>
> $ btrfs --version
> btrfs-progs v4.4

Good.

> $ btrfs fi show
> Label: none uuid: 9868d803-78d1-40c3-b1ee-a4ce3363df87
> Total devices 1 FS bytes used 16.07TiB
> devid 1 size 19.10TiB used 16.27TiB path /dev/mapper/storage
>
> excerpt from DMESG:
> [ 151.970916] BTRFS: device fsid 9868d803-78d1-40c3-b1ee-a4ce3363df87
> devid 1 transid 1322969 /dev/dm-0
> [ 163.105784] BTRFS info (device dm-0): disk space caching is enabled
> [ 165.304968] BTRFS: bad tree block start 4530277753793296986 36340960788480
> [ 165.305233] BTRFS: bad tree block start 4530277753793296986 36340960788480
> [ 165.305281] BTRFS: failed to read chunk tree on dm-0
> [ 165.331407] BTRFS: open_ctree failed

Yeah this isn't a good message typically. There's one surprising (to
me) case where someone had luck getting this fixed with btrfs-zero-log
which is unexpected. But I think it's very premature to make changes
to the file system until you have more information.

What do you get for
btrfs-find-root /dev/mdX
btrfs-show-super -fa /dev/mdX


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html