Re: Failover for unattached USB device

2018-10-25 Thread Chris Murphy
On Thu, Oct 25, 2018 at 3:47 AM, Dmitry Katsubo  wrote:
>
>
> BTRFS error (device sdf): bdev /dev/sdh errs: wr 0, rd 1867, flush 0,
> corrupt 0, gen 0
> BTRFS error (device sdf): bdev /dev/sdg errs: wr 0, rd 1867, flush 0,
> corrupt 0, gen 0
>
> Attempts lasted for 29 minutes.

Yep, and it floods the log. It's extra fun if the journal is on the
device with errors. The more errors, the more writes and reads to the
problem drive, the more errors, the more writes, the more errors...
snowball.

But that's the state of error handling on Btrfs, which is still more
sophisticated than other file systems. It's not more sophisticated
than the kernel's md driver, which does have some sort of read error
rate limit and then it'll kick the drive out of the array (faulty
state) and stop complaining about it. And I think it considers a drive
faulty on a single write failure.

>
> Thanks for this information. I have a situation similar to yours, with
> only important difference that my drives are put into the USB dock with
> independent power and cooling like this one:
>
> https://www.ebay.com/itm/Mediasonic-ProBox-4-Bay-3-5-Hard-Drive-Enclosure-USB-3-0-eSATA-Sata-3-6-0Gbps/273161164246
>
> so I don't think I need to worry about amps. This dock is connected
> directly to USB port on the motherboard.

It is entirely plausible that this still needs a hub, but it really
depends on the exact errors you're getting. And those need to go to
the linux-usb list, I don't know enough about it.

And it might require a bit of luck to get a reply because it's a very
busy list. My main recommendation is to be very concise: They will
want to know the hardware setup (topology), lsusb -v, lspci, and a
complete dmesg. It'll seem reasonable to snip just to the usb error
messages, that almost always drives developers crazy because important
hints for problems can show up in kernel message during boot, so they
will inevitably want the whole dmesg. Ideal scenario is to do a clean
boot and then reproduce the problem, and then capture the dmesg, that
way it's a concise dmesg that isn't two weeks old with a bunch of
device connects and disconnects or whatever. There almost certainly
will be usb kernel parameters for debugging, ideally you search the
linux-usb list archives to find out what they are (I'm not sure) so
that you already have that set for your clean boot.

There might be usb quirks for your hardware setup that apply. Or they
might suggests that it still needs a  USB hub to clean things up
between controller and bridge chipset.



>
> However indeed there could be bugs both on dock side and in south bridge.
> More over I could imagine that USB reset happens due to another USB device,
> like a wave stated in one place turning into tsunami for the whole
> USB subsystem.

If there is a hub, one of their jobs is to prevent that from
happening. And if the drive enclosure and problem device are on
separate ports, they are effectively going through a built-in hub in
the usb host device. But yeah, you want to tell linux-usb exactly what
devices (and chipsets which lsusb -v will show)  you're using because
they may already know about such problems.


>
>> There are pending patches for something similar that you can find in
>> the archives. I think the reason they haven't been merged yet is there
>> haven't been enough comments and feedback (?). I think Anand Jain is
>> the author of those patches so you might dig around in the archives.
>> In a way you have an ideal setup for testing them out. Just make sure
>> you have backups...
>
>
> Thanks for reference. Should I look for this patch here:
>
> https://patchwork.kernel.org/project/linux-btrfs/list/?submitter=34632=-date

Maybe, it's a lot of patches to go through. I'm using
https://lore.kernel.org/linux-btrfs which has a search field.

This is the recent email I was thinking of that might point you in the
right direction:

https://lore.kernel.org/linux-btrfs/2287c62d-6dbb-3b30-1134-d754e4294...@oracle.com/

A complicating factor is that the block layer does do some retries.
I'm also not familiar enough with the way md does retries and sets
drives as faulty and if that is really what Btrfs should replicate or
not. Some of these conversations require cooperation with other kernel
developers, I suspect, like libata, SCSI, USB, SD, in order to make
sure no one is being stepped on with some big surprise.



>
> I didn't observe any errors while doing "btrfs check" on this volume after
> several such resets, because that volume is mostly used for reading and
> chance that USB reset happens during the write is very low.

If it mounts and the most recent changes are readable without errors,
the file system is probably fine. Btrfs is pretty good at detecting
and correcting for hardware related problems, in that it is fussier
than other file systems because it can detect such problems in both
metadata and data and should be able to avoid them in the first place
due to always on COW (as long as you 

Re: Failover for unattached USB device

2018-10-25 Thread Dmitry Katsubo

On 2018-10-24 20:05, Chris Murphy wrote:

I think about the best we can expect in the short term is that Btrfs
goes read-only before the file system becomes corrupted in a way it
can't recover with a normal mount. And I'm not certain it is in this
state of development right now for all cases. And I say the same thing
for other file systems as well.

Running Btrfs on USB devices is fine, so long as they're well behaved.
I have such a setup with USB 3.0 devices. Perhaps I got a bit lucky,
because there are a lot of known bugs with USB controllers, USB bridge
chipsets, and USB hubs.

Having user definable switches for when to go read-only is, I think
misleading to the user, and very likely will mislead the file system.
The file system needs to go read-only when it gets confused, period.
It doesn't matter what the error rate is.


In general I agree. I just wonder why it couldn't happen quicker. For
example, from the log I've originally attached one can see that btrfs
made 1867 attempts to read (perhaps the same) block from both devices
in RAID1 volume, without success:

BTRFS error (device sdf): bdev /dev/sdh errs: wr 0, rd 1867, flush 0, 
corrupt 0, gen 0
BTRFS error (device sdf): bdev /dev/sdg errs: wr 0, rd 1867, flush 0, 
corrupt 0, gen 0


Attempts lasted for 29 minutes.


The work around is really to do the hard work making the devices
stable. Not asking Btrfs to paper over known unstable hardware.

In my case, I started out with rare disconnects and resets with
directly attached drives. This was a couple years ago. It was a Btrfs
raid1 setup, and the drives would not go missing at the same time, but
both would just drop off from time to time. Btrfs would complain of
dropped writes, I vaguely remember it going read only. But normal
mounts worked, sometimes with scary errors but always finding a good
copy on the other drive, and doing passive fixups. Scrub would always
fix up the rest. I'm still using those same file systems on those
devices, but now they go through a dyconn USB 3.0 hub with a decently
good power supply. I originally thought the drop offs were power
related, so I explicitly looked for a USB hub that could supply at
least 2A, and this one is 12VDC @ 2500mA. A laptop drive will draw
nearly 1A on spin up, but at that point P=AV. Laptop drives during
read/write using 1.5 W to 2.5 W @ 5VDC.

1.5-2.5 W = A * 5 V
Therefore A = 0.3-0.5A

And for 4 drives at possibly 0.5 A (although my drives are all at the
1.6 W read/write), that's 2 A @ 5 V, which is easily maintained for
the hub power supply (which by my calculation could do 6 A @ 5 V, not
accounting for any resistance).

Anyway, as it turns out I don't think it was power related, as the
Intel NUC in question probably had just enough amps per port. And what
it really was, was incompatibility between the Intel controller and
the bridgechipset in the USB-SATA cases, and the USB hub is similar to
an ethernet hub, it actually reads the USB stream and rewrites it out.
So hubs are actually pretty complicated little things, and having a
good one matters.


Thanks for this information. I have a situation similar to yours, with
only important difference that my drives are put into the USB dock with
independent power and cooling like this one:

https://www.ebay.com/itm/Mediasonic-ProBox-4-Bay-3-5-Hard-Drive-Enclosure-USB-3-0-eSATA-Sata-3-6-0Gbps/273161164246

so I don't think I need to worry about amps. This dock is connected
directly to USB port on the motherboard.

However indeed there could be bugs both on dock side and in south 
bridge.
More over I could imagine that USB reset happens due to another USB 
device,

like a wave stated in one place turning into tsunami for the whole
USB subsystem.


There are pending patches for something similar that you can find in
the archives. I think the reason they haven't been merged yet is there
haven't been enough comments and feedback (?). I think Anand Jain is
the author of those patches so you might dig around in the archives.
In a way you have an ideal setup for testing them out. Just make sure
you have backups...


Thanks for reference. Should I look for this patch here:

https://patchwork.kernel.org/project/linux-btrfs/list/?submitter=34632=-date

or this patch was only floating around in this maillist?


'btrfs check' without the --repair flag is safe and read only but
takes a long time because it'll read all metadata. The fastest safe
way is to mount it ro and read a directory recently being written to
and see if there are any kernel errors. You could recursively copy
files from a directory to /dev/null and then check kernel messages for
any errors. So long as metadata is DUP, there is a good chance a bad
copy of metadata can be automatically fixed up with a good copy. If
there's only single copy of metadata, or both copies get corrupt, then
it's difficult. Usually recovery of data is possible, but depending on
what's damaged, repair might not be possible.


I think "btrfs check" would be too heavy. 

Re: Failover for unattached USB device

2018-10-24 Thread Chris Murphy
On Wed, Oct 24, 2018 at 9:03 AM, Dmitry Katsubo  wrote:
> On 2018-10-17 00:14, Dmitry Katsubo wrote:
>>
>> As a workaround I can monitor dmesg output but:
>>
>> 1. It would be nice if I could tell btrfs that I would like to mount
>> read-only
>> after a certain error rate per minute is reached.
>> 2. It would be nice if btrfs could detect that both drives are not
>> available and
>> unmount (as mount read-only won't help much) the filesystem.
>>
>> Kernel log for Linux v4.14.2 is attached.
>
>
> I wonder if somebody could further advise the workaround. I understand that
> running
> btrfs volume over USB devices is not good, but I think btrfs could play some
> role
> as well.

I think about the best we can expect in the short term is that Btrfs
goes read-only before the file system becomes corrupted in a way it
can't recover with a normal mount. And I'm not certain it is in this
state of development right now for all cases. And I say the same thing
for other file systems as well.

Running Btrfs on USB devices is fine, so long as they're well behaved.
I have such a setup with USB 3.0 devices. Perhaps I got a bit lucky,
because there are a lot of known bugs with USB controllers, USB bridge
chipsets, and USB hubs.

Having user definable switches for when to go read-only is, I think
misleading to the user, and very likely will mislead the file system.
The file system needs to go read-only when it gets confused, period.
It doesn't matter what the error rate is.

The work around is really to do the hard work making the devices
stable. Not asking Btrfs to paper over known unstable hardware.

In my case, I started out with rare disconnects and resets with
directly attached drives. This was a couple years ago. It was a Btrfs
raid1 setup, and the drives would not go missing at the same time, but
both would just drop off from time to time. Btrfs would complain of
dropped writes, I vaguely remember it going read only. But normal
mounts worked, sometimes with scary errors but always finding a good
copy on the other drive, and doing passive fixups. Scrub would always
fix up the rest. I'm still using those same file systems on those
devices, but now they go through a dyconn USB 3.0 hub with a decently
good power supply. I originally thought the drop offs were power
related, so I explicitly looked for a USB hub that could supply at
least 2A, and this one is 12VDC @ 2500mA. A laptop drive will draw
nearly 1A on spin up, but at that point P=AV. Laptop drives during
read/write using 1.5 W to 2.5 W @ 5VDC.

1.5-2.5 W = A * 5 V
Therefore A = 0.3-0.5A

And for 4 drives at possibly 0.5 A (although my drives are all at the
1.6 W read/write), that's 2 A @ 5 V, which is easily maintained for
the hub power supply (which by my calculation could do 6 A @ 5 V, not
accounting for any resistance).

Anyway, as it turns out I don't think it was power related, as the
Intel NUC in question probably had just enough amps per port. And what
it really was, was incompatibility between the Intel controller and
the bridgechipset in the USB-SATA cases, and the USB hub is similar to
an ethernet hub, it actually reads the USB stream and rewrites it out.
So hubs are actually pretty complicated little things, and having a
good one matters.

>
> In particular I wonder if btrfs could detect that all devices in RAID1
> volume became
> inaccessible and instead of reporting increasing "write error" counter to
> kernel log simply
> render the volume as read-only. "inaccessible" could be that if the same
> block cannot be
> written back to minimum number of devices in RAID volume, so btrfs gives up.

There are pending patches for something similar that you can find in
the archives. I think the reason they haven't been merged yet is there
haven't been enough comments and feedback (?). I think Anand Jain is
the author of those patches so you might dig around in the archives.
In a way you have an ideal setup for testing them out. Just make sure
you have backups...


>
> Maybe someone can advise some sophisticated way of quick checking that
> filesystems is
> healthy?

'btrfs check' without the --repair flag is safe and read only but
takes a long time because it'll read all metadata. The fastest safe
way is to mount it ro and read a directory recently being written to
and see if there are any kernel errors. You could recursively copy
files from a directory to /dev/null and then check kernel messages for
any errors. So long as metadata is DUP, there is a good chance a bad
copy of metadata can be automatically fixed up with a good copy. If
there's only single copy of metadata, or both copies get corrupt, then
it's difficult. Usually recovery of data is possible, but depending on
what's damaged, repair might not be possible.


-- 
Chris Murphy


Re: Failover for unattached USB device

2018-10-24 Thread Dmitry Katsubo

On 2018-10-17 00:14, Dmitry Katsubo wrote:

As a workaround I can monitor dmesg output but:

1. It would be nice if I could tell btrfs that I would like to mount 
read-only

after a certain error rate per minute is reached.
2. It would be nice if btrfs could detect that both drives are not 
available and

unmount (as mount read-only won't help much) the filesystem.

Kernel log for Linux v4.14.2 is attached.


I wonder if somebody could further advise the workaround. I understand 
that running
btrfs volume over USB devices is not good, but I think btrfs could play 
some role

as well.

In particular I wonder if btrfs could detect that all devices in RAID1 
volume became
inaccessible and instead of reporting increasing "write error" counter 
to kernel log simply
render the volume as read-only. "inaccessible" could be that if the same 
block cannot be
written back to minimum number of devices in RAID volume, so btrfs gives 
up.


Maybe someone can advise some sophisticated way of quick checking that 
filesystems is
healthy? Right now the only way I see is to make a tiny write (like 
create a file and
instantly remove it) to make it die faster... Checking for write IO 
errors in "btrfs
dev stats /mnt/backups" output could be an option provided that delta is 
computed for
some period of time and write errors counter increase for both devices 
in the volume
(as apparently I am not interested in one failing block which btrfs 
tries to write

again and again increasing the write errors counter).

Thanks for any feedback.

--
With best regards,
Dmitry


Failover for unattached USB device

2018-10-16 Thread Dmitry Katsubo
Dear btrfs team / community,

Sometimes it happens that kernel resets USB subsystem (looks like hardware
problem). Nevertheless all USB devices are unattached and attached back. After
few hours of struggle btrfs finally comes to the situation when read-only
filesystem mount is necessary. During this time when I try to access this
mounted filesystem (/mnt/backups) it reports success for some directories, or
error for others:

root@debian:~# ll /mnt/backups/
total 14334
drwxr-xr-x 1 adm users116 Sep 12 00:35 .
drwxrwxr-x 1 adm users164 Sep 19 22:44 ..
-rw-r--r-- 1 adm users  79927 Feb  7  2018 contacts.zip
drwxr-xr-x 1 adm users254 Feb  4  2018 attic
drwxr-xr-x 1 adm users 16 Feb 23  2018 recent
...
root@debian:~# ll /mnt/backups/attic/
ls: reading directory '/mnt/backups/attic/': Input/output error
total 0
drwxr-xr-x 1 adm users 254 Feb  4  2018 .
drwxr-xr-x 1 adm users 116 Sep 12 00:35 ..

It looks like this depends on whether the content is in disk cache...

What is surprising: when I try to create a file, I succeed:

root@debian:~# touch /mnt/backups/.mounted
root@debian:~# ll /mnt/backups/.mounted
-rw-r--r-- 1 root root 0 Sep 20 16:52 /mnt/backups/.mounted
root@debian:~# rm /mnt/backups/.mounted

My btrfs volume consists of two identical drives combined into RAID1 volume:

# btrfs filesystem df /mnt/backups
Data, RAID1: total=880.00GiB, used=878.96GiB
System, RAID1: total=8.00MiB, used=144.00KiB
Metadata, RAID1: total=2.00GiB, used=1.13GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

# btrfs filesystem show /mnt/backups
Label: none  uuid: a657364b-36d2-4c1f-8e5d-dc3d28166190
Total devices 2 FS bytes used 880.09GiB
devid1 size 3.64TiB used 882.01GiB path /dev/sdf
devid2 size 3.64TiB used 882.01GiB path /dev/sde

As a workaround I can monitor dmesg output but:

1. It would be nice if I could tell btrfs that I would like to mount read-only
after a certain error rate per minute is reached.
2. It would be nice if btrfs could detect that both drives are not available and
unmount (as mount read-only won't help much) the filesystem.

Kernel log for Linux v4.14.2 is attached.

-- 
With best regards,
Dmitry
Jun 29 18:54:56 debian kernel: [1197865.440396] usb 4-2: USB disconnect, device 
number 3
Jun 29 18:54:56 debian kernel: [1197865.440403] usb 4-2.2: USB disconnect, 
device number 5
Jun 29 18:54:56 debian kernel: [1197865.476118] usb 4-2.3: USB disconnect, 
device number 8
Jun 29 18:54:56 debian kernel: [1197865.549379] usb 4-2.4: USB disconnect, 
device number 7
...
Jun 29 18:54:58 debian kernel: [1197867.517728] usb-storage 4-2.3:1.0: USB Mass 
Storage device detected
Jun 29 18:54:58 debian kernel: [1197867.524021] usb-storage 4-2.3:1.0: Quirks 
match for vid 152d pid 0567: 500
Jun 29 18:54:58 debian kernel: [1197867.603859] usb 4-2.4: new full-speed USB 
device number 13 using ehci-pci
Jun 29 18:54:58 debian kernel: [1197867.725595] usb-storage 4-2.4:1.2: USB Mass 
Storage device detected
Jun 29 18:54:58 debian kernel: [1197867.728602] scsi host9: usb-storage 
4-2.4:1.2
Jun 29 18:54:59 debian kernel: [1197868.528737] scsi 7:0:0:0: Direct-Access 
ST4000DM 004-2CV104   0125 PQ: 0 ANSI: 6
Jun 29 18:54:59 debian kernel: [1197868.529310] scsi 7:0:0:1: Direct-Access 
ST4000DM 004-2CV104   0125 PQ: 0 ANSI: 6
Jun 29 18:54:59 debian kernel: [1197868.530093] sd 7:0:0:0: Attached scsi 
generic sg5 type 0
Jun 29 18:54:59 debian kernel: [1197868.530588] sd 7:0:0:1: Attached scsi 
generic sg6 type 0
Jun 29 18:54:59 debian kernel: [1197868.533064] sd 7:0:0:1: [sdh] Very big 
device. Trying to use READ CAPACITY(16).
Jun 29 18:54:59 debian kernel: [1197868.533619] sd 7:0:0:1: [sdh] 7814037168 
512-byte logical blocks: (4.00 TB/3.64 TiB)
Jun 29 18:54:59 debian kernel: [1197868.533626] sd 7:0:0:1: [sdh] 4096-byte 
physical blocks
Jun 29 18:54:59 debian kernel: [1197868.534063] sd 7:0:0:1: [sdh] Write Protect 
is off
Jun 29 18:54:59 debian kernel: [1197868.534069] sd 7:0:0:1: [sdh] Mode Sense: 
67 00 10 08
Jun 29 18:54:59 debian kernel: [1197868.534422] sd 7:0:0:1: [sdh] No Caching 
mode page found
Jun 29 18:54:59 debian kernel: [1197868.534542] sd 7:0:0:1: [sdh] Assuming 
drive cache: write through
Jun 29 18:54:59 debian kernel: [1197868.535563] sd 7:0:0:1: [sdh] Very big 
device. Trying to use READ CAPACITY(16).
Jun 29 18:54:59 debian kernel: [1197868.536702] sd 7:0:0:0: [sdg] Very big 
device. Trying to use READ CAPACITY(16).
Jun 29 18:54:59 debian kernel: [1197868.537454] sd 7:0:0:0: [sdg] 7814037168 
512-byte logical blocks: (4.00 TB/3.64 TiB)
Jun 29 18:54:59 debian kernel: [1197868.537459] sd 7:0:0:0: [sdg] 4096-byte 
physical blocks
Jun 29 18:54:59 debian kernel: [1197868.538327] sd 7:0:0:0: [sdg] Write Protect 
is off
Jun 29 18:54:59 debian kernel: [1197868.538331] sd 7:0:0:0: [sdg] Mode Sense: 
67 00 10 08
...
Jun 29 20:22:35 debian kernel: [1203125.061068] BTRFS error (device sdf): bdev 
/dev/sdh errs: wr 0, rd 1, flush 0, corrupt 0, gen 0