Re: Failover for unattached USB device
On Thu, Oct 25, 2018 at 3:47 AM, Dmitry Katsubo wrote: > > > BTRFS error (device sdf): bdev /dev/sdh errs: wr 0, rd 1867, flush 0, > corrupt 0, gen 0 > BTRFS error (device sdf): bdev /dev/sdg errs: wr 0, rd 1867, flush 0, > corrupt 0, gen 0 > > Attempts lasted for 29 minutes. Yep, and it floods the log. It's extra fun if the journal is on the device with errors. The more errors, the more writes and reads to the problem drive, the more errors, the more writes, the more errors... snowball. But that's the state of error handling on Btrfs, which is still more sophisticated than other file systems. It's not more sophisticated than the kernel's md driver, which does have some sort of read error rate limit and then it'll kick the drive out of the array (faulty state) and stop complaining about it. And I think it considers a drive faulty on a single write failure. > > Thanks for this information. I have a situation similar to yours, with > only important difference that my drives are put into the USB dock with > independent power and cooling like this one: > > https://www.ebay.com/itm/Mediasonic-ProBox-4-Bay-3-5-Hard-Drive-Enclosure-USB-3-0-eSATA-Sata-3-6-0Gbps/273161164246 > > so I don't think I need to worry about amps. This dock is connected > directly to USB port on the motherboard. It is entirely plausible that this still needs a hub, but it really depends on the exact errors you're getting. And those need to go to the linux-usb list, I don't know enough about it. And it might require a bit of luck to get a reply because it's a very busy list. My main recommendation is to be very concise: They will want to know the hardware setup (topology), lsusb -v, lspci, and a complete dmesg. It'll seem reasonable to snip just to the usb error messages, that almost always drives developers crazy because important hints for problems can show up in kernel message during boot, so they will inevitably want the whole dmesg. Ideal scenario is to do a clean boot and then reproduce the problem, and then capture the dmesg, that way it's a concise dmesg that isn't two weeks old with a bunch of device connects and disconnects or whatever. There almost certainly will be usb kernel parameters for debugging, ideally you search the linux-usb list archives to find out what they are (I'm not sure) so that you already have that set for your clean boot. There might be usb quirks for your hardware setup that apply. Or they might suggests that it still needs a USB hub to clean things up between controller and bridge chipset. > > However indeed there could be bugs both on dock side and in south bridge. > More over I could imagine that USB reset happens due to another USB device, > like a wave stated in one place turning into tsunami for the whole > USB subsystem. If there is a hub, one of their jobs is to prevent that from happening. And if the drive enclosure and problem device are on separate ports, they are effectively going through a built-in hub in the usb host device. But yeah, you want to tell linux-usb exactly what devices (and chipsets which lsusb -v will show) you're using because they may already know about such problems. > >> There are pending patches for something similar that you can find in >> the archives. I think the reason they haven't been merged yet is there >> haven't been enough comments and feedback (?). I think Anand Jain is >> the author of those patches so you might dig around in the archives. >> In a way you have an ideal setup for testing them out. Just make sure >> you have backups... > > > Thanks for reference. Should I look for this patch here: > > https://patchwork.kernel.org/project/linux-btrfs/list/?submitter=34632=-date Maybe, it's a lot of patches to go through. I'm using https://lore.kernel.org/linux-btrfs which has a search field. This is the recent email I was thinking of that might point you in the right direction: https://lore.kernel.org/linux-btrfs/2287c62d-6dbb-3b30-1134-d754e4294...@oracle.com/ A complicating factor is that the block layer does do some retries. I'm also not familiar enough with the way md does retries and sets drives as faulty and if that is really what Btrfs should replicate or not. Some of these conversations require cooperation with other kernel developers, I suspect, like libata, SCSI, USB, SD, in order to make sure no one is being stepped on with some big surprise. > > I didn't observe any errors while doing "btrfs check" on this volume after > several such resets, because that volume is mostly used for reading and > chance that USB reset happens during the write is very low. If it mounts and the most recent changes are readable without errors, the file system is probably fine. Btrfs is pretty good at detecting and correcting for hardware related problems, in that it is fussier than other file systems because it can detect such problems in both metadata and data and should be able to avoid them in the first place due to always on COW (as long as you
Re: Failover for unattached USB device
On 2018-10-24 20:05, Chris Murphy wrote: I think about the best we can expect in the short term is that Btrfs goes read-only before the file system becomes corrupted in a way it can't recover with a normal mount. And I'm not certain it is in this state of development right now for all cases. And I say the same thing for other file systems as well. Running Btrfs on USB devices is fine, so long as they're well behaved. I have such a setup with USB 3.0 devices. Perhaps I got a bit lucky, because there are a lot of known bugs with USB controllers, USB bridge chipsets, and USB hubs. Having user definable switches for when to go read-only is, I think misleading to the user, and very likely will mislead the file system. The file system needs to go read-only when it gets confused, period. It doesn't matter what the error rate is. In general I agree. I just wonder why it couldn't happen quicker. For example, from the log I've originally attached one can see that btrfs made 1867 attempts to read (perhaps the same) block from both devices in RAID1 volume, without success: BTRFS error (device sdf): bdev /dev/sdh errs: wr 0, rd 1867, flush 0, corrupt 0, gen 0 BTRFS error (device sdf): bdev /dev/sdg errs: wr 0, rd 1867, flush 0, corrupt 0, gen 0 Attempts lasted for 29 minutes. The work around is really to do the hard work making the devices stable. Not asking Btrfs to paper over known unstable hardware. In my case, I started out with rare disconnects and resets with directly attached drives. This was a couple years ago. It was a Btrfs raid1 setup, and the drives would not go missing at the same time, but both would just drop off from time to time. Btrfs would complain of dropped writes, I vaguely remember it going read only. But normal mounts worked, sometimes with scary errors but always finding a good copy on the other drive, and doing passive fixups. Scrub would always fix up the rest. I'm still using those same file systems on those devices, but now they go through a dyconn USB 3.0 hub with a decently good power supply. I originally thought the drop offs were power related, so I explicitly looked for a USB hub that could supply at least 2A, and this one is 12VDC @ 2500mA. A laptop drive will draw nearly 1A on spin up, but at that point P=AV. Laptop drives during read/write using 1.5 W to 2.5 W @ 5VDC. 1.5-2.5 W = A * 5 V Therefore A = 0.3-0.5A And for 4 drives at possibly 0.5 A (although my drives are all at the 1.6 W read/write), that's 2 A @ 5 V, which is easily maintained for the hub power supply (which by my calculation could do 6 A @ 5 V, not accounting for any resistance). Anyway, as it turns out I don't think it was power related, as the Intel NUC in question probably had just enough amps per port. And what it really was, was incompatibility between the Intel controller and the bridgechipset in the USB-SATA cases, and the USB hub is similar to an ethernet hub, it actually reads the USB stream and rewrites it out. So hubs are actually pretty complicated little things, and having a good one matters. Thanks for this information. I have a situation similar to yours, with only important difference that my drives are put into the USB dock with independent power and cooling like this one: https://www.ebay.com/itm/Mediasonic-ProBox-4-Bay-3-5-Hard-Drive-Enclosure-USB-3-0-eSATA-Sata-3-6-0Gbps/273161164246 so I don't think I need to worry about amps. This dock is connected directly to USB port on the motherboard. However indeed there could be bugs both on dock side and in south bridge. More over I could imagine that USB reset happens due to another USB device, like a wave stated in one place turning into tsunami for the whole USB subsystem. There are pending patches for something similar that you can find in the archives. I think the reason they haven't been merged yet is there haven't been enough comments and feedback (?). I think Anand Jain is the author of those patches so you might dig around in the archives. In a way you have an ideal setup for testing them out. Just make sure you have backups... Thanks for reference. Should I look for this patch here: https://patchwork.kernel.org/project/linux-btrfs/list/?submitter=34632=-date or this patch was only floating around in this maillist? 'btrfs check' without the --repair flag is safe and read only but takes a long time because it'll read all metadata. The fastest safe way is to mount it ro and read a directory recently being written to and see if there are any kernel errors. You could recursively copy files from a directory to /dev/null and then check kernel messages for any errors. So long as metadata is DUP, there is a good chance a bad copy of metadata can be automatically fixed up with a good copy. If there's only single copy of metadata, or both copies get corrupt, then it's difficult. Usually recovery of data is possible, but depending on what's damaged, repair might not be possible. I think "btrfs check" would be too heavy.
Re: Failover for unattached USB device
On Wed, Oct 24, 2018 at 9:03 AM, Dmitry Katsubo wrote: > On 2018-10-17 00:14, Dmitry Katsubo wrote: >> >> As a workaround I can monitor dmesg output but: >> >> 1. It would be nice if I could tell btrfs that I would like to mount >> read-only >> after a certain error rate per minute is reached. >> 2. It would be nice if btrfs could detect that both drives are not >> available and >> unmount (as mount read-only won't help much) the filesystem. >> >> Kernel log for Linux v4.14.2 is attached. > > > I wonder if somebody could further advise the workaround. I understand that > running > btrfs volume over USB devices is not good, but I think btrfs could play some > role > as well. I think about the best we can expect in the short term is that Btrfs goes read-only before the file system becomes corrupted in a way it can't recover with a normal mount. And I'm not certain it is in this state of development right now for all cases. And I say the same thing for other file systems as well. Running Btrfs on USB devices is fine, so long as they're well behaved. I have such a setup with USB 3.0 devices. Perhaps I got a bit lucky, because there are a lot of known bugs with USB controllers, USB bridge chipsets, and USB hubs. Having user definable switches for when to go read-only is, I think misleading to the user, and very likely will mislead the file system. The file system needs to go read-only when it gets confused, period. It doesn't matter what the error rate is. The work around is really to do the hard work making the devices stable. Not asking Btrfs to paper over known unstable hardware. In my case, I started out with rare disconnects and resets with directly attached drives. This was a couple years ago. It was a Btrfs raid1 setup, and the drives would not go missing at the same time, but both would just drop off from time to time. Btrfs would complain of dropped writes, I vaguely remember it going read only. But normal mounts worked, sometimes with scary errors but always finding a good copy on the other drive, and doing passive fixups. Scrub would always fix up the rest. I'm still using those same file systems on those devices, but now they go through a dyconn USB 3.0 hub with a decently good power supply. I originally thought the drop offs were power related, so I explicitly looked for a USB hub that could supply at least 2A, and this one is 12VDC @ 2500mA. A laptop drive will draw nearly 1A on spin up, but at that point P=AV. Laptop drives during read/write using 1.5 W to 2.5 W @ 5VDC. 1.5-2.5 W = A * 5 V Therefore A = 0.3-0.5A And for 4 drives at possibly 0.5 A (although my drives are all at the 1.6 W read/write), that's 2 A @ 5 V, which is easily maintained for the hub power supply (which by my calculation could do 6 A @ 5 V, not accounting for any resistance). Anyway, as it turns out I don't think it was power related, as the Intel NUC in question probably had just enough amps per port. And what it really was, was incompatibility between the Intel controller and the bridgechipset in the USB-SATA cases, and the USB hub is similar to an ethernet hub, it actually reads the USB stream and rewrites it out. So hubs are actually pretty complicated little things, and having a good one matters. > > In particular I wonder if btrfs could detect that all devices in RAID1 > volume became > inaccessible and instead of reporting increasing "write error" counter to > kernel log simply > render the volume as read-only. "inaccessible" could be that if the same > block cannot be > written back to minimum number of devices in RAID volume, so btrfs gives up. There are pending patches for something similar that you can find in the archives. I think the reason they haven't been merged yet is there haven't been enough comments and feedback (?). I think Anand Jain is the author of those patches so you might dig around in the archives. In a way you have an ideal setup for testing them out. Just make sure you have backups... > > Maybe someone can advise some sophisticated way of quick checking that > filesystems is > healthy? 'btrfs check' without the --repair flag is safe and read only but takes a long time because it'll read all metadata. The fastest safe way is to mount it ro and read a directory recently being written to and see if there are any kernel errors. You could recursively copy files from a directory to /dev/null and then check kernel messages for any errors. So long as metadata is DUP, there is a good chance a bad copy of metadata can be automatically fixed up with a good copy. If there's only single copy of metadata, or both copies get corrupt, then it's difficult. Usually recovery of data is possible, but depending on what's damaged, repair might not be possible. -- Chris Murphy
Re: Failover for unattached USB device
On 2018-10-17 00:14, Dmitry Katsubo wrote: As a workaround I can monitor dmesg output but: 1. It would be nice if I could tell btrfs that I would like to mount read-only after a certain error rate per minute is reached. 2. It would be nice if btrfs could detect that both drives are not available and unmount (as mount read-only won't help much) the filesystem. Kernel log for Linux v4.14.2 is attached. I wonder if somebody could further advise the workaround. I understand that running btrfs volume over USB devices is not good, but I think btrfs could play some role as well. In particular I wonder if btrfs could detect that all devices in RAID1 volume became inaccessible and instead of reporting increasing "write error" counter to kernel log simply render the volume as read-only. "inaccessible" could be that if the same block cannot be written back to minimum number of devices in RAID volume, so btrfs gives up. Maybe someone can advise some sophisticated way of quick checking that filesystems is healthy? Right now the only way I see is to make a tiny write (like create a file and instantly remove it) to make it die faster... Checking for write IO errors in "btrfs dev stats /mnt/backups" output could be an option provided that delta is computed for some period of time and write errors counter increase for both devices in the volume (as apparently I am not interested in one failing block which btrfs tries to write again and again increasing the write errors counter). Thanks for any feedback. -- With best regards, Dmitry
Failover for unattached USB device
Dear btrfs team / community, Sometimes it happens that kernel resets USB subsystem (looks like hardware problem). Nevertheless all USB devices are unattached and attached back. After few hours of struggle btrfs finally comes to the situation when read-only filesystem mount is necessary. During this time when I try to access this mounted filesystem (/mnt/backups) it reports success for some directories, or error for others: root@debian:~# ll /mnt/backups/ total 14334 drwxr-xr-x 1 adm users116 Sep 12 00:35 . drwxrwxr-x 1 adm users164 Sep 19 22:44 .. -rw-r--r-- 1 adm users 79927 Feb 7 2018 contacts.zip drwxr-xr-x 1 adm users254 Feb 4 2018 attic drwxr-xr-x 1 adm users 16 Feb 23 2018 recent ... root@debian:~# ll /mnt/backups/attic/ ls: reading directory '/mnt/backups/attic/': Input/output error total 0 drwxr-xr-x 1 adm users 254 Feb 4 2018 . drwxr-xr-x 1 adm users 116 Sep 12 00:35 .. It looks like this depends on whether the content is in disk cache... What is surprising: when I try to create a file, I succeed: root@debian:~# touch /mnt/backups/.mounted root@debian:~# ll /mnt/backups/.mounted -rw-r--r-- 1 root root 0 Sep 20 16:52 /mnt/backups/.mounted root@debian:~# rm /mnt/backups/.mounted My btrfs volume consists of two identical drives combined into RAID1 volume: # btrfs filesystem df /mnt/backups Data, RAID1: total=880.00GiB, used=878.96GiB System, RAID1: total=8.00MiB, used=144.00KiB Metadata, RAID1: total=2.00GiB, used=1.13GiB GlobalReserve, single: total=512.00MiB, used=0.00B # btrfs filesystem show /mnt/backups Label: none uuid: a657364b-36d2-4c1f-8e5d-dc3d28166190 Total devices 2 FS bytes used 880.09GiB devid1 size 3.64TiB used 882.01GiB path /dev/sdf devid2 size 3.64TiB used 882.01GiB path /dev/sde As a workaround I can monitor dmesg output but: 1. It would be nice if I could tell btrfs that I would like to mount read-only after a certain error rate per minute is reached. 2. It would be nice if btrfs could detect that both drives are not available and unmount (as mount read-only won't help much) the filesystem. Kernel log for Linux v4.14.2 is attached. -- With best regards, Dmitry Jun 29 18:54:56 debian kernel: [1197865.440396] usb 4-2: USB disconnect, device number 3 Jun 29 18:54:56 debian kernel: [1197865.440403] usb 4-2.2: USB disconnect, device number 5 Jun 29 18:54:56 debian kernel: [1197865.476118] usb 4-2.3: USB disconnect, device number 8 Jun 29 18:54:56 debian kernel: [1197865.549379] usb 4-2.4: USB disconnect, device number 7 ... Jun 29 18:54:58 debian kernel: [1197867.517728] usb-storage 4-2.3:1.0: USB Mass Storage device detected Jun 29 18:54:58 debian kernel: [1197867.524021] usb-storage 4-2.3:1.0: Quirks match for vid 152d pid 0567: 500 Jun 29 18:54:58 debian kernel: [1197867.603859] usb 4-2.4: new full-speed USB device number 13 using ehci-pci Jun 29 18:54:58 debian kernel: [1197867.725595] usb-storage 4-2.4:1.2: USB Mass Storage device detected Jun 29 18:54:58 debian kernel: [1197867.728602] scsi host9: usb-storage 4-2.4:1.2 Jun 29 18:54:59 debian kernel: [1197868.528737] scsi 7:0:0:0: Direct-Access ST4000DM 004-2CV104 0125 PQ: 0 ANSI: 6 Jun 29 18:54:59 debian kernel: [1197868.529310] scsi 7:0:0:1: Direct-Access ST4000DM 004-2CV104 0125 PQ: 0 ANSI: 6 Jun 29 18:54:59 debian kernel: [1197868.530093] sd 7:0:0:0: Attached scsi generic sg5 type 0 Jun 29 18:54:59 debian kernel: [1197868.530588] sd 7:0:0:1: Attached scsi generic sg6 type 0 Jun 29 18:54:59 debian kernel: [1197868.533064] sd 7:0:0:1: [sdh] Very big device. Trying to use READ CAPACITY(16). Jun 29 18:54:59 debian kernel: [1197868.533619] sd 7:0:0:1: [sdh] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) Jun 29 18:54:59 debian kernel: [1197868.533626] sd 7:0:0:1: [sdh] 4096-byte physical blocks Jun 29 18:54:59 debian kernel: [1197868.534063] sd 7:0:0:1: [sdh] Write Protect is off Jun 29 18:54:59 debian kernel: [1197868.534069] sd 7:0:0:1: [sdh] Mode Sense: 67 00 10 08 Jun 29 18:54:59 debian kernel: [1197868.534422] sd 7:0:0:1: [sdh] No Caching mode page found Jun 29 18:54:59 debian kernel: [1197868.534542] sd 7:0:0:1: [sdh] Assuming drive cache: write through Jun 29 18:54:59 debian kernel: [1197868.535563] sd 7:0:0:1: [sdh] Very big device. Trying to use READ CAPACITY(16). Jun 29 18:54:59 debian kernel: [1197868.536702] sd 7:0:0:0: [sdg] Very big device. Trying to use READ CAPACITY(16). Jun 29 18:54:59 debian kernel: [1197868.537454] sd 7:0:0:0: [sdg] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) Jun 29 18:54:59 debian kernel: [1197868.537459] sd 7:0:0:0: [sdg] 4096-byte physical blocks Jun 29 18:54:59 debian kernel: [1197868.538327] sd 7:0:0:0: [sdg] Write Protect is off Jun 29 18:54:59 debian kernel: [1197868.538331] sd 7:0:0:0: [sdg] Mode Sense: 67 00 10 08 ... Jun 29 20:22:35 debian kernel: [1203125.061068] BTRFS error (device sdf): bdev /dev/sdh errs: wr 0, rd 1, flush 0, corrupt 0, gen 0