Re: btrfs send hung in pipe_wait
On Sun, Sep 9, 2018 at 2:16 PM, Stefan Loewen wrote: > I'm not sure about the exact definition of "blocked" here, but I was > also surprised that there were no blocked tasks listed since I'm > definitely unable to kill (SIGKILL) that process. > On the other hand it wakes up hourly to transfer a few bytes. > The problem is definitely not, that I issued the sysrq too early. I > think it was after about 45min of no IO. Another one the devs have asked for in cases where things get slow or hang, but without explicit blocked task messages, is sysrq + t. But I'm throwing spaghetti at a wall at this point, none of it will fix the problem, and I haven't learned how to read these outputs. > So there is some problem with this "original" subvol. Maybe I should > describe how that came into existence. > Initially I had my data on a NTFS formatted drive. I then created a > btrfs partition on my second drive and rsynced all my stuff over into > the root subvol. > Then I noticed that having all my data in the root subvol was a bad > idea and created a "data" subvol and reflinked everything into it. > I deleted the data from the root subvol, made a snapshot of the "data" > subvol, tried sending that and ran into the problem we're discussing > here. That is interesting and useful information. I see nothing invalid about it at all. However, just for future reference it is possible to snapshot the top level (default) subvolume. By default, the top level subvolume (sometimes referred to as subvolid=5 or subvolid=0) is what is mounted if you haven't used 'btrfs sub set-default' to change it. You can snapshot that subvolume by snapshotting the mount point. e.g. mount /dev/sda1 /mnt btrfs sub snap /mnt/subvolume1 So now you have a readwrite subvolume called "subvolume1" which contains everything that was in the top level, which you can now delete if you're trying to keep things tidy and just have subvolumes and snapshots in the top level. Anyway, what you did is possibly relevant to the problem. But if it turns out it's the cause of the problem, it's definitely a bug. > > btrfs check in lowmem mode did not find any errors either: > > $ sudo btrfs check --mode=lowmem --progress /dev/sdb1 > Opening filesystem to check... > Checking filesystem on /dev/sdb1 > UUID: cd786597-3816-40e7-bf6c-d585265ad372 > [1/7] checking root items (0:00:30 elapsed, > 1047408 items checked) > [2/7] checking extents (0:03:55 elapsed, > 309170 items checked) > cache and super generation don't match, space cache will be invalidated > [3/7] checking free space cache(0:00:00 elapsed) > [4/7] checking fs roots(0:04:07 elapsed, 85373 > items checked) > [5/7] checking csums (without verifying data) (0:00:00 elapsed, > 253106 items checked) > [6/7] checking root refs done with fs roots in lowmem mode, skipping > [7/7] checking quota groups skipped (not enabled on this FS) > found 708354711552 bytes used, no error found > total csum bytes: 689206904 > total tree bytes: 2423865344 > total fs tree bytes: 1542914048 > total extent tree bytes: 129843200 > btree space waste bytes: 299191292 > file data blocks allocated: 31709967417344 > referenced 928531877888 OK good to know. -- Chris Murphy
Re: btrfs send hung in pipe_wait
I don't see any blocked tasks. I wonder if you were too fast with sysrq w? Maybe it takes a little bit for the block task to actually develop? I suggest also 'btrfs check --mode=lowmem' because that is a separate implementation of btrfs check that tends to catch different things than the original. It is slow, however. -- Chris Murphy
Fwd: btrfs send hung in pipe_wait
And this one as well. -- Forwarded message - From: Chris Murphy Date: Fr., 7. Sep. 2018 um 23:53 Uhr Subject: Re: btrfs send hung in pipe_wait To: Stefan Loewen Cc: Chris Murphy On Fri, Sep 7, 2018 at 3:19 PM, Stefan Loewen wrote: > Now I also tested with Fedora 28 (linux 4.16) from live-usb (so baremetal). > Same result. > Thanks for the pointer towards sys requests. sysrq-w is empty, but I > sent a bunch of other sysrqs to get stacktraces etc. from the kernel. > Logs are attached. Needs a dev to take a look at this, or someone with usb or block device knowledge to see if something unrelated to btrfs is hung up. I can't parse it. Using btrfs-progs 4.17.1, what do you get for 'btrfs check --mode=lowmem ' ? -- Chris Murphy
Fwd: btrfs send hung in pipe_wait
Oops. Forgot CCing the mailinglist -- Forwarded message - From: Stefan Loewen Date: Fr., 7. Sep. 2018 um 23:19 Uhr Subject: Re: btrfs send hung in pipe_wait To: Chris Murphy No it does not only happen in VirtualBox. I already tested the following: - Manjaro baremetal (btrfs-progs v4.17.1; linux v4.18.5 and v4.14.67) - Ubuntu 18.04 in VirtualBox (btrfs-progs v4.15.1; linux v4.15.0-33-generic) - ArchLinux in VirtualBox (btrfs-progs v4.17.1; linux v4.18.5-arch1-1-ARCH) The logs I posted until now were mostly (iirc all of them) from the VM with Arch. Now I also tested with Fedora 28 (linux 4.16) from live-usb (so baremetal). Same result. Thanks for the pointer towards sys requests. sysrq-w is empty, but I sent a bunch of other sysrqs to get stacktraces etc. from the kernel. Logs are attached. To recap: I copied (reflink) a 3.8G iso file from a read-only snapshot (A) into a new subvol (B) just to keep things small and managable. There's nothing special about this file other than that it happens to be one of the files to trigger the later btrfs-send to hang. Not all files from A do this, but there are definitely multiple and the problem only occurs when the files are reflinked. I then create a snapshot (C) of B to be able to btrfs-send it. Then I run "btrfs send snap-C > /somewhere", (--no-data leads to the same) that process reads some MB from the source disk and writes a few bytes to /somewhere and then just hangs without any further IO. This is where I issued some sysrqs. The output is attached. Then I tried killing the btrfs-send with ctrl-c and issued the sysrqs again. I have no idea if that changed anything, but it didn't hurt, so why not. I'll try to minimize the dataset and maybe get a small fs-image without too much personal information that I can upload so the issue is reproducible by others. Am Fr., 7. Sep. 2018 um 21:17 Uhr schrieb Chris Murphy : > > On Fri, Sep 7, 2018 at 11:07 AM, Stefan Loewen > wrote: > > List of steps: > > - 3.8G iso lays in read-only subvol A > > - I create subvol B and reflink-copy the iso into it. > > - I create a read-only snapshot C of B > > - I "btrfs send --no-data C > /somefile" > > So you got that right, yes. > > OK I can't reproduce it. Sending A and C complete instantly with > --no-data, and complete in the same time with a full send/receive. In > my case I used a 4.9G ISO. > > I can't think of what local difference accounts for what you're > seeing. There is really nothing special about --reflinks. The extent > and csum data are identical to the original file, and that's the bulk > of the metadata for a given file. > > What I can tell you is usually the developers want to see sysrq+w > whenever there are blocked tasks. > https://fedoraproject.org/wiki/QA/Sysrq > > You'll want to enable all sysrq functions. And next you'll want three > ssh shells: > > 1. sudo journalctl -fk > 2. sudo -i to become root, and then echo w > /proc/sysrq-trigger but > do not hit return yet > 3. sudo btrfs send... to reproduce the problem. > > Basically the thing is gonna hang soon after you reproduce the > problem, so you want to get to shell #2 and just hit return rather > than dealing with long delays typing that echo command out. And then > the journal command is so your local terminal captures the sysrq > output because you're gonna kill the VM instead of waiting it out. I > have no idea how to read these things but someone might pick up this > thread and have some idea why these tasks are hanging. > > > > > > > > Unfortunately I don't have any way to connect the drive to a SATA port > > directly but I tried to switch out as much of the used setup as > > possible (all changes active at the same time): > > - I got the original (not the clone) HDD out of the enclosure and used > > this adapter to connect it: > > https://www.amazon.de/DIGITUS-Adapterkabel-40pol-480Mbps-schwarz/dp/B007X86VZK > > - I used a different Notebook > > - I ran the test natively on that notebook (instead of from > > VirtualBox. I used VirtualBox for most of the tests as I have to > > force-poweroff the PC everytime the btrfs-send hangs as it is not > > killable) > > > This problem only happens in VirtualBox? Or it happens on baremetal > also? And we've established it happens with two different source > (send) devices, which means two different Btrfs volumes. > > All I can say is you need to keep changing things up, process of > elimination. Rather tedious. Maybe you could try downloading a Fedora > 28 ISO, make a boot stick out of it, and try to reproduce with the > same drives. At least that's an easy way to isolate the OS from the > equation. > > > -- > Chris Murphy fedora-kernelmsgs.log.gz Description: application/gzip
Re: btrfs send hung in pipe_wait
On Fri, Sep 7, 2018 at 11:07 AM, Stefan Loewen wrote: > List of steps: > - 3.8G iso lays in read-only subvol A > - I create subvol B and reflink-copy the iso into it. > - I create a read-only snapshot C of B > - I "btrfs send --no-data C > /somefile" > So you got that right, yes. OK I can't reproduce it. Sending A and C complete instantly with --no-data, and complete in the same time with a full send/receive. In my case I used a 4.9G ISO. I can't think of what local difference accounts for what you're seeing. There is really nothing special about --reflinks. The extent and csum data are identical to the original file, and that's the bulk of the metadata for a given file. What I can tell you is usually the developers want to see sysrq+w whenever there are blocked tasks. https://fedoraproject.org/wiki/QA/Sysrq You'll want to enable all sysrq functions. And next you'll want three ssh shells: 1. sudo journalctl -fk 2. sudo -i to become root, and then echo w > /proc/sysrq-trigger but do not hit return yet 3. sudo btrfs send... to reproduce the problem. Basically the thing is gonna hang soon after you reproduce the problem, so you want to get to shell #2 and just hit return rather than dealing with long delays typing that echo command out. And then the journal command is so your local terminal captures the sysrq output because you're gonna kill the VM instead of waiting it out. I have no idea how to read these things but someone might pick up this thread and have some idea why these tasks are hanging. > > Unfortunately I don't have any way to connect the drive to a SATA port > directly but I tried to switch out as much of the used setup as > possible (all changes active at the same time): > - I got the original (not the clone) HDD out of the enclosure and used > this adapter to connect it: > https://www.amazon.de/DIGITUS-Adapterkabel-40pol-480Mbps-schwarz/dp/B007X86VZK > - I used a different Notebook > - I ran the test natively on that notebook (instead of from > VirtualBox. I used VirtualBox for most of the tests as I have to > force-poweroff the PC everytime the btrfs-send hangs as it is not > killable) This problem only happens in VirtualBox? Or it happens on baremetal also? And we've established it happens with two different source (send) devices, which means two different Btrfs volumes. All I can say is you need to keep changing things up, process of elimination. Rather tedious. Maybe you could try downloading a Fedora 28 ISO, make a boot stick out of it, and try to reproduce with the same drives. At least that's an easy way to isolate the OS from the equation. -- Chris Murphy
Re: btrfs send hung in pipe_wait
List of steps: - 3.8G iso lays in read-only subvol A - I create subvol B and reflink-copy the iso into it. - I create a read-only snapshot C of B - I "btrfs send --no-data C > /somefile" So you got that right, yes. Unfortunately I don't have any way to connect the drive to a SATA port directly but I tried to switch out as much of the used setup as possible (all changes active at the same time): - I got the original (not the clone) HDD out of the enclosure and used this adapter to connect it: https://www.amazon.de/DIGITUS-Adapterkabel-40pol-480Mbps-schwarz/dp/B007X86VZK - I used a different Notebook - I ran the test natively on that notebook (instead of from VirtualBox. I used VirtualBox for most of the tests as I have to force-poweroff the PC everytime the btrfs-send hangs as it is not killable) - That notebook runs Manjaro with "Linux asus 4.14.67-1-MANJARO #1 SMP PREEMPT Fri Aug 24 16:33:26 UTC 2018 x86_64 GNU/Linux" Same results. btrfs-send freezes. Am Fr., 7. Sep. 2018 um 17:44 Uhr schrieb Chris Murphy : > > On Fri, Sep 7, 2018 at 6:47 AM, Stefan Loewen wrote: > > Well... It seems it's not the hardware. > > I ran a long SMART check which ran through without errors and > > reallocation count is still 0. > > That only checks the drive, it's an internal test. It doesn't check > anything else, including connections. > > Also you do have a log with a read error and a sector LBA reported. So > there is a hardware issue, it could just be transient. > > > > So I used clonezilla (partclone.btrfs) to mirror the drive to another > > drive (same model). > > Everything copied over just fine. No I/O error im dmesg. > > > > The new disk shows the same behavior. > > So now I'm suspicious of USB behavior. Like I said earlier, when I've > got USB enclosed drives connect to my NUC, regardless of file system, > I routinely get hangs and USB resets. I have to connect all of my USB > enclosed drives to a good USB hub, or I have problems. > > > > > So I created another subvolume, reflinked stuff over and found that it > > is enough to reflink one file, create a read-only snapshot and try to > > btrfs-send that. It's not happening with every file, but there are > > definitely multiple different files. The one I tested with is a 3.8GB > > ISO file. > > Even better: > > 'btrfs send --no-data snap-one > /dev/null' > > (snap-one containing just one iso file) hangs as well. > > Do you have a list of steps to make this clear? It sounds like first > you copy a 3.8G ISO file to one subvolume, then reflink copy it into > another subvolume, then snapshot that 2nd subvolume, and try to send > the snapshot? But I want to be clear. > > I've got piles of reflinked files in snapshots and they send OK, > although like I said I do get sometimes a 15-30 second hang during > sends. > > > Still dmesg shows no IO errors, only "INFO: task btrfs-transacti:541 > > blocked for more than 120 seconds." with associated call trace. > > btrfs-send reads some MB in the beginning, writes a few bytes and then > > hangs without further IO. > > > > copying the same file without --reflink, snapshotting and sending > > works without problems. > > > > I guess that pretty much eleminates bad sectors and points towards > > some problem with reflinks / btrfs metadata. > > That's pretty weird. I'll keep trying and see if I hit this. What > happens if you downgrade to an older kernel? Either 4.14 or 4.17 or > both. The send code is mainly in the kernel, where the receive code is > mainly in user space tools, for this testing you don't need to > downgrade user space tools. If there's a bug here, I expect it's > kernel. > > > > > -- > Chris Murphy
Re: btrfs send hung in pipe_wait
On Fri, Sep 7, 2018 at 6:47 AM, Stefan Loewen wrote: > Well... It seems it's not the hardware. > I ran a long SMART check which ran through without errors and > reallocation count is still 0. That only checks the drive, it's an internal test. It doesn't check anything else, including connections. Also you do have a log with a read error and a sector LBA reported. So there is a hardware issue, it could just be transient. > So I used clonezilla (partclone.btrfs) to mirror the drive to another > drive (same model). > Everything copied over just fine. No I/O error im dmesg. > > The new disk shows the same behavior. So now I'm suspicious of USB behavior. Like I said earlier, when I've got USB enclosed drives connect to my NUC, regardless of file system, I routinely get hangs and USB resets. I have to connect all of my USB enclosed drives to a good USB hub, or I have problems. > So I created another subvolume, reflinked stuff over and found that it > is enough to reflink one file, create a read-only snapshot and try to > btrfs-send that. It's not happening with every file, but there are > definitely multiple different files. The one I tested with is a 3.8GB > ISO file. > Even better: > 'btrfs send --no-data snap-one > /dev/null' > (snap-one containing just one iso file) hangs as well. Do you have a list of steps to make this clear? It sounds like first you copy a 3.8G ISO file to one subvolume, then reflink copy it into another subvolume, then snapshot that 2nd subvolume, and try to send the snapshot? But I want to be clear. I've got piles of reflinked files in snapshots and they send OK, although like I said I do get sometimes a 15-30 second hang during sends. > Still dmesg shows no IO errors, only "INFO: task btrfs-transacti:541 > blocked for more than 120 seconds." with associated call trace. > btrfs-send reads some MB in the beginning, writes a few bytes and then > hangs without further IO. > > copying the same file without --reflink, snapshotting and sending > works without problems. > > I guess that pretty much eleminates bad sectors and points towards > some problem with reflinks / btrfs metadata. That's pretty weird. I'll keep trying and see if I hit this. What happens if you downgrade to an older kernel? Either 4.14 or 4.17 or both. The send code is mainly in the kernel, where the receive code is mainly in user space tools, for this testing you don't need to downgrade user space tools. If there's a bug here, I expect it's kernel. -- Chris Murphy
Re: btrfs send hung in pipe_wait
Well... It seems it's not the hardware. I ran a long SMART check which ran through without errors and reallocation count is still 0. So I used clonezilla (partclone.btrfs) to mirror the drive to another drive (same model). Everything copied over just fine. No I/O error im dmesg. The new disk shows the same behavior. So I created another subvolume, reflinked stuff over and found that it is enough to reflink one file, create a read-only snapshot and try to btrfs-send that. It's not happening with every file, but there are definitely multiple different files. The one I tested with is a 3.8GB ISO file. Even better: 'btrfs send --no-data snap-one > /dev/null' (snap-one containing just one iso file) hangs as well. Still dmesg shows no IO errors, only "INFO: task btrfs-transacti:541 blocked for more than 120 seconds." with associated call trace. btrfs-send reads some MB in the beginning, writes a few bytes and then hangs without further IO. copying the same file without --reflink, snapshotting and sending works without problems. I guess that pretty much eleminates bad sectors and points towards some problem with reflinks / btrfs metadata. Btw.: Thanks for taking that much time for helping me find the problem here, Chris. Very much appreciated! Am Fr., 7. Sep. 2018 um 05:29 Uhr schrieb Chris Murphy : > > On Thu, Sep 6, 2018 at 2:16 PM, Stefan Loewen wrote: > > > Data,single: Size:695.01GiB, Used:653.69GiB > > /dev/sdb1 695.01GiB > > Metadata,DUP: Size:4.00GiB, Used:2.25GiB > > /dev/sdb1 8.00GiB > > System,DUP: Size:40.00MiB, Used:96.00KiB > > > > Does that mean Metadata is duplicated? > > Yes. Single copy for data. Duplicate for metadata+system, and there > are no single chunks for metadata/system. > > > > > Ok so to summarize and see if I understood you correctly: > > There are bad sectors on disk. Running an extended selftest (smartctl -t > > long) could find those and replace them with spare sectors. > > More likely if it finds a persistently failing sector, it will just > record the first failing sector LBA in its log, and then abort. You'll > see this info with 'smartctl -a' or with -x. > > It is possible to resume the test using selective option and picking a > 4K aligned 512 byte LBA value after the 4K sector with the defect. > Just because only one is reported in dmesg doesn't mean there isn't a > bad one. > > It's unlikely the long test is going to actually fix anything, it'll > just give you more ammunition for getting a likely under warranty > device replaced because it really shouldn't have any issues at this > age. > > > > If it does not I can try calculating the physical (4K) sector number and > > write to that to make the drive notice and mark the bad sector. > > Is there a way to find out which file I will be writing to beforehand? > > I'm not sure how to do it easily. > > >Or is > > it easier to just write to the sector and then wait for scrub to tell me > > (and the sector is broken anyways)? > > If it's a persistent read error, then it's lost. So you might as well > overwrite it. If it's data, scrub will tell you what file is corrupted > (and restore can help you recover the whole file, of course it'll have > a 4K hole of zeros in it). If it's metadata, Btrfs will fix up the 4K > hole with duplicate metadata. > > Gotcha is to make certain you've got the right LBA to write to. You > can use dd to test this, by reading the suspect bad sector, and if > you've got the right one, you'll get an I/O error in user space and > dmesg will have a message like before with sector value. Use the dd > skip= flag for reading, but make *sure* you use seek= when writing > *and* make sure you always use bs=4096 count=1 so that if you make a > mistake you limit the damage haha. > > > > > For the drive: Not under warranty anymore. It's an external HDD that I had > > lying around for years, mostly unused. Now I wanted to use it as part of my > > small DIY NAS. > > Gotcha. Well you can read up on smartctl and smartd, and set it up for > regular extended tests, and keep an eye on rapidly changing values. It > might give you a 50/50 chance of an early heads up before it dies. > > I've got an old Hitachi/Apple laptop drive that years ago developed > multiple bad sectors in different zones of the drive. They got > remapped and I haven't had a problem with that drive since. *shrug* > And in fact I did get a discrete error message from the drive for one > of those and Btrfs overwrote that bad sector with a good copy (it's in > a raid1 volume), so working as designed I guess. > > Since you didn't get a fix up message from Btrfs, either the whole > thing just got confused with hanging tasks, or it's possible it's a > data block. > > > -- > Chris Murphy
Re: btrfs send hung in pipe_wait
On Thu, Sep 6, 2018 at 2:16 PM, Stefan Loewen wrote: > Data,single: Size:695.01GiB, Used:653.69GiB > /dev/sdb1 695.01GiB > Metadata,DUP: Size:4.00GiB, Used:2.25GiB > /dev/sdb1 8.00GiB > System,DUP: Size:40.00MiB, Used:96.00KiB > Does that mean Metadata is duplicated? Yes. Single copy for data. Duplicate for metadata+system, and there are no single chunks for metadata/system. > > Ok so to summarize and see if I understood you correctly: > There are bad sectors on disk. Running an extended selftest (smartctl -t > long) could find those and replace them with spare sectors. More likely if it finds a persistently failing sector, it will just record the first failing sector LBA in its log, and then abort. You'll see this info with 'smartctl -a' or with -x. It is possible to resume the test using selective option and picking a 4K aligned 512 byte LBA value after the 4K sector with the defect. Just because only one is reported in dmesg doesn't mean there isn't a bad one. It's unlikely the long test is going to actually fix anything, it'll just give you more ammunition for getting a likely under warranty device replaced because it really shouldn't have any issues at this age. > If it does not I can try calculating the physical (4K) sector number and > write to that to make the drive notice and mark the bad sector. > Is there a way to find out which file I will be writing to beforehand? I'm not sure how to do it easily. >Or is > it easier to just write to the sector and then wait for scrub to tell me > (and the sector is broken anyways)? If it's a persistent read error, then it's lost. So you might as well overwrite it. If it's data, scrub will tell you what file is corrupted (and restore can help you recover the whole file, of course it'll have a 4K hole of zeros in it). If it's metadata, Btrfs will fix up the 4K hole with duplicate metadata. Gotcha is to make certain you've got the right LBA to write to. You can use dd to test this, by reading the suspect bad sector, and if you've got the right one, you'll get an I/O error in user space and dmesg will have a message like before with sector value. Use the dd skip= flag for reading, but make *sure* you use seek= when writing *and* make sure you always use bs=4096 count=1 so that if you make a mistake you limit the damage haha. > > For the drive: Not under warranty anymore. It's an external HDD that I had > lying around for years, mostly unused. Now I wanted to use it as part of my > small DIY NAS. Gotcha. Well you can read up on smartctl and smartd, and set it up for regular extended tests, and keep an eye on rapidly changing values. It might give you a 50/50 chance of an early heads up before it dies. I've got an old Hitachi/Apple laptop drive that years ago developed multiple bad sectors in different zones of the drive. They got remapped and I haven't had a problem with that drive since. *shrug* And in fact I did get a discrete error message from the drive for one of those and Btrfs overwrote that bad sector with a good copy (it's in a raid1 volume), so working as designed I guess. Since you didn't get a fix up message from Btrfs, either the whole thing just got confused with hanging tasks, or it's possible it's a data block. -- Chris Murphy
Re: btrfs send hung in pipe_wait
[root@archlinux @data]# btrfs fi us /mnt/intenso_white/ Overall: Device size: 911.51GiB Device allocated: 703.09GiB Device unallocated: 208.43GiB Device missing: 0.00B Used: 658.19GiB Free (estimated): 249.75GiB (min: 145.53GiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data,single: Size:695.01GiB, Used:653.69GiB /dev/sdb1 695.01GiB Metadata,DUP: Size:4.00GiB, Used:2.25GiB /dev/sdb1 8.00GiB System,DUP: Size:40.00MiB, Used:96.00KiB /dev/sdb1 80.00MiB Unallocated: /dev/sdb1 208.43GiB Does that mean Metadata is duplicated? Ok so to summarize and see if I understood you correctly: There are bad sectors on disk. Running an extended selftest (smartctl -t long) could find those and replace them with spare sectors. If it does not I can try calculating the physical (4K) sector number and write to that to make the drive notice and mark the bad sector. Is there a way to find out which file I will be writing to beforehand? Or is it easier to just write to the sector and then wait for scrub to tell me (and the sector is broken anyways)? For the drive: Not under warranty anymore. It's an external HDD that I had lying around for years, mostly unused. Now I wanted to use it as part of my small DIY NAS. On 9/6/18 9:58 PM, Chris Murphy wrote: On Thu, Sep 6, 2018 at 12:36 PM, Stefan Loewen wrote: Output of the commands is attached. fdisk Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes smart Sector Sizes: 512 bytes logical, 4096 bytes physical So clearly the case is lying about the actual physical sector size of the drive. It's very common. But it means to fix the bad sector by writing to it, must be a 4K write. A 512 byte write to the reported LBA, will fail because it is a RMW, and the read will fail. So if you write to that sector, you'll get a read failure. Kinda confusing. So you can convert the LBA to a 4K value, and use dd to write to that "4K LBA" using bs=4096 and a count of 1 but only when you're ready to lose all 4096 bytes in that sector. If it's data, it's fine. It's the loss of one file, and scrub will find and report path to file so you know what was affected. If it's metadata, it could be a problem. What do you get for 'btrfs fi us ' for this volume? I'm wondering if DUP metadata is being used across the board with no single chunks. If so, then you can zero that sector, and Btrfs will detect the missing metadata in that chunk on scrub, and fix it up from a copy. But if you only have single copy metadata, it just depends what's on that block as to how recoverable or repairable this is. 195 Hardware_ECC_Recovered -O-RCK 100 100 000-0 196 Reallocated_Event_Count -O--CK 252 252 000-0 197 Current_Pending_Sector -O--CK 252 252 000-0 198 Offline_Uncorrectable CK 252 252 000-0 Interesting, no complaints there. Unexpected. 11 Calibration_Retry_Count -O--CK 100 100 000-8 200 Multi_Zone_Error_Rate -O-R-K 100 100 000-31 https://kb.acronis.com/content/9136 This is a low hour device, probably still under warranty? I'd get it swapped out. If you want more ammunition for arguing in favor of a swap out under warranty you could do smartctl -t long /dev/sdb That will take just under 4 hours to run (you can use the drive in the meantime, but it'll take a bit longer); and then after that smartctl -x /dev/sdb And see if it's found a bad sector or updated any of those smart values for the worse in particular the offline values. SCT (Get) Error Recovery Control command failed OK so not configurable, it is whatever it is and we don't know what that is. Probably one of the really long recoveries. The broken-sector-theory sounds plausible and is compatible with my new findings: I suspected the problem to be in one specific directory, let's call it "broken_dir". I created a new subvolume and copied broken_dir over. - If I copied it with cp --reflink, made a snapshot and tried to btrfs-send that, it hung - If I rsynced broken_dir over I could snapshot and btrfs-send without a problem. Yeah I'm not sure what it is, maybe a data block. But shouldn't btrfs scrub or check find such errors? Nope. Btrfs expects the drive to complete the read command, but always second guesses the content of the read by comparing to checksums. So if the drive just supplied corrupt data, Btrfs would detect that and discretely report, and if there's a good copy it would self heal. But it can't do that because the drive or USB bus also seems to hang in such a way that a bunch of tasks are also hung, and none of them are getting a clear pass/fail for the read. It just hangs. Arguably the device or the link should not hang. So I'm still wondering if something else
Re: btrfs send hung in pipe_wait
On Thu, Sep 6, 2018 at 12:36 PM, Stefan Loewen wrote: > Output of the commands is attached. fdisk Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes smart Sector Sizes: 512 bytes logical, 4096 bytes physical So clearly the case is lying about the actual physical sector size of the drive. It's very common. But it means to fix the bad sector by writing to it, must be a 4K write. A 512 byte write to the reported LBA, will fail because it is a RMW, and the read will fail. So if you write to that sector, you'll get a read failure. Kinda confusing. So you can convert the LBA to a 4K value, and use dd to write to that "4K LBA" using bs=4096 and a count of 1 but only when you're ready to lose all 4096 bytes in that sector. If it's data, it's fine. It's the loss of one file, and scrub will find and report path to file so you know what was affected. If it's metadata, it could be a problem. What do you get for 'btrfs fi us ' for this volume? I'm wondering if DUP metadata is being used across the board with no single chunks. If so, then you can zero that sector, and Btrfs will detect the missing metadata in that chunk on scrub, and fix it up from a copy. But if you only have single copy metadata, it just depends what's on that block as to how recoverable or repairable this is. 195 Hardware_ECC_Recovered -O-RCK 100 100 000-0 196 Reallocated_Event_Count -O--CK 252 252 000-0 197 Current_Pending_Sector -O--CK 252 252 000-0 198 Offline_Uncorrectable CK 252 252 000-0 Interesting, no complaints there. Unexpected. 11 Calibration_Retry_Count -O--CK 100 100 000-8 200 Multi_Zone_Error_Rate -O-R-K 100 100 000-31 https://kb.acronis.com/content/9136 This is a low hour device, probably still under warranty? I'd get it swapped out. If you want more ammunition for arguing in favor of a swap out under warranty you could do smartctl -t long /dev/sdb That will take just under 4 hours to run (you can use the drive in the meantime, but it'll take a bit longer); and then after that smartctl -x /dev/sdb And see if it's found a bad sector or updated any of those smart values for the worse in particular the offline values. SCT (Get) Error Recovery Control command failed OK so not configurable, it is whatever it is and we don't know what that is. Probably one of the really long recoveries. > > The broken-sector-theory sounds plausible and is compatible with my new > findings: > I suspected the problem to be in one specific directory, let's call it > "broken_dir". > I created a new subvolume and copied broken_dir over. > - If I copied it with cp --reflink, made a snapshot and tried to btrfs-send > that, it hung > - If I rsynced broken_dir over I could snapshot and btrfs-send without a > problem. Yeah I'm not sure what it is, maybe a data block. > > But shouldn't btrfs scrub or check find such errors? Nope. Btrfs expects the drive to complete the read command, but always second guesses the content of the read by comparing to checksums. So if the drive just supplied corrupt data, Btrfs would detect that and discretely report, and if there's a good copy it would self heal. But it can't do that because the drive or USB bus also seems to hang in such a way that a bunch of tasks are also hung, and none of them are getting a clear pass/fail for the read. It just hangs. Arguably the device or the link should not hang. So I'm still wondering if something else is going on, but this is just the most obvious first problem, and maybe it's being complicated by another problem we haven't figure out yet. Anyway, once this problem is solve, it'll become clear if there are additional problems or not. In my case, I often get usb reset errors when I directly connect USB 3.0 drives to my Intel NUC, but I don't ever get them when plugging the drive into a dyconn hub. So if you don't already have a hub in between the drive and the computer, it might be worth considering. Basically the hub is going to read and completely rewrite the whole stream that goes through it (in both directions). -- Chris Murphy
Re: btrfs send hung in pipe_wait
Output of the commands is attached. The broken-sector-theory sounds plausible and is compatible with my new findings: I suspected the problem to be in one specific directory, let's call it "broken_dir". I created a new subvolume and copied broken_dir over. - If I copied it with cp --reflink, made a snapshot and tried to btrfs-send that, it hung - If I rsynced broken_dir over I could snapshot and btrfs-send without a problem. But shouldn't btrfs scrub or check find such errors? On 9/6/18 8:16 PM, Chris Murphy wrote: OK you've got a different problem. [ 186.898756] sd 2:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK [ 186.898762] sd 2:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 15 26 a0 d0 00 08 00 00 [ 186.898764] print_req_error: I/O error, dev sdb, sector 354853072 [ 187.109641] usb 2-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd [ 187.345245] usb 2-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd [ 187.657844] usb 2-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd [ 187.851336] usb 2-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd [ 188.026882] usb 2-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd [ 188.215881] usb 2-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd [ 188.247028] sd 2:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK [ 188.247041] sd 2:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 15 26 a8 d0 00 08 00 00 [ 188.247048] print_req_error: I/O error, dev sdb, sector 354855120 This is a read error for a specific sector. So your drive has media problems. And I think that's the instigating problem here, from which a bunch of other tasks that depend on one or more reads completing but never do. But weirdly there also isn't any kind of libata reset. At least on SATA, by default we see a link reset after a command has not returned in 30 seconds. That reset would totally clear the drive's command queue, and then things either can recover or barf. But in your case, neither happens and it just sits there with hung tasks. [ 189.350360] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0 And that's the last we really see from Btrfs. After that, it's all just hung task traces and are rather unsurprising to me. Drives in USB cases add a whole bunch of complicating factors for troubleshooting and repair. Including often masking the actual logical and physical sector size, the min and max IO size, alignment offset, and all kinds of things. They can have all sorts of bugs. And I'm also not totally certain about the relationship between the usb reset messages and the bad sector. As far as I know the only way we can get a sector LBA expressly noted in dmesg along with the failed read(10) command, is if the drive has reported back to libata that discrete error with sense information. So I'm accepting that as a reliable error, rather than it being something like a cable. But the reset messages could possibly be something else in addition to that. Anyway, the central issue is sector 354855120 is having problems. I can't tell from the trace if it's transient or persistent. Maybe if it's transient, that would explain how you sometimes get send to start working again briefly but then it reverts to hanging. What do you get for: fdisk -l /dev/sdb smartctl -x /dev/sdb smartctl -l sct erc /dev/sdb Those are all read only commands, nothing is written or changed. [root@archlinux ~]# fdisk -l /dev/sdb Disk /dev/sdb: 931.5 GiB, 1000204140544 bytes, 1953523712 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x1cc21bd2 Device Boot StartEndSectors Size Id Type /dev/sdb1 2048 1933593750 1933591703 922G 83 Linux /dev/sdb2 1933610176 1953525167 19914992 9.5G 83 Linux [root@archlinux ~]# smartctl -x /dev/sdb smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.5-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Samsung SpinPoint M8 (AF) Device Model: ST1000LM024 HN-M101MBB Serial Number:S2RXJ9FCB07612 LU WWN Device Id: 5 0004cf 208d24759 Firmware Version: 2AR10002 User Capacity:1,000,204,886,016 bytes [1.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate:5400 rpm Form Factor: 2.5 inches Device is:In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 6 SATA Version is: SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s) Local Time is:Thu Sep 6 18:23:57 2018 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM level is: 254 (maximum performance), recommended: 254 APM feature is: Disabled Rd look-ahead is: Enabled Write
Re: btrfs send hung in pipe_wait
On Thu, Sep 6, 2018 at 10:03 AM, Stefan Löwen wrote: > I have one subvolume (rw) and 2 snapshots (ro) of it. > > I just tested 'btrfs send > /dev/null' and that also shows no IO > after a while but also no significant CPU usage. > During this I tried 'ls' on the source subvolume and it hangs as well. > dmesg has some interesting messages I think (see attached dmesg.log) > OK you've got a different problem. [ 186.898756] sd 2:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK [ 186.898762] sd 2:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 15 26 a0 d0 00 08 00 00 [ 186.898764] print_req_error: I/O error, dev sdb, sector 354853072 [ 187.109641] usb 2-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd [ 187.345245] usb 2-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd [ 187.657844] usb 2-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd [ 187.851336] usb 2-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd [ 188.026882] usb 2-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd [ 188.215881] usb 2-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd [ 188.247028] sd 2:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK [ 188.247041] sd 2:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 15 26 a8 d0 00 08 00 00 [ 188.247048] print_req_error: I/O error, dev sdb, sector 354855120 This is a read error for a specific sector. So your drive has media problems. And I think that's the instigating problem here, from which a bunch of other tasks that depend on one or more reads completing but never do. But weirdly there also isn't any kind of libata reset. At least on SATA, by default we see a link reset after a command has not returned in 30 seconds. That reset would totally clear the drive's command queue, and then things either can recover or barf. But in your case, neither happens and it just sits there with hung tasks. [ 189.350360] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0 And that's the last we really see from Btrfs. After that, it's all just hung task traces and are rather unsurprising to me. Drives in USB cases add a whole bunch of complicating factors for troubleshooting and repair. Including often masking the actual logical and physical sector size, the min and max IO size, alignment offset, and all kinds of things. They can have all sorts of bugs. And I'm also not totally certain about the relationship between the usb reset messages and the bad sector. As far as I know the only way we can get a sector LBA expressly noted in dmesg along with the failed read(10) command, is if the drive has reported back to libata that discrete error with sense information. So I'm accepting that as a reliable error, rather than it being something like a cable. But the reset messages could possibly be something else in addition to that. Anyway, the central issue is sector 354855120 is having problems. I can't tell from the trace if it's transient or persistent. Maybe if it's transient, that would explain how you sometimes get send to start working again briefly but then it reverts to hanging. What do you get for: fdisk -l /dev/sdb smartctl -x /dev/sdb smartctl -l sct erc /dev/sdb Those are all read only commands, nothing is written or changed. -- Chris Murphy
Re: btrfs send hung in pipe_wait
I have one subvolume (rw) and 2 snapshots (ro) of it. I just tested 'btrfs send > /dev/null' and that also shows no IO after a while but also no significant CPU usage. During this I tried 'ls' on the source subvolume and it hangs as well. dmesg has some interesting messages I think (see attached dmesg.log) On 9/6/18 5:48 PM, Chris Murphy wrote: On Thu, Sep 6, 2018 at 9:04 AM, Stefan Loewen wrote: Update: It seems like btrfs-send is not completely hung. It somewhat regularly wakes up every hour to transfer a few bytes. I noticed this via a periodic 'ls -l' on the snapshot file. These are the last outputs (uniq'ed): -rw--- 1 root root 1492797759 Sep 6 08:44 intenso_white.snapshot -rw--- 1 root root 1493087856 Sep 6 09:44 intenso_white.snapshot -rw--- 1 root root 1773825308 Sep 6 10:44 intenso_white.snapshot -rw--- 1 root root 1773976853 Sep 6 11:58 intenso_white.snapshot -rw--- 1 root root 1774122301 Sep 6 12:59 intenso_white.snapshot -rw--- 1 root root 1774274264 Sep 6 13:58 intenso_white.snapshot -rw--- 1 root root 1774435235 Sep 6 14:57 intenso_white.snapshot I also monitor the /proc/3022/task/*/stack files with 'tail -f' (I have no idea if this is useful) but there are no changes, even during the short wakeups. I have a sort of "me too" here. I definitely see btrfs send just hang for no apparent reason, but in my case it's for maybe 15-30 seconds. Not an hour. Looking at top and iotop at the same time as the LED lights on the drives, there's definitely zero activity happening. I can make things happen during this time - like I can read a file or save a file from/to any location including the send source or receive destination. It really just behaves as if the send thread is saying "OK I'm gonna nap now, back in a bit" and then it is. So what I end up with on drives with a minimum read-write of 80M/s, is a send receive that's getting me a net of about 30M/s. I have around 100 snapshots on the source device. How many total snapshots do you have on your source? That does appear to affect performance for some things, including send/receive. [0.00] Linux version 4.18.5-arch1-1-ARCH (builduser@heftig-12250) (gcc version 8.2.0 (GCC)) #1 SMP PREEMPT Fri Aug 24 12:48:58 UTC 2018 [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-linux root=UUID=a83c9650-c8f8-4afe-90a6-4e80156d523d rw quiet [0.00] KERNEL supported cpus: [0.00] Intel GenuineIntel [0.00] AMD AuthenticAMD [0.00] Centaur CentaurHauls [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [0.00] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format. [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009fbff] usable [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xdffe] usable [0.00] BIOS-e820: [mem 0xdfff-0xdfff] ACPI data [0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved [0.00] BIOS-e820: [mem 0xfffc-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00011fff] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.5 present. [0.00] DMI: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [0.00] Hypervisor detected: KVM [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] last_pfn = 0x12 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR variable ranges disabled: [0.00] Disabled [0.00] x86/PAT: MTRRs disabled, skipping PAT initialization too. [0.00] CPU MTRRs all blank - virtualized system. [0.00] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC [0.00] last_pfn = 0xdfff0 max_arch_pfn = 0x4 [0.00] found SMP MP-table at [mem 0x0009fff0-0x0009] mapped at [(ptrval)] [0.00] Scanning 1 areas for low memory corruption [0.00] Base memory trampoline at [(ptrval)] 99000 size 24576 [0.00] BRK [0x115a77000, 0x115a77fff] PGTABLE [0.00] BRK [0x115a78000, 0x115a78fff] PGTABLE [0.00] BRK [0x115a79000, 0x115a79fff] PGTABLE [0.00] BRK [0x115a7a000, 0x115a7afff] PGTABLE [0.00] BRK [0x115a7b000, 0x115a7bfff] PGTABLE [
Re: btrfs send hung in pipe_wait
On Thu, Sep 6, 2018 at 9:04 AM, Stefan Loewen wrote: > Update: > It seems like btrfs-send is not completely hung. It somewhat regularly > wakes up every hour to transfer a few bytes. I noticed this via a > periodic 'ls -l' on the snapshot file. These are the last outputs > (uniq'ed): > > -rw--- 1 root root 1492797759 Sep 6 08:44 intenso_white.snapshot > -rw--- 1 root root 1493087856 Sep 6 09:44 intenso_white.snapshot > -rw--- 1 root root 1773825308 Sep 6 10:44 intenso_white.snapshot > -rw--- 1 root root 1773976853 Sep 6 11:58 intenso_white.snapshot > -rw--- 1 root root 1774122301 Sep 6 12:59 intenso_white.snapshot > -rw--- 1 root root 1774274264 Sep 6 13:58 intenso_white.snapshot > -rw--- 1 root root 1774435235 Sep 6 14:57 intenso_white.snapshot > > I also monitor the /proc/3022/task/*/stack files with 'tail -f' (I > have no idea if this is useful) but there are no changes, even during > the short wakeups. I have a sort of "me too" here. I definitely see btrfs send just hang for no apparent reason, but in my case it's for maybe 15-30 seconds. Not an hour. Looking at top and iotop at the same time as the LED lights on the drives, there's definitely zero activity happening. I can make things happen during this time - like I can read a file or save a file from/to any location including the send source or receive destination. It really just behaves as if the send thread is saying "OK I'm gonna nap now, back in a bit" and then it is. So what I end up with on drives with a minimum read-write of 80M/s, is a send receive that's getting me a net of about 30M/s. I have around 100 snapshots on the source device. How many total snapshots do you have on your source? That does appear to affect performance for some things, including send/receive. -- Chris Murphy
Re: btrfs send hung in pipe_wait
Update: It seems like btrfs-send is not completely hung. It somewhat regularly wakes up every hour to transfer a few bytes. I noticed this via a periodic 'ls -l' on the snapshot file. These are the last outputs (uniq'ed): -rw--- 1 root root 1492797759 Sep 6 08:44 intenso_white.snapshot -rw--- 1 root root 1493087856 Sep 6 09:44 intenso_white.snapshot -rw--- 1 root root 1773825308 Sep 6 10:44 intenso_white.snapshot -rw--- 1 root root 1773976853 Sep 6 11:58 intenso_white.snapshot -rw--- 1 root root 1774122301 Sep 6 12:59 intenso_white.snapshot -rw--- 1 root root 1774274264 Sep 6 13:58 intenso_white.snapshot -rw--- 1 root root 1774435235 Sep 6 14:57 intenso_white.snapshot I also monitor the /proc/3022/task/*/stack files with 'tail -f' (I have no idea if this is useful) but there are no changes, even during the short wakeups. Am Do., 6. Sep. 2018 um 11:22 Uhr schrieb Stefan Löwen : > > Hello linux-btrfs, > > I'm trying to clone a subvolume with 'btrfs send' but it always hangs > for hours. > > I tested this on multiple systems. All showed the same result: > - Manjaro (btrfs-progs v4.17.1; linux v4.18.5-1-MANJARO) > - Ubuntu 18.04 in VirtualBox (btrfs-progs v4.15.1; linux v4.15.0-33-generic) > - ArchLinux in VirtualBox (btrfs-progs v4.17.1; linux v4.18.5-arch1-1-ARCH) > All following logs are from the ArchLinux VM. > > To make sure it's not the 'btrfs receive' at fault I tried sending into > a file using the following command: > 'strace -o btrfs-send.strace btrfs send -vvv -f intenso_white.snapshot > /mnt/intenso_white/@data/.snapshots/test-snapshot' > > The 'btrfs send' process always copies around 1.2-1.4G of data, then > stops all disk IO and fully loads one cpu core. btrfs scrub found 0 > errors. Neither did btrfsck. 'btrfs device stats' is all 0. > > I would be thankful for all ideas and tips. > > Regards > Stefan > > - > > The btrfs-send.strace is attached. So is the dmesg.log during the hang. > > Stack traces of the hung process: > --- /proc/3022/task/3022/stack --- > [<0>] 0x > --- /proc/3022/task/3023/stack --- > [<0>] pipe_wait+0x6c/0xb0 > [<0>] splice_from_pipe_next.part.3+0x24/0xa0 > [<0>] __splice_from_pipe+0x43/0x180 > [<0>] splice_from_pipe+0x5d/0x90 > [<0>] default_file_splice_write+0x15/0x20 > [<0>] __se_sys_splice+0x31b/0x770 > [<0>] do_syscall_64+0x5b/0x170 > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [<0>] 0x > > [vagrant@archlinux mnt]$ uname -a > Linux archlinux 4.18.5-arch1-1-ARCH #1 SMP PREEMPT Fri Aug 24 12:48:58 > UTC 2018 x86_64 GNU/Linux > > [vagrant@archlinux mnt]$ btrfs --version > btrfs-progs v4.17.1 > > [vagrant@archlinux mnt]$ sudo btrfs fi show /dev/sdb1 > Label: 'intenso_white' uuid: 07bf61ed-7728-4151-a784-c4b840e343ed > Total devices 1 FS bytes used 655.82GiB > devid1 size 911.51GiB used 703.09GiB path /dev/sdb1 > > [vagrant@archlinux mnt]$ sudo btrfs fi df /mnt/intenso_white/ > Data, single: total=695.01GiB, used=653.69GiB > System, DUP: total=40.00MiB, used=96.00KiB > Metadata, DUP: total=4.00GiB, used=2.13GiB > GlobalReserve, single: total=512.00MiB, used=0.00B >
btrfs send hung in pipe_wait
Hello linux-btrfs, I'm trying to clone a subvolume with 'btrfs send' but it always hangs for hours. I tested this on multiple systems. All showed the same result: - Manjaro (btrfs-progs v4.17.1; linux v4.18.5-1-MANJARO) - Ubuntu 18.04 in VirtualBox (btrfs-progs v4.15.1; linux v4.15.0-33-generic) - ArchLinux in VirtualBox (btrfs-progs v4.17.1; linux v4.18.5-arch1-1-ARCH) All following logs are from the ArchLinux VM. To make sure it's not the 'btrfs receive' at fault I tried sending into a file using the following command: 'strace -o btrfs-send.strace btrfs send -vvv -f intenso_white.snapshot /mnt/intenso_white/@data/.snapshots/test-snapshot' The 'btrfs send' process always copies around 1.2-1.4G of data, then stops all disk IO and fully loads one cpu core. btrfs scrub found 0 errors. Neither did btrfsck. 'btrfs device stats' is all 0. I would be thankful for all ideas and tips. Regards Stefan - The btrfs-send.strace is attached. So is the dmesg.log during the hang. Stack traces of the hung process: --- /proc/3022/task/3022/stack --- [<0>] 0x --- /proc/3022/task/3023/stack --- [<0>] pipe_wait+0x6c/0xb0 [<0>] splice_from_pipe_next.part.3+0x24/0xa0 [<0>] __splice_from_pipe+0x43/0x180 [<0>] splice_from_pipe+0x5d/0x90 [<0>] default_file_splice_write+0x15/0x20 [<0>] __se_sys_splice+0x31b/0x770 [<0>] do_syscall_64+0x5b/0x170 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<0>] 0x [vagrant@archlinux mnt]$ uname -a Linux archlinux 4.18.5-arch1-1-ARCH #1 SMP PREEMPT Fri Aug 24 12:48:58 UTC 2018 x86_64 GNU/Linux [vagrant@archlinux mnt]$ btrfs --version btrfs-progs v4.17.1 [vagrant@archlinux mnt]$ sudo btrfs fi show /dev/sdb1 Label: 'intenso_white' uuid: 07bf61ed-7728-4151-a784-c4b840e343ed Total devices 1 FS bytes used 655.82GiB devid 1 size 911.51GiB used 703.09GiB path /dev/sdb1 [vagrant@archlinux mnt]$ sudo btrfs fi df /mnt/intenso_white/ Data, single: total=695.01GiB, used=653.69GiB System, DUP: total=40.00MiB, used=96.00KiB Metadata, DUP: total=4.00GiB, used=2.13GiB GlobalReserve, single: total=512.00MiB, used=0.00B execve("/usr/bin/btrfs", ["btrfs", "send", "-vvv", "-f", "intenso_white.snapshot", "/mnt/intenso_white/@data/.snapsh"...], 0x7ffcb8f52718 /* 16 vars */) = 0 brk(NULL) = 0x555c9eddb000 arch_prctl(0x3001 /* ARCH_??? */, 0x7ffe8c9971e0) = -1 EINVAL (Invalid argument) access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=28542, ...}) = 0 mmap(NULL, 28542, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ffa9a016000 close(3)= 0 openat(AT_FDCWD, "/usr/lib/libuuid.so.1", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\24\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=26552, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffa9a014000 mmap(NULL, 2121752, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffa99e0d000 mprotect(0x7ffa99e13000, 2093056, PROT_NONE) = 0 mmap(0x7ffa9a012000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5000) = 0x7ffa9a012000 close(3)= 0 openat(AT_FDCWD, "/usr/lib/libblkid.so.1", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\231\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=326480, ...}) = 0 mmap(NULL, 2426656, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffa99bbc000 mprotect(0x7ffa99c06000, 2097152, PROT_NONE) = 0 mmap(0x7ffa99e06000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4a000) = 0x7ffa99e06000 mmap(0x7ffa99e0c000, 1824, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffa99e0c000 close(3)= 0 openat(AT_FDCWD, "/usr/lib/libz.so.1", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320!\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=91912, ...}) = 0 mmap(NULL, 2187280, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffa999a5000 mprotect(0x7ffa999bb000, 2093056, PROT_NONE) = 0 mmap(0x7ffa99bba000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x7ffa99bba000 close(3)= 0 openat(AT_FDCWD, "/usr/lib/liblzo2.so.2", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20'\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=137432, ...}) = 0 mmap(NULL, 2232528, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffa99783000 mprotect(0x7ffa997a4000, 2093056, PROT_NONE) = 0 mmap(0x7ffa999a3000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2) = 0x7ffa999a3000 close(3)