Fwd: btrfs send hung in pipe_wait

2018-09-08 Thread Stefan Loewen
And this one as well.

-- Forwarded message -
From: Chris Murphy 
Date: Fr., 7. Sep. 2018 um 23:53 Uhr
Subject: Re: btrfs send hung in pipe_wait
To: Stefan Loewen 
Cc: Chris Murphy 


On Fri, Sep 7, 2018 at 3:19 PM, Stefan Loewen  wrote:
> Now I also tested with Fedora 28 (linux 4.16) from live-usb (so baremetal).
> Same result.
> Thanks for the pointer towards sys requests. sysrq-w is empty, but I
> sent a bunch of other sysrqs to get stacktraces etc. from the kernel.
> Logs are attached.

Needs a dev to take a look at this, or someone with usb or block
device knowledge to see if something unrelated to btrfs is hung up. I
can't parse it.

Using btrfs-progs 4.17.1, what do you get for 'btrfs check
--mode=lowmem ' ?



--
Chris Murphy


Fwd: btrfs send hung in pipe_wait

2018-09-08 Thread Stefan Loewen
Oops. Forgot CCing the mailinglist

-- Forwarded message -
From: Stefan Loewen 
Date: Fr., 7. Sep. 2018 um 23:19 Uhr
Subject: Re: btrfs send hung in pipe_wait
To: Chris Murphy 


No it does not only happen in VirtualBox. I already tested the following:
- Manjaro baremetal (btrfs-progs v4.17.1; linux v4.18.5 and v4.14.67)
- Ubuntu 18.04 in VirtualBox (btrfs-progs v4.15.1; linux v4.15.0-33-generic)
- ArchLinux in VirtualBox (btrfs-progs v4.17.1; linux v4.18.5-arch1-1-ARCH)
The logs I posted until now were mostly (iirc all of them) from the VM
with Arch.

Now I also tested with Fedora 28 (linux 4.16) from live-usb (so baremetal).
Same result.
Thanks for the pointer towards sys requests. sysrq-w is empty, but I
sent a bunch of other sysrqs to get stacktraces etc. from the kernel.
Logs are attached.

To recap:
I copied (reflink) a 3.8G iso file from a read-only snapshot (A) into
a new subvol (B) just to keep things small and managable.
There's nothing special about this file other than that it happens to
be one of the files to trigger the later btrfs-send to hang.
Not all files from A do this, but there are definitely multiple and
the problem only occurs when the files are reflinked.
I then create a snapshot (C) of B to be able to btrfs-send it.
Then I run "btrfs send snap-C > /somewhere", (--no-data leads to the
same) that process reads some MB from the source disk and writes a few
bytes to /somewhere and then just hangs without any further IO.
This is where I issued some sysrqs. The output is attached.
Then I tried killing the btrfs-send with ctrl-c and issued the sysrqs
again. I have no idea if that changed anything, but it didn't hurt, so
why not.

I'll try to minimize the dataset and maybe get a small fs-image
without too much personal information that I can upload so the issue
is reproducible by others.

Am Fr., 7. Sep. 2018 um 21:17 Uhr schrieb Chris Murphy
:
>
> On Fri, Sep 7, 2018 at 11:07 AM, Stefan Loewen  
> wrote:
> > List of steps:
> > - 3.8G iso lays in read-only subvol A
> > - I create subvol B and reflink-copy the iso into it.
> > - I create a read-only snapshot C of B
> > - I "btrfs send --no-data C > /somefile"
> > So you got that right, yes.
>
> OK I can't reproduce it. Sending A and C complete instantly with
> --no-data, and complete in the same time with a full send/receive. In
> my case I used a 4.9G ISO.
>
> I can't think of what local difference accounts for what you're
> seeing. There is really nothing special about --reflinks. The extent
> and csum data are identical to the original file, and that's the bulk
> of the metadata for a given file.
>
> What I can tell you is usually the developers want to see sysrq+w
> whenever there are blocked tasks.
> https://fedoraproject.org/wiki/QA/Sysrq
>
> You'll want to enable all sysrq functions. And next you'll want three
> ssh shells:
>
> 1. sudo journalctl -fk
> 2. sudo -i to become root, and then echo w > /proc/sysrq-trigger but
> do not hit return yet
> 3. sudo btrfs send... to reproduce the problem.
>
> Basically the thing is gonna hang soon after you reproduce the
> problem, so you want to get to shell #2 and just hit return rather
> than dealing with long delays typing that echo command out. And then
> the journal command is so your local terminal captures the sysrq
> output because you're gonna kill the VM instead of waiting it out. I
> have no idea how to read these things but someone might pick up this
> thread and have some idea why these tasks are hanging.
>
>
>
>
> >
> > Unfortunately I don't have any way to connect the drive to a SATA port
> > directly but I tried to switch out as much of the used setup as
> > possible (all changes active at the same time):
> > - I got the original (not the clone) HDD out of the enclosure and used
> > this adapter to connect it:
> > https://www.amazon.de/DIGITUS-Adapterkabel-40pol-480Mbps-schwarz/dp/B007X86VZK
> > - I used a different Notebook
> > - I ran the test natively on that notebook (instead of from
> > VirtualBox. I used VirtualBox for most of the tests as I have to
> > force-poweroff the PC everytime the btrfs-send hangs as it is not
> > killable)
>
>
> This problem only happens in VirtualBox? Or it happens on baremetal
> also? And we've established it happens with two different source
> (send) devices, which means two different Btrfs volumes.
>
> All I can say is you need to keep changing things up, process of
> elimination. Rather tedious. Maybe you could try downloading a Fedora
> 28 ISO, make a boot stick out of it, and try to reproduce with the
> same drives. At least that's an easy way to isolate the OS from the
> equation.
>
>
> --
> Chris Murphy


fedora-kernelmsgs.log.gz
Description: application/gzip