Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-29 Thread Marc Schöchlin
Hello Jason, i updated the ticket https://tracker.ceph.com/issues/40822 Am 24.07.19 um 19:20 schrieb Jason Dillaman: > On Wed, Jul 24, 2019 at 12:47 PM Marc Schöchlin wrote: >> >> Testing with a 10.2.5 librbd/rbd-nbd ist currently not that easy for me, >> because the ceph apt source does not

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-24 Thread Jason Dillaman
On Wed, Jul 24, 2019 at 12:47 PM Marc Schöchlin wrote: > > Hi Jason, > > i installed kernel 4.4.0-154.181 (from ubuntu package sources) and performed > the crash reproduction. > The problem also re-appeared with that kernel release. > > A gunzip with 10 gunzip processes throwed 1600 write and

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-24 Thread Marc Schöchlin
Hi Jason, i installed kernel 4.4.0-154.181 (from ubuntu package sources) and performed the crash reproduction. The problem also re-appeared with that kernel release. A gunzip with 10 gunzip processes throwed 1600 write and 330 read IOPS against the cluster/the rbd_ec volume with a transfer

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-24 Thread Mike Christie
On 07/23/2019 12:28 AM, Marc Schöchlin wrote: >>> For testing purposes i set the timeout to unlimited ("nbd_set_ioctl >>> /dev/nbd0 0", on already mounted device). >>> >> I re-executed the problem procedure and discovered that the >>> >> compression-procedure crashes not at the same file, but

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-23 Thread Marc Schöchlin
Hi Jason, Am 24.07.19 um 00:40 schrieb Jason Dillaman: > >> Sure, which kernel do you prefer? > You said you have never had an issue w/ rbd-nbd 12.2.5 in your Xen > environment. Can you use a matching kernel version? Thats true, our virtual machines of our xen environments completly run on

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-23 Thread Marc Schöchlin
Hi Jason, Am 23.07.19 um 14:41 schrieb Jason Dillaman > Can you please test a consistent Ceph release w/ a known working > kernel release? It sounds like you have changed two variables, so it's > hard to know which one is broken. We need *you* to isolate what > specific Ceph or kernel release

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-23 Thread Jason Dillaman
On Tue, Jul 23, 2019 at 6:58 AM Marc Schöchlin wrote: > > > Am 23.07.19 um 07:28 schrieb Marc Schöchlin: > > > > Okay, i already experimented with high timeouts (i.e 600 seconds). As i can > > remember this leaded to pretty unusable system if i put high amounts of io > > on the ec volume. > >

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-23 Thread Marc Schöchlin
Am 23.07.19 um 07:28 schrieb Marc Schöchlin: > > Okay, i already experimented with high timeouts (i.e 600 seconds). As i can > remember this leaded to pretty unusable system if i put high amounts of io on > the ec volume. > This system also runs als krbd volume which saturates the system with

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-22 Thread Marc Schöchlin
Hi Mike, Am 22.07.19 um 16:48 schrieb Mike Christie: > On 07/22/2019 06:00 AM, Marc Schöchlin wrote: >>> With older kernels no timeout would be set for each command by default, >>> so if you were not running that tool then you would not see the nbd >>> disconnect+io_errors+xfs issue. You would

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-22 Thread Marc Schöchlin
Hi Mike, Am 22.07.19 um 17:01 schrieb Mike Christie: > On 07/19/2019 02:42 AM, Marc Schöchlin wrote: >> We have ~500 heavy load rbd-nbd devices in our xen cluster (rbd-nbd 12.2.5, >> kernel 4.4.0+10, centos clone) and ~20 high load krbd devices (kernel >> 4.15.0-45, ubuntu 16.04) - we never

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-22 Thread Mike Christie
On 07/19/2019 02:42 AM, Marc Schöchlin wrote: > We have ~500 heavy load rbd-nbd devices in our xen cluster (rbd-nbd 12.2.5, > kernel 4.4.0+10, centos clone) and ~20 high load krbd devices (kernel > 4.15.0-45, ubuntu 16.04) - we never experienced problems like this. For this setup, do you have

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-22 Thread Mike Christie
On 07/22/2019 06:00 AM, Marc Schöchlin wrote: >> With older kernels no timeout would be set for each command by default, >> so if you were not running that tool then you would not see the nbd >> disconnect+io_errors+xfs issue. You would just see slow IOs. >> >> With newer kernels, like 4.15,

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-22 Thread Marc Schöchlin
Hello Mike, i attached inline comments. Am 19.07.19 um 22:20 schrieb Mike Christie: > >> We have ~500 heavy load rbd-nbd devices in our xen cluster (rbd-nbd 12.2.5, >> kernel 4.4.0+10, centos clone) and ~20 high load krbd devices (kernel >> 4.15.0-45, ubuntu 16.04) - we never experienced

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-19 Thread Mike Christie
On 07/19/2019 02:42 AM, Marc Schöchlin wrote: > Hello Jason, > > Am 18.07.19 um 20:10 schrieb Jason Dillaman: >> On Thu, Jul 18, 2019 at 1:47 PM Marc Schöchlin wrote: >>> Hello cephers, >>> >>> rbd-nbd crashes in a reproducible way here. >> I don't see a crash report in the log below. Is it

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-19 Thread Marc Schöchlin
Hello Jason, Am 18.07.19 um 20:10 schrieb Jason Dillaman: > On Thu, Jul 18, 2019 at 1:47 PM Marc Schöchlin wrote: >> Hello cephers, >> >> rbd-nbd crashes in a reproducible way here. > I don't see a crash report in the log below. Is it really crashing or > is it shutting down? If it is crashing

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-18 Thread Jason Dillaman
On Thu, Jul 18, 2019 at 1:47 PM Marc Schöchlin wrote: > > Hello cephers, > > rbd-nbd crashes in a reproducible way here. I don't see a crash report in the log below. Is it really crashing or is it shutting down? If it is crashing and it's reproducable, can you install the debuginfo packages,

[ceph-users] reproducable rbd-nbd crashes

2019-07-18 Thread Marc Schöchlin
Hello cephers, rbd-nbd crashes in a reproducible way here. I created the following bug report: https://tracker.ceph.com/issues/40822 Do you also experience this problem? Do you have suggestions for in depth debug data collection? I invoke the following command on a freshly mapped rbd and