Re: [ceph-users] reproducible rbd-nbd crashes

2019-09-20 Thread Marc Schöchlin
Hello Mike and Jason, as described in my last mail i converted the filesystem to ext4, set "sysctl vm.dirty_background_ratio=0" and I put the regular workload on the filesystem (used as a NFS mount). That seems so to prevent crashes for a entire week now (before this, the nbd device crashed

Re: [ceph-users] reproducible rbd-nbd crashes

2019-09-13 Thread Marc Schöchlin
Hello Jason, Am 12.09.19 um 16:56 schrieb Jason Dillaman: > On Thu, Sep 12, 2019 at 3:31 AM Marc Schöchlin wrote: > > Whats that, have we seen that before? ("Numerical argument out of domain") > It's the error that rbd-nbd prints when the kernel prematurely closes > the socket ... and as we

Re: [ceph-users] reproducible rbd-nbd crashes

2019-09-12 Thread Jason Dillaman
On Thu, Sep 12, 2019 at 3:31 AM Marc Schöchlin wrote: > > Hello Jason, > > yesterday i started rbd-nbd in forground mode to see if there are any > additional informations. > > root@int-nfs-001:/etc/ceph# rbd-nbd map rbd_hdd/int-nfs-001_srv-ceph -d --id > nfs > 2019-09-11 13:07:41.444534

Re: [ceph-users] reproducible rbd-nbd crashes

2019-09-12 Thread Marc Schöchlin
Hello Jason, yesterday i started rbd-nbd in forground mode to see if there are any additional informations. root@int-nfs-001:/etc/ceph# rbd-nbd map rbd_hdd/int-nfs-001_srv-ceph -d --id nfs 2019-09-11 13:07:41.444534 77fe1040  0 ceph version 12.2.12

Re: [ceph-users] reproducible rbd-nbd crashes

2019-09-10 Thread Jason Dillaman
On Tue, Sep 10, 2019 at 9:46 AM Marc Schöchlin wrote: > > Hello Mike, > > as described i set all the settings. > > Unfortunately it crashed also with these settings :-( > > Regards > Marc > > [Tue Sep 10 12:25:56 2019] Btrfs loaded, crc32c=crc32c-intel > [Tue Sep 10 12:25:57 2019] EXT4-fs (dm-0):

Re: [ceph-users] reproducible rbd-nbd crashes

2019-09-10 Thread Marc Schöchlin
Hello Mike, as described i set all the settings. Unfortunately it crashed also with these settings :-( Regards Marc [Tue Sep 10 12:25:56 2019] Btrfs loaded, crc32c=crc32c-intel [Tue Sep 10 12:25:57 2019] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null) [Tue Sep 10

Re: [ceph-users] reproducible rbd-nbd crashes

2019-09-10 Thread Marc Schöchlin
Hello Mike, Am 03.09.19 um 04:41 schrieb Mike Christie: > On 09/02/2019 06:20 AM, Marc Schöchlin wrote: >> Hello Mike, >> >> i am having a quick look to this on vacation because my coworker >> reports daily and continuous crashes ;-) >> Any updates here (i am aware that this is not very easy to

Re: [ceph-users] reproducible rbd-nbd crashes

2019-08-15 Thread Marc Schöchlin
Hello Mike, Am 15.08.19 um 19:57 schrieb Mike Christie: > >> Don't waste your time. I found a way to replicate it now. >> > > Just a quick update. > > Looks like we are trying to allocate memory in the IO path in a way that > can swing back on us, so we can end up locking up. You are probably not

Re: [ceph-users] reproducible rbd-nbd crashes

2019-08-15 Thread Mike Christie
On 08/14/2019 06:55 PM, Mike Christie wrote: > On 08/14/2019 02:09 PM, Mike Christie wrote: >> On 08/14/2019 07:35 AM, Marc Schöchlin wrote: > 3. I wonder if we are hitting a bug with PF_MEMALLOC Ilya hit with krbd. > He removed that code from the krbd. I will ping him on that. >>> >>>

Re: [ceph-users] reproducible rbd-nbd crashes

2019-08-14 Thread Mike Christie
On 08/14/2019 02:09 PM, Mike Christie wrote: > On 08/14/2019 07:35 AM, Marc Schöchlin wrote: 3. I wonder if we are hitting a bug with PF_MEMALLOC Ilya hit with krbd. He removed that code from the krbd. I will ping him on that. >> >> Interesting. I activated Coredumps for that processes -

Re: [ceph-users] reproducible rbd-nbd crashes

2019-08-14 Thread Mike Christie
On 08/14/2019 07:35 AM, Marc Schöchlin wrote: >>> 3. I wonder if we are hitting a bug with PF_MEMALLOC Ilya hit with krbd. >>> He removed that code from the krbd. I will ping him on that. > > Interesting. I activated Coredumps for that processes - probably we can > find something interesting

[ceph-users] reproducible rbd-nbd crashes

2019-08-14 Thread Marc Schöchlin
Hello Mike, see my inline comments. Am 14.08.19 um 02:09 schrieb Mike Christie: >>> - >>> Previous tests crashed in a reproducible manner with "-P 1" (single io >>> gzip/gunzip) after a few minutes up to 45 minutes. >>> >>> Overview of my tests: >>> >>> - SUCCESSFUL: kernel 4.15, ceph

Re: [ceph-users] reproducible rbd-nbd crashes

2019-08-13 Thread Mike Christie
On 08/13/2019 07:04 PM, Mike Christie wrote: > On 07/31/2019 05:20 AM, Marc Schöchlin wrote: >> Hello Jason, >> >> it seems that there is something wrong in the rbd-nbd implementation. >> (added this information also at https://tracker.ceph.com/issues/40822) >> >> The problem not seems to be

Re: [ceph-users] reproducible rbd-nbd crashes

2019-08-13 Thread Mike Christie
On 07/31/2019 05:20 AM, Marc Schöchlin wrote: > Hello Jason, > > it seems that there is something wrong in the rbd-nbd implementation. > (added this information also at https://tracker.ceph.com/issues/40822) > > The problem not seems to be related to kernel releases, filesystem types or > the

Re: [ceph-users] reproducible rbd-nbd crashes

2019-08-13 Thread Marc Schöchlin
Hello Jason, thanks for your response. See my inline comments. Am 31.07.19 um 14:43 schrieb Jason Dillaman: > On Wed, Jul 31, 2019 at 6:20 AM Marc Schöchlin wrote: > > > The problem not seems to be related to kernel releases, filesystem types or > the ceph and network setup. > Release 12.2.5

Re: [ceph-users] reproducible rbd-nbd crashes

2019-07-31 Thread Jason Dillaman
On Wed, Jul 31, 2019 at 6:20 AM Marc Schöchlin wrote: > > Hello Jason, > > it seems that there is something wrong in the rbd-nbd implementation. > (added this information also at https://tracker.ceph.com/issues/40822) > > The problem not seems to be related to kernel releases, filesystem types

Re: [ceph-users] reproducible rbd-nbd crashes

2019-07-31 Thread Marc Schöchlin
Hello Jason, it seems that there is something wrong in the rbd-nbd implementation. (added this information also at  https://tracker.ceph.com/issues/40822) The problem not seems to be related to kernel releases, filesystem types or the ceph and network setup. Release 12.2.5 seems to work