Re: [ceph-users] EC pools grinding to a screeching halt on Luminous

2019-01-10 Thread Florian Haas
Hi Mohamad! On 31/12/2018 19:30, Mohamad Gebai wrote: > On 12/31/18 4:51 AM, Marcus Murwall wrote: >> What you say does make sense though as I also get the feeling that the >> osds are just waiting for something. Something that never happens and >> the request finally timeout... > > So the OSDs a

Re: [ceph-users] EC pools grinding to a screeching halt on Luminous

2018-12-31 Thread Mohamad Gebai
On 12/31/18 4:51 AM, Marcus Murwall wrote: > What you say does make sense though as I also get the feeling that the > osds are just waiting for something. Something that never happens and > the request finally timeout... So the OSDs are just completely idle? If not, try using strace and/or perf to

Re: [ceph-users] EC pools grinding to a screeching halt on Luminous

2018-12-31 Thread Marcus Murwall
Hi Mohamad The network is 2x25Gbit interface bonded for the cluster network and I see no signs of congestion. Also if I benchmark against a replicated pool I can't recreate these issues. I can push a lot more data against a replicated pool and everything works just fine. If it was a network c

Re: [ceph-users] EC pools grinding to a screeching halt on Luminous

2018-12-28 Thread Mohamad Gebai
Hi Marcus, On 12/27/18 4:21 PM, Marcus Murwall wrote: > Hey Mohamad > > I work with Florian on this issue. > Just reinstalled the ceph cluster and triggered the error again. > Looking at iostat -x 1 there is basically no activity at all against > any of the osds. > We get blocked ops all over the

Re: [ceph-users] EC pools grinding to a screeching halt on Luminous

2018-12-26 Thread Mohamad Gebai
What is happening on the individual nodes when you reach that point (iostat -x 1 on the OSD nodes)? Also, what throughput do you get when benchmarking the replicated pool? I guess one way to start would be by looking at ongoing operations at the OSD level: ceph daemon osd.X dump_blocked_ops ceph

[ceph-users] EC pools grinding to a screeching halt on Luminous

2018-12-26 Thread Florian Haas
Hi everyone, We have a Luminous cluster (12.2.10) on Ubuntu Xenial, though we have also observed the same behavior on 12.2.7 on Bionic (download.ceph.com doesn't build Luminous packages for Bionic, and 12.2.7 is the latest distro build). The primary use case for this cluster is radosgw. 6 OSD nod