Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-10-10 Thread Paul Emmerich
I've also encountered this issue on a cluster yesterday; one CPU got stuck in an infinite loop in get_obj_data::flush and it stopped serving requests. I've updated the tracker issue accordingly. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-26 Thread Vladimir Brik
or on its own. I’m going to check with others who’re more familiar with this code path. Begin forwarded message: *From:*Vladimir Brik <mailto:vladimir.b...@icecube.wisc.edu>> *Subject:**Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred* *Date:*August 21, 2019

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-23 Thread Eric Ivancich
ven though the machine is not being used for data transfers >> (nothing in radosgw logs, couple of KB/s of network). >> >> This situation can affect any number of our rados gateways, lasts from few >> hours to few days and stops if radosgw process is restarted or on i

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread Vladimir Brik
> Are you running multisite? No > Do you have dynamic bucket resharding turned on? Yes. "radosgw-admin reshard list" prints "[]" > Are you using lifecycle? I am not sure. How can I check? "radosgw-admin lc list" says "[]" > And just to be clear -- sometimes all 3 of your rados gateways are >

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread J. Eric Ivancich
On 8/21/19 10:22 AM, Mark Nelson wrote: > Hi Vladimir, > > > On 8/21/19 8:54 AM, Vladimir Brik wrote: >> Hello >> [much elided] > You might want to try grabbing a a callgraph from perf instead of just > running perf top or using my wallclock profiler to see if you can drill > down and find out

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread Vladimir Brik
Correction: the number of threads stuck using 100% of a CPU core varies from 1 to 5 (it's not always 5) Vlad On 8/21/19 8:54 AM, Vladimir Brik wrote: Hello I am running a Ceph 14.2.1 cluster with 3 rados gateways. Periodically, radosgw process on those machines starts consuming 100% of 5

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread Paul Emmerich
On Wed, Aug 21, 2019 at 3:55 PM Vladimir Brik wrote: > > Hello > > I am running a Ceph 14.2.1 cluster with 3 rados gateways. Periodically, > radosgw process on those machines starts consuming 100% of 5 CPU cores > for days at a time, even though the machine is not being used for data > transfers

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread Mark Nelson
Hi Vladimir, On 8/21/19 8:54 AM, Vladimir Brik wrote: Hello I am running a Ceph 14.2.1 cluster with 3 rados gateways. Periodically, radosgw process on those machines starts consuming 100% of 5 CPU cores for days at a time, even though the machine is not being used for data transfers

[ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread Vladimir Brik
Hello I am running a Ceph 14.2.1 cluster with 3 rados gateways. Periodically, radosgw process on those machines starts consuming 100% of 5 CPU cores for days at a time, even though the machine is not being used for data transfers (nothing in radosgw logs, couple of KB/s of network). This