Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

2014-09-19 Thread Florian Haas
Hi Craig, On Fri, Sep 19, 2014 at 2:49 AM, Craig Lewis cle...@centraldesktop.com wrote: No, removing the snapshots didn't solve my problem. I eventually traced this problem to XFS deadlocks caused by [osd] osd mkfs options xfs: -l size=1024m -n size=64k -i size=2048 -s size=4096

Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

2014-09-19 Thread Craig Lewis
Excellent find. On Fri, Sep 19, 2014 at 7:11 AM, Florian Haas flor...@hastexo.com wrote: Hi Craig, On Fri, Sep 19, 2014 at 2:49 AM, Craig Lewis cle...@centraldesktop.com wrote: No, removing the snapshots didn't solve my problem. I eventually traced this problem to XFS deadlocks caused

Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

2014-09-18 Thread Craig Lewis
No, removing the snapshots didn't solve my problem. I eventually traced this problem to XFS deadlocks caused by [osd] osd mkfs options xfs: -l size=1024m -n size=64k -i size=2048 -s size=4096 Changing to just -s size=4096, and reformatting all OSDs solved this problem. Since then, I ran into

Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

2014-09-17 Thread Florian Haas
Hi Craig, just dug this up in the list archives. On Fri, Mar 28, 2014 at 2:04 AM, Craig Lewis cle...@centraldesktop.com wrote: In the interest of removing variables, I removed all snapshots on all pools, then restarted all ceph daemons at the same time. This brought up osd.8 as well. So

Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

2014-09-17 Thread Dan Van Der Ster
Hi Florian, On 17 Sep 2014, at 17:09, Florian Haas flor...@hastexo.com wrote: Hi Craig, just dug this up in the list archives. On Fri, Mar 28, 2014 at 2:04 AM, Craig Lewis cle...@centraldesktop.com wrote: In the interest of removing variables, I removed all snapshots on all pools,

Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

2014-09-17 Thread Florian Haas
On Wed, Sep 17, 2014 at 5:24 PM, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi Florian, On 17 Sep 2014, at 17:09, Florian Haas flor...@hastexo.com wrote: Hi Craig, just dug this up in the list archives. On Fri, Mar 28, 2014 at 2:04 AM, Craig Lewis cle...@centraldesktop.com wrote:

Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

2014-09-17 Thread Dan Van Der Ster
@lists.ceph.com Subject: Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU On Wed, Sep 17, 2014 at 5:24 PM, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi Florian, On 17 Sep 2014, at 17:09, Florian Haas flor...@hastexo.com wrote: Hi Craig, just dug this up in the list archives. On Fri, Mar

Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

2014-09-17 Thread Florian Haas
On Wed, Sep 17, 2014 at 5:42 PM, Dan Van Der Ster daniel.vanders...@cern.ch wrote: From: Florian Haas flor...@hastexo.com Sent: Sep 17, 2014 5:33 PM To: Dan Van Der Ster Cc: Craig Lewis cle...@centraldesktop.com;ceph-users@lists.ceph.com Subject: Re: [ceph-users] RGW hung, 2 OSDs using 100

Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

2014-03-28 Thread Craig Lewis
On 3/27/14 18:04 , Craig Lewis wrote: I'm trying to use strace on osd.4: strace -tt -f -ff -o ./ceph-osd.4.strace -x /usr/bin/ceph-osd --cluster=ceph -i 4 -f So far, strace is running, and the process isn't hung. After I ran this, the cluster finally finished backfilling the last of the

Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

2014-03-27 Thread Craig Lewis
The osd.8 log shows it doing some deep scrubbing here. Perhaps that is what caused your earlier issues with CPU usage? When I first noticed the CPU usage, I checked iotop and iostat. Both said there was no disk activity, on any OSD. At 14:17:25, I ran radosgw-admin

Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

2014-03-27 Thread Craig Lewis
In the interest of removing variables, I removed all snapshots on all pools, then restarted all ceph daemons at the same time. This brought up osd.8 as well. The cluster started recovering. Now osd.4 and osd.13 are doing this. Any suggestions for how I can see what the hung OSDs are doing?