Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Also putting this on the list. On 06 Sep 2014, at 13:36, Josef Johansson jo...@oderland.se wrote: Hi, Same issues again, but I think we found the drive that causes the problems. But this is causing problems as it’s trying to do a recover to that osd at the moment. So we’re left with

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Christian Balzer
Hello, On Sat, 6 Sep 2014 13:37:25 +0200 Josef Johansson wrote: Also putting this on the list. On 06 Sep 2014, at 13:36, Josef Johansson jo...@oderland.se wrote: Hi, Same issues again, but I think we found the drive that causes the problems. But this is causing problems as

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Hi, On 06 Sep 2014, at 13:53, Christian Balzer ch...@gol.com wrote: Hello, On Sat, 6 Sep 2014 13:37:25 +0200 Josef Johansson wrote: Also putting this on the list. On 06 Sep 2014, at 13:36, Josef Johansson jo...@oderland.se wrote: Hi, Same issues again, but I think we found the

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
FWI I did restart the OSDs until I saw a server that made impact. Until that server stopped doing impact, I didn’t get lower in the number objects being degraded. After a while it was done with recovering that OSD and happily started with others. I guess I will be seeing the same behaviour when

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
We manage to go through the restore, but the performance degradation is still there. Looking through the OSDs to pinpoint a source of the degradation and hoping the current load will be lowered. I’m a bit afraid of doing the 0 to weight of an OSD, wouldn’t it be tough if the degradation is

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Hi, On 06 Sep 2014, at 17:27, Christian Balzer ch...@gol.com wrote: Hello, On Sat, 6 Sep 2014 17:10:11 +0200 Josef Johansson wrote: We manage to go through the restore, but the performance degradation is still there. Manifesting itself how? Awful slow io on the VMs, and iowait,

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Christian Balzer
Hello, On Sat, 6 Sep 2014 17:41:02 +0200 Josef Johansson wrote: Hi, On 06 Sep 2014, at 17:27, Christian Balzer ch...@gol.com wrote: Hello, On Sat, 6 Sep 2014 17:10:11 +0200 Josef Johansson wrote: We manage to go through the restore, but the performance degradation is

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Hi, On 06 Sep 2014, at 18:05, Christian Balzer ch...@gol.com wrote: Hello, On Sat, 6 Sep 2014 17:52:59 +0200 Josef Johansson wrote: Hi, Just realised that it could also be with a popularity bug as well and lots a small traffic. And seeing that it’s fast it gets popular until it

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Hi, On 06 Sep 2014, at 17:59, Christian Balzer ch...@gol.com wrote: Hello, On Sat, 6 Sep 2014 17:41:02 +0200 Josef Johansson wrote: Hi, On 06 Sep 2014, at 17:27, Christian Balzer ch...@gol.com wrote: Hello, On Sat, 6 Sep 2014 17:10:11 +0200 Josef Johansson wrote: We manage

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
On 06 Sep 2014, at 19:37, Josef Johansson jo...@oderland.se wrote: Hi, Unfortunatly the journal tuning did not do much. That’s odd, because I don’t see much utilisation on OSDs themselves. Now this leads to a network-issue between the OSDs right? To answer my own question. Restarted a

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
On 07 Sep 2014, at 04:47, Christian Balzer ch...@gol.com wrote: On Sat, 6 Sep 2014 19:47:13 +0200 Josef Johansson wrote: On 06 Sep 2014, at 19:37, Josef Johansson jo...@oderland.se wrote: Hi, Unfortunatly the journal tuning did not do much. That’s odd, because I don’t see much

Re: [ceph-users] Huge issues with slow requests

2014-09-05 Thread David
Hi, Sorry for the lack of information yesterday, this was solved after some 30 minutes, after having reloaded/restarted all osd daemons. Unfortunately we couldn’t pin point it to a single OSD or drive, all drives seemed ok, some had a bit higher latency and we tried to out / in them to see if

Re: [ceph-users] Huge issues with slow requests

2014-09-05 Thread Christian Balzer
Hello, On Fri, 5 Sep 2014 08:26:47 +0200 David wrote: Hi, Sorry for the lack of information yesterday, this was solved after some 30 minutes, after having reloaded/restarted all osd daemons. Unfortunately we couldn’t pin point it to a single OSD or drive, all drives seemed ok, some had a

Re: [ceph-users] Huge issues with slow requests

2014-09-05 Thread Luis Periquito
Only time I saw such behaviour was when I was deleting a big chunk of data from the cluster: all the client activity was reduced, the op/s were almost non-existent and there was unjustified delays all over the cluster. But all the disks were somewhat busy in atop/iotstat. On 5 September 2014

[ceph-users] Huge issues with slow requests

2014-09-04 Thread David
Hi, We’re running a ceph cluster with version: 0.67.7-1~bpo70+1 All of a sudden we’re having issues with the cluster (running RBD images for kvm) with slow requests on all of the OSD servers. Any idea why and how to fix it? 2014-09-04 11:56:35.868521 mon.0 [INF] pgmap v12504451: 6860 pgs:

Re: [ceph-users] Huge issues with slow requests

2014-09-04 Thread Christian Balzer
On Thu, 4 Sep 2014 12:02:13 +0200 David wrote: Hi, We’re running a ceph cluster with version: 0.67.7-1~bpo70+1 All of a sudden we’re having issues with the cluster (running RBD images for kvm) with slow requests on all of the OSD servers. Any idea why and how to fix it? You give us

Re: [ceph-users] Huge issues with slow requests

2014-09-04 Thread Martin B Nielsen
Just echoing what Christian said. Also, iirc the currently waiting for subobs on [ could also mean a problem on those as it waits for ack from them (I might remember wrong). If that is the case you might want to check in on osd 13 37 as well. With the cluster load and size you should not have