Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-03-27 Thread Mikaƫl Cluseau
Hi, On 03/18/2015 03:01 PM, Gregory Farnum wrote: I think it tended to crash rather than hang like this so I'm a bit surprised, but if this op is touching a broken file or something that could explain it. FWIW, the last time I had the issue (on a 3.10.9 kernel), btrfs was freezing, waiting

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-03-26 Thread Chris Murray
% osds down, CPUs still busy; will the cluster recover without help? On Fri, Mar 20, 2015 at 4:03 PM, Chris Murray chrismurra...@gmail.com wrote: Ah, I was wondering myself if compression could be causing an issue, but I'm reconsidering now. My latest experiment should hopefully help troubleshoot

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-03-20 Thread Chris Murray
is fixed in something later than 0.80.9? -Original Message- From: Gregory Farnum [mailto:g...@gregs42.com] Sent: 18 March 2015 14:01 To: Chris Murray Cc: ceph-users Subject: Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help? On Wed, Mar 18, 2015

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-03-20 Thread Gregory Farnum
On Fri, Mar 20, 2015 at 4:03 PM, Chris Murray chrismurra...@gmail.com wrote: Ah, I was wondering myself if compression could be causing an issue, but I'm reconsidering now. My latest experiment should hopefully help troubleshoot. So, I remembered that ZLIB is slower, but is more 'safe for old

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-03-18 Thread Chris Murray
. In case I should be troubleshooting this side, is/isn't this happening to others? -Original Message- From: Gregory Farnum [mailto:g...@gregs42.com] Sent: 16 March 2015 20:40 To: Chris Murray Cc: ceph-users Subject: Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-03-18 Thread Gregory Farnum
On Wed, Mar 18, 2015 at 3:28 AM, Chris Murray chrismurra...@gmail.com wrote: Hi again Greg :-) No, it doesn't seem to progress past that point. I started the OSD again a couple of nights ago: 2015-03-16 21:34:46.221307 7fe4a8aa7780 10 journal op_apply_finish 13288339 open_ops 1 - 0,

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-03-14 Thread Chris Murray
-Original Message- From: Gregory Farnum [mailto:g...@gregs42.com] Sent: 02 March 2015 18:05 To: Chris Murray Cc: ceph-users Subject: Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help? You can turn the filestore up to 20 instead of 1. ;) You might

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-03-03 Thread Chris Murray
Farnum [mailto:g...@gregs42.com] Sent: 02 March 2015 18:05 To: Chris Murray Cc: ceph-users Subject: Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help? You can turn the filestore up to 20 instead of 1. ;) You might also explore what information you can

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-03-02 Thread Gregory Farnum
To: Gregory Farnum Cc: ceph-users Subject: Re: [ceph-users] More than 50% osds down, CPUs still busy;will the cluster recover without help? A little further logging: 2015-02-27 10:27:15.745585 7fe8e3f2f700 20 osd.11 62839 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-02-28 Thread Chris Murray
Subject: Re: [ceph-users] More than 50% osds down, CPUs still busy;will the cluster recover without help? That's interesting, it seems to be alternating between two lines, but only one thread this time? I'm guessing the 62738 is the osdmap, which is much behind where it should be? Osd.0 and osd.3

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-02-27 Thread Chris Murray
down, CPUs still busy; will the cluster recover without help? If you turn up debug osd = 20 or something it'll apply a good bit more disk load but give you more debugging logs about what's going on. It could be that you're in enough of a mess now that it's stuck trying to calculate past

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-02-27 Thread Chris Murray
-users] More than 50% osds down, CPUs still busy; will the cluster recover without help? If you turn up debug osd = 20 or something it'll apply a good bit more disk load but give you more debugging logs about what's going on. It could be that you're in enough of a mess now that it's stuck trying

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-02-26 Thread Chris Murray
February 2015 12:58 To: Gregory Farnum Cc: ceph-users Subject: Re: [ceph-users] More than 50% osds down, CPUs still busy;will the cluster recover without help? Thanks Greg After seeing some recommendations I found in another thread, my impatience got the better of me, and I've start the process again

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-02-26 Thread Gregory Farnum
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Chris Murray Sent: 25 February 2015 12:58 To: Gregory Farnum Cc: ceph-users Subject: Re: [ceph-users] More than 50% osds down, CPUs still busy;will the cluster recover without help? Thanks Greg After seeing some recommendations I found

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-02-25 Thread Chris Murray
Thanks Greg After seeing some recommendations I found in another thread, my impatience got the better of me, and I've start the process again, but there is some logic, I promise :-) I've copied the process from Michael Kidd, I believe, and it goes along the lines of: setting noup, noin,

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-02-24 Thread Gregory Farnum
On Mon, Feb 23, 2015 at 8:59 AM, Chris Murray chrismurra...@gmail.com wrote: ... Trying to send again after reporting bounce backs to dreamhost ... ... Trying to send one more time after seeing mails come through the list today ... Hi all, First off, I should point out that this is a 'small

[ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-02-23 Thread Chris Murray
... Trying to send again after reporting bounce backs to dreamhost ... ... Trying to send one more time after seeing mails come through the list today ... Hi all, First off, I should point out that this is a 'small cluster' issue and may well be due to the stretched resources. If I'm doomed to