Re: [ceph-users] Gracefully reboot OSD node

2017-08-03 Thread Wido den Hollander
> Op 3 augustus 2017 om 14:14 schreef Hans van den Bogert > : > > > Thanks for answering even before I asked the questions:) > > So bottom line, HEALTH_ERR state is simply part of taking a (bunch of) OSD > down? Is HEALTH_ERR period of 2-4 seconds within normal bounds?

Re: [ceph-users] Gracefully reboot OSD node

2017-08-03 Thread Hans van den Bogert
Thanks for answering even before I asked the questions:) So bottom line, HEALTH_ERR state is simply part of taking a (bunch of) OSD down? Is HEALTH_ERR period of 2-4 seconds within normal bounds? For context, CPUs are 2609v3 per 4 OSDs. (I know; they're far from the fastest CPUs) On Thu, Aug 3,

Re: [ceph-users] Gracefully reboot OSD node

2017-08-03 Thread Hans van den Bogert
What are the implications of this? Because I can see a lot of blocked requests piling up when using 'noout' and 'nodown'. That probably makes sense though. Another thing, no when the OSDs come back online, I again see multiple periods of HEALTH_ERR state. Is that to be expected? On Thu, Aug 3,

Re: [ceph-users] Gracefully reboot OSD node

2017-08-03 Thread Wido den Hollander
> Op 3 augustus 2017 om 13:36 schreef linghucongsong : > > > > > set the osd noout nodown > While noout is correct and might help in some situations, never set nodown unless you really need that. It will block I/O since you are taking down OSDs which aren't marked

Re: [ceph-users] Gracefully reboot OSD node

2017-08-03 Thread linghucongsong
set the osd noout nodown At 2017-08-03 18:29:47, "Hans van den Bogert" wrote: Hi all, One thing which has bothered since the beginning of using ceph is that a reboot of a single OSD causes a HEALTH_ERR state for the cluster for at least a couple of seconds. In

[ceph-users] Gracefully reboot OSD node

2017-08-03 Thread Hans van den Bogert
Hi all, One thing which has bothered since the beginning of using ceph is that a reboot of a single OSD causes a HEALTH_ERR state for the cluster for at least a couple of seconds. In the case of planned reboot of a OSD node, should I do some extra commands in order not to go to HEALTH_ERR state?