Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-13 Thread ulembke
Hi Greg, Am 2017-01-12 19:54, schrieb Gregory Farnum: ... That's not what anybody intended to have happen. It's possible the simultaneous loss of a monitor and the OSDs is triggering a case that's not behaving correctly. Can you create a ticket at tracker.ceph.com with your logs and what steps

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread Christian Balzer
On Thu, 12 Jan 2017 13:59:12 -0800 Samuel Just wrote: > That would work. > -Sam > Having seen similar behavior in the past I made it a habit to manually shut down services before a reboot. Not limited to Ceph and these race conditions have definitely gotten worse with systemd in general.

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread Shinobu Kinjo
Now I'm totally clear. Regards, On Fri, Jan 13, 2017 at 6:59 AM, Samuel Just wrote: > That would work. > -Sam > > On Thu, Jan 12, 2017 at 1:40 PM, Gregory Farnum wrote: >> On Thu, Jan 12, 2017 at 1:37 PM, Samuel Just wrote: >>> Oh, this

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread Samuel Just
That would work. -Sam On Thu, Jan 12, 2017 at 1:40 PM, Gregory Farnum wrote: > On Thu, Jan 12, 2017 at 1:37 PM, Samuel Just wrote: >> Oh, this is basically working as intended. What happened is that the >> mon died before the pending map was actually

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread Gregory Farnum
On Thu, Jan 12, 2017 at 1:37 PM, Samuel Just wrote: > Oh, this is basically working as intended. What happened is that the > mon died before the pending map was actually committed. The OSD has a > timeout (5s) after which it stops trying to mark itself down and just > dies (so

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread Samuel Just
Oh, this is basically working as intended. What happened is that the mon died before the pending map was actually committed. The OSD has a timeout (5s) after which it stops trying to mark itself down and just dies (so that OSDs don't hang when killed). It took a bit longer than 5s for the

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread Udo Lembke
Hi Sam, the webfrontend of an external ceph-dash was interrupted till the node was up again. The reboot took app. 5 min. But the ceph -w output shows some IO much faster. I will look tomorrow at the output again and create an ticket. Thanks Udo On 12.01.2017 20:02, Samuel Just wrote: >

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread Samuel Just
How long did it take for the cluster to recover? -Sam On Thu, Jan 12, 2017 at 10:54 AM, Gregory Farnum wrote: > On Thu, Jan 12, 2017 at 2:03 AM, wrote: >> Hi all, >> I had just reboot all 3 nodes (one after one) of an small Proxmox-VE >> ceph-cluster.

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread Gregory Farnum
On Thu, Jan 12, 2017 at 2:03 AM, wrote: > Hi all, > I had just reboot all 3 nodes (one after one) of an small Proxmox-VE > ceph-cluster. All nodes are mons and have two OSDs. > During reboot of one node, ceph stucks longer than normaly and I look in the > "ceph -w" output

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread ulembke
Hi, Am 2017-01-12 11:38, schrieb Shinobu Kinjo: Sorry, I don't get your question. Generally speaking, the MON maintains maps of the cluster state: * Monitor map * OSD map * PG map * CRUSH map yes - and if an osd say "osd.5 marked itself down" the mon can update immediately the OSD map

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread Shinobu Kinjo
Sorry, I don't get your question. Generally speaking, the MON maintains maps of the cluster state: * Monitor map * OSD map * PG map * CRUSH map Regards, On Thu, Jan 12, 2017 at 7:03 PM, wrote: > Hi all, > I had just reboot all 3 nodes (one after one) of an small