Re: [ceph-users] Mimic cluster is offline and not healing

2018-09-28 Thread Stefan Kooman
Quoting by morphin (morphinwith...@gmail.com): > Good news... :) > > After I tried everything. I decide to re-create my MONs from OSD's and > I used the script: > https://paste.ubuntu.com/p/rNMPdMPhT5/ > > And it worked!!! Congrats! > I think when 2 server crashed and come back same time some h

Re: [ceph-users] Mimic cluster is offline and not healing

2018-09-27 Thread by morphin
Good news... :) After I tried everything. I decide to re-create my MONs from OSD's and I used the script: https://paste.ubuntu.com/p/rNMPdMPhT5/ And it worked!!! I think when 2 server crashed and come back same time some how MON's confused and the maps just corrupted. After re-creation all the MO

Re: [ceph-users] Mimic cluster is offline and not healing

2018-09-27 Thread by morphin
I think I might find something. When I start an OSD its making High I/O around %95 and the other OSDs are also triggered and altogether they make same the I/O. This is true even if when I set noup flag. So all the OSDs are making high I/O when ever an OSD starts. I think this is too much. I have

Re: [ceph-users] Mimic cluster is offline and not healing

2018-09-27 Thread by morphin
I should not have client I/O right now. All of my VMs are down right now. There is only a single pool. Here is my crush map: https://paste.ubuntu.com/p/Z9G5hSdqCR/ Cluster does not recover. After starting OSDs with the specified flags, OSD up count drops from 168 to 50 with in 24 hours. Stefan Ko

Re: [ceph-users] Mimic cluster is offline and not healing

2018-09-27 Thread Stefan Kooman
Quoting by morphin (morphinwith...@gmail.com): > After 72 hours I believe we may hit a bug. Any help would be greatly > appreciated. Is it feasible for you to stop all client IO to the Ceph cluster? At least until it stabilizes again. "ceph osd pause" would do the trick (ceph osd unpause would uns

[ceph-users] Mimic cluster is offline and not healing

2018-09-27 Thread by morphin
Hello, I am writing this e-mail about an incident that has started last weekend. There seems to something wrong with my e-mail. Some of my e-mails did not reach-out. So I decided to start an new thread here and start from begining. One can find the email related e-mail thread (http://lists.ceph.co