Re: [ceph-users] [URGENT] Rebuilding cluster data from remaining OSDs

2018-05-31 Thread Gregory Farnum
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds

On Thu, May 31, 2018 at 1:49 PM Leônidas Villeneuve 
wrote:

> I had a small Ceph cluster and had to take down one node. The data from
> its OSDs was reallocated on the other OSDs and went fine.
>
> After the reallocation, I removed its mon.service as described by the
> official documentation.
>
> Then, everything went wrong. The other mons just collapsed and stopped
> talking to mgrs. The mgr dashboard still works but has outdated data. The
> osds are still up and rbd volumes are working too, but the mons can't get
> online.
>
> After trying everything described by the troubleshooter, removing the old
> mon from monmap, I couldn't inject the new monmap because of lock errors in
> store.db. When I finally injected the new monmap, the mon refused to get
> up. I tried this setting on other mons and got the same results. And, to my
> despair, the store.db ended up being corrupted.
>
> I finally gave up and (after backing up the store.db), deleted the mons
> and started fresh new ones. That worked, but the new mons now have no OSDs
> or hosts mapped to them. I have an old crush map and that's all.
>
> But, since the OSDs are still up, is it possible to rebuild the map and
> all the data needed for mons to start working again from then? That's the
> last resource I have.
>
> Putting it in another way, I have OSDs services and OSD data but no
> monitor and no mgr and need to put them back running. Any tips will be
> appreciated.
>
> Thanks.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] [URGENT] Rebuilding cluster data from remaining OSDs

2018-05-31 Thread Leônidas Villeneuve
I had a small Ceph cluster and had to take down one node. The data from its
OSDs was reallocated on the other OSDs and went fine.

After the reallocation, I removed its mon.service as described by the
official documentation.

Then, everything went wrong. The other mons just collapsed and stopped
talking to mgrs. The mgr dashboard still works but has outdated data. The
osds are still up and rbd volumes are working too, but the mons can't get
online.

After trying everything described by the troubleshooter, removing the old
mon from monmap, I couldn't inject the new monmap because of lock errors in
store.db. When I finally injected the new monmap, the mon refused to get
up. I tried this setting on other mons and got the same results. And, to my
despair, the store.db ended up being corrupted.

I finally gave up and (after backing up the store.db), deleted the mons and
started fresh new ones. That worked, but the new mons now have no OSDs or
hosts mapped to them. I have an old crush map and that's all.

But, since the OSDs are still up, is it possible to rebuild the map and all
the data needed for mons to start working again from then? That's the last
resource I have.

Putting it in another way, I have OSDs services and OSD data but no monitor
and no mgr and need to put them back running. Any tips will be appreciated.

Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com