Re: [ceph-users] Fwd: lost power. monitors died. Cephx errors now

2017-02-06 Thread Sean Sullivan
So the cluster has been dead and down since around 8/10/2016. I have since rebooted the cluster in order to try and use the new ceph-monstore-tool rebuild functionality. I built the debian packages for the tools for hammer that were recently backported and installed it across all of the servers:

Re: [ceph-users] Fwd: lost power. monitors died. Cephx errors now

2016-08-13 Thread Sean Sullivan
So with a patched leveldb to skip errors I now have a store.db that I can extract the pg,mon,and osd map from. That said when I try to start kh10-8 it bombs out:: --- --- root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8# ceph-mon -i

Re: [ceph-users] Fwd: lost power. monitors died. Cephx errors now

2016-08-12 Thread Sean Sullivan
A coworker patched leveldb and we were able to export quite a bit of data from kh08's leveldb database. At this point I think I need to re-construct a new leveldb with whatever values I can. Is it the same leveldb database across all 3 montiors? IE will keys exported from one work in the other?

Re: [ceph-users] Fwd: lost power. monitors died. Cephx errors now

2016-08-12 Thread Sean Sullivan
ceph-monstore-tool? Is that the same as monmaptool? oops! NM found it in ceph-test package:: I can't seem to get it working :-( dump monmap or any of the commands. They all bomb out with the same message: root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8/store.db# ceph-monstore-tool

Re: [ceph-users] Fwd: lost power. monitors died. Cephx errors now

2016-08-11 Thread Wido den Hollander
> Op 11 augustus 2016 om 15:17 schreef Sean Sullivan : > > > Hello Wido, > > Thanks for the advice. While the data center has a/b circuits and > redundant power, etc if a ground fault happens it travels outside and > fails causing the whole building to fail

Re: [ceph-users] Fwd: lost power. monitors died. Cephx errors now

2016-08-11 Thread Sean Sullivan
Hello Wido, Thanks for the advice. While the data center has a/b circuits and redundant power, etc if a ground fault happens it travels outside and fails causing the whole building to fail (apparently). The monitors are each the same with 2x e5 cpus 64gb of ram 4x 300gb 10k SAS drives in raid

Re: [ceph-users] Fwd: lost power. monitors died. Cephx errors now

2016-08-11 Thread Tomasz Kuzemko
I'm guessing you had writeback cache enabled on ceph-mon disk (smartctl -g wcache /dev/sdX) and disk firmware did not care about respecting flush semantics. On 11.08.2016 08:33, Wido den Hollander wrote: > >> Op 11 augustus 2016 om 0:10 schreef Sean Sullivan : >> >> >> I

Re: [ceph-users] Fwd: lost power. monitors died. Cephx errors now

2016-08-11 Thread Wido den Hollander
> Op 11 augustus 2016 om 0:10 schreef Sean Sullivan : > > > I think it just got worse:: > > all three monitors on my other cluster say that ceph-mon can't open > /var/lib/ceph/mon/$(hostname). Is there any way to recover if you lose all > 3 monitors? I saw a post by