[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Michel Niyoyita
The OSDs are up and in , I have the problem on PGs as you see below root@ceph-mon1:~# ceph -s cluster: id: 43f5d6b4-74b0-4281-92ab-940829d3ee5e health: HEALTH_ERR 1/3 mons down, quorum ceph-mon1,ceph-mon3 14/32863 objects unfound (0.043%) Possible

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Etienne Menguy
Could your hardware be faulty? You are trying to deploy the faulty monitor? Or a whole new cluster? If you are trying to fix your cluster, you should focus on OSD. A cluster can run without big troubles with 2 monitors for few days (if not years…). - Etienne Menguy etienne.men...@croit.io

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Michel Niyoyita
Hello team Below is the error , I am getting once I try to redeploy the same cluster TASK [ceph-mon : recursively fix ownership of monitor directory]

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Etienne Menguy
Have you tried to restart one of the OSD that seems to block PG recover? I don’t think increasing PG can help. - Etienne Menguy etienne.men...@croit.io > On 29 Oct 2021, at 11:53, Michel Niyoyita wrote: > > Hello Eugen > > The failure_domain is host level and crush rule is replicated_rule

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Michel Niyoyita
Hello Eugen The failure_domain is host level and crush rule is replicated_rule in troubleshooting process I changed for pool 5 its PG from 32 to 128 to see if there can be some changes. and it has the default replica (3) Thanks for your continous help On Fri, Oct 29, 2021 at 11:44 AM Etienne

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Etienne Menguy
> Is a way there you can enforce mon to rejoin a quorum ? I tried to restart it > but nothing changed. I guess it is the cause If I am not mistaken. No, but with quorum_status you can check monitor status and if it’s trying to join quorum. You may have to use daemon socket interface (asok

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Michel Niyoyita
Dear Etienne Is a way there you can enforce mon to rejoin a quorum ? I tried to restart it but nothing changed. I guess it is the cause If I am not mistaken. below is pg querry output root@ceph-mon2:~# ceph pg 5.10 query { "snap_trimq": "[]", "snap_trimq_len": 0, "state":

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Eugen Block
Also what does the crush rule look like for pool 5 and what is the failure-domain? Zitat von Etienne Menguy : With “ceph pg x.y query” you can check why it’s complaining. x.y for pg id, like 5.77 It would also be interesting to check why mon fails to rejoin quorum, it may give you hints

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Etienne Menguy
With “ceph pg x.y query” you can check why it’s complaining. x.y for pg id, like 5.77 It would also be interesting to check why mon fails to rejoin quorum, it may give you hints at your OSD issues. - Etienne Menguy etienne.men...@croit.io > On 29 Oct 2021, at 10:34, Michel Niyoyita

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Michel Niyoyita
Hello Etienne This is the ceph -s output root@ceph-mon1:~# ceph -s cluster: id: 43f5d6b4-74b0-4281-92ab-940829d3ee5e health: HEALTH_ERR 1/3 mons down, quorum ceph-mon1,ceph-mon3 14/47681 objects unfound (0.029%) 1 scrub errors

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Etienne Menguy
Hi, Please share “ceph -s” output. - Etienne Menguy etienne.men...@croit.io > On 29 Oct 2021, at 10:03, Michel Niyoyita wrote: > > Hello team > > I am running a ceph cluster with 3 monitors and 4 OSDs nodes running 3osd > each , I deployed my ceph cluster using ansible and ubuntu 20.04 as