Re: [ceph-users] Monitor Recovery

2018-10-23 Thread Wido den Hollander
On 10/24/18 2:22 AM, John Petrini wrote: > Hi List, > > I've got a monitor that won't stay up. It comes up and joins the > cluster but crashes within a couple of minutes with no info in the > logs. At this point I'd prefer to just give up on it and assume it's > in a bad state and recover it fr

Re: [ceph-users] Monitor Recovery

2018-10-23 Thread Martin Verges
Hello John, did you try http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-mon/#preparing-your-logs ? At this point I'd prefer to just give up on it and assume it's in a bad > state and recover it from the working monitors. What's the best way to go > about this? As long as

Re: [ceph-users] Monitor Recovery

2018-10-23 Thread Bastiaan Visser
are you using ceph-deploy? In that case you could do: ceph-deploy mon destroy {host-name [host-name]...} and: ceph-deploy mon create {host-name [host-name]...} te recreate it. - Original Message - From: "John Petrini" To: "ceph-users" Sent: Tuesday, October 23, 2018 8:22:44 PM Subject:

Re: [ceph-users] Crushmap and failure domains at rack level (ideally data-center level in the future)

2018-10-23 Thread Bastiaan Visser
Something must be wrong, since you have min_size 3 the pool should go read only once you take out the first rack. Probably even when you take out the first host. What is the outputput of ceph osd pool get min_size ? I guess it will be 2, since you did not hit a problem while taking out one

[ceph-users] Monitor Recovery

2018-10-23 Thread John Petrini
Hi List, I've got a monitor that won't stay up. It comes up and joins the cluster but crashes within a couple of minutes with no info in the logs. At this point I'd prefer to just give up on it and assume it's in a bad state and recover it from the working monitors. What's the best way to go about

[ceph-users] Crushmap and failure domains at rack level (ideally data-center level in the future)

2018-10-23 Thread Waterbly, Dan
Hello, I want to create a crushmap rule where I can lose two racks of hosts and still be able to operate. I have tried the rule below, but it only allows me to operate (rados gateway) with one rack down and two racks up. If I lose any host in the two remaining racks my rados gateway stops respo

Re: [ceph-users] bluestore compression enabled but no data compressed

2018-10-23 Thread Igor Fedotov
Hi Frank, On 10/23/2018 2:56 PM, Frank Schilder wrote: Dear David and Igor, thank you very much for your help. I have one more question about chunk sizes and data granularity on bluestore and will summarize the information I got on bluestore compression at the end. 1) Compression ratio

Re: [ceph-users] [ceph-ansible]Purging cluster using ceph-ansible stable 3.1/3.2

2018-10-23 Thread Cody
Hi Mark, Thank you for pointing out the issue. The problem is solved after I added "library= ~/.ansible/plugins/modules:/usr/share/ansible/plugins/modules:/root/ceph-ansible/library" into the /root/ceph-ansible/ansible.cfg file. The "library" key wasn't there in the first place and the result of

Re: [ceph-users] MDS hangs in "heartbeat_map" deadlock

2018-10-23 Thread Stefan Kooman
Quoting Patrick Donnelly (pdonn...@redhat.com): > Thanks for the detailed notes. It looks like the MDS is stuck > somewhere it's not even outputting any log messages. If possible, it'd > be helpful to get a coredump (e.g. by sending SIGQUIT to the MDS) or, > if you're comfortable with gdb, a backtr

Re: [ceph-users] [ceph-ansible]Purging cluster using ceph-ansible stable 3.1/3.2

2018-10-23 Thread Mark Johnston
On Mon, 2018-10-22 at 20:05 -0400, Cody wrote: > I tried to purge a ceph cluster using infrastructure-playbooks/purge- > cluster.yml from stable 3.1 and stable 3.2 branches, but kept getting the > following error immediately: > > ERROR! no action detected in task. This often indicates a misspelled

Re: [ceph-users] bluestore compression enabled but no data compressed

2018-10-23 Thread Frank Schilder
Dear David and Igor, thank you very much for your help. I have one more question about chunk sizes and data granularity on bluestore and will summarize the information I got on bluestore compression at the end. 1) Compression ratio --- Following Igor's explanation, I tr

Re: [ceph-users] scrub errors

2018-10-23 Thread Sergey Malinin
There is an osd_scrub_auto_repair setting which defaults to 'false'. > On 23.10.2018, at 12:12, Dominque Roux wrote: > > Hi all, > > We lately faced several scrub errors. > All of them were more or less easily fixed with the ceph pg repair X.Y > command. > > We're using ceph version 12.2.7 an

[ceph-users] scrub errors

2018-10-23 Thread Dominque Roux
Hi all, We lately faced several scrub errors. All of them were more or less easily fixed with the ceph pg repair X.Y command. We're using ceph version 12.2.7 and have SSD and HDD pools. Is there a way to prevent our datastore from these kind of errors, or is there a way to automate the fix (It w

Re: [ceph-users] RGW stale buckets

2018-10-23 Thread Janne Johansson
When you run rgw it creates a ton of pools, so one of the other pools were holding the indexes of what buckets there are, and the actual data is what got stored in default.rgw.data (or whatever name it had), so that cleanup was not complete and this is what causes your issues, I'd say. How to move

[ceph-users] slow requests and degraded cluster, but not really ?

2018-10-23 Thread Ben Morrice
Hello all, We have an issue with our ceph cluster where 'ceph -s' shows that several requests are blocked, however querying further with 'ceph health detail' indicates that the PGs affected are either active+clean or do not currently exist. OSD 32 appears to be working fine, and the cluster is