[ceph-users] Ceph benchmark tool (cbt)

2020-12-10 Thread Seena Fallah
Hi all, I want to benchmark my production cluster with cbt. I read a bit of the code and I see something strange in it, for example, it's going to create ceph-osd by it selves ( https://github.com/ceph/cbt/blob/master/cluster/ceph.py#L373) and also shutdown the whole cluster!! (

[ceph-users] Re: Incomplete PG due to primary OSD crashing during EC backfill - get_hash_info: Mismatch of total_chunk_size 0

2020-12-10 Thread Byrne, Thomas (STFC,RAL,SC)
A few more things of note after more poking with the help of Dan vdS. 1) The object that the backfill is crashing on has an mtime of a few minutes before the original primary died this morning, and a 'rados get' gives an input/output error. So it looks like a new object that was possibly

[ceph-users] Re: mgr's stop responding, dropping out of cluster with _check_auth_rotating

2020-12-10 Thread David Orman
Hi Janek, We realize this, we referenced that issue in our initial email. We do want the metrics exposed by Ceph internally, and would prefer to work towards a fix upstream. We appreciate the suggestion for a workaround, however! Again, we're happy to provide whatever information we can that

[ceph-users] removing index for non-existent buckets

2020-12-10 Thread Christopher Durham
Hi, I am uisng 15.2.7 on CentOS 8.1. I have a number of old buckets that are listed with # radosgw-admin metadata list bucket.instance but are not listed with: # radosgw-admin bucket list Lets say that one of them is: 'old-bucket' and its instance is 'c100feda-5e16-48a4-b908-7be61aa877ef.123.1'

[ceph-users] Incomplete PG due to primary OSD crashing during EC backfill - get_hash_info: Mismatch of total_chunk_size 0

2020-12-10 Thread Byrne, Thomas (STFC,RAL,SC)
Hi all, Got an odd issue that I'm not sure how to solve on our Nautilus 14.2.9 EC cluster. The primary OSD of an EC 8+3 PG died this morning with a very sad disk (thousands of pending sectors). After the down out interval a new 'up' primary was assigned and the backfill started. Twenty

[ceph-users] Re: mgr's stop responding, dropping out of cluster with _check_auth_rotating

2020-12-10 Thread Janek Bevendorff
FYI, this is the ceph-exporter we're using at the moment: https://github.com/digitalocean/ceph_exporter It's not as good, but it does the job mostly. Some more specific metrics are missing, but the majority is there. On 10/12/2020 19:01, Janek Bevendorff wrote: Do you have the prometheus

[ceph-users] Re: mgr's stop responding, dropping out of cluster with _check_auth_rotating

2020-12-10 Thread Janek Bevendorff
Do you have the prometheus module enabled? Turn that off, it's causing issues. I replaced it with another ceph exporter from Github and almost forgot about it. Here's the relevant issue report: https://tracker.ceph.com/issues/39264#change-179946 On 10/12/2020 16:43, Welby McRoberts wrote:

[ceph-users] mgr's stop responding, dropping out of cluster with _check_auth_rotating

2020-12-10 Thread Welby McRoberts
Hi Folks We've noticed that in a cluster of 21 nodes (5 mgrs & 504 OSDs with 24 per node) that the mgr's are, after a non specific period of time, dropping out of the cluster. The logs only show the following: debug 2020-12-10T02:02:50.409+ 7f1005840700 0 log_channel(cluster) log [DBG] :