[ceph-users] Re: One PG keeps going inconsistent (stat mismatch)

2021-10-11 Thread Eric Petit
FWIW, I saw a similar problem on a cluster ~1 ago and noticed that the PG affected with "stat mismatch" was the very last PG of the pool (4.1fff in my case, with pg_num = 8192). I recall thinking that it looked more like a bug than a hardware issue and, assuming your pool has 1024 PGs, you may

[ceph-users] Re: RFP for arm64 test nodes

2021-10-11 Thread Dan Mick
We have some experience testing Ceph on x86 VMs; we used to do that a lot, but have move to mostly physical hosts. I could be wrong, but I think our experience is that the cross-loading from one swamped VM to another on the same physical host can skew the load/failure recovery testing enough

[ceph-users] Multisite Pubsub - Duplicates Growing Uncontrollably

2021-10-11 Thread Alex Hussein-Kershaw
Hi Ceph-Users, I have a multisite Ceph cluster deployed on containers within 3 VMs (6 VMs total over 2 sites). Each VM has a mon, osd, mgr, mds, and two rgw containers (regular and pubsub). It was installed with ceph-ansible. One of the sites has been up for a few years, the other site has

[ceph-users] Re: Ceph User Survey 2022 Planning

2021-10-11 Thread Mike Perez
Hi everyone, Our first meeting will be on October 18 at 17:00 UTC. If you would like to join us, please add your info to the wiki and use the bluejeans link mentioned on the page: https://tracker.ceph.com/projects/ceph/wiki/User_Survey_Working_Group If you're unable to join us, but have

[ceph-users] Re: cephadm adopt with another user than root

2021-10-11 Thread Luis Domingues
I tries you advice today, and it work well. Thanks, Luis Domingues ‐‐‐ Original Message ‐‐‐ On Friday, October 8th, 2021 at 9:50 PM, Daniel Pivonka wrote: > Id have to test this to make sure it works but i believe you can run 'ceph > > cephadm set-user ' > >

[ceph-users] Re: One PG keeps going inconsistent (stat mismatch)

2021-10-11 Thread Simon Ironside
Bump for any pointers here? tl;dr - I've got a single PG that keeps going inconsistent (stat mismatch). It always repairs ok but comes back every day now when it's scrubbed. If there's no suggestions I'll try upgrading to 14.2.22 and then reweighting the other OSDs (I've already done the

[ceph-users] Re: MDSs report damaged metadata

2021-10-11 Thread Vadim Bulst
I removed all entries with: ceph tell mds.$filesystem:0 damage rm $id so that cluster was no longer in error state. It didn't take much time to have again new entries so that it was back to error state. On 10/11/21 10:49, Vadim Bulst wrote: ceph tell mds.scfs:0 scrub start / recursive

[ceph-users] Re: RFP for arm64 test nodes

2021-10-11 Thread mabi
‐‐‐ Original Message ‐‐‐ On Monday, October 11th, 2021 at 10:14 AM, Stefan Kooman wrote: > If you want to go the virtualization route ... you might as well go for > > the Ampere Altra Max with 128 cores :-). I was trying to get an offer for this CPU in Europe but they say it is not

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-11 Thread Igor Fedotov
No, that's just backtrace of the crash - I'd like to see the full OSD log from the process startup till the crash instead... On 10/8/2021 4:02 PM, Szabo, Istvan (Agoda) wrote: Hi Igor, Here is a bluestore tool fsck output: https://justpaste.it/7igrb Is this

[ceph-users] CentOS 7 and CentOS 8 Stream dependencies for diskprediction module

2021-10-11 Thread Michal Strnad
Hi, Did anyone get the diskprediction-local plugin working on CentOS 7.9 or CentOS 8 Stream? We have the same problem under both version of CentOS. When we enable the plugin with 15.2.14 ceph version we get following error. Module 'diskprediction_local' has failed: No module named

[ceph-users] Re: cephfs vs rbd

2021-10-11 Thread PABLO MARTINEZ
Buenos días, Jorge, Muy interesante la prueba, ¿Dado que veo que trasteas mucho con ceph, me puedes responder a una duda? Como calcula Ceph el espacio disponible de disco, no termino de entender cómo es que, si tienes 3 servidores con 5 discos en cada uno de ellos, solo quedan efectivos un

[ceph-users] Re: RFP for arm64 test nodes

2021-10-11 Thread Stefan Kooman
On 10/11/21 09:48, Phil Regnauld wrote: Martin Verges (martin.verges) writes: Hello Dan, why not using a bit bigger machines and use VMs for tests? We have quite good experience with that and it works like a charm. If you plan them as hypervisors, you can run a lot of tests simultaneous. Use

[ceph-users] Re: bluefs _allocate unable to allocate

2021-10-11 Thread Igor Fedotov
hmm... so it looks like RocksDB still doesn't perform WAL cleanup during regular operation but applies it on OSD startup Does single OSD startup (after if's experiencing "unable to allocate) takes 20 mins as well? Could you please share OSD log containing both that long startup and

[ceph-users] MDSs report damaged metadata

2021-10-11 Thread Vadim Bulst
Good morning to everybody, I run into a problem where inodes are not updated in journal backlog and scrubbing plus repair is not removing old infos. Infos about my Ceph-installation: * version Pacific 16.2.5 * 6 nodes * 48 OSDs * one active MDS * one standby-replay MDS * one standby

[ceph-users] Re: RFP for arm64 test nodes

2021-10-11 Thread Phil Regnauld
Martin Verges (martin.verges) writes: > Hello Dan, > > why not using a bit bigger machines and use VMs for tests? We have quite > good experience with that and it works like a charm. If you plan them as > hypervisors, you can run a lot of tests simultaneous. Use the 80 core ARM, Are you

[ceph-users] Re: RFP for arm64 test nodes

2021-10-11 Thread Martin Verges
Hello Dan, why not using a bit bigger machines and use VMs for tests? We have quite good experience with that and it works like a charm. If you plan them as hypervisors, you can run a lot of tests simultaneous. Use the 80 core ARM, put 512GB or more in them and use some good NVMe like P55XX or