[ceph-users] Re: Quincy 17.2.6 - Rados gateway crash -

2023-08-16 Thread Matthias Grandl
We have also encountered this exact backtrace on 17.2.6 also in combination with Veeam Backups. I suspect a regression as we had no issues before the update and all other clusters still running 17.2.5 and Veeam Backups don’t appear to be affected. -- Matthias Grandl matthias.gra...@croit.io c

[ceph-users] Messenger v2 Connection mode config options

2023-08-16 Thread Beaman, Joshua
After reviewing: https://docs.ceph.com/en/reef/rados/configuration/msgr2/#connection-mode-configuration-options I am still confused about the difference between: ms_service_mode• a list of permitted modes for client

[ceph-users] Re: osdspec_affinity error in the Cephadm module

2023-08-16 Thread Adam King
it looks like you've hit https://tracker.ceph.com/issues/58946 which has a candidate fix open, but nothing merged. The description on the PR with the candidate fix says "When osdspec_affinity is not set, the drive selection code will fail. This can happen when a device has multiple LVs where some o

[ceph-users] osdspec_affinity error in the Cephadm module

2023-08-16 Thread Adam Huffman
I've been having fun today trying to invite a new disk that replaced a failing one into a cluster. One of my attempts to apply an OSD spec was clearly wrong, because I now have this error: Module 'cephadm' has failed: 'osdspec_affinity' and this was the traceback in the mgr logs: Traceback (mo

[ceph-users] Re: cephadm orchestrator does not restart daemons [was: ceph orch upgrade stuck between 16.2.7 and 16.2.13]

2023-08-16 Thread Adam King
I've seen this before where the ceph-volume process hanging causes the whole serve loop to get stuck (we have a patch to get it to timeout properly in reef and are backporting to quincy but nothing for pacific unfortunately). That's why I was asking about the REFRESHED column in the orch ps/ orch d

[ceph-users] Re: [ceph v16.2.10] radosgw crash

2023-08-16 Thread Casey Bodley
thanks Louis, that looks like the same backtrace as https://tracker.ceph.com/issues/61763. that issue has been on 'Need More Info' because all of the rgw logging was disabled there. are you able to share some more log output to help us figure this out? under "--- begin dump of recent events ---",

[ceph-users] Re: cephadm orchestrator does not restart daemons [was: ceph orch upgrade stuck between 16.2.7 and 16.2.13]

2023-08-16 Thread Eugen Block
Great, thanks for the update! Just yesterday I wanted to cleanup a couple of test clusters and remove some old container images which seemed to still be in use although several upgrades had been processed. Those were quite old ceph-volume inventory processes, dating back to the initial clus

[ceph-users] Re: CephFS metadata outgrow DISASTER during recovery

2023-08-16 Thread Jakub Petrzilka
Hello Again, Is your data pool EC please? Kind regards, Jakub Petrzilka, Nubium. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: cephadm orchestrator does not restart daemons [was: ceph orch upgrade stuck between 16.2.7 and 16.2.13]

2023-08-16 Thread Robert Sander
On 8/16/23 12:10, Eugen Block wrote: I don't really have a good idea right now, but there was a thread [1] about ssh sessions that are not removed, maybe that could have such an impact? And if you crank up the debug level to 30, do you see anything else? It was something similar. There were lef

[ceph-users] Re: CephFS metadata outgrow DISASTER during recovery

2023-08-16 Thread Jakub Petrzilka
Seems it could be the same issue somewhere deep under the hood. Do you remember anything abnormal before this issue please? Any reweighting, balancer runs, osds restart - anything what can cause PG peering? However we were far from nearfull state (as visible on the screenshot). We were on abo

[ceph-users] Re: Can't join new mon - lossy channel, failing

2023-08-16 Thread Konstantin Shalygin
> On 16 Aug 2023, at 13:23, Josef Johansson wrote: > > I'm running ceph version 15.2.16 (a6b69e817d6c9e6f02d0a7ac3043ba9cdbda1bdf) > octopus (stable), that would mean I am not running the fix. > > Glad to know that an upgrade will solve the issue! I'm not 100% sure that this tracker, exactly

[ceph-users] Re: cephadm orchestrator does not restart daemons [was: ceph orch upgrade stuck between 16.2.7 and 16.2.13]

2023-08-16 Thread Eugen Block
I don't really have a good idea right now, but there was a thread [1] about ssh sessions that are not removed, maybe that could have such an impact? And if you crank up the debug level to 30, do you see anything else? ceph config set mgr debug_mgr 30 [1] https://lists.ceph.io/hyperkitty

[ceph-users] Re: Can't join new mon - lossy channel, failing

2023-08-16 Thread Konstantin Shalygin
Hi, > On 16 Aug 2023, at 11:30, Josef Johansson wrote: > > Let's do some serious necromancy here. > > I just had this exact problem. Turns out that after rebooting all nodes (one > at the time of course), the monitor could join perfectly. > > Why? You tell me. We did not see any traces of the

[ceph-users] cephadm orchestrator does not restart daemons [was: ceph orch upgrade stuck between 16.2.7 and 16.2.13]

2023-08-16 Thread Robert Sander
On 8/15/23 16:36, Adam King wrote: with the log to cluster level already on debug, if you do a "ceph mgr fail" what does cephadm log to the cluster before it reports sleeping? It should at least be doing something if it's responsive at all. Also, in "ceph orch ps"  and "ceph orch device ls" ar

[ceph-users] Re: Cephadm adoption - service reconfiguration changes container image

2023-08-16 Thread Eugen Block
That would have been my suggestion as well, set your own container image and override the default. Just one comment, the config option is "container_image" and not "container", that one fails: $ ceph config set global container my-registry:5000/ceph/ceph:16.2.9 Error EINVAL: unrecognized conf

[ceph-users] Re: Cephadm adoption - service reconfiguration changes container image

2023-08-16 Thread Iain Stott
Thanks Adam, Will give it a try today. Cheers From: Adam King Sent: 15 August 2023 15:40 To: Iain Stott Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Cephadm adoption - service reconfiguration changes container image CAUTION: This email originates from ou