[ceph-users] snaptrim number of objects

2023-08-04 Thread Angelo Höngens
Hey guys, I'm trying to figure out what's happening to my backup cluster that often grinds to a halt when cephfs automatically removes snapshots. Almost all OSD's go to 100% CPU, ceph complains about slow ops, and CephFS stops doing client i/o. I'm graphing the cumulative value of the snaptrimq_l

[ceph-users] Re: [External Email] Re: Natuilus: Taking out OSDs that are 'Failure Pending' [EXT]

2023-08-04 Thread Tyler Stachecki
On Fri, Aug 4, 2023 at 11:33 AM Dave Hall wrote: > > Dave, > > Actually, my failure domain is OSD since I so far only have 9 OSD nodes but > EC 8+2. However, the drives are still functioning, except that one has > failed multiple times in the last few days, requiring a node power-cycle to > recov

[ceph-users] Re: snapshot timestamp

2023-08-04 Thread Tony Liu
Thank you Ilya for confirmation! Tony From: Ilya Dryomov Sent: August 4, 2023 04:51 AM To: Tony Liu Cc: d...@ceph.io; ceph-users@ceph.io Subject: Re: [ceph-users] snapshot timestamp On Fri, Aug 4, 2023 at 7:49 AM Tony Liu wrote: > > Hi, > > We know snaps

[ceph-users] Re: What's the max of snap ID?

2023-08-04 Thread Tony Liu
Thank you Eugen and Nathan! uint64 is big enough, no concerns any more. Tony From: Nathan Fish Sent: August 4, 2023 04:19 AM To: Eugen Block Cc: ceph-users@ceph.io Subject: [ceph-users] Re: What's the max of snap ID? 2^64 byte in peta byte = 18446.744073

[ceph-users] Re: [External Email] Re: Natuilus: Taking out OSDs that are 'Failure Pending' [EXT]

2023-08-04 Thread Dave Hall
Dave, Actually, my failure domain is OSD since I so far only have 9 OSD nodes but EC 8+2. However, the drives are still functioning, except that one has failed multiple times in the last few days, requiring a node power-cycle to recover. I will certainly mark that one out immediately. The other

[ceph-users] Re: [EXTERNAL] cephfs mount problem - client session lacks required features - solved

2023-08-04 Thread Götz Reinicke
Hi Josh, Thanks for your feedback. We did a restart of the active MDS and the error / problem is gone. Best . Götz > Am 04.08.2023 um 16:19 schrieb Beaman, Joshua : > > We did not have any cephfs or mds involved. But since you haven’t even > started a ceph upgrade in earnest, I have

[ceph-users] Re: [EXTERNAL] cephfs mount problem - client session lacks required features

2023-08-04 Thread Beaman, Joshua
We did not have any cephfs or mds involved. But since you haven’t even started a ceph upgrade in earnest, I have to wonder about your nautilus versions. Maybe you have a mismatch there? I would definitely share the output of `ceph versions` and `ceph features`. If you’re not 14.2.22 across t

[ceph-users] Re: Natuilus: Taking out OSDs that are 'Failure Pending' [EXT]

2023-08-04 Thread Dave Holland
On Fri, Aug 04, 2023 at 09:44:57AM -0400, Dave Hall wrote: > My inclination is to mark these 3 OSDs 'OUT' before they crash completely, > but I want to confirm my understanding of Ceph's response to this. Mainly, > given my EC pools (or replicated pools for that matter), if I mark all 3 > OSD out

[ceph-users] cephfs mount problem - client session lacks required features

2023-08-04 Thread Götz Reinicke
Hi, During the upgrade from centos7/nautilus to ubuntu 18/nautilus (still updating the MONs) I got a cephfs client who refuses or is refused to mount the ceph fs again. The clients says: mount error 13 = Permission denied The cephmds log: lacks required features 0x1000 client suppo

[ceph-users] Re: [EXTERNAL] Natuilus: Taking out OSDs that are 'Failure Pending'

2023-08-04 Thread Beaman, Joshua
Marking them OUT first is the way to go. As long as the osds stay UP, they can and will participate in the recovery. How many you can mark out at one time will depend on how sensitive your client i/o is to background recovery, and all of the related tunings. If you have the hours/days to spar

[ceph-users] Re: [EXTERNAL] Upgrading nautilus / centos7 to octopus / ubuntu 20.04. - Suggestions and hints? - Thanks

2023-08-04 Thread Götz Reinicke
Hi, thanks to all suggestions. Right now, it is step by step that works: going to bionic/nautilus …and from that like Josh noted. We encountered a problem which I'll post separately . Best . Götz > Am 03.08.2023 um 15:44 schrieb Beaman, Joshua : > > We went through this exercise, thou

[ceph-users] Natuilus: Taking out OSDs that are 'Failure Pending'

2023-08-04 Thread Dave Hall
Hello. It's been a while. I have a Nautilus cluster with 72 x 12GB HDD OSDs (BlueStore) and mostly of EC 8+2 pools/PGs. It's been working great - some nodes went nearly 900 days without a reboot. As of yesterday I found that I have 3 OSDs with a Smart status of 'Pending Failure'. New drives ar

[ceph-users] Re: question about OSD onode hits ratio

2023-08-04 Thread Mark Nelson
Check to see what your osd_memory_target is set to.  The default 4GB is generally a decent starting point, but if you have a large active data set you might benefit from increasing the amount of memory available to the OSDs.  They'll generally prefer giving it to the onode cache first if it's h

[ceph-users] Re: Ceph Quincy and liburing.so.2 on Rocky Linux 9

2023-08-04 Thread Tobias Urdin
That’s a major misinterpretation of how it actually is in reality. Sorry just had to state that, obviously not the proper mailing list to discuss it on. Best regards Tobias > On 4 Aug 2023, at 09:25, Jens Galsgaard wrote: > > Your are right. > > Centos stream is alpha > Fedora is beta > RHEL

[ceph-users] Re: snapshot timestamp

2023-08-04 Thread Ilya Dryomov
On Fri, Aug 4, 2023 at 7:49 AM Tony Liu wrote: > > Hi, > > We know snapshot is on a point of time. Is this point of time tracked > internally by > some sort of sequence number, or the timestamp showed by "snap ls", or > something else? Hi Tony, The timestamp in "rbd snap ls" output is the snap

[ceph-users] Re: What's the max of snap ID?

2023-08-04 Thread Nathan Fish
2^64 byte in peta byte = 18446.744073709551616 (peta⋅byte) Assuming that a snapshot requires storing any data at all, which it must, nobody has a Ceph cluster that could store that much snapshot metadata even for empty snapshots. On Fri, Aug 4, 2023 at 7:05 AM Eugen Block wrote: > > I'm no prog

[ceph-users] Re: What's the max of snap ID?

2023-08-04 Thread Eugen Block
I'm no programmer but if I understand [1] correctly it's an unsigned long long: int ImageCtx::snap_set(uint64_t in_snap_id) { which means the max snap_id should be this: 2^64 = 18446744073709551616 Not sure if you can get your cluster to reach that limit, but I also don't know what woul

[ceph-users] Re: Ceph Quincy and liburing.so.2 on Rocky Linux 9

2023-08-04 Thread dobrie2
Konstantin Shalygin wrote: > Hi, > > In most cases the 'Alternative' distro like Alma or Rocky have outdated > versions > of packages, if we compared it with CentOS Stream 8 or CentOS Stream 9. For > example is a > golang package, on c8s is a 1.20 version on Alma still 1.19 > > You can try to u

[ceph-users] Re: [EXTERN] Re: Ceph Quincy and liburing.so.2 on Rocky Linux 9

2023-08-04 Thread Dietmar Rieder
I thought so too, but now I'm a bit confused. We are planning to setup a new ceph cluster and initially opted for a el9 system, which is supposed to be stable, should we rather use a stream trail version? Dietmar On 8/4/23 09:04, Marc wrote: But Rocky Linux 9 is the continuation of what

[ceph-users] Re: Ceph Quincy and liburing.so.2 on Rocky Linux 9

2023-08-04 Thread Jens Galsgaard
Your are right. Centos stream is alpha Fedora is beta RHEL is stable Alma/Rocky/Oracle are based on RHEL Venlig hilsen - Mit freundlichen Grüßen - Kind Regards, Jens Galsgaard Gitservice.dk Mob: +45 28864340 -Oprindelig meddelelse- Fra: Marc Sendt: Friday, 4 August 2023 09.04 Til:

[ceph-users] Re: Ceph Quincy and liburing.so.2 on Rocky Linux 9

2023-08-04 Thread Marc
But Rocky Linux 9 is the continuation of what CentOS would have been on el9. Afaik is ceph being developed on elX distributions and not the 'trial' stream versions, not? > > In most cases the 'Alternative' distro like Alma or Rocky have outdated > versions of packages, if we compared it with C