Hey guys,
I'm trying to figure out what's happening to my backup cluster that
often grinds to a halt when cephfs automatically removes snapshots.
Almost all OSD's go to 100% CPU, ceph complains about slow ops, and
CephFS stops doing client i/o.
I'm graphing the cumulative value of the snaptrimq_l
On Fri, Aug 4, 2023 at 11:33 AM Dave Hall wrote:
>
> Dave,
>
> Actually, my failure domain is OSD since I so far only have 9 OSD nodes but
> EC 8+2. However, the drives are still functioning, except that one has
> failed multiple times in the last few days, requiring a node power-cycle to
> recov
Thank you Ilya for confirmation!
Tony
From: Ilya Dryomov
Sent: August 4, 2023 04:51 AM
To: Tony Liu
Cc: d...@ceph.io; ceph-users@ceph.io
Subject: Re: [ceph-users] snapshot timestamp
On Fri, Aug 4, 2023 at 7:49 AM Tony Liu wrote:
>
> Hi,
>
> We know snaps
Thank you Eugen and Nathan!
uint64 is big enough, no concerns any more.
Tony
From: Nathan Fish
Sent: August 4, 2023 04:19 AM
To: Eugen Block
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: What's the max of snap ID?
2^64 byte in peta byte
= 18446.744073
Dave,
Actually, my failure domain is OSD since I so far only have 9 OSD nodes but
EC 8+2. However, the drives are still functioning, except that one has
failed multiple times in the last few days, requiring a node power-cycle to
recover. I will certainly mark that one out immediately.
The other
Hi Josh,
Thanks for your feedback. We did a restart of the active MDS and the error /
problem is gone.
Best . Götz
> Am 04.08.2023 um 16:19 schrieb Beaman, Joshua :
>
> We did not have any cephfs or mds involved. But since you haven’t even
> started a ceph upgrade in earnest, I have
We did not have any cephfs or mds involved. But since you haven’t even started
a ceph upgrade in earnest, I have to wonder about your nautilus versions.
Maybe you have a mismatch there?
I would definitely share the output of `ceph versions` and `ceph features`. If
you’re not 14.2.22 across t
On Fri, Aug 04, 2023 at 09:44:57AM -0400, Dave Hall wrote:
> My inclination is to mark these 3 OSDs 'OUT' before they crash completely,
> but I want to confirm my understanding of Ceph's response to this. Mainly,
> given my EC pools (or replicated pools for that matter), if I mark all 3
> OSD out
Hi,
During the upgrade from centos7/nautilus to ubuntu 18/nautilus (still updating
the MONs) I got a cephfs client who refuses or is refused to mount the ceph fs
again.
The clients says: mount error 13 = Permission denied
The cephmds log: lacks required features 0x1000 client suppo
Marking them OUT first is the way to go. As long as the osds stay UP, they can
and will participate in the recovery. How many you can mark out at one time
will depend on how sensitive your client i/o is to background recovery, and all
of the related tunings. If you have the hours/days to spar
Hi, thanks to all suggestions.
Right now, it is step by step that works: going to bionic/nautilus …and from
that like Josh noted.
We encountered a problem which I'll post separately .
Best . Götz
> Am 03.08.2023 um 15:44 schrieb Beaman, Joshua :
>
> We went through this exercise, thou
Hello. It's been a while. I have a Nautilus cluster with 72 x 12GB HDD
OSDs (BlueStore) and mostly of EC 8+2 pools/PGs. It's been working great -
some nodes went nearly 900 days without a reboot.
As of yesterday I found that I have 3 OSDs with a Smart status of 'Pending
Failure'. New drives ar
Check to see what your osd_memory_target is set to. The default 4GB is
generally a decent starting point, but if you have a large active data
set you might benefit from increasing the amount of memory available to
the OSDs. They'll generally prefer giving it to the onode cache first
if it's h
That’s a major misinterpretation of how it actually is in reality.
Sorry just had to state that, obviously not the proper mailing list to discuss
it on.
Best regards
Tobias
> On 4 Aug 2023, at 09:25, Jens Galsgaard wrote:
>
> Your are right.
>
> Centos stream is alpha
> Fedora is beta
> RHEL
On Fri, Aug 4, 2023 at 7:49 AM Tony Liu wrote:
>
> Hi,
>
> We know snapshot is on a point of time. Is this point of time tracked
> internally by
> some sort of sequence number, or the timestamp showed by "snap ls", or
> something else?
Hi Tony,
The timestamp in "rbd snap ls" output is the snap
2^64 byte in peta byte
= 18446.744073709551616 (peta⋅byte)
Assuming that a snapshot requires storing any data at all, which it
must, nobody has a Ceph cluster that could store that much snapshot
metadata even for empty snapshots.
On Fri, Aug 4, 2023 at 7:05 AM Eugen Block wrote:
>
> I'm no prog
I'm no programmer but if I understand [1] correctly it's an unsigned
long long:
int ImageCtx::snap_set(uint64_t in_snap_id) {
which means the max snap_id should be this:
2^64 = 18446744073709551616
Not sure if you can get your cluster to reach that limit, but I also
don't know what woul
Konstantin Shalygin wrote:
> Hi,
>
> In most cases the 'Alternative' distro like Alma or Rocky have outdated
> versions
> of packages, if we compared it with CentOS Stream 8 or CentOS Stream 9. For
> example is a
> golang package, on c8s is a 1.20 version on Alma still 1.19
>
> You can try to u
I thought so too, but now I'm a bit confused. We are planning to
setup a new ceph cluster and initially opted for a el9 system, which is
supposed to be stable, should we rather use a stream trail version?
Dietmar
On 8/4/23 09:04, Marc wrote:
But Rocky Linux 9 is the continuation of what
Your are right.
Centos stream is alpha
Fedora is beta
RHEL is stable
Alma/Rocky/Oracle are based on RHEL
Venlig hilsen - Mit freundlichen Grüßen - Kind Regards,
Jens Galsgaard
Gitservice.dk
Mob: +45 28864340
-Oprindelig meddelelse-
Fra: Marc
Sendt: Friday, 4 August 2023 09.04
Til:
But Rocky Linux 9 is the continuation of what CentOS would have been on el9.
Afaik is ceph being developed on elX distributions and not the 'trial' stream
versions, not?
>
> In most cases the 'Alternative' distro like Alma or Rocky have outdated
> versions of packages, if we compared it with C
21 matches
Mail list logo