[ceph-users] Re: Are we logging IRC channels?

2024-03-19 Thread Mark Nelson
A long time ago Wido used to have a bot logging IRC afaik, but I think that's been gone for some time. Mark On 3/19/24 19:36, Alvaro Soto wrote: Hi Community!!! Are we logging IRC channels? I ask this because a lot of people only use Slack, and the Slack we use doesn't have a subscription,

[ceph-users] Are we logging IRC channels?

2024-03-19 Thread Alvaro Soto
Hi Community!!! Are we logging IRC channels? I ask this because a lot of people only use Slack, and the Slack we use doesn't have a subscription, so messages are lost after 90 days (I believe) I believe it's important to keep track of the technical knowledge we see each day over IRC+Slack Cheers!

[ceph-users] Re: CephFS space usage

2024-03-19 Thread Anthony D'Atri
> Those files are VM disk images, and they're under constant heavy use, so yes- > there/is/ constant severe write load against this disk. Why are you using CephFS for an RBD application? ___ ceph-users mailing list -- ceph-users@ceph.io To

[ceph-users] Re: CephFS space usage

2024-03-19 Thread Thorne Lawler
Alexander, Thank you, but as I said to Igor: The 5.5TB of files on this filesystem are virtual machine disks. They are under constant, heavy write load. There is no way to turn this off. On 19/03/2024 9:36 pm, Alexander E. Patrakov wrote: Hello Thorne, Here is one more suggestion on how to

[ceph-users] Re: CephFS space usage

2024-03-19 Thread Thorne Lawler
Igor, Those files are VM disk images, and they're under constant heavy use, so yes- there/is/ constant severe write load against this disk. Apart from writing more test files into the filesystems, there must be Ceph diagnostic tools to describe what those objects are being used for, surely?

[ceph-users] Re: RGW: Cannot write to bucket anymore

2024-03-19 Thread Robin H. Johnson
On Tue, Mar 19, 2024 at 01:19:34PM +0100, Malte Stroem wrote: > I checked the policies, lifecycle and versioning. > > Nothing. The user has FULL_CONTROL. Same settings for the user's other > buckets he can still write to. > > Wenn setting debugging to higher numbers all I can see is something

[ceph-users] Leaked clone objects

2024-03-19 Thread Frédéric Nass
  Hello,   Over the last few weeks, we have observed a abnormal increase of a pool's data usage (by a factor of 2). It turns out that we are hit by this bug [1].   In short, if you happened to take pool snapshots and removed them by using the following command   'ceph osd pool rmsnap

[ceph-users] Re: OSD does not die when disk has failures

2024-03-19 Thread Robert Sander
Hi, On 3/19/24 13:00, Igor Fedotov wrote: translating EIO to upper layers rather than crashing an OSD is a valid default behavior. One can alter this by setting bluestore_fail_eio parameter to true. What benefit lies in this behavior when in the end client IO stalls? Regards -- Robert

[ceph-users] RGW: Cannot write to bucket anymore

2024-03-19 Thread Malte Stroem
Hello, there is one bucket for a user in our Ceph cluster who is suddenly not able to write to one of his buckets. Reading works fine. All other buckets work fine. If we copy the bucket to another bucket on the same cluster, the error stays. Writing is not possible in the new bucket, too.

[ceph-users] Re: OSD does not die when disk has failures

2024-03-19 Thread Igor Fedotov
Hi Daniel, translating EIO to upper layers rather than crashing an OSD is a valid default behavior. One can alter this by setting bluestore_fail_eio parameter to true. Thanks, Igor On 3/19/2024 2:50 PM, Daniel Schreiber wrote: Hi, in our cluster (17.2.6) disks fail from time to time.

[ceph-users] Re: MDS_CLIENT_LATE_RELEASE, MDS_SLOW_METADATA_IO, and MDS_SLOW_REQUEST errors and slow osd_ops despite hardware being fine

2024-03-19 Thread Enrico Bocchi
Hello Ivan, Do you observe any spikes in the memory utilization of the MDS when the lock happens? Particularly in buffer_anon? We are observing some rdlock issues which are leading to a spinlock on the MDS but it does not seem to be related to a hanging operation on an OSD. Cheers, Enrico

[ceph-users] Re: Return value from cephadm host-maintenance?

2024-03-19 Thread John Mulligan
On Tuesday, March 19, 2024 7:32:47 AM EDT Daniel Brown wrote: > Possibly a naive question, and possibly seemingly trivial, but is there any > good reason to return a “1” on success for cephadm host-maintenance enter > and exit: No, I doubt that was intentional. The function is written in a way

[ceph-users] OSD does not die when disk has failures

2024-03-19 Thread Daniel Schreiber
Hi, in our cluster (17.2.6) disks fail from time to time. Block devices are HDD, DB devices are NVME. However, the OSD process does not reliably die. That leads to blocked client IO for all requests for which the OSD with the broken disk is the primary OSD. All pools on these OSDs are EC

[ceph-users] Re: MDS_CLIENT_LATE_RELEASE, MDS_SLOW_METADATA_IO, and MDS_SLOW_REQUEST errors and slow osd_ops despite hardware being fine

2024-03-19 Thread Ivan Clayson
Hello Gregory and Nathan, Having a look at our resource utilization, there doesn't seem to be a CPU or memory bottleneck as there is plenty of both available for the host which has the blocked OSD as well for the MDS' host. We've had a repeated of this problem today where the OSD logging

[ceph-users] Return value from cephadm host-maintenance?

2024-03-19 Thread Daniel Brown
Possibly a naive question, and possibly seemingly trivial, but is there any good reason to return a “1” on success for cephadm host-maintenance enter and exit: ~$ sudo cephadm host-maintenance enter --fsid -XX--X Inferring config

[ceph-users] Re: CephFS space usage

2024-03-19 Thread Alexander E. Patrakov
Hello Thorne, Here is one more suggestion on how to debug this. Right now, there is uncertainty on whether there is really a disk space leak or if something simply wrote new data during the test. If you have at least three OSDs you can reassign, please set their CRUSH device class to something

[ceph-users] Re: CephFS space usage

2024-03-19 Thread Igor Fedotov
Hi Thorn, given the amount of files at CephFS volume I presume you don't have severe write load against it. Is that correct? If so we can assume that the numbers you're sharing are mostly refer to your experiment. At peak I can see bytes_used increase = 629,461,893,120 bytes (45978612027392 

[ceph-users] Re: Adding new OSD's - slow_ops and other issues.

2024-03-19 Thread Eugen Block
Hi Jesper, could you please provide more details about the cluster (the usual like 'ceph osd tree', 'ceph osd df', 'ceph versions')? I find it unusual to enable maintenance mode to add OSDs, is there a specific reason? And why adding OSDs manually with 'ceph orch osd add', why not have a

[ceph-users] Re: mon stuck in probing

2024-03-19 Thread Eugen Block
Hi, there are several existing threads on this list, have you tried to apply those suggestions? A couple of them were: - ceph mgr fail - check time sync (NTP, chrony) - different weights for MONs - Check debug logs Regards, Eugen Zitat von faicker mo : some logs here,

[ceph-users] Re: CephFS space usage

2024-03-19 Thread Eugen Block
It's your pool replication (size = 3): 3886733 (number of objects) * 3 = 11660199 Zitat von Thorne Lawler : Can anyone please tell me what "COPIES" means in this context? [ceph: root@san2 /]# rados df -p cephfs.shared.data POOL_NAME USED  OBJECTS  CLONES    COPIES