[ceph-users] Re: MDS_DAMAGE dir_frag

2022-12-14 Thread Sascha Lucas
Hi Venky, On Wed, 14 Dec 2022, Venky Shankar wrote: On Tue, Dec 13, 2022 at 6:43 PM Sascha Lucas wrote: Just an update: "scrub / recursive,repair" does not uncover additional errors. But also does not fix the single dirfrag error. File system scrub does not clear entries from the damage

[ceph-users] Re: New pool created with 2048 pg_num not executed

2022-12-14 Thread Martin Buss
Hi Frank and Eugen, target_max_misplaced_ratio 1 did the trick. Now I can increment pg_num and pgp_num in steps of 128 increments. Thanks! On 14.12.22 21:32, Frank Schilder wrote: Hi Eugen: déjà vu again? I think the way autoscaler code in the MGRs interferes with operations is

[ceph-users] Re: New pool created with 2048 pg_num not executed

2022-12-14 Thread Martin Buss
after backfilling was complete, I was able to increase pg_num and pgp_num on the empty pool cfs_data in 128 increments all the way up to 2048, that was fine. This is not working for the filled pool. pg_num 187 pgp_num, 59 trying to increase that in small increments set nobackfill set

[ceph-users] rgw: "failed to read header: bad method" after PutObject failed with 404 (NoSuchBucket)

2022-12-14 Thread Stefan Reuter
Hi, When I try to upload an object to a non-existing bucket, PutObject returns a 404 Not Found with error code NoSuchBucket as expected. Trying to create the bucket afterwards however results in a 400 Bad Request error which is not expected. The rgw logs indicate "failed to read header: bad

[ceph-users] Re: New pool created with 2048 pg_num not executed

2022-12-14 Thread Frank Schilder
Hi Martin, I can't find the output of ceph osd df tree ceph status anywhere. I thought you posted it, but well. Could you please post the output of these commands? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: New pool created with 2048 pg_num not executed

2022-12-14 Thread Martin Buss
Hi Frank, thanks for coming in on this, setting target_max_misplaced_ratio to 1 does not help Regards, Martin On 14.12.22 21:32, Frank Schilder wrote: Hi Eugen: déjà vu again? I think the way autoscaler code in the MGRs interferes with operations is extremely confusing. Could this be the

[ceph-users] Re: ceph-volume inventory reports available devices as unavailable

2022-12-14 Thread Frank Schilder
Hi Eugen, thanks for that. I guess the sane insane logic could be that if "rejected_reasons": ["LVM detected", "locked"], the disk has at least 1 OSD (or something ceph-ish) already and lvm batch would do something non-trivial (report-json not empty), one should consider the disk as

[ceph-users] Re: New pool created with 2048 pg_num not executed

2022-12-14 Thread Frank Schilder
Hi Eugen: déjà vu again? I think the way autoscaler code in the MGRs interferes with operations is extremely confusing. Could this be the same issue I and somebody else had a while ago? Even though autoscaler is disabled, there are parts of it in the MGR still interfering. One of the

[ceph-users] Re: ceph-iscsi lock ping pong

2022-12-14 Thread Joe Comeau
That's correct - we use the kernel target not tcmu-runner >>> Xiubo Li 12/13/2022 6:02 PM >>> On 14/12/2022 06:54, Joe Comeau wrote: > I am curious about what is happening with your iscsi configuration > Is this a new iscsi config or something that has just cropped up ? > > We are

[ceph-users] User + Dev Monthly Meeting happening tomorrow, December 15th!

2022-12-14 Thread Laura Flores
Hi Ceph Users, The User + Dev Monthly Meeting is coming up tomorrow, *Thursday, December 15th* *@* *3:00pm UTC* (time conversions below). See meeting details at the bottom of this email. Please add any topics you'd like to discuss to the agenda:

[ceph-users] Re: CephFS constant high write I/O to the metadata pool

2022-12-14 Thread Olli Rajala
Hi, One thing I now noticed in the mds logs is that there's a ton of entries like this: 2022-12-11T18:20:49.321+0200 7fdd0edde700 20 mds.0.cache projecting to [d345,d346] n(v1638 rc2022-12-11T18:20:49.317400+0200 b787972591 694=484+210) 2022-12-11T18:20:49.321+0200 7fdd0edde700 20 mds.0.cache

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-12-14 Thread Jakub Jaszewski
Sure, I tried in screen session before but it did not reduce the queue. Eventually managed to zero the queue by increasing these params radosgw-admin gc process --include-all --debug-rgw=20 --rgw-gc-max-concurrent-io=20 --rgw-gc-max-trim-chunk=64 --rgw-gc-processor-max-time=7200 I think it was

[ceph-users] Re: MDS_DAMAGE dir_frag

2022-12-14 Thread Venky Shankar
Hi Sascha, On Tue, Dec 13, 2022 at 6:43 PM Sascha Lucas wrote: > > Hi, > > On Mon, 12 Dec 2022, Sascha Lucas wrote: > > > On Mon, 12 Dec 2022, Gregory Farnum wrote: > > >> Yes, we’d very much like to understand this. What versions of the server > >> and kernel client are you using? What platform

[ceph-users] Re: SLOW_OPS

2022-12-14 Thread Eugen Block
With 12 OSDs and a default of 4 GB RAM per OSD you would at least require 48 GB, usually a little more. Even if you reduced the memory target per OSD it doesn’t mean they can deal with the workload. There was a thread explaining that a couple of weeks ago. Zitat von Murilo Morais : Good

[ceph-users] Re: Recent ceph.io Performance Blog Posts

2022-12-14 Thread Stefan Kooman
On 11/21/22 10:07, Stefan Kooman wrote: On 11/8/22 21:20, Mark Nelson wrote: 2.     https://ceph.io/en/news/blog/2022/qemu-kvm-tuning/     You tested network encryption impact on performance. It would be nice to see how OSD encryption

[ceph-users] SLOW_OPS

2022-12-14 Thread Murilo Morais
Good morning everyone. Guys, today my cluster had a "problem", it was showing SLOW_OPS, when restarting the OSDs that were showing this problem everything was solved (there were VMs stuck because of this), what I'm breaking my head is to know the reason for having SLOW_OPS. In the logs I saw

[ceph-users] Re: ceph-volume inventory reports available devices as unavailable

2022-12-14 Thread Ralph Soika
Hi, in run into the same problem. After installing ceph quincy on different servers, some were able to detect the disks others not. My servers are hosted at hetzner.de and as I did not found a solution, so I tried as long different servers until I found servers where ceph detected the disks

[ceph-users] Re: New pool created with 2048 pg_num not executed

2022-12-14 Thread Eugen Block
Then I'd suggest to wait until the backfilling is done and then report back if the PGs are still not created. I don't have information about the ML admin, sorry. Zitat von Martin Buss : that cephfs_data has been autoscaling while filling, the mismatched numbers are a result of that

[ceph-users] Re: New pool created with 2048 pg_num not executed

2022-12-14 Thread Martin Buss
that cephfs_data has been autoscaling while filling, the mismatched numbers are a result of that autoscaling the cluster status is WARN as there is still some old stuff backfilling on cephfs_data The issue is the newly created pool 9 cfs_data, which is stuck at 1152 pg_num ps: can you

[ceph-users] Re: New pool created with 2048 pg_num not executed

2022-12-14 Thread Eugen Block
I'm wondering why the cephfs_data pool has mismatching pg_num and pgp_num: pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 187 pgp_num 59 autoscale_mode off Does disabling the autoscaler leave it like that when you disable it in the middle of

[ceph-users] Re: ceph-volume inventory reports available devices as unavailable

2022-12-14 Thread Martin Buss
Hi list admins, I accidentally posted my private address, can you please delete that post? https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/JMFG73QMB3MJKHDMNPIKZHQOUUCJPJJN/ Thanks, Martin On 14.12.22 15:18, Eugen Block wrote: Hi, I haven't been dealing with ceph-volume too

[ceph-users] Re: ceph-volume inventory reports available devices as unavailable

2022-12-14 Thread Eugen Block
Hi, I haven't been dealing with ceph-volume too much lately, but I remember seeing that when I have multiple DB devices on SSD and wanted to replace only one failed drive. Although ceph-volume inventory reported the disk as unavailable the actual create command was successful. But I

[ceph-users] Re: New pool created with 2048 pg_num not executed

2022-12-14 Thread Martin Buss
Hi Eugen, thanks, sure, below: pg_num stuck at 1152 and pgp_num stuck at 1024 Regards, Martin ceph config set global mon_max_pg_per_osd 400 ceph osd pool create cfs_data 2048 2048 --pg_num_min 2048 pool 'cfs_data' created pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 0

[ceph-users] Re: New pool created with 2048 pg_num not executed

2022-12-14 Thread Martin Buss
Hi Eugen, thanks, sure, below: pg_num stuck at 1152 and pgp_num stuck at 1024 Regards, Martin ceph config set global mon_max_pg_per_osd 400 ceph osd pool create cfs_data 2048 2048 --pg_num_min 2048 pool 'cfs_data' created pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 0

[ceph-users] Re: New pool created with 2048 pg_num not executed

2022-12-14 Thread Eugen Block
Hi, are there already existing pools in the cluster? Can you share your 'ceph osd df tree' as well as 'ceph osd pool ls detail'? It sounds like ceph is trying to stay within the limit of mon_max_pg_per_osd (default 250). Regards, Eugen Zitat von Martin Buss : Hi, on quincy, I created

[ceph-users] Re: pacific: ceph-mon services stopped after OSDs are out/down

2022-12-14 Thread Eugen Block
There's an existing tracker issue [1] that hasn't been updated since a year. The OP reported that restarting the other MONs did resolve it, have you tried that? [1] https://tracker.ceph.com/issues/52760 Zitat von Mevludin Blazevic : Its very strange. The keyring of the ceph monitor is the

[ceph-users] New pool created with 2048 pg_num not executed

2022-12-14 Thread Martin Buss
Hi, on quincy, I created a new pool ceph osd pool create cfs_data 2048 2048 6 hosts 71 osds autoscaler is off; I find it kind of strange that the pool is created with pg_num 1152 and pgp_num 1024, mentioning the 2048 as the new target. I cannot manage to actually make this pool contain 2048

[ceph-users] Re: Purge OSD does not delete the OSD deamon

2022-12-14 Thread Mevludin Blazevic
Update: It was removed after 6min from the dashboard Am 14.12.2022 um 12:11 schrieb Stefan Kooman: On 12/14/22 11:40, Mevludin Blazevic wrote: Hi, the strange thing is that on 2 different host, an OSD deamon with the same ID is present, by doing ls on /var/lib/ceph/FSID, e.g. I am afraid

[ceph-users] Re: Purge OSD does not delete the OSD deamon

2022-12-14 Thread Stefan Kooman
On 12/14/22 11:40, Mevludin Blazevic wrote: Hi, the strange thing is that on 2 different host, an OSD deamon with the same ID is present, by doing ls on /var/lib/ceph/FSID, e.g. I am afraid that performing a ceph orch deamon rm will remove both osd deamons, the healthy one and the failed

[ceph-users] Re: Purge OSD does not delete the OSD deamon

2022-12-14 Thread Mevludin Blazevic
Hi, the strange thing is that on 2 different host, an OSD deamon with the same ID is present, by doing ls on /var/lib/ceph/FSID, e.g. I am afraid that performing a ceph orch deamon rm will remove both osd deamons, the healthy one and the failed one. Am 14.12.2022 um 11:35 schrieb Mevludin

[ceph-users] Purge OSD does not delete the OSD deamon

2022-12-14 Thread Mevludin Blazevic
Hi all, while trying to perform an update from Ceph Pacific to the current Patch version, errors occure due to failed osd deamon which are still present and installed on some Ceph hosts although I purged the corresponding OSD using the GUI. I am using a Red Hat environment, what is the save

[ceph-users] Re: ceph-iscsi lock ping pong

2022-12-14 Thread Stolte, Felix
We have been using tgt for five years and switched to ceph-iscsi (LIO Framework) two months ago. We observed a massive performance boost. Can’t say though if the performance increase was only related to the different software or if our TGT configuration was not as could as it could have been.