[ceph-users] Re: RGW - user created bucket with name of already created bucket

2024-01-14 Thread Jayanth Reddy
Hello Ondrej,

Does renaming the bucket help? I see the command [1] takes the UID.
How about you take the maintenance window from your both of the users and try 
renaming usernames or bucket names to see if either helps?

[1] https://www.ibm.com/docs/en/storage-ceph/7?topic=management-renaming-buckets

Regards,
Jayanth


From: Ondřej Kukla 
Sent: Friday, January 12, 2024 4:19:32 PM
To: Jayanth Reddy 
Cc: ceph-users@ceph.io 
Subject: Re: [ceph-users] RGW - user created bucket with name of already 
created bucket

Thanks Jayanth,

I’ve tried this but unfortunately the unlink fails as the it checks against the 
bucket owner id which is not the user I’m trying to unlink.

So I’m still stuck here with two users with same bucket name :(

Ondrej

On 24. 12. 2023, at 17:14, Jayanth Reddy  wrote:

Hi Ondřej,
I've not tried it myself, but see if you can use # radosgw-admin bucket unlink 
[1] command to achieve it. It is strange that the user was somehow able to 
create the bucket with the same name. We've also got v17.2.6 and have not 
encountered this so far. Maybe devs from RGW can answer this.

[1] https://docs.ceph.com/en/quincy/man/8/radosgw-admin/#commands

Thanks,
Jayanth

On Fri, Dec 22, 2023 at 7:29 PM Ondřej Kukla 
mailto:ond...@kuuk.la>> wrote:
Hello,

I would like to share a quite worrying experience I’ve just found on one of my 
production clusters.

User successfully created a bucket with name of a bucket that already exists!

He is not bucket owner - the original user is, but he is able to see it when he 
does ListBuckets over s3 api. (Both accounts are able to do it now - only the 
original owner is able to interact with it)

This bucket is also counted to the new users usage stats.

Has anyone noticed this before? This cluster is running on Quincy - 17.2.6.

Is there a way to detach the bucket from the new owner so he doesn’t have a 
bucket that doesn’t belong to him?

Regards,

Ondrej



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 
ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?

2024-01-14 Thread Anthony D'Atri

Agreed, though today either limits one’s choices of manufacturer.

> There are models to fit that, but if you're also considering new drives,
> you can get further density in E1/E3

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?

2024-01-14 Thread Robin H. Johnson
On Fri, Jan 12, 2024 at 02:32:12PM +, Drew Weaver wrote:
> Hello,
> 
> So we were going to replace a Ceph cluster with some hardware we had
> laying around using SATA HBAs but I was told that the only right way
> to build Ceph in 2023 is with direct attach NVMe.
> 
> Does anyone have any recommendation for a 1U barebones server (we just
> drop in ram disks and cpus) with 8-10 2.5" NVMe bays that are direct
> attached to the motherboard without a bridge or HBA for Ceph
> specifically?
If you're buying new, Supermicro would be my first choice for vendor
based on experience.
https://www.supermicro.com/en/products/nvme

You said 2.5" bays, which makes me think you have existing drives.
There are models to fit that, but if you're also considering new drives,
you can get further density in E1/E3

The only caveat is that you will absolutely want to put a better NIC in
these systems, because 2x10G is easy to saturate with a pile of NVME.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation President & Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 3 DC with 4+5 EC not quite working

2024-01-14 Thread Christian Wuerdig
I could be wrong however as far as I can see you have 9 chunks which
requires 9 failure domains.
Your failure domain is set to datacenter which you only have 3 of. So that
won't work.

You need to set your failure domain to host and then create a crush rule to
choose a DC and choose 3 hosts within each DC
Something like this should work:
step choose indep 3 type datacenter
step chooseleaf indep 3 type host

On Fri, 12 Jan 2024 at 20:58, Torkil Svensgaard  wrote:

> We are looking to create a 3 datacenter 4+5 erasure coded pool but can't
> quite get it to work. Ceph version 17.2.7. These are the hosts (there
> will eventually be 6 hdd hosts in each datacenter):
>
> -33  886.00842  datacenter 714
>   -7  209.93135  host ceph-hdd1
>
> -69   69.86389  host ceph-flash1
>   -6  188.09579  host ceph-hdd2
>
>   -3  233.57649  host ceph-hdd3
>
> -12  184.54091  host ceph-hdd4
> -34  824.47168  datacenter DCN
> -73   69.86389  host ceph-flash2
>   -2  201.78067  host ceph-hdd5
>
> -81  288.26501  host ceph-hdd6
>
> -31  264.56207  host ceph-hdd7
>
> -36 1284.48621  datacenter TBA
> -77   69.86389  host ceph-flash3
> -21  190.83224  host ceph-hdd8
>
> -29  199.08838  host ceph-hdd9
>
> -11  193.85382  host ceph-hdd10
>
>   -9  237.28154  host ceph-hdd11
>
> -26  187.19536  host ceph-hdd12
>
>   -4  206.37102  host ceph-hdd13
>
> We did this:
>
> ceph osd erasure-code-profile set DRCMR_k4m5_datacenter_hdd
> plugin=jerasure k=4 m=5 technique=reed_sol_van crush-root=default
> crush-failure-domain=datacenter crush-device-class=hdd
>
> ceph osd pool create cephfs.hdd.data erasure DRCMR_k4m5_datacenter_hdd
> ceph osd pool set cephfs.hdd.data allow_ec_overwrites true
> ceph osd pool set cephfs.hdd.data pg_autoscale_mode warn
>
> Didn't quite work:
>
> "
> [WARN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive, 1 pg
> incomplete
>  pg 33.0 is creating+incomplete, acting
> [104,219,NONE,NONE,NONE,41,NONE,NONE,NONE] (reducing pool
> cephfs.hdd.data min_size from 5 may help; search ceph.com/docs for
> 'incomplete')
> "
>
> I then manually changed the crush rule from this:
>
> "
> rule cephfs.hdd.data {
>  id 7
>  type erasure
>  step set_chooseleaf_tries 5
>  step set_choose_tries 100
>  step take default class hdd
>  step chooseleaf indep 0 type datacenter
>  step emit
> }
> "
>
> To this:
>
> "
> rule cephfs.hdd.data {
>  id 7
>  type erasure
>  step set_chooseleaf_tries 5
>  step set_choose_tries 100
>  step take default class hdd
>  step choose indep 0 type datacenter
>  step chooseleaf indep 3 type host
>  step emit
> }
> "
>
> Based on some testing and dialogue I had with Red Hat support last year
> when we were on RHCS, and it seemed to work. Then:
>
> ceph fs add_data_pool cephfs cephfs.hdd.data
> ceph fs subvolumegroup create hdd --pool_layout cephfs.hdd.data
>
> I started copying data to the subvolume and increased pg_num a couple of
> times:
>
> ceph osd pool set cephfs.hdd.data pg_num 256
> ceph osd pool set cephfs.hdd.data pg_num 2048
>
> But at some point it failed to activate new PGs eventually leading to this:
>
> "
> [WARN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs
>  mds.cephfs.ceph-flash1.agdajf(mds.0): 64 slow metadata IOs are
> blocked > 30 secs, oldest blocked for 25455 secs
> [WARN] MDS_TRIM: 1 MDSs behind on trimming
>  mds.cephfs.ceph-flash1.agdajf(mds.0): Behind on trimming
> (997/128) max_segments: 128, num_segments: 997
> [WARN] PG_AVAILABILITY: Reduced data availability: 5 pgs inactive
>  pg 33.6f6 is stuck inactive for 8h, current state
> activating+remapped, last acting [50,79,116,299,98,219,164,124,421]
>  pg 33.6fa is stuck inactive for 11h, current state
> activating+undersized+degraded+remapped, last acting
> [17,408,NONE,196,223,290,73,39,11]
>  pg 33.705 is stuck inactive for 11h, current state
> activating+undersized+degraded+remapped, last acting
> [33,273,71,NONE,411,96,28,7,161]
>  pg 33.721 is stuck inactive for 7h, current state
> activating+remapped, last acting [283,150,209,423,103,325,118,142,87]
>  pg 33.726 is stuck inactive for 11h, current state
> activating+undersized+degraded+remapped, last acting
> [234,NONE,416,121,54,141,277,265,19]
> [WARN] PG_DEGRADED: Degraded data redundancy: 1818/1282640036 objects
> degraded (0.000%), 3 pgs degraded, 3 pgs undersized
>  pg 33.6fa is stuck undersized for 7h, current state
> activating+undersized+degraded+remapped, last acting
> [17,408,NONE,196,223,290,73,39,11]
>  pg 33.705 is stuck undersized for 7h, current state
> activating+undersized+deg