[ceph-users] 回复: Failed in ceph-osd -i ${osd_id} --mkfs -k /var/lib/ceph/osd/ceph-${osd_id}/keyring

2022-03-08 Thread huxia...@horebdata.cn
Just to report back the root cause of the above mentioned failures in " ceph-osd -i ${osd_id} --mkfs -k /var/lib/ceph/osd/ceph-${osd_id}/keyring" It turns out the culprit was using Samsung SM883 SSD disks as DB/WAL partitions. Replacing SM883 with Intel S4510/4520 SSDs solved the issues. It

[ceph-users] Re: "Incomplete" pg's

2022-03-08 Thread Kyriazis, George
Ok, some progress… I’m describing what I did here, hopefully it will help someone that ended up in the same predicament. I used "ceph-objectstore-tool … —op mark-complete” to mark the incomplete pgs as complete on the primary OSD, and then brought the OSD up. The incomplete pg now has a

[ceph-users] Re: RGW STS AssumeRoleWithWebIdentity Multi-Tenancy

2022-03-08 Thread Pritha Srivastava
Alternatively, if you want to restrict access to s3 resources for different groups of users, then you can do so by creating a role in a tenant, and then create s3 resources and attach tags to them and then use ABAC/ tags to allow a user to access a particular resource (bucket/ object). Details

[ceph-users] Re: RGW STS AssumeRoleWithWebIdentity Multi-Tenancy

2022-03-08 Thread Pritha Srivastava
Hi Mark, On Wed, Mar 9, 2022 at 6:57 AM Mark Selby wrote: > I am not sure that what I would like to do is even possible. I was hoping > there is someone out there who could chime in on this. > > > > We use Ceph RBD and Ceph FS somewhat extensively and are starting on our > RGW journey. > > > >

[ceph-users] aws-cli with RGW and cross tenant access

2022-03-08 Thread Mark Selby
We are starting to test out Ceph RGW and have run into a small issue with the aws-cli that amazon publishes. We have a set of developers who use the aws-cli heavily and it seems that this tool does not work with Ceph RGW tenancy. Given user = test01$test01 with bucket buck01 Given user =

[ceph-users] RGW STS AssumeRoleWithWebIdentity Multi-Tenancy

2022-03-08 Thread Mark Selby
I am not sure that what I would like to do is even possible. I was hoping there is someone out there who could chime in on this. We use Ceph RBD and Ceph FS somewhat extensively and are starting on our RGW journey. We have a couple of different groups that would like to be their own

[ceph-users] Re: OSD SLOW_OPS is filling MONs disk space

2022-03-08 Thread Gaël THEROND
Unexpectedly, everything disappeared and the cluster health went back to its previous state! I think I’ll never have a definitive answer ^^ I’ve been able to find out a really nice way to get the rbd stats/iotop on our prometheus using the mgr plugin too and it’s awesome as we can now better

[ceph-users] Re: "Incomplete" pg's

2022-03-08 Thread Kyriazis, George
Thanks Eugen, Yeah, unfortunately the OSDs have been replaced with new OSDs. Currently the cluster is under rebalancing. I was thinking that I would try the ''osd_find_best_info_ignore_history_les' trick after the cluster has calmed down and there is no extra traffic on the OSDs. Thing is

[ceph-users] Re: Ceph Pacific 16.2.7 dashboard doesn't work with Safari

2022-03-08 Thread Ernesto Puerta
Hi, This was already fixed in master/quincy, but the pacific backport was never completed (https://github.com/ceph/ceph/pull/45301). I just did that: https://github.com/ceph/ceph/pull/45301 (it should be there for 16.2.8). Kind Regards, Ernesto On Tue, Mar 8, 2022 at 3:55 PM Jozef Rebjak

[ceph-users] Re: Cephadm is stable or not in product?

2022-03-08 Thread David Orman
We use it without major issues, at this point. There are still flaws, but there are flaws in almost any deployment and management system, and this is not unique to cephadm. I agree with the general sentiment that you need to have some knowledge about containers, however. I don't think that's

[ceph-users] Re: 3 node CEPH PVE hyper-converged cluster serious fragmentation and performance loss in matter of days.

2022-03-08 Thread Sasa Glumac
Rados bench before deleting OSD's and recreating them + syncing with fragmentation 0.89 > T1 - wr,4M > Total time run 60.0405 > Total writes made 9997 > Write size 4194304 > Object size4194304 > Bandwidth (MB/sec) 666,017 > Stddev Bandwidth 24.1108 >

[ceph-users] 3 node CEPH PVE hyper-converged cluster serious fragmentation and performance loss in matter of days.

2022-03-08 Thread Sasa Glumac
> Where is the rados bench before and after your problem? Rados bench before deleting OSD's and recreating them + syncing with fragmentation 0.89 T1 - wr,4M T2 = ro,seq,4M T3 = ro,rand,4M > Total time run 60.0405 Total time run 250.486 Total time run > 600.463 > Total writes made

[ceph-users] Re: *****SPAM***** 3 node CEPH PVE hyper-converged cluster serious fragmentation and performance loss in matter of days.

2022-03-08 Thread Sasa Glumac
> Where is the rados bench before and after your problem? Rados bench before deleting OSD's and recreating them + syncing with fragmentation 0.89 T1 - wr,4M T2 = ro,seq,4M T3 = ro,rand,4M > Total time run 60.0405 Total time run 250.486 Total time run > 600.463 > Total writes made

[ceph-users] Re: *****SPAM***** 3 node CEPH PVE hyper-converged cluster serious fragmentation and performance loss in matter of days. (Marc)

2022-03-08 Thread Sasa Glumac
> Where is the rados bench before and after your problem? Rados bench before deleting OSD's and recreating them + syncing with fragmentation 0.89 T1 - wr,4M T2 = ro,seq,4M T3 = ro,rand,4M > Total time run 60.0405 Total time run 250.486 Total time run > 600.463 > Total writes made

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-08 Thread Francois Legrand
Hi, The last 2 osd I recreated were on december 30 and february 8. I totally agree that ssd cache are a terrible spof. I think that's an option if you use 1 ssd/nvme for 1 or 2 osd, but the cost is then very high. Using 1 ssd for 10 osd increase the risk for almost no gain because the ssd is

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-08 Thread Boris Behrens
Hi Francois, thanks for the reminder. We offline compacted all of the OSDs when we reinstalled the hosts with the new OS. But actually reinstalling them was never on my list. I could try that and in the same go I can remove all the cache SSDs (when one SSD share the cache for 10 OSDs this is a

[ceph-users] Re: Cephadm is stable or not in product?

2022-03-08 Thread Marc
> > Can't imagine there is no reason. Anyway I think there is a general > misconception that using containers would make it easier for users. > > ceph = learn linux sysadmin + learn ceph > cephadm = learn linux sysadmin + learn ceph + learn containers > Oh forgot ;) croit ceph = learn

[ceph-users] Re: Cephadm is stable or not in product?

2022-03-08 Thread Marc
> > We have an old ceph cluster, which is running fine without any problems > with cephadm and pacific (16.2.7) on Ubuntu (which was deployed without > using cephadm). > > Now, I am trying to setup one more cluster on CentOS Stream 8 with > cephadm, containers are killed or stopped for no

[ceph-users] Re: *****SPAM***** 3 node CEPH PVE hyper-converged cluster serious fragmentation and performance loss in matter of days.

2022-03-08 Thread Marc
> > VM don't do many writes and i migrated main testing VM's to 2TB pool which > in turns fragments faster. > > > Did a lot of tests and recreated pools and OSD's in many ways but in a > matter of days every time each OSD's gets severely fragmented and loses up > to 80% of write performance

[ceph-users] Re: Cephadm is stable or not in product?

2022-03-08 Thread Jay See
We have an old ceph cluster, which is running fine without any problems with cephadm and pacific (16.2.7) on Ubuntu (which was deployed without using cephadm). Now, I am trying to setup one more cluster on CentOS Stream 8 with cephadm, containers are killed or stopped for no reason. On Tue, Mar

[ceph-users] Re: Ceph Pacific 16.2.7 dashboard doesn't work with Safari

2022-03-08 Thread Ulrich Klein
Replying to my self :) It seems to be this function: replaceBraces(e) { ==> return e.replace(/(?<=\d)\s*-\s*(?=\d)/g, ".."). replace(/\(/g, "{"). replace(/\)/g, "}").

[ceph-users] 3 node CEPH PVE hyper-converged cluster serious fragmentation and performance loss in matter of days.

2022-03-08 Thread Sasa Glumac
Proxmox = 6.4-8 CEPH = 15.2.15 Nodes = 3 Network = 2x100G / node Disk = nvme Samsung PM-1733 MZWLJ3T8HBLS 4TB nvme Samsung PM-1733 MZWLJ1T9HBJR 2TB CPU = EPYC 7252 CEPH pools = 2 separate pools for each disk type and each disk spliced in 2 OSD's Replica = 3 VM don't do many

[ceph-users] Ceph Pacific 16.2.7 dashboard doesn't work with Safari

2022-03-08 Thread Ulrich Klein
Hi, I just upgraded a small test cluster on Raspberries from pacific 16.2.6 to 16.2.7. The upgrade went without major problems. But now the Ceph Dashboard doesn't work anymore in Safari. It complains about main..js "Line 3 invalid regular expression: invalid group specifier name". It works

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-08 Thread Francois Legrand
Hi, We also had this kind of problems after upgrading to octopus. Maybe you can play with the hearthbeat grace time ( https://docs.ceph.com/en/latest/rados/configuration/mon-osd-interaction/ ) to tell osds to wait a little more before declaring another osd down ! We also try to fix the problem

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-08 Thread Boris Behrens
Yes, this is something we know and we disabled it, because we ran into the problem that PGs went unavailable when two or more OSDs went offline. I am searching for the reason WHY this happens. Currently we have set the service file to restart=always and removed the StartLimitBurst from the

[ceph-users] Re: Cephadm is stable or not in product?

2022-03-08 Thread Zakhar Kirpichenko
Hi! I run cephadm-based 16.2.x cluster in production. It's been mostly fine, but not without quirks. Hope this helps. /Z On Tue, Mar 8, 2022 at 6:17 AM norman.kern wrote: > Dear Ceph folks, > > Anyone is using cephadm in product(Version: Pacific)? I found several bugs > on it and > I really

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-08 Thread Dan van der Ster
Here's the reason they exit: 7f1605dc9700 -1 osd.97 486896 _committed_osd_maps marked down 6 > osd_max_markdown_count 5 in last 600.00 seconds, shutting down If an osd flaps (marked down, then up) 6 times in 10 minutes, it exits. (This is a safety measure). It's normally caused by a network