[ceph-users] Re: Recently deployed cluster showing 9Tb of raw usage without any load deployed

2023-04-03 Thread Anthony D'Atri
Any chance you ran `rados bench` but didn’t fully clean up afterward? > On Apr 3, 2023, at 9:25 PM, Work Ceph > wrote: > > Hello guys! > > > We noticed an unexpected situation. In a recently deployed Ceph cluster we > are seeing a raw usage, that is a bit odd. We have the following setup: >

[ceph-users] Re: Recently deployed cluster showing 9Tb of raw usage without any load deployed

2023-04-03 Thread Work Ceph
To add more information, in case that helps: ``` # ceph -s cluster: id: health: HEALTH_OK task status: data: pools: 6 pools, 161 pgs objects: 223 objects, 7.0 KiB usage: 9.3 TiB used, 364 TiB / 373 TiB avail pgs: 161 active+clean # ceph df ---

[ceph-users] Recently deployed cluster showing 9Tb of raw usage without any load deployed

2023-04-03 Thread Work Ceph
Hello guys! We noticed an unexpected situation. In a recently deployed Ceph cluster we are seeing a raw usage, that is a bit odd. We have the following setup: We have a new cluster with 5 nodes with the following setup: - 128 GB of RAM - 2 cpus Intel(R) Intel Xeon Silver 4210R - 1

[ceph-users] Read and write performance on distributed filesystem

2023-04-03 Thread David Cunningham
Hello, We are considering CephFS as an alternative to GlusterFS, and have some questions about performance. Is anyone able to advise us please? This would be for file systems between 100GB and 2TB in size, average file size around 5MB, and a mixture of reads and writes. I may not be using the

[ceph-users] Re: quincy v17.2.6 QE Validation status

2023-04-03 Thread Yuri Weinstein
Josh, the release is ready for your review and approval. Adam, can you please update the LRC upgrade to 17.2.6 RC? Thx On Wed, Mar 29, 2023 at 3:07 PM Yuri Weinstein wrote: > The release has been approved. > > And the gibba cluster upgraded. > > We are awaiting the LRC upgrade and then/or in

[ceph-users] Re: Crushmap rule for multi-datacenter erasure coding

2023-04-03 Thread Anthony D'Atri
Mark Nelson's space amp sheet visualizes this really well. A nuance here is that Ceph always writes a full stripe, so with a 9,6 profile, on conventional media, a minimum of 15x4KB=20KB underlying storage will be consumed, even for a 1KB object. A 22 KB object would similarly tie up ~18KB of

[ceph-users] Re: Crushmap rule for multi-datacenter erasure coding

2023-04-03 Thread Michel Jouvin
Hi Frank, Thanks for this detailed answer. About your point of 4+2 or similar schemes defeating the purpose of a 3-datacenter configuration, you're right in principle. In our case, the goal is to avoid any impact for replicated pools (in particular RBD for the cloud) but it may be acceptable

[ceph-users] Re: Crushmap rule for multi-datacenter erasure coding

2023-04-03 Thread Frank Schilder
Hi Michel, failure domain = datacenter doesn't work, because crush wants to put 1 shard per failure domain and you have 3 data centers and not 6. The modified crush rule you wrote should work. I believe equally well with x=0 or 2 -- but try it out before doing anything to your cluster. The

[ceph-users] Re: Misplaced objects greater than 100%

2023-04-03 Thread Johan Hattne
Thanks Mehmet; I took a closer look at what I sent you and the problem appears to be in the CRUSH map. At some point since anything was last rebooted, I created rack buckets and moved the OSD nodes in under them: # ceph osd crush add-bucket rack-0 rack # ceph osd crush add-bucket rack-1

[ceph-users] Re: compiling Nautilus for el9

2023-04-03 Thread Marc
I am building with a centos9 stream container currently. I have been adding some rpms that were missing and not in the dependencies. Currently with these cmake options, these binaries are not build. Anyone an idea what this could be. cmake .. -DCMAKE_INSTALL_PREFIX=/usr

[ceph-users] Crushmap rule for multi-datacenter erasure coding

2023-04-03 Thread Michel Jouvin
Hi, We have a 3-site Ceph cluster and would like to create a 4+2 EC pool with 2 chunks per datacenter, to maximise the resilience in case of 1 datacenter being down. I have not found a way to create an EC profile with this 2-level allocation strategy. I created an EC profile with a failure

[ceph-users] Re: How mClock profile calculation works, and IOPS

2023-04-03 Thread Sridhar Seshasayee
Responses inline. I have a last question. Why is the bench performed using writes of 4 KiB. > Is any reason to choose that over another another value? > > Yes, the mClock scheduler considers this as a baseline in order to estimate costs for operations involving other block sizes. This is again an

[ceph-users] Re: How mClock profile calculation works, and IOPS

2023-04-03 Thread Luis Domingues
Hi, Thanks a lot for the information. I have a last question. Why is the bench performed using writes of 4 KiB. Is any reason to choose that over another another value? On my lab, I tested with various values, and I have mainly two type of disks. Some Seagates and Toshiba. If I do bench with

[ceph-users] Re: How mClock profile calculation works, and IOPS

2023-04-03 Thread Sridhar Seshasayee
Why was it done that way? I do not understand the reason why distributing > the IOPS accross different disks, when the measurement we have is for one > disk alone. This means with default parameters we will always be far from > reaching OSD limit right? > > It's not on different disks. We

[ceph-users] Re: How mClock profile calculation works, and IOPS

2023-04-03 Thread Luis Domingues
Hi Sridhar Thanks for the information. > > The above values are a result of distributing the IOPS across all the OSD > shards as defined by the > osd_op_num_shards_[hdd|ssd] option. For HDDs, this is set to 5 and > therefore the IOPS will be > distributed across the 5 shards (i.e. for e.g.,

[ceph-users] Re: How mClock profile calculation works, and IOPS

2023-04-03 Thread Sridhar Seshasayee
Hi Luis, I am reading reading some documentation about mClock and have two questions. > > First, about the IOPS. Are those IOPS disk IOPS or other kind of IOPS? And > what the assumption of those? (Like block size, sequential or random > reads/writes)? > This is the result of testing running

[ceph-users] Re: Eccessive occupation of small OSDs

2023-04-03 Thread Nicola Mori
Hi Christian, I understand what you say, but in my understanding a small capacity OSD should be properly weighted so that fewer PGs are allocated on it and then the bulk of the data should reside on other, bigger OSDs. In my specific case I also have more hosts (10) than shards (8), so a

[ceph-users] Re: monitoring apply_latency / commit_latency ?

2023-04-03 Thread Konstantin Shalygin
 Hi, > On 2 Apr 2023, at 23:14, Matthias Ferdinand wrote: > > I understand that grafana graphs are generated from prometheus metrics. > I just wanted to know which OSD daemon-perf values feed these prometheus > metrics (or if they are generated in some other way). Yep, this perf metrics is