[ceph-users] Re: Performance improvement suggestion

2024-02-21 Thread Peter Grandi
> 1. Write object A from client. > 2. Fsync to primary device completes. > 3. Ack to client. > 4. Writes sent to replicas. [...] As mentioned in the discussion this proposal is the opposite of what the current policy, is, which is to wait for all replicas to be written before writes are

[ceph-users] Re: Scrubbing?

2024-01-24 Thread Peter Grandi
> [...] After a few days, I have on our OSD nodes around 90MB/s > read and 70MB/s write while 'ceph -s' have client io as > 2,5MB/s read and 50MB/s write. [...] This is one of my pet-peeves: that a storage system must have capacity (principally IOPS) to handle both a maintenance workload and a

[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?

2024-01-15 Thread Peter Grandi
>> So we were going to replace a Ceph cluster with some hardware we had >> laying around using SATA HBAs but I was told that the only right way >> to build Ceph in 2023 is with direct attach NVMe. My impression are somewhat different: * Nowadays it is rather more difficult to find 2.5in SAS or

[ceph-users] Re: CephFS mirror very slow (maybe for small files?)

2023-11-13 Thread Peter Grandi
> the speed of data transfer is varying a lot over time (200KB/s > – 120MB/s). [...] The FS in question, has a lot of small files > in it and I suspect this is the cause of the variability – ie, > the transfer of many small files will be more impacted by > greater site-site latency. 200KB/s on

[ceph-users] Re: CEPH Cluster performance review

2023-11-12 Thread Peter Grandi
>>> during scrubbing, OSD latency spikes to 300-600 ms, >> I have seen Ceph clusters spike to several seconds per IO >> operation as they were designed for the same goals. >>> resulting in sluggish performance for all VMs. Additionally, >>> some OSDs fail during the scrubbing process. >> Most

[ceph-users] Re: CEPH Cluster performance review

2023-11-12 Thread Peter Grandi
> during scrubbing, OSD latency spikes to 300-600 ms, I have seen Ceph clusters spike to several seconds per IO operation as they were designed for the same goals. > resulting in sluggish performance for all VMs. Additionally, > some OSDs fail during the scrubbing process. Most likely they time

[ceph-users] Re: How do you handle large Ceph object storage cluster?

2023-10-19 Thread Peter Grandi
> [...] (>10k OSDs, >60 PB of data). 6TBs on average per OSD? Hopully SSDs or RAID10 (or low-number, 3-5) RAID5. > It is entirely dedicated to object storage with S3 interface. > Maintenance and its extension are getting more and more > problematic and time consuming. Ah the joys of a single

[ceph-users] Re: How to deal with increasing HDD sizes ? 1 OSD for 2 LVM-packed HDDs ?

2023-10-18 Thread Peter Grandi
> * Ceph cluster with old nodes having 6TB HDDs > * Add new node with new 12TB HDDs Halving IOPS-per-TB? https://www.sabi.co.uk/blog/17-one.html?170610#170610 https://www.sabi.co.uk/blog/15-one.html?150329#150329 > Is it supported/recommended to pack 2 6TB HDDs handled by 2 > old OSDs into 1

[ceph-users] Re: Time Estimation for cephfs-data-scan scan_links

2023-10-18 Thread Peter Grandi
[...] > What is being done is a serial tree walk and copy in 3 > replicas of all objects in the CephFS metadata pool, so it > depends on both the read and write IOPS rate for the metadata > pools, but mostly in the write IOPS. [...] Wild guess: > metadata is on 10x 3.84TB SSDs without persistent

[ceph-users] Re: Time Estimation for cephfs-data-scan scan_links

2023-10-13 Thread Peter Grandi
>> However, I've observed that the cephfs-data-scan scan_links step has >> been running for over 24 hours on 35 TB of data, which is replicated >> across 3 OSDs, resulting in more than 100 TB of raw data. What matters is the number of "inodes" (and secondarily their size), that is the number of

[ceph-users] Re: Time Estimation for cephfs-data-scan scan_links

2023-10-13 Thread Peter Grandi
>> However, I've observed that the cephfs-data-scan scan_links step has >> been running for over 24 hours on 35 TB of data, which is replicated >> across 3 OSDs, resulting in more than 100 TB of raw data. What matters is the number of "inodes" (and secondarily their size), that is the number of

[ceph-users] Re: Decrepit ceph cluster performance

2023-08-14 Thread Peter Grandi
> We recently started experimenting with Proxmox Backup Server, > which is really cool, but performs enough IO to basically lock > out the VM being backed up, leading to IO timeouts, leading to > user complaints. :-( The two most common things I have had to fix over years as to storage systems I

[ceph-users] Re: Workload that delete 100 M object daily via lifecycle

2023-07-18 Thread Peter Grandi
[...] S3 workload, that will need to delete 100M file daily [...] >> [...] average (what about peaks?) around 1,200 committed >> deletions per second (across the traditional 3 metadata >> OSDs) sustained, that may not leave a lot of time for file > creation, writing or reading. :-)[...]

[ceph-users] Re: Workload that delete 100 M object daily via lifecycle

2023-07-18 Thread Peter Grandi
>>> On Mon, 17 Jul 2023 19:19:34 +0700, Ha Nguyen Van >>> said: > [...] S3 workload, that will need to delete 100M file daily [...] So many people seem to think that distributed (or even local) filesystems (and in particular their metadata servers) can sustain the same workload as high volume

[ceph-users] Re: ls: cannot access '/cephfs': Stale file handle

2023-05-18 Thread Peter Grandi
>>> On Wed, 17 May 2023 16:52:28 -0500, Harry G Coin >>> said: > I have two autofs entries that mount the same cephfs file > system to two different mountpoints.  Accessing the first of > the two fails with 'stale file handle'.  The second works > normally. [...] Something pretty close to that

[ceph-users] Re: Deleting millions of objects

2023-05-18 Thread Peter Grandi
> [...] We have this slow and limited delete issue also. [...] That usually, apart from command list length limitations, happens because so many Ceph storage backends have too low committed IOPS (write, but not just) for mass metadata (and equivalently small data) operations, never mind for

[ceph-users] Re: Deep-scrub much slower than HDD speed

2023-04-27 Thread Peter Grandi
On a 38 TB cluster, if you scrub 8 MB/s on 10 disks (using only numbers already divided by replication factor), you need 55 days to scrub it once. That's 8x larger than the default scrub factor [...] Also, even if I set the default scrub interval to 8x larger, it my disks will still be thrashing

[ceph-users] Re: Deep-scrub much slower than HDD speed

2023-04-27 Thread Peter Grandi
> On a 38 TB cluster, if you scrub 8 MB/s on 10 disks (using only > numbers already divided by replication factor), you need 55 days > to scrub it once. > That's 8x larger than the default scrub factor [...] Also, even > if I set the default scrub interval to 8x larger, it my disks > will still

[ceph-users] Re: Deep-scrub much slower than HDD speed

2023-04-27 Thread Peter Grandi
> On a 38 TB cluster, if you scrub 8 MB/s on 10 disks (using only > numbers already divided by replication factor), you need 55 days > to scrub it once. > That's 8x larger than the default scrub factor [...] Also, even > if I set the default scrub interval to 8x larger, it my disks > will still