[ceph-users] Re: RGW Dedicated clusters vs Shared (RBD, RGW) clusters

2021-07-08 Thread Konstantin Shalygin
What is your object size will be? 70T RAW is such small, I think is better add hardware to your RBD cluster and run object service here k Sent from my iPhone > On 8 Jul 2021, at 14:17, gustavo panizzo wrote: > > Hello > > I have some experience with RBD clusters (for use with KVM/libvirt)

[ceph-users] Re: [Suspicious newsletter] Issue with Nautilus upgrade from Luminous

2021-07-08 Thread Szabo, Istvan (Agoda)
I've just made update this week also but mine required jewel at least. Hadn't it notify about that before? Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com ---

[ceph-users] OSD refuses to start (OOMK) due to pg split

2021-07-08 Thread Tor Martin Ølberg
After an upgrade to 15.2.13 from 15.2.4 this small home lab cluster ran into issues with OSDs failing on all four hosts. This might be unrelated to the upgrade but it looks like the trigger has been an autoscaling event where the RBD PG pool has been scaled from 128 PGs to 512 PGs. Only some OSDs

[ceph-users] Re: NVME hosts added to the clusters and it made old ssd hosts flapping osds

2021-07-08 Thread Szabo, Istvan (Agoda)
Yes, you are right guys, the networking will be the issue in the private network with Jumbo, I can’t ping on the cluster network each other with Jumbo packets (not even simple 1462 …).Thank you guys. Istvan Szabo Senior Infrastructure Engineer ---

[ceph-users] Issue with Nautilus upgrade from Luminous

2021-07-08 Thread Suresh Rama
Dear All, We have 13 Ceph clusters and we started upgrading one by one from Luminous to Nautilus. Post upgrade started fixing the warning alerts and had issues setting "*ceph config set mon mon_crush_min_required_version firefly" *yielded no results. Updated the mon config and restart the daemon

[ceph-users] name alertmanager/node-exporter already in use with v16.2.5

2021-07-08 Thread Bryan Stillwell
I upgraded one of my clusters to v16.2.5 today and now I'm seeing these messages from 'ceph -W cephadm': 2021-07-08T22:01:55.356953+ mgr.excalibur.kuumco [ERR] Failed to apply alertmanager spec AlertManagerSpec({'placement': PlacementSpec(count=1), 'service_type': 'alertmanager', 'service_i

[ceph-users] Re: v16.2.5 Pacific released

2021-07-08 Thread David Galloway
Done! On 7/8/21 3:51 PM, Bryan Stillwell wrote: > There appears to be arm64 packages built for Ubuntu Bionic, but not for > Focal. Any chance Focal packages can be built as well? > > Thanks, > Bryan > >> On Jul 8, 2021, at 12:20 PM, David Galloway wrote: >> >> Caution: This email is from an e

[ceph-users] Re: v16.2.5 Pacific released

2021-07-08 Thread Bryan Stillwell
There appears to be arm64 packages built for Ubuntu Bionic, but not for Focal. Any chance Focal packages can be built as well? Thanks, Bryan > On Jul 8, 2021, at 12:20 PM, David Galloway wrote: > > Caution: This email is from an external sender. Please do not click links or > open attachment

[ceph-users] Re: v16.2.5 Pacific released

2021-07-08 Thread dgallowa
Sure, I can give it a shot now. (Will take a couple hours). Not sure why it wasn't being done in the first place. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] v16.2.5 Pacific released

2021-07-08 Thread David Galloway
We're happy to announce the 5th backport release in the Pacific series. We recommend users to update to this release. For a detailed release notes with links & changelog please refer to the official blog entry at https://ceph.io/en/news/blog/2021/v16-2-5-pacific-released Notable Changes --

[ceph-users] Re: NVME hosts added to the clusters and it made old ssd hosts flapping osds

2021-07-08 Thread Josh Baergen
Have you confirmed that all OSD hosts can see each other (on both the front and back networks if you use split networks)? If there's not full connectivity, then that can lead to the issues you see here. Checking the logs on the mons can be helpful, as it will usually indicate why a given OSD is be

[ceph-users] Re: Cephfs slow, not busy, but doing high traffic in the metadata pool

2021-07-08 Thread Dan van der Ster
Indeed that looks pretty idle to me. But if you have two active MDSs, then the load is probably caused by the MD balancing continuously migrating subdirs back and forth between each other in an effort to balance themselves -- we've seen this several times in the past and is why we use pinning. Each

[ceph-users] RocksDB resharding does not work

2021-07-08 Thread Robert Sander
Hi, I am trying to apply the resharding to a containerized OSD (16.2.4) as described here: https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#rocksdb-sharding # ceph osd set noout # ceph orch daemon stop osd.13 # cephadm shell --name osd.13 # ceph-bluestore-tool --path /v

[ceph-users] NVME hosts added to the clusters and it made old ssd hosts flapping osds

2021-07-08 Thread Szabo, Istvan (Agoda)
Hi, I've added 4 nvme hosts with 2osd/nvme to my cluster and it made al the ssd osds flapping I don't understand why. It is under the same root but 2 different device classes, nvme and ssd. The pools are on the ssd on the nvme nothing at the moment. The only way to bring back the ssd osds alive t

[ceph-users] RGW Dedicated clusters vs Shared (RBD, RGW) clusters

2021-07-08 Thread gustavo panizzo
Hello I have some experience with RBD clusters (for use with KVM/libvirt) but now I'm building my first cluster to use with RGW. The RGW cluster size will be around 70T RAW, current RBD cluster(s) are in similar (or smaller) size. I'll be deploying Octopus Since most of the tunning is pretty di

[ceph-users] Re: Fwd: ceph upgrade from luminous to nautils

2021-07-08 Thread M Ranga Swami Reddy
Thanks Marc. Means, we can upgrade from Luminous to Nautils and later can upgrade the OSDs from ceph-disk to ceph-volmue.. On Thu, Jul 8, 2021 at 5:45 PM Marc wrote: > I did the same upgrade from Luminous to Nautilus, and still have osd's > created with ceph-disk. I am slowly migrating to lvm an

[ceph-users] Wrong hostnames in "ceph mgr services" (Octopus)

2021-07-08 Thread Sebastian Knust
Hi, After upgrading from 15.2.8. to 15.2.13 with cephadm on CentOS 8 (containerised installation done by cephadm), Grafana no longer shows new data. Additionally, when accessing the Dashboard-URL on a host currently not hosting the dashboard, I am redirected to a wrong hostname (as shown in c

[ceph-users] list-type=2 requests

2021-07-08 Thread Szabo, Istvan (Agoda)
Hi, Is there anybody know about list-type=2 request? GET /bucket?list-type=2&max-keys=2 We faced yesterday the 2nd big objectstore cluster outage due to this request. 1 user made the cluster down totally. The normal ceph iostat read operation is below 30k, when they deployed their release it ju

[ceph-users] Re: Stuck MDSs behind in trimming

2021-07-08 Thread Zachary Ulissi
After some more digging, all three MDS enter state up:rejoin but don't move on from there when restarting. Also, MDS 0 (not the one with a trimming problem) consistently has mds.0.cache failed to open ino 0x101 err -116/0 mds.0.cache failed to open ino 0x102 err -116/0 in the log when restartin

[ceph-users] Re: RocksDB degradation / manual compaction vs. snaptrim operations choking Ceph to a halt

2021-07-08 Thread Igor Fedotov
Hi Christian, yeah, came to the same idea to trigger compaction on upgrade completetion yesterday. See https://github.com/ceph/ceph/pull/42218 Thanks, Igor On 7/8/2021 10:21 AM, Christian Rohmann wrote: Hey Igor, On 07/07/2021 14:59, Igor Fedotov wrote: after an upgrade from Ceph Nautilu

[ceph-users] Stuck MDSs behind in trimming

2021-07-08 Thread Zachary Ulissi
We're running a rook-ceph cluster that has gotten stuck in "1 MDSs behind on trimming". * 1 filesystem, three active MDS servers each with standby * Quite a few files (20M objects), daily snapshots. This might be a problem? * Ceph pacific 16.2.4 * `ceph health detail` doesn't provide much help (s

[ceph-users] Re: Cephfs slow, not busy, but doing high traffic in the metadata pool

2021-07-08 Thread Flemming Frandsen
This output seems typical for both active MDS servers: ---mds --mds_cache--- --mds_log-- -mds_mem- ---mds_server--- mds_ -objecter-- purg req rlat fwd inos caps exi imi |stry recy recd|subm evts segs repl|ino dn |hcr hcs hsr cre cat |ses

[ceph-users] Fwd: ceph upgrade from luminous to nautils

2021-07-08 Thread M Ranga Swami Reddy
-- Forwarded message - From: M Ranga Swami Reddy Date: Thu, Jul 8, 2021 at 2:30 PM Subject: ceph upgrade from luminous to nautils To: ceph-devel Dear All, I am using the Ceph with Luminous version with 2000+ OSDs. Planning to upgrade the ceph from Luminous to Nautils. Currently,

[ceph-users] Re: Upgrade tips from Luminous to Nautilus?

2021-07-08 Thread Mark Schouten
Hi, Op 15-05-2021 om 22:17 schreef Mark Schouten: Ok, so that helped for one of the MDS'es. Trying to deactivate another mds, it started to release inos and dns'es, until it was almost done. When it had a 50-ish left, a client started to complain and be blacklisted until I restarted the deactiva

[ceph-users] Re: list-type=2 requests

2021-07-08 Thread Konstantin Shalygin
This is default query in aws-sdk for a couple of years What is your Ceph version? k > On 8 Jul 2021, at 11:23, Szabo, Istvan (Agoda) wrote: > > Hi, > > Is there anybody know about list-type=2 request? > GET /bucket?list-type=2&max-keys=2 > > We faced yesterday the 2nd big objectstore clust

[ceph-users] Re: pgcalc tool removed (or moved?) from ceph.com ?

2021-07-08 Thread Christian Rohmann
On 08/07/2021 09:39, Dominik Csapak wrote: It's available at https://ceph.com/pgcalc/ just now (with cert not matching), but there apparently are people working on migrating the whole website * ceph.com redirects to https://old.ceph.com/ with matching Let's Encrypt certificate * but http

[ceph-users] Re: Cephfs slow, not busy, but doing high traffic in the metadata pool

2021-07-08 Thread Dan van der Ster
Hi, That's interesting -- yes on a lightly loaded cluster the metadata IO should be almost nil. You can debug what is happening using ceph daemonperf on the active MDS, e.g. https://pastebin.com/raw/n0iD8zXY (Use a wide terminal to show all the columns). Normally, lots of md io would indicate t

[ceph-users] Re: pgcalc tool removed (or moved?) from ceph.com ?

2021-07-08 Thread Dominik Csapak
On 7/8/21 09:32, Christian Rohmann wrote: Hey Dominik, On 05/07/2021 09:55, Dominik Csapak wrote: Hi, just wanted to ask if it is intentional that http://ceph.com/pgcalc/ results in a 404 error? is there any alternative url? it is still linked from the offical docs. It's available at http

[ceph-users] Cephfs slow, not busy, but doing high traffic in the metadata pool

2021-07-08 Thread Flemming Frandsen
We have a nautilus cluster where any metadata write operation is very slow. We're seeing very light load from clients, as reported by dumping ops in flight, often it's zero. We're also seeing about 100 MB/s writes to the metadata pool, constantly, for weeks on end, which seems excessive, as only

[ceph-users] Re: pgcalc tool removed (or moved?) from ceph.com ?

2021-07-08 Thread Christian Rohmann
Hey Dominik, On 05/07/2021 09:55, Dominik Csapak wrote: Hi, just wanted to ask if it is intentional that http://ceph.com/pgcalc/ results in a 404 error? is there any alternative url? it is still linked from the offical docs. It's available at https://ceph.com/pgcalc/ just now (with cert no

[ceph-users] Re: pgcalc tool removed (or moved?) from ceph.com ?

2021-07-08 Thread Dominik Csapak
On 7/5/21 09:55, Dominik Csapak wrote: Hi, just wanted to ask if it is intentional that http://ceph.com/pgcalc/ results in a 404 error? is there any alternative url? it is still linked from the offical docs. with kind regards Dominik ___ ceph-users

[ceph-users] Re: RocksDB degradation / manual compaction vs. snaptrim operations choking Ceph to a halt

2021-07-08 Thread Christian Rohmann
Hey Igor, On 07/07/2021 14:59, Igor Fedotov wrote: after an upgrade from Ceph Nautilus to Octopus we ran into extreme performance issues leading to an unusable cluster when doing a larger snapshot delete and the cluster doing snaptrims, see i.e. https://tracker.ceph.com/issues/50511#note-13. Si