[ceph-users] Stable erasure coding CRUSH rule for multiple hosts?

2023-01-17 Thread aschmitz
Hi folks, I have a small cluster of three Ceph hosts running on Pacific. I'm trying to balance resilience and disk usage, so I've set up a k=4 m=2 pool for some bulk storage on HDD devices. With the correct placement of PGs this should allow me to take any one host offline for maintenance.

[ceph-users] Ceph Community Infrastructure Outage

2023-01-17 Thread Mike Perez
Hi everyone, From November into January, we experienced a series of outages with the Ceph Community Infrastructure and its services: - Mailing lists - https://lists.ceph.io - Sepia (testing infrastructure) - https://wiki.sepia.ceph.com -

[ceph-users] Ceph User + Dev Monthly January Meetup

2023-01-17 Thread Neha Ojha
Hi everyone, This month's Ceph User + Dev Monthly meetup is on January 19, 15:00-16:00 UTC. There are some topics in the agenda regarding RGW backports, please feel free to add other topics to https://pad.ceph.com/p/ceph-user-dev-monthly-minutes. Hope to see you there! Thanks, Neha

[ceph-users] Re: 16.2.11 pacific QE validation status

2023-01-17 Thread Yuri Weinstein
OK I will rerun failed jobs filtering rhel in Thx! On Tue, Jan 17, 2023 at 10:43 AM Adam Kraitman wrote: > > Hey the satellite issue was fixed > > Thanks > > On Tue, Jan 17, 2023 at 7:43 PM Laura Flores wrote: >> >> This was my summary of rados failures. There was nothing new or amiss, >>

[ceph-users] Re: 16.2.11 pacific QE validation status

2023-01-17 Thread Adam Kraitman
Hey the satellite issue was fixed Thanks On Tue, Jan 17, 2023 at 7:43 PM Laura Flores wrote: > This was my summary of rados failures. There was nothing new or amiss, > although it is important to note that runs were done with filtering out > rhel 8. > > I will leave it to Neha for final

[ceph-users] Re: Dashboard access to CephFS snapshots

2023-01-17 Thread Manuel Holtgrewe
Hi Robert, maybe the dashboard uses libcephfs and you hit the following bug? https://tracker.ceph.com/issues/55313 It also occured to us in the cephfs snapshot mirror daemon. Best wishes, Manuel Robert Sander schrieb am Di., 17. Jan. 2023, 17:32: > Hi, > > The dashboard has a simple CephFS

[ceph-users] Re: 16.2.11 pacific QE validation status

2023-01-17 Thread Laura Flores
This was my summary of rados failures. There was nothing new or amiss, although it is important to note that runs were done with filtering out rhel 8. I will leave it to Neha for final approval. Failures: 1. https://tracker.ceph.com/issues/58258 2. https://tracker.ceph.com/issues/58146

[ceph-users] Ceph-ansible: add a new HDD to an already provisioned WAL device

2023-01-17 Thread Len Kimms
Hello all, we’ve set up a new Ceph cluster with a number of nodes which are all identically configured. There is one device vda which should act as WAL device for all other devices. Additionally, there are four other devices vdb, vdc, vdd, vde which use vda as WAL. The whole cluster was set up

[ceph-users] Re: large omap objects in the .rgw.log pool

2023-01-17 Thread Ramin Najjarbashi
I am experiencing the same problem with 'Large omap object' in Ceph, so I wrote a script to find the large objects in the pool, count the number of "omap" in each object, and compare it with the value set for large_omap_object_key_threshold in the Ceph configuration.

[ceph-users] Re: 16.2.11 pacific QE validation status

2023-01-17 Thread Yuri Weinstein
Ilya krbd had a bad link pasted (ran without -d ubuntu -D 20.04) This is a better run: https://pulpito.ceph.com/yuriw-2023-01-15_16:16:11-krbd-pacific_16.2.11_RC6.6-testing-default-smithi/ On Tue, Jan 17, 2023 at 9:11 AM Ilya Dryomov wrote: > > On Tue, Jan 17, 2023 at 4:46 PM Yuri Weinstein

[ceph-users] Re: 16.2.11 pacific QE validation status

2023-01-17 Thread Ilya Dryomov
On Tue, Jan 17, 2023 at 4:46 PM Yuri Weinstein wrote: > > Please see the test results on the rebased RC 6.6 in this comment: > > https://tracker.ceph.com/issues/58257#note-2 > > We're still having infrastructure issues making testing difficult. > Therefore all reruns were done excluding the rhel

[ceph-users] Dashboard access to CephFS snapshots

2023-01-17 Thread Robert Sander
Hi, The dashboard has a simple CephFS browser where we can set quota and snapshots for the directories. When a directory has the "other" permission bits unset, i.e. only access for user and group, the dashboard displays an error: Failed to execute CephFS opendir failed at /path/to/dir/.snap:

[ceph-users] Re: 16.2.11 pacific QE validation status

2023-01-17 Thread Yuri Weinstein
Please see the test results on the rebased RC 6.6 in this comment: https://tracker.ceph.com/issues/58257#note-2 We're still having infrastructure issues making testing difficult. Therefore all reruns were done excluding the rhel 8 distro ('--filter-out rhel_8') Also, the upgrades failed and

[ceph-users] Re: opnesuse rpm repos

2023-01-17 Thread Eugen Block
Hi, the last RPMs I'm aware of for openSUSE are for Pacific, and we get them from software.opensuse.org. We used Pacific for the switch to cephadm and containers in our openSUSE based ceph cluster. I don't see any opensuse directory underneath download.ceph.com, were they actually

[ceph-users] Re: bidirectional rbd-mirroring

2023-01-17 Thread Eugen Block
Hi, maybe you need to remove the peer first before readding it with a different config? At least that's how I interpret the code [1]. I haven't tried it myself though, so be careful and maybe test it first in a test environment. [1]

[ceph-users] Re: Mysterious HDD-Space Eating Issue

2023-01-17 Thread Janne Johansson
> Well, that's the thing: there are a whole bunch of ceph-guest-XX.log > files in /var/log/ceph/; most of them are empty, a handful are up to 250 > Kb in size, and this one () keeps on growing - and where not sure where > they're coming from (ie there's nothing that we can see in the conf files.

[ceph-users] Re: ._handle_peer_banner peer [v2:***,v1:***] is using msgr V1 protocol

2023-01-17 Thread Eugen Block
That's a good question and I don't have an answer for you, unfortunately. I just tried to reproduce it by disabling ms_bind_msgr2 for one OSD, but didn't find such log entries. Hopefully someone else has some more insights. Zitat von Frank Schilder : Hi Eugen, I have found these threads

[ceph-users] Re: MDS stuck in "up:replay"

2023-01-17 Thread Thomas Widhalm
Another new thing that just happened: One of the MDS just crashed out of nowhere. /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.5/rpm/el8/BUILD/ceph-17.2.5/src/mds/journal.cc: In function

[ceph-users] Re: ._handle_peer_banner peer [v2:***,v1:***] is using msgr V1 protocol

2023-01-17 Thread Frank Schilder
Hi Eugen, I have found these threads and am not entirely convinced that they apply to our situation. Most importantly, the IP addresses cannot be clients, because they are in the replication network, which clients don't have access to. The formulation of the message sounds a lot like a PG peer

[ceph-users] Re: MDS stuck in "up:replay"

2023-01-17 Thread Thomas Widhalm
Hi again, Another thing I found: Out of pure desperation, I started MDS on all nodes. I had them configured in the past so I was hoping, they could help with bringing in missing data even when they were down for quite a while now. I didn't see any changes in the logs but the CPU on the hosts

[ceph-users] Re: MDS stuck in "up:replay"

2023-01-17 Thread Thomas Widhalm
Hi, Thanks again. :-) Ok, that seems like an error to me. I never configured an extra rank for MDS. Maybe that's where my knowledge failed me but I guess, MDS is waiting for something that was never there. Yes, there are two filesystems. Due to "budget restrictions" (it's my personal system at

[ceph-users] Re: Filesystem is degraded, offline, mds daemon damaged

2023-01-17 Thread Eugen Block
Hi, your Ceph filesystem is damaged (or do you have multiple?), check the MDS logs to see why it is failing, and share that information here. Also please share 'ceph fs status'. Regards, Eugen Zitat von bpur...@kfpl.ca: I am really hoping you can help. THANKS in advance. I have

[ceph-users] Re: ._handle_peer_banner peer [v2:***,v1:***] is using msgr V1 protocol

2023-01-17 Thread Eugen Block
Hi, this sounds familiar, I believe this means that (some) of your clients are using msgr_v1. I assume that your MONs are configured to use both v1 and v2? I read somewhere that kernel clients might not support v2, but I'm not really sure. But there are a couple of threads discussing

[ceph-users] Re: ceph orch cannot refresh

2023-01-17 Thread Eugen Block
Hi, have you tried a mgr failover? 'ceph mgr fail' should do the trick, because restarting a mgr daemon won't fail it over. You should be able to see hints in the active mgr logs what is failing, e.g. cephadm logs --name mgr.. Zitat von Nicola Mori : Dear Ceph users, after a host

[ceph-users] Re: Mysterious HDD-Space Eating Issue

2023-01-17 Thread duluxoz
Hi Eneko, Well, that's the thing: there are a whole bunch of ceph-guest-XX.log files in /var/log/caeh/; most of them are empty, a handful are up to 250 Kb in size, and this one () keeps on growing - and where not sure where they're coming from (ie there's nothing that we can see in the conf

[ceph-users] Re: Mysterious HDD-Space Eating Issue

2023-01-17 Thread Eneko Lacunza
Hi, El 17/1/23 a las 8:12, duluxoz escribió: Thanks to Eneko Lacunza, E Taka, and Anthony D'Atri for replying - all that advice was really helpful. So, we finally tracked down our "disk eating monster" (sort of). We've got a "runaway" ceph-guest-NN that is filling up its log file

[ceph-users] Re: PG_BACKFILL_FULL

2023-01-17 Thread Stefan Kooman
On 1/17/23 08:39, Iztok Gregori wrote: Thank for your response and advice. On 16/01/23 15:17, Boris Behrens wrote: Hmm.. I ran into some similar issue. IMHO there are two ways to work around the problem until the new disk in place: 1. change the backfill full threshold (I use these commands: