[ceph-users] [solved] Re: OSD repeatedly marked down

2021-12-01 Thread Jan Kasprzak
Jan Kasprzak wrote: [...] : So I don't think my problem is OOM. It might be communication, : but I tried to tcpdump and look for example for ICMP port unreachable : messages, but nothing interesting there. D'oh. Wrong prefix length of public_network in ceph.conf, copied from the old

[ceph-users] Re: ceph-mgr constantly dying

2021-12-01 Thread Konstantin Shalygin
Hi, The fix was backported to 14.2.10 I suggest to upgrade your clusters to 14.2.22 k Sent from my iPhone > On 1 Dec 2021, at 19:56, Malte Stroem wrote: > > We have two clusters. Both use the same ceph version 14.2.8. Each cluster > hosts three ceph-mgrs. > > Only one and always the same

[ceph-users] Re: Is it normal for a orch osd rm drain to take so long?

2021-12-01 Thread David Orman
What's "ceph osd df" show? On Wed, Dec 1, 2021 at 2:20 PM Zach Heise (SSCC) wrote: > I wanted to swap out on existing OSD, preserve the number, and then remove > the HDD that had it (osd.14 in this case) and give the ID of 14 to a new > SSD that would be taking its place in the same node. First

[ceph-users] Re: 16.2.7 pacific QE validation status, RC1 available for testing

2021-12-01 Thread Neha Ojha
Hi Luis, On Wed, Dec 1, 2021 at 8:19 AM Luis Domingues wrote: > > We upgraded a test cluster (3 controllers + 6 osds nodes with HDD and SSDs > for rocksdb) from last Nautilus to this 16.2.7 RC1. > > Upgrade went well without issues. We repaired the OSDs and no one crashed. That's good to know!

[ceph-users] Re: OSD repeatedly marked down

2021-12-01 Thread Jan Kasprzak
Sebastian, Sebastian Knust wrote: : On 01.12.21 17:31, Jan Kasprzak wrote: : >In "ceph -s", they "2 osds down" : >message disappears, and the number of degraded objects steadily decreases. : >However, after some time the number of degraded objects starts going up : >and down again, and

[ceph-users] Re: Cephalocon 2022 is official!

2021-12-01 Thread Mike Perez
Hi everyone, We're near the deadline of December 10th for the Cephalocon CFP. So don't miss your chance to speak at this event either in-person or virtually. https://ceph.io/en/community/events/2022/cephalocon-portland/ If you're interested in sponsoring Cephalocon, the sponsorship prospectus

[ceph-users] Is it normal for a orch osd rm drain to take so long?

2021-12-01 Thread Zach Heise (SSCC)
I wanted to swap out on existing OSD, preserve the number, and then remove the HDD that had it (osd.14 in this case) and give the ID of 14 to a new SSD that would be taking its place in the same node. First time ever doing this, so not sure what to expect. I followed

[ceph-users] Re: OSD repeatedly marked down

2021-12-01 Thread Dan van der Ster
Hi, You should check the central ceph.log to understand why the osd is getting marked down to begin with. Is it a connectivity issue from peers to that OSD? It looks like you have osd logging disabled -- revert to defaults while you troubleshoot this. -- dan On Wed, Dec 1, 2021 at 5:31 PM Jan

[ceph-users] ceph-mgr constantly dying

2021-12-01 Thread Malte Stroem
Hello, one of our mgrs is constantly dying. Everything worked fine for a long time but now it happens every two weeks or so. We have two clusters. Both use the same ceph version 14.2.8. Each cluster hosts three ceph-mgrs. Only one and always the same ceph-mgr is dying on the same machine

[ceph-users] Re: OSD repeatedly marked down

2021-12-01 Thread Sebastian Knust
Hi Jan, On 01.12.21 17:31, Jan Kasprzak wrote: In "ceph -s", they "2 osds down" message disappears, and the number of degraded objects steadily decreases. However, after some time the number of degraded objects starts going up and down again, and osds appear to be down (and then up again).

[ceph-users] OSD repeatedly marked down

2021-12-01 Thread Jan Kasprzak
Hello, I am trying to upgrade my Ceph cluster (v15.2.15) from CentOS 7 to CentOS 8 stream. I upgraded monitors (a month or so ago), and now I want to upgrade OSDs: for now I upgraded one host with two OSDs: I kept the partitions where OSD data live (I have separate db on NVMe partition

[ceph-users] Re: Rocksdb: Corruption: missing start of fragmented record(1)

2021-12-01 Thread Frank Schilder
Hi Dan, I can try to find the thread and the link again. I should mention that my inbox is a mess and our search function on the outlook 365 app is, well, don't mention the war. Is there a "list by thread" option on the lists.ceph.io? I can go through threads for 2 years, but not all messages.

[ceph-users] Re: Ceph unresponsive on manager restart

2021-12-01 Thread Janne Johansson
Den ons 1 dec. 2021 kl 15:45 skrev Roman Steinhart : > Hi Janne, > That's not a typo :D , I really mean manager. That thing happens when I > restart the active ceph manager daemon or when It the active manager switches > on its own. My bad, I thought since you later mentioned monitor elections

[ceph-users] Re: Ceph unresponsive on manager restart

2021-12-01 Thread Janne Johansson
Den ons 1 dec. 2021 kl 13:43 skrev Roman Steinhart : > Hi all, > We're currently troubleshooting our Ceph cluster. > It appears that every time the active manager switches or restarts the > whole cluster becomes slow/unresponsive for a short period of time. > Everytime that happens we also see a

[ceph-users] Re: bluefs_allocator bitmap or hybrid

2021-12-01 Thread Igor Fedotov
Hi Samuel, On 12/1/2021 11:54 AM, huxia...@horebdata.cn wrote: Dear Cephers, We are running tons of Ceph clusters on Luminous with bluefs_allocator being bitmap, and when looking at Nautilus, 14.1.22, bluefs_allocator is now defaulting to hybrid. I am then wondering the follwoing: 1)

[ceph-users] [RGW] Too much index objects and OMAP keys on them

2021-12-01 Thread Gilles Mocellin
Hello, We see large omap objects warnings on the RGW bucket index pool. The objects OMAP keys are about objects in one identified big bucket. Context : = We use S3 storage for an application, with ~1,5 M objects. The production cluster is "replicated" with rclone cron jobs on another

[ceph-users] Re: Rocksdb: Corruption: missing start of fragmented record(1)

2021-12-01 Thread Dan van der Ster
Hi Frank, I'd be interested to read that paper, if you can find it again. I don't understand why the volatile cache + fsync might be dangerous due to a buggy firmware, but yet we should trust that a firmware respects FUA when the volatile cache is disabled. In

[ceph-users] Re: 16.2.7 pacific QE validation status, RC1 available for testing

2021-12-01 Thread Venky Shankar
On Mon, Nov 29, 2021 at 10:53 PM Yuri Weinstein wrote: > fs - Venky, Patrick fs approved - failures are known and have trackers. -- Cheers, Venky ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to

[ceph-users] bluefs_allocator bitmap or hybrid

2021-12-01 Thread huxia...@horebdata.cn
Dear Cephers, We are running tons of Ceph clusters on Luminous with bluefs_allocator being bitmap, and when looking at Nautilus, 14.1.22, bluefs_allocator is now defaulting to hybrid. I am then wondering the follwoing: 1) what will be the advantage for using hybrid instead of bitmap (which