[ceph-users] Re: What's happening with ceph-users?

2023-01-10 Thread Neha Ojha
Re-adding the dev list and adding the user list because others might benefit from this information. Thanks, Neha On Tue, Jan 10, 2023 at 10:21 AM Wyll Ingersoll < wyllys.ingers...@keepertech.com> wrote: > Also, it was only my ceph-users account that was lost, dev account was > still active. >

[ceph-users] Re: Serious cluster issue - Incomplete PGs

2023-01-10 Thread Deep Dish
Eugen, I never insinuated my circumstance is resultant from buggy software, and acknowledged operational missteps. Let's please leave that there. Ceph remains a technology I like and will continue to use. Our operational understanding has evolved greatly as a result of current circumstances.

[ceph-users] Re: 2 pgs backfill_toofull but plenty of space

2023-01-10 Thread Fox, Kevin M
What else is going on? (ceph -s). If there is a lot of data being shuffled around, it may just be because its waiting for some other actions to complete first. Thanks, Kevin From: Torkil Svensgaard Sent: Tuesday, January 10, 2023 2:36 AM To:

[ceph-users] adding OSD to orchestrated system, ignoring osd service spec.

2023-01-10 Thread Wyll Ingersoll
When adding a new OSD to a ceph orchestrated system (16.2.9) on a storage node that has a specification profile that dictates which devices to use as the db_devices (SSDs), the newly added OSDs seem to be ignoring the db_devices (there are several available) and putting the data and db/wal on

[ceph-users] Removing OSDs - draining but never completes.

2023-01-10 Thread Wyll Ingersoll
Running ceph-pacific 16.2.9 using ceph orchestrator. We made a mistake adding a disk to the cluster and immediately issued a command to remove it using "ceph orch osd rm ### --replace --force". This OSD had no data on it at the time and was removed after just a few minutes. "ceph orch osd rm

[ceph-users] Re: OSD crash on Onode::put

2023-01-10 Thread Anthony D'Atri
Could this be a temporal co-incidence? E.g. each host got a different model drive in slot 19 via an incremental expansion. > On Jan 10, 2023, at 05:27, Frank Schilder wrote: > > Following up on my previous post, we have identical OSD hosts. The very > strange observation now is, that all

[ceph-users] Re: docs.ceph.com -- Do you use the header navigation bar? (RESPONSES REQUESTED)

2023-01-10 Thread John Zachary Dover
Everyone, I have been able to move the text using the "scroll-top-margin" parameter in custom.css. This means that the top bar no longer gets in the way (which is likely why John was unable to replicate the issue). Here is the pull request that addresses this issue:

[ceph-users] Re: OSD crash on Onode::put

2023-01-10 Thread Serkan Çoban
Slot 19 is inside the chassis? Do you check chassis temperature? I sometimes have more failure rate in chassis HDDs than in front of the chassis. In our case it was related to the temperature difference. On Tue, Jan 10, 2023 at 1:28 PM Frank Schilder wrote: > > Following up on my previous post,

[ceph-users] Re: rbd-mirror stops replaying journal on primary cluster

2023-01-10 Thread Josef Johansson
Hi, Actually, the test case was even more simple than that. A misaligned discard (discard_granularity_bytes=4096, offset=0, length=4096+512) made the journal stop replaying entries. This is now well covered in tests and example e2e-tests. The workaround is quite easy, set

[ceph-users] 2 pgs backfill_toofull but plenty of space

2023-01-10 Thread Torkil Svensgaard
Hi Ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy (stable) Looking at this: " Low space hindering backfill (add storage if this doesn't resolve itself): 2 pgs backfill_toofull " " [WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this doesn't

[ceph-users] Re: OSD crash on Onode::put

2023-01-10 Thread Frank Schilder
Following up on my previous post, we have identical OSD hosts. The very strange observation now is, that all outlier OSDs are in exactly the same disk slot on these hosts. We have 5 problematic OSDs and they are all in slot 19 on 5 different hosts. This is an extremely strange and unlikely

[ceph-users] Re: OSD crash on Onode::put

2023-01-10 Thread Frank Schilder
Hi Dongdong and Igor, thanks for pointing to this issue. I guess if its a memory leak issue (well, cache pool trim issue), checking for some indicator and an OSD restart should be a work-around? Dongdong promised a work-around but talks only about a patch (fix). Looking at the tracker items,

[ceph-users] Octopus RGW large omaps in usage

2023-01-10 Thread Boris Behrens
Hi, I am currently trying to figure out how to resolve the "large objects found in pool 'rgw.usage'" error. In the past I trimmed the usage log, but now I am at the point that I need to trim it down to two weeks. I checked and amount of omapkeys and the distribution is quite off: # for OBJECT

[ceph-users] Re: Serious cluster issue - Incomplete PGs

2023-01-10 Thread Eugen Block
Hi, Backups will be challenging. I honestly didn't anticipate this kind of failure with ceph to be possible, we've been using it for several years now and were encouraged by orchestrator and performance improvements in the 17 code branch. that's exactly what a backup is for, to be prepared