[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-06 Thread Frank Schilder
r reconnected. This is in our experience very unusual behaviour. Was there a change or are we looking at a potential bug here? Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Igor Fedotov Sent: 06 October 2022 17:03 T

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-06 Thread Frank Schilder
in reason for our observations. Thanks for your help, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Igor Fedotov Sent: 06 October 2022 14:39 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] OSD crashes

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-06 Thread Frank Schilder
ope the stuck bstore_kv_sync thread does not lead to rocksdb corruption. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Igor Fedotov Sent: 06 October 2022 14:26 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-us

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-06 Thread Frank Schilder
= Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Igor Fedotov Sent: 06 October 2022 13:45:17 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] OSD crashes during upgrade mimic->octopus From your response to Stefan I

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-06 Thread Frank Schilder
properly on upgraded but unconverted OSDs? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Stefan Kooman Sent: 06 October 2022 13:27 To: ceph-users@ceph.io; Frank Schilder Subject: Re: [ceph-users] OSD crashes during

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-06 Thread Frank Schilder
again and service is back, I can add the setting osd_compact_on_start=true and start rebooting servers. Right now I need to prevent the ship from sinking. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Igor

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-06 Thread Frank Schilder
ate a way to make the OSDs more crash tolerant until I have full redundancy again. Is there a setting that increases the OPS timeout or is there a way to restrict the load to tolerable levels? Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum

[ceph-users] OSD crashes during upgrade mimic->octopus

2022-10-06 Thread Frank Schilder
will spell doom. I'm also afraid that peering load is one of the factors and am very reluctant to reboot hosts to clear D-state processes. I really don't want to play this whack-a-mole game. Thanks for your help and best regards, ===== Frank Schilder AIT Risø Campus

[ceph-users] Re: weird performance issue on ceph

2022-09-26 Thread Frank Schilder
in contrast to peak load (which most cheap drives are optimized for and therefore less suitable for a constant-load system like ceph)? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Mark Nelson Sent: 26 September 20

[ceph-users] Re: cephfs: num_stray growing without bounds (octopus)

2022-08-09 Thread Frank Schilder
eem to be unable to find these tools. They are not installed in quay.io/ceph/ceph:v15.2.16 and I seem to be unable to figure out which repo/package provides these tools. Can you help me out here? Thanks and best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum

[ceph-users] Re: Stretch cluster questions

2022-05-17 Thread Frank Schilder
eople use these instead of something crazy like REP 6(4). Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Gregory Farnum Sent: 17 May 2022 00:56:42 To: Frank Schilder Cc: Eneko Lacunza; ceph-users Subject: Re: [cep

[ceph-users] MDS stuck in stopping state

2021-12-13 Thread Frank Schilder
itiated" }, { "time": "2021-12-13 13:31:50.634857", "event": "header_read" }, { "time": "2021-12-13

[ceph-users] Re: Rocksdb: Corruption: missing start of fragmented record(1)

2021-12-01 Thread Frank Schilder
el datacenter hardware, they need to simulate performance with cheap components. I have never seen an enterprise SAS drive with write cache enabled. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan van der S

[ceph-users] Re: ceph-osd iodepth for high-performance SSD OSDs

2021-10-26 Thread Frank Schilder
per active bstore_kv_sync thread (meaning: per OSD daemon), which more or less matches the aggregated performance I see. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Stefan Kooman Sent: 26 October

[ceph-users] Re: ceph-osd iodepth for high-performance SSD OSDs

2021-10-26 Thread Frank Schilder
of ceph's latency on aggregated performance. For PCIe NVMe drives I would expect the bstore_kv_sync thread to be CPU bound. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Szabo, Istvan (Agoda) Sent: 26 October

[ceph-users] Re: ceph-osd iodepth for high-performance SSD OSDs

2021-10-26 Thread Frank Schilder
? Thanks! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 26 October 2021 09:41:44 To: ceph-users Subject: [ceph-users] ceph-osd iodepth for high-performance SSD OSDs Hi all, we deployed a pool with high

[ceph-users] ceph-osd iodepth for high-performance SSD OSDs

2021-10-26 Thread Frank Schilder
f? Could I set, for example ceph config set osd/class:rbd_perf osd_op_num_threads_per_shard 4 to increase concurrency on this particular device class only? Is it possible to increase the number of shards at run-time? Thanks for your help! Best regards, = Frank Schilder AIT Risø C

[ceph-users] ceph full-object read crc != expected on xxx:head

2021-10-12 Thread Frank Schilder
references I could find the error message contains the PG. The above doesn't. There is no additional information in the OSD log of 335. The above read error did not create a health warn/error state. Is this error automatically fixed? Best regards, = Frank Schilder AIT Risø Campus

[ceph-users] Re: Why set osd flag to noout during upgrade ?

2021-09-22 Thread Frank Schilder
. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Etienne Menguy Sent: 22 September 2021 12:17:39 To: ceph-users Subject: [ceph-users] Re: Why set osd flag to noout during upgrade ? Hello, >From my experience

[ceph-users] Re: ceph fs service outage: currently failed to authpin, subtree is being exported

2021-09-19 Thread Frank Schilder
a good Sunday, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan van der Ster Sent: 19 September 2021 14:01 To: Frank Schilder Cc: ceph-users Subject: Re: [ceph-users] ceph fs service outage: currently failed to authpin, subtree

[ceph-users] Re: ceph fs service outage: currently failed to authpin, subtree is being exported

2021-09-19 Thread Frank Schilder
regarding the 2 issues above. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Frank Schilder Sent: 19 September 2021 10:11:27 To: ceph-users Subject: [ceph-users] ceph fs service outage: currently failed to authpin

[ceph-users] ceph fs service outage: currently failed to authpin, subtree is being exported

2021-09-19 Thread Frank Schilder
mimic (stable) Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Health check failed: 1 pools ful

2021-09-16 Thread Frank Schilder
the data pool definition in the system data store). This leads to an insane temporary usage in the meta data pool. Wanted to report this bug a long time ago. Thanks to everyone who replied. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: Health check failed: 1 pools ful

2021-09-15 Thread Frank Schilder
are running snapshot rotation on RBD images. Could this have anything to do with it? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 13 September 2021 12:20 To: ceph-users Subject: [ceph-users] Health

[ceph-users] Re: ceph fs re-export with or without NFS async option

2021-09-13 Thread Frank Schilder
from nfsd to the ceph fs client? Or any other ideas what might cause the slow IO/high io wait? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Jeff Layton Sent: 09 September 2021 12:30:30 To: Frank Schilder; ceph

[ceph-users] Health check failed: 1 pools ful

2021-09-13 Thread Frank Schilder
on or if this could be a problem? Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS daemons stuck in resolve, please help

2021-09-13 Thread Frank Schilder
Hi Dan and Patrick, I created a tracker item for the snapshot issue: https://tracker.ceph.com/issues/52581 Patrick, could you please take a quick look at it? Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: Data loss on appends, prod outage

2021-09-08 Thread Frank Schilder
Hi Nathan, thanks for the update. This seems to be a different and worse instance than the centos 7 case. We are using Centos 8 Stream for a few clients. I will check if they are affected. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: Data loss on appends, prod outage

2021-09-08 Thread Frank Schilder
Can you make the devs aware of the regression? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Nathan Fish Sent: 08 September 2021 19:33 To: ceph-users Subject: [ceph-users] Re: Data loss on appends, prod outage

[ceph-users] ceph fs re-export with or without NFS async option

2021-09-08 Thread Frank Schilder
, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Data loss on appends, prod outage

2021-09-07 Thread Frank Schilder
ph (although a lot more pronounced compared with other distributed file systems), so a useful lesson to learn in any case I would say. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Nathan Fish Sent: 07 Septembe

[ceph-users] Re: Data loss on appends, prod outage

2021-09-07 Thread Frank Schilder
client, because the coordination of meta data updates and write locks between clients is unreasonably expensive. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Nathan Fish Sent: 07 September 2021 21:17:05

[ceph-users] Re: MDS daemons stuck in resolve, please help

2021-09-07 Thread Frank Schilder
bug that occurs by changing directory layouts while snapshots are present on a system? The 11 extra snapshots seem to cause severe performance issues. I would be most grateful for any advice how to get rid or them. The corresponding fs snapshots have been deleted at least a week ago. Many thanks an

[ceph-users] Re: MDS daemons stuck in resolve, please help

2021-09-07 Thread Frank Schilder
window and try it out without load on it. Or upgrade first :) > ... I've cc'd Patrick. Thanks a lot! It would be really good if we could resolve the mystery of extra snapshots in pool con-fs2-data2. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109

[ceph-users] Re: Kworker 100% with ceph-msgr (after upgrade to 14.2.6?)

2021-09-07 Thread Frank Schilder
present, N snapshots means ~N times slower. I'm testing this on kernel version 5.9.9-1.el7.elrepo.x86_64. It is even worse on older kernels. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Marc Roos Sent: 16

[ceph-users] Re: MDS daemons stuck in resolve, please help

2021-09-07 Thread Frank Schilder
degradation. Maybe you could point one of the ceph fs devs to this problem? Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan van der Ster Sent: 06 September 2021 11:33 To: Frank Schilder Cc: ceph-users

[ceph-users] Re: MDS daemons stuck in resolve, please help

2021-09-06 Thread Frank Schilder
snapshots don't disappear and/or what might have happened to our MDS daemons performance wise. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 31 August 2021 16:23:15 To: Dan van der

[ceph-users] Re: nautilus cluster down by loss of 2 mons

2021-08-31 Thread Frank Schilder
/latest/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds -- check your version!). Hope you can do without taking the cluster down. Good luck and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Marc Sent

[ceph-users] Re: LARGE_OMAP_OBJECTS: any proper action possible?

2021-08-31 Thread Frank Schilder
, the first 4 are probably purged from the snapshots. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan van der Ster Sent: 31 August 2021 15:44:41 To: Frank Schilder Cc: Patrick Donnelly; ceph-users

[ceph-users] Re: MDS daemons stuck in resolve, please help

2021-08-31 Thread Frank Schilder
that. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan van der Ster Sent: 31 August 2021 15:26:17 To: Frank Schilder Cc: ceph-users Subject: Re: [ceph-users] Re: MDS daemons stuck in resolve, please help

[ceph-users] Re: LARGE_OMAP_OBJECTS: any proper action possible?

2021-08-31 Thread Frank Schilder
Apparently, the meta data performance is now so high that a single client can crash an MDS daemon and even take the MDS cluster with it. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 30 August 2021

[ceph-users] Re: MDS daemons stuck in resolve, please help

2021-08-31 Thread Frank Schilder
mds_recall_max_decay_threshold and lower mds_recall_max_decay_rate increase speed of caps recall? What increments would be safe to use? For example, is it really a good idea to go from 16384 to the new default 131072 in one go? Thanks for any advice and best regards, = Frank Schilder AIT Risø Campus

[ceph-users] Re: MDS daemons stuck in resolve, please help

2021-08-30 Thread Frank Schilder
st regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Frank Schilder Sent: 30 August 2021 21:12:53 To: ceph-users Subject: [ceph-users] MDS daemons stuck in resolve, please help Hi all, our MDS cluster got degraded after

[ceph-users] MDS daemons stuck in resolve, please help

2021-08-30 Thread Frank Schilder
| +-+ | ceph-12 | | ceph-08 | | ceph-23 | | ceph-11 | +-+ I tried to set max_mds to 1 to no avail. How can I get the MDS daemons back up? Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: LARGE_OMAP_OBJECTS: any proper action possible?

2021-08-30 Thread Frank Schilder
which is preferred. It would be great if you could help me find the original path so I can identify the user and advice him/her on how to organise his/her files. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: LARGE_OMAP_OBJECTS: any proper action possible?

2021-08-26 Thread Frank Schilder
f. One of the advantages of ceph is its unlimited inode capacity and it seems to cope with the usage pattern reasonably well - modulo the problems I seem to observe here. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: D

[ceph-users] Re: How to slow down PG recovery when a failed OSD node come back?

2021-08-26 Thread Frank Schilder
might want to look through the logs if there are other problems, for example, with peering taking very long or other OSDs being marked as down temporarily (the classic "a monitor marked me down but I'm still running"). Could be network or CPU bottlenecks. Best regards, =

[ceph-users] Re: LARGE_OMAP_OBJECTS: any proper action possible?

2021-08-25 Thread Frank Schilder
ase the MDS beacon timeout to get out of an MDS restart loop (it also had oversized cache by the time I discovered the problem). The dirfrag was reported as a slow op warning. Thanks and best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum

[ceph-users] Re: LARGE_OMAP_OBJECTS: any proper action possible?

2021-08-25 Thread Frank Schilder
: 12.98b194d0 (12.50) Key count: 657212 Size (bytes): 308889640 They are all in the fs meta-data pool. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan van der Ster Sent: 25 August 2021 13:57:44 To: Frank Schilder Cc

[ceph-users] LARGE_OMAP_OBJECTS: any proper action possible?

2021-08-25 Thread Frank Schilder
If I can't do anything anyway, why the warning? If there is a warning, I would assume that one can do something proper to prevent large omap objects from being born by an MDS. What is it? Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 __

[ceph-users] Re: PGs stuck after replacing OSDs

2021-08-17 Thread Frank Schilder
Maybe an instance of https://tracker.ceph.com/issues/46847 ? Nest time you see this problem, you can try the new "repeer" command on affected PGs. The "ceph pg x.y query" as mentioned by Etienne will provide a clue if its due to this bug. Best regards, ===== Fra

[ceph-users] Re: Very slow I/O during rebalance - options to tune?

2021-08-12 Thread Frank Schilder
(not the documentation)? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Steven Pine Sent: 12 August 2021 17:45:48 To: Peter Lieven Cc: Frank Schilder; Nico Schottelius; Ceph Users Subject: Re: [ceph-users] Re: Very slow I/O during

[ceph-users] Re: Very slow I/O during rebalance - options to tune?

2021-08-12 Thread Frank Schilder
unreadable. Then you can deploy new OSDs on the same hosts with these IDs and data will move back with minimal movement between other OSDs again. The manual deployment commands accept OSD IDs as an optional argument for this reason. Best regards, ===== Frank Schilder AIT Risø Campu

[ceph-users] Re: Very slow I/O during rebalance - options to tune?

2021-08-11 Thread Frank Schilder
usually don't notice anything. I'm running mimic 13.2.10 though. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Nico Schottelius Sent: 11 August 2021 10:08:34 To: Ceph Users Subject: [ceph-users] Very slow I/O during

[ceph-users] Re: Fwd: Re: Issues with Ceph network redundancy using L2 MC-LAG

2021-07-22 Thread Frank Schilder
e success. It helps with failing transceivers/ports. Typically, the failing link is suppressed before users start creating support tickets. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Andrew Walker-Brown Sent

[ceph-users] Re: ceph fs mv does copy, not move

2021-06-25 Thread Frank Schilder
Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Marc Sent: 25 June 2021 10:21:16 To: Frank Schilder; Patrick Donnelly Cc: ceph-users@ceph.io Subject: RE: [ceph-users] Re: ceph fs mv does copy, not move Adding to this. I can remember that I was sur

[ceph-users] Re: ceph fs mv does copy, not move

2021-06-24 Thread Frank Schilder
ers don't have influence on. I don't see this becoming a relevant alternative to a parallel file system any-time soon. Sorry. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Stefan Kooman Sent: 24 June 2021 20:01

[ceph-users] Re: ceph fs mv does copy, not move

2021-06-24 Thread Frank Schilder
xpire before changes on one client become visible on another (unless direct_io is used for all IO) is perfectly acceptable for us given the potential performance gain due to simpler client-MDS communication. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: ceph fs mv does copy, not move

2021-06-22 Thread Frank Schilder
I don't think so. It is exactly the same location in all tests and it is reproducible. Why would a move be a copy on some MDSs/OSDs but not others? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Marc Sent: 22

[ceph-users] Re: ceph fs mv does copy, not move

2021-06-22 Thread Frank Schilder
sys 0m0.001s = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 22 June 2021 11:15:06 To: ceph-users@ceph.io Subject: [ceph-users] ceph fs mv does copy, not move Dear all, some time ago I reported

[ceph-users] Re: ceph fs mv does copy, not move

2021-06-22 Thread Frank Schilder
, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 22 June 2021 11:15:06 To: ceph-users@ceph.io Subject: [ceph-users] ceph fs mv does copy, not move Dear all, some time ago I reported that the kernel client resorts to a copy

[ceph-users] ceph fs mv does copy, not move

2021-06-22 Thread Frank Schilder
, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: OSD lost: firmware bug in Kingston SSDs?

2021-05-13 Thread Frank Schilder
. The test was done under production load. Looks like the OSD crash I observed was caused by special and hopefully rare circumstances. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 06 May 2021

[ceph-users] Re: Which EC-code for 6 servers?

2021-05-11 Thread Frank Schilder
hosts. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Szabo, Istvan (Agoda) Sent: 10 May 2021 10:35:34 To: ceph-users Subject: [ceph-users] Which EC-code for 6 servers? Hi, Thinking to have 2:2 so I can tolerate

[ceph-users] Re: Host crash undetected by ceph health check

2021-05-10 Thread Frank Schilder
"time": "2021-05-10 14:54:06.211732", "event": "no_reply: send routed request" }, { "time": "2021-05-10 14:55:34.257996", "ev

[ceph-users] Re: Performance compare between CEPH multi replica and EC

2021-05-09 Thread Frank Schilder
and then someone might be able to give details about a specific set up. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: zp_8483 Sent: 08 May 2021 10:45:17 To: ceph-users@ceph.io Subject: [ceph-users] Performance compare between

[ceph-users] Re: Building ceph clusters with 8TB SSD drives?

2021-05-07 Thread Frank Schilder
benefit is no moving parts. The higher shock resistance can be a big plus. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Matt Larson Sent: 07 May 2021 22:10:40 To: ceph-users Subject: [ceph-users] Building ceph

[ceph-users] Host crash undetected by ceph health check

2021-05-07 Thread Frank Schilder
). For debugging this problem, can anyone provide me with a pointer when this could be the result of a misconfiguration? Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph

[ceph-users] Re: OSD lost: firmware bug in Kingston SSDs?

2021-05-06 Thread Frank Schilder
Hi Andrew, thanks, that is reassuring. To be sure, I plan to do a few power out tests with this server. Never had any issues with that so far, its the first time I see a corrupted OSD. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] OSD lost: firmware bug in Kingston SSDs?

2021-05-06 Thread Frank Schilder
er outages? Any recommendations? Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: OSD id 241 != my id 248: conversion from "ceph-disk" to "ceph-volume simple" destroys OSDs

2021-05-04 Thread Frank Schilder
Hi Chris and Wissem, finally found the time: https://tracker.ceph.com/issues/50638 Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Chris Dunlop Sent: 16 March 2021 03:56:50 To: Frank Schilder Cc: ceph-users

[ceph-users] Re: OSD slow ops warning not clearing after OSD down

2021-05-04 Thread Frank Schilder
I created a ticket: https://tracker.ceph.com/issues/50637 Hope a purge will do the trick. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 03 May 2021 15:21:38 To: Dan van der Ster; Vladimir

[ceph-users] Spam from Chip Cox

2021-05-03 Thread Frank Schilder
Does anyone else receive unsolicited replies from sender "Chip Cox " to e-mails posted on this list? Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To u

[ceph-users] Re: [ Ceph MDS MON Config Variables ] Failover Delay issue

2021-05-03 Thread Frank Schilder
to administrate ceph, it is not providing a point for comparison with a production system and will have heavily degraded performance. Ceph requires a not exactly small minimum size before it starts working well. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: [ Ceph MDS MON Config Variables ] Failover Delay issue

2021-05-03 Thread Frank Schilder
es (1) or (2) are you referring >to? How many FS clients do you have? Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Patrick Donnelly Sent: 03 May 2021 17:19:37 To: Lokendra Rathour Cc: Ceph Development; dev; cep

[ceph-users] Re: OSD slow ops warning not clearing after OSD down

2021-05-03 Thread Frank Schilder
Hi Dan, just restarted all MONs, no change though :( Thanks for looking at this. I will wait until tomorrow. My plan is to get the disk up again with the same OSD ID and would expect that this will eventually allow the message to be cleared. Best regards, = Frank Schilder AIT

[ceph-users] Re: OSD slow ops warning not clearing after OSD down

2021-05-03 Thread Frank Schilder
of this message and am wondering of purging the OSD would help. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Vladimir Sigunov Sent: 03 May 2021 13:45:19 To: ceph-users@ceph.io; Frank Schilder Subject: Re: OSD slow ops

[ceph-users] OSD slow ops warning not clearing after OSD down

2021-05-03 Thread Frank Schilder
| ceph-22 |0 |0 |0 | 0 |0 | 0 | autoout,exists | It is probably a warning dating back to just before the fail. How can I clear it? Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: ceph Nautilus lost two disk over night everything hangs

2021-03-30 Thread Frank Schilder
Ahh, right. I saw it fixed here https://tracker.ceph.com/issues/18749 a long time ago, but it seems the back-port never happened. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Josh Baergen Sent: 30 March 2021

[ceph-users] Re: ceph Nautilus lost two disk over night everything hangs

2021-03-30 Thread Frank Schilder
ank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 30 March 2021 14:53:18 To: Rainer Krienke; Eugen Block; ceph-users@ceph.io Subject: Re: [ceph-users] Re: ceph Nautilus lost two disk over night everything hangs Dear Rainer,

[ceph-users] Re: ceph Nautilus lost two disk over night everything hangs

2021-03-30 Thread Frank Schilder
check if this option is present and set to true? If it is not working as intended, a tracker ticker might be in order. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Rainer Krienke Sent: 30 March 2021 13:05:56

[ceph-users] Re: Cluster suspends when Add Mon or stop and start after a while.

2021-03-29 Thread Frank Schilder
Please use the correct list: ceph-users@ceph.io Probably same problem I had. Try reducing mon_sync_max_payload_size=4096 and start a new MON. Should just take a few seconds to boot up. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: LVM vs. direct disk acess

2021-03-25 Thread Frank Schilder
egards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Nico Schottelius Sent: 25 March 2021 13:29:10 To: Frank Schilder Cc: Marc; Nico Schottelius; ceph-users@ceph.io Subject: Re: [ceph-users] Re: LVM vs. direct disk acess Frank Schilder

[ceph-users] Re: PG export import

2021-03-18 Thread Frank Schilder
version etc. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Szabo, Istvan (Agoda) Sent: 18 March 2021 10:48:05 To: Ceph Users Subject: [ceph-users] PG export import Hi, I’ve tried to save some pg from a dead

[ceph-users] Re: osd_max_backfills = 1 for one OSD

2021-03-16 Thread Frank Schilder
ceph config rm ... = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dave Hall Sent: 16 March 2021 16:41:47 To: ceph-users Subject: [ceph-users] osd_max_backfills = 1 for one OSD Hello, I've been trying to get an OSD ready

[ceph-users] Re: Inactive pg, how to make it active / or delete

2021-03-16 Thread Frank Schilder
The PG says blocked_by at least 2 of your down-OSDs. When you look at the history (past_intervals), it needs to backfill from the down OSDs (down_osds_we_would_probe). Since its more than 1, it can't proceed. You need to get the OSDs up. Best regards, = Frank Schilder AIT Risø

[ceph-users] Re: MDS pinning: ceph.dir.pin: No such attribute

2021-03-15 Thread Frank Schilder
ile from source? Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Patrick Donnelly Sent: 15 March 2021 18:43:36 To: Jeff Layton Cc: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: MDS pinnin

[ceph-users] MDS pinning: ceph.dir.pin: No such attribute

2021-03-15 Thread Frank Schilder
When I try this, I get the following: # setfattr -n ceph.dir.pin -v 1 . # getfattr -n ceph.dir.pin . .: ceph.dir.pin: No such attribute Is this expected? How can I check that pinning is in effect as intended? Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109

[ceph-users] Re: OSD id 241 != my id 248: conversion from "ceph-disk" to "ceph-volume simple" destroys OSDs

2021-03-13 Thread Frank Schilder
Sorry if anyone gets this twice. It didn't make it to the list. -- Frank From: Frank Schilder Sent: 12 March 2021 13:48 To: Chris Dunlop Cc: ceph-users@ceph.io; Wissem MIMOUNA Subject: Re: [ceph-users] OSD id 241 != my id 248: conversion from "ceph

[ceph-users] Re: Removing secondary data pool from mds

2021-03-13 Thread Frank Schilder
luster is not health_ok. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Michael Thomas Sent: 12 March 2021 22:29:48 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Removing secondary data pool from m

[ceph-users] Re: OSD id 241 != my id 248: conversion from "ceph-disk" to "ceph-volume simple" destroys OSDs

2021-03-12 Thread Frank Schilder
te.py:190-206 would look something like this uuid_map = { 'journal': osd_metadata.get('journal', {}).get('uuid'), 'block': osd_metadata.get('block', {}).get('uuid'), 'block.db': osd_metadata.get('block.db', {}).get('uuid'), 'block.wal': osd_metadata.

[ceph-users] Can FS snapshots cause factor 3 performance loss?

2021-03-11 Thread Frank Schilder
2021-03-07_000611+0100_daily 2021-03-10_000612+0100_daily 2021-03-04_000611+0100_daily 2021-03-08_000611+0100_daily 2021-03-11_000612+0100_daily 2021-03-05_000611+0100_daily 2021-03-08_000911+0100_weekly Many thanks for any pointers and best regards, = Frank Schilder AIT

[ceph-users] Re: OSD id 241 != my id 248: conversion from "ceph-disk" to "ceph-volume simple" destroys OSDs

2021-03-11 Thread Frank Schilder
ceph-volume simple scan" will produce a corrupted meta data file. This will also happen if you move a converted OSD to another host and try to scan+start it. The change of the symbolic link to an unstable device path is a critical bug and I don't even understand why it happens in the first place. Th

[ceph-users] Re: Monitor leveldb growing without bound v14.2.16

2021-03-03 Thread Frank Schilder
Slow mon sync can be caused by too large mon_sync_max_payload_size. The default is usually way too high. I had sync problems until I set mon_sync_max_payload_size = 4096 Since then mon sync is not an issue any more. Best regards, = Frank Schilder AIT Risø Campus Bygning 109

[ceph-users] OSD id 241 != my id 248: conversion from "ceph-disk" to "ceph-volume simple" destroys OSDs

2021-03-02 Thread Frank Schilder
rsion" device names are unstable. I have now a cluster that I cannot reboot servers on due to this problem. OSDs randomly re-assigned devices will refuse to start with: 2021-03-02 15:56:21.709 7fb7c2549b80 -1 OSD id 241 != my id 248 Please help me with getting out of this mess. Thanks and

[ceph-users] Re: reboot breaks OSDs converted from ceph-disk to ceph-volume simple

2021-03-02 Thread Frank Schilder
, "kv_backend": "rocksdb", "magic": "ceph osd volume v026", "mkfs_done": "yes", "none": "", "ready": "ready", "require_osd_release": "", "type":

[ceph-users] reboot breaks OSDs converted from ceph-disk to ceph-volume simple

2021-03-02 Thread Frank Schilder
8 Oct 16 2019 kv_backend -rw-r--r--. 1 ceph ceph 21 Oct 16 2019 magic -rw-r--r--. 1 ceph disk 4 Oct 16 2019 mkfs_done -rw-r--r--. 1 ceph ceph 0 Nov 23 14:58 none -rw-r--r--. 1 ceph disk 6 Oct 16 2019 ready -rw-r--r--. 1 ceph disk 2 Jan 31 2020 require_osd_release -rw-r--r--. 1 c

[ceph-users] Re: Slow cluster / misplaced objects - Ceph 15.2.9

2021-02-26 Thread Frank Schilder
protection, it is called "link dampening". Our switches are Dell OS9, s4048 and z9100. Might be worth checking if your switches support something like that as well. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: D

[ceph-users] Re: Newbie Requesting Help - Please, This Is Driving Me Mad/Crazy!

2021-02-25 Thread Frank Schilder
above https://docs.ceph.com/en/latest/install/manual-deployment/#bluestore with the additional simplification of using the "lvm batch" sub-command. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Burkhard L

[ceph-users] Re: Newbie Requesting Help - Please, This Is Driving Me Mad/Crazy!

2021-02-25 Thread Frank Schilder
-i Make sure no daemon is running at the time. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: duluxoz Sent: 25 February 2021 03:53:10 To: ceph-users@ceph.io Subject: [ceph-users] Re: Newbie Requesting Help - Ple

<    1   2   3   4   5   6   7   8   >