[ceph-users] Re: Snap trimming best practice

2023-01-11 Thread Frank Schilder
. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Szabo, Istvan (Agoda) Sent: 11 January 2023 09:06:51 To: Ceph Users Subject: [ceph-users] Snap trimming best practice Hi, Wonder have you ever faced issue

[ceph-users] Re: OSD crash on Onode::put

2023-01-11 Thread Frank Schilder
https://www.spinics.net/lists/ceph-users/msg73231.html Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dongdong Tao Sent: 11 January 2023 04:30:14 To: Frank Schilder Cc: Igor Fedotov; ceph-users@ceph.io; cobanser..

[ceph-users] Re: OSD crash on Onode::put

2023-01-10 Thread Frank Schilder
-incidence. Are there any specific conditions for this problem to be present or amplified that could have to do with hardware? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users

[ceph-users] Re: OSD crash on Onode::put

2023-01-10 Thread Frank Schilder
consider restarting an OSD? What values of the above variables are critical and what are tolerable? Of course a proper fix would be better, but I doubt that everyone is willing to apply a patch. Therefore, some guidance on how to mitigate this problem to acceptable levels might be useful. I'm thin

[ceph-users] Re: Erasing Disk to the initial state

2023-01-09 Thread Frank Schilder
You need to stop all daemons, remove the mon stores and wipe the OSDs with ceph-volume. Find out which OSDs were running on which host (ceph-volume inventory DEVICE) and use ceph-volume lvm zap --destroy --osd-id ID on these hosts. Best regards, = Frank Schilder AIT Risø

[ceph-users] Re: docs.ceph.com -- Do you use the header navigation bar? (RESPONSES REQUESTED)

2023-01-09 Thread Frank Schilder
a user. What exactly is "header navigation" expected to do if it contains nothing else? Unless I'm looking at the wrong thing (I can't see the attached image), this header can be removed. The "edit on github" link can be added to the end of a page. Best regards,

[ceph-users] Re: increasing number of (deep) scrubs

2023-01-09 Thread Frank Schilder
a way beyond bumping osd_max_scrubs to increase the number of scheduled and executed deep scrubs. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan van der Ster Sent: 05 January 2023 15:36 To: Frank Schilder Cc

[ceph-users] Re: mon scrub error (scrub mismatch)

2023-01-09 Thread Frank Schilder
. It would also be nice to have a command like "ceph mon repair" or "ceph mon resync" instead of having to do a complete manual daemon rebuild. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan va

[ceph-users] Re: mon scrub error (scrub mismatch)

2023-01-03 Thread Frank Schilder
be an explanation. Regards, Eugen Zitat von Frank Schilder : > Hi all, > > we have these messages in our logs daily: > > 1/3/23 12:20:00 PM[INF]overall HEALTH_OK > 1/3/23 12:19:46 PM[ERR] mon.2 ScrubResult(keys > {auth=77,config=2,health=11,logm=10} crc > {auth=688385498,

[ceph-users] mon scrub error (scrub mismatch)

2023-01-03 Thread Frank Schilder
, google wasn't of too much help. Is this scrub error something to worry about? Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email

[ceph-users] increasing number of (deep) scrubs

2023-01-03 Thread Frank Schilder
would have the desired effect? Are there other parameters to look at that allow gradual changes in the number of scrubs going on? Thanks a lot for your help! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list

[ceph-users] Re: libceph: osdXXX up/down all the time

2022-12-21 Thread Frank Schilder
Hi Eugen, thanks! I think this explains our observation. Thanks and merry Christmas! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: 21 December 2022 14:03:06 To: ceph-users@ceph.io Subject: [ceph-users] Re

[ceph-users] libceph: osdXXX up/down all the time

2022-12-21 Thread Frank Schilder
suspicious and we wonder if it has anything to do with the ceph client/fs. The cluster has been healthy the whole time. Best regards and thanks for pointers! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list

[ceph-users] Is there a bug in backfill scheduling?

2022-12-17 Thread Frank Schilder
578 active+clean 6339 active+remapped+backfill_wait 142 active+remapped+backfilling 6 active+clean+snaptrim io: client: 32 MiB/s rd, 247 MiB/s wr, 1.10k op/s rd, 1.57k op/s wr recovery: 4.2 GiB/s, 1.56k objects/s ===== Frank Sc

[ceph-users] Re: New pool created with 2048 pg_num not executed

2022-12-14 Thread Frank Schilder
Hi Martin, I can't find the output of ceph osd df tree ceph status anywhere. I thought you posted it, but well. Could you please post the output of these commands? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: ceph-volume inventory reports available devices as unavailable

2022-12-14 Thread Frank Schilder
onsider the disk as "available". Shame that the deployment tools are so inconsistent. It would be much easier to repair things if there was an easy way to query what is possible, how much space on a drive could be used and for what, etc. Best regards, = Frank Schil

[ceph-users] Re: New pool created with 2048 pg_num not executed

2022-12-14 Thread Frank Schilder
splitting will stop if recovery IO is going on (some objects are degraded). Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Martin Buss Sent: 14 December 2022 19:32 To: ceph-users@ceph.io Subject: [ceph-users] Re: New

[ceph-users] Re: Increase the recovery throughput

2022-12-12 Thread Frank Schilder
t versions. So, back to Eugen's answer: go through this list and try solutions of earlier cases. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Monish Selvaraj Sent: 12 December 2022 11:32:26 To: Eugen Bloc

[ceph-users] Re: What to expect on rejoining a host to cluster?

2022-12-06 Thread Frank Schilder
the outliers only. What I would not recommend is to go all balanced and 95% OSD utilisation. You will see serious performance loss after some OSDs reached 80% and if you loose an OSD or host you will have to combat the fallout of deleted upmaps. Best regards, = Frank Schilder AIT Risø

[ceph-users] Re: What to expect on rejoining a host to cluster?

2022-12-05 Thread Frank Schilder
the pro of being fairly stable under OSD failures/additions at the expanse of a few % less capacity. Maybe someone else an help here? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Matt Larson Sent: 04 December

[ceph-users] Re: proxmox hyperconverged pg calculations in ceph pacific, pve 7.2

2022-12-02 Thread Frank Schilder
to this confusion. Both dual-uses are legacy and very hard to clean up in the docs. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Rainer Krienke Sent: 02 December 2022 12:44:26 To: ceph-users@ceph.io Sub

[ceph-users] Re: PGs stuck down

2022-11-30 Thread Frank Schilder
or more physically separated possibilities for network routing that will never go down simultaneously. If just network link goes between OSDs on both sides access will be down. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: MDS stuck ops

2022-11-30 Thread Frank Schilder
tience and explanations! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Venky Shankar Sent: 30 November 2022 07:45 To: Frank Schilder Cc: Reed Dier; ceph-users; Patrick Donnelly Subject: Re: [ceph-users] Re: MDS stuck ops Hi Fran

[ceph-users] Re: Implications of pglog_hardlimit

2022-11-29 Thread Frank Schilder
-ram-growth/ Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Gregory Farnum Sent: 29 November 2022 22:25:54 To: Joshua Timmer Cc: ceph-users@ceph.io Subject: [ceph-users] Re: Implications of pglog_hardlimit On Tue, Nov

[ceph-users] Re: MDS internal op exportdir despite ephemeral pinning

2022-11-29 Thread Frank Schilder
ubtree partitioning policies". OK, I will try this out, I can restore manual pins without problems. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Patrick Donnelly Sent: 29 November 2022 18:08:56

[ceph-users] Re: MDS stuck ops

2022-11-29 Thread Frank Schilder
? Thanks a lot for your time again! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Venky Shankar Sent: 29 November 2022 15:54:12 To: Frank Schilder Cc: Reed Dier; ceph-users Subject: Re: [ceph-users] Re: MDS stuck ops Hi Frank, On Tue, Nov

[ceph-users] Re: MDS stuck ops

2022-11-29 Thread Frank Schilder
s the implementation not match documentation? Thanks for any insight! Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Venky Shankar Sent: 29 November 2022 10:09:21 To: Frank Schilder Cc: Reed Dier; ceph-user

[ceph-users] Re: PGs stuck down

2022-11-29 Thread Frank Schilder
he DCs? Without stretch mode you need 3 DCs and a geo-replicated 3(2) pool. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Wolfpaw - Dale Corse Sent: 29 November 2022 07:20:20 To: 'ceph-users' Subject: [ceph-user

[ceph-users] Re: MDS stuck ops

2022-11-29 Thread Frank Schilder
g by hand and it solved all sorts of problems. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS stuck ops

2022-11-28 Thread Frank Schilder
. If I fail an MDS, its only 1/8th of users noticing (except maybe rank 0). The fail-over is usually fast enough that I don't get complaints. We have ca. 1700 kernel clients, it takes a few minutes for the new MS to become active. Best regards, = Frank Schilder AIT Risø Campus

[ceph-users] Re: MDS stuck ops

2022-11-28 Thread Frank Schilder
oosted. We are also on octopus. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Reed Dier Sent: 28 November 2022 19:14:55 To: Venky Shankar Cc: ceph-users Subject: [ceph-users] Re: MDS stuck ops Hi Venk

[ceph-users] Re: ceph-volume lvm zap destroyes up+in OSD

2022-11-28 Thread Frank Schilder
Thanks, also for finding the related tracker issue! It looks like a fix has already been approved. Hope it shows up in the next release. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: 28

[ceph-users] Re: What to expect on rejoining a host to cluster?

2022-11-27 Thread Frank Schilder
when adding a new host. That's a stable situation from an operations point of view. Hope that helps. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Matt Larson Sent: 26 November 2022 21:07:41 To: ceph-users

[ceph-users] Re: ceph-volume lvm zap destroyes up+in OSD

2022-11-23 Thread Frank Schilder
t, I think it is worth a ticket. Since I can't test on versions higher than octopus yet, could you then open the ticket? Thanks! ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: 23 November 2022 09:27:22 To: ceph-use

[ceph-users] Re: MDS internal op exportdir despite ephemeral pinning

2022-11-23 Thread Frank Schilder
-level sub-dir distributed over 10K sub-trees, which really didn't help performance at all. If anyone has the dynamic balancer in action, intentionally or not, it might be worth trying to pin everything up to a depth of 2-3 in the FS tree. Best regards, ===== Frank Schilder AIT R

[ceph-users] ceph-volume lvm zap destroyes up+in OSD

2022-11-22 Thread Frank Schilder
ere is an in-official recovery procedure somewhere). I would prefer that ceph-volume lvm zap employs the same strict sanity checks as other ceph-commands to avoid accidents. In my case it was a typo, one wrong letter. Best regards, ===== Frank Schilder AIT Ri

[ceph-users] Re: backfilling kills rbd performance

2022-11-20 Thread Frank Schilder
to wpq or look at high-client IO profiles. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: martin.kon...@konsec.com on behalf of Konold, Martin Sent: 19 November 2022 18:06:54 To: ceph-users@ceph.io Subject: [ceph

[ceph-users] Re: MDS internal op exportdir despite ephemeral pinning

2022-11-18 Thread Frank Schilder
of the documentation can I trust? If it is implemented, I would like to get it working - if this is possible at all. Would you still take a look at the data? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Patrick

[ceph-users] Re: MDS internal op exportdir despite ephemeral pinning

2022-11-18 Thread Frank Schilder
cf884bd39c17e3236e0632ac146dc4) octopus (stable)": 1070 } } I will collect the other output you ask for and send it to you privately. Unless you state otherwise, I will attach a gz-file to an e-mail. Thanks for your help! Best regards, = Frank Schilder AIT Risø Campus Bygni

[ceph-users] Re: MDS internal op exportdir despite ephemeral pinning

2022-11-18 Thread Frank Schilder
gh, I also need to know how the correct output should look like. I would be grateful if you could provide this additional information. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________ From: Patrick Donnelly Sent:

[ceph-users] Re: LVM osds loose connection to disk

2022-11-18 Thread Frank Schilder
. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan van der Ster Sent: 18 November 2022 10:43:12 To: Frank Schilder Cc: Igor Fedotov; ceph-users@ceph.io Subject: Re: [ceph-users] Re: LVM osds loose

[ceph-users] Re: LVM osds loose connection to disk

2022-11-17 Thread Frank Schilder
ssible to reproduce a realistic ceph-osd IO pattern for testing. Is there any tool available for this? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 14 November 2022 13:03:58 To: Igor Fedotov

[ceph-users] Re: MDS internal op exportdir despite ephemeral pinning

2022-11-17 Thread Frank Schilder
be deeper in the directory tree, which in turn should be pinned to a rank and not move. That's why I would really like to know what directories are moved around. Thanks and best regards! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From

[ceph-users] Re: MDS internal op exportdir despite ephemeral pinning

2022-11-17 Thread Frank Schilder
dropcaches on client nodes after job completion, so there is potential for reloading data)? Thanks a lot! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Patrick Donnelly Sent: 16 November 2022 22:50:22 To: Frank Schilder Cc: ceph

[ceph-users] MDS internal op exportdir despite ephemeral pinning

2022-11-16 Thread Frank Schilder
misunderstanding the warning? What is happening here and why are these ops there? Does this point to a config problem? Thanks for any explanations! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph

[ceph-users] Re: OSDs down after reweight

2022-11-15 Thread Frank Schilder
, then effective-weight = crush-weight * reweight, but it is clearly not implemented this way. Please take a look at the specific re-mapping examples on a test cluster I posted with effective-weights=0.5*1 and 1*0.5. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109

[ceph-users] Re: OSDs down after reweight

2022-11-15 Thread Frank Schilder
when using reweight. And this should not happen, it smells like a really bad bug. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Etienne Menguy Sent: 15 November 2022 10:45:19 To: Frank Schilder; ceph-users@ceph.io S

[ceph-users] Re: OSDs down after reweight

2022-11-15 Thread Frank Schilder
to the documentation, I would expect identical mappings in all 3 cases. Can someone help me out here? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 15 November 2022 10:09:10 To: ceph-users@ceph.io Subject

[ceph-users] OSDs down after reweight

2022-11-15 Thread Frank Schilder
mappings change if the relative weight of all OSDs to each other stays the same (the probabilities of picking an OSD are unchanged over all OSDs)? Thanks for any hints. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: LVM osds loose connection to disk

2022-11-14 Thread Frank Schilder
for how many log-entries are created per second with these settings for tuning log_max_recent? Thanks for your help! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 11 November 2022 10:25:17 To: Igor Fedotov

[ceph-users] Re: LVM osds loose connection to disk

2022-11-11 Thread Frank Schilder
2 are doing? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Igor Fedotov Sent: 10 November 2022 15:48:23 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: LVM osds loose connection to disk Hi

[ceph-users] Re: ceph df reporting incorrect used space after pg reduction

2022-11-11 Thread Frank Schilder
196 MiB 0 25 TiB I do not even believe that stored is correct everywhere, the numbers are very different in the other form of report. This is really irritating. I think you should file a bug report. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: LVM osds loose connection to disk

2022-11-10 Thread Frank Schilder
t, I would like to avoid hunting ghosts. Many thanks and best regards! ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Frank Schilder Sent: 10 October 2022 23:33:32 To: Igor Fedotov; ceph-users@ceph.io Subject: [ceph-users] Re: LVM

[ceph-users] Re: How to force PG merging in one step?

2022-11-10 Thread Frank Schilder
Hi Eugen, I created https://tracker.ceph.com/issues/58002 Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: 03 November 2022 11:41 To: Frank Schilder Cc: ceph-users@ceph.io Subject: Re: [ceph

[ceph-users] Large strange flip in storage accounting

2022-11-09 Thread Frank Schilder
, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: TOO_MANY_PGS after upgrade from Nautilus to Octupus

2022-11-08 Thread Frank Schilder
to increase the PG count on some pools. Apart from that, you should always use the full PG capacity that your cluster can afford, it will not only speed up so many things, it will also improve resiliency and all-to-all recovery. Best regards, = Frank Schilder AIT Risø Campus Bygning

[ceph-users] Re: How to manuall take down an osd

2022-11-07 Thread Frank Schilder
st regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Rainer Krienke Sent: 07 November 2022 09:20:44 To: ceph-users@ceph.io Subject: [ceph-users] How to manuall take down an osd Hi, today morning I had osd.77 in my ceph nautil

[ceph-users] Re: Missing OSD in up set

2022-11-03 Thread Frank Schilder
Yes, it will. The PG never had the last copy, which needs to be build for the first time. Just wait for it to finish. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Nicola Mori Sent: 03 November 2022 13:37:30

[ceph-users] Re: Strange 50K slow ops incident

2022-11-03 Thread Frank Schilder
Hi Szabo, its a switch-local network shared with an HPC cluster with spine-leaf topology. The storage nodes sit on leafs and the leafs all connect to the same spine. Everything with duplicated hardware and LACP bonding. Best regards, = Frank Schilder AIT Risø Campus Bygning 109

[ceph-users] Re: Missing OSD in up set

2022-11-03 Thread Frank Schilder
Ah, no. Just set it to 250 as well. I think choose_total_tries is the overall max, using set_choose_tries higher than choose_total_tries has no effect. In my case, the bad mapping was already resolved with both=51, but your case looks a bit more serious. Best regards, = Frank

[ceph-users] Strange 50K slow ops incident

2022-11-03 Thread Frank Schilder
ect that it was internal communication going bonkers. Since the impact is quite high it would be nice to have a pointer as to what might have happened. Thanks and best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mail

[ceph-users] Re: Missing OSD in up set

2022-11-03 Thread Frank Schilder
/issues/57348 contains examples of how the output looks like. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Nicola Mori Sent: 03 November 2022 10:57:03 To: ceph-users Subject: [ceph-users] Re: Missing OSD in up set

[ceph-users] Re: Missing OSD in up set

2022-11-03 Thread Frank Schilder
The default for choose_total_tries was 50 in my case and way too small. It will get better once you have more host buckets to choose OSDs from. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Nicola Mori Sent: 03

[ceph-users] Re: Missing OSD in up set

2022-11-02 Thread Frank Schilder
Hi Nicola, might be https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon or https://tracker.ceph.com/issues/57348. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From

[ceph-users] Re: How to force PG merging in one step?

2022-11-02 Thread Frank Schilder
influence the average much. I was always wondering how users ended up with more than 1000 PGs per OSD by accident during recovery. It now makes more sense. If there is no per-OSD warning, this can easily happen. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: Temporary shutdown of subcluster and cephfs

2022-10-25 Thread Frank Schilder
uch as possible out of the "does this really work in all corner cases" equation and rather rely on "I did this 100 times in the past without a problem" situations. That users may have to repeat a task is not a problem. Damaging the file system itself, on the other hand, is. Thanks an

[ceph-users] ceph status does not report IO any more

2022-10-25 Thread Frank Schilder
18399 pgs objects: 1.41G objects, 2.5 PiB usage: 3.2 PiB used, 8.3 PiB / 12 PiB avail pgs: 18378 active+clean 20active+clean+scrubbing+deep 1 active+clean+scrubbing Any idea what the problem could be? Thanks and best regards. ===== Fran

[ceph-users] Re: Temporary shutdown of subcluster and cephfs

2022-10-25 Thread Frank Schilder
number of parameters to ensure that the remaining sub-cluster continues to operate as normal as possible, for example, handles OSD fails in the usual way despite 90% of OSDs being down already. Thanks for your input and best regards, ===== Frank Schilder AIT Risø Campus

[ceph-users] Re: Getting started with cephfs-top, how to install

2022-10-20 Thread Frank Schilder
who can point me to an installation procedure? Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Zach Heise (SSCC) Sent: 19 October 2022 21:25:14 To: Neeraj Pratap Singh; ceph-users@ceph.io Subject: [ceph

[ceph-users] Re: Temporary shutdown of subcluster and cephfs

2022-10-19 Thread Frank Schilder
o long. I need a fast (unclean yet recoverable) procedure. Maybe data in flight gets lost, but the FS itself must come up healthy again. Any hints on how to do this? Also for the MON store log size problem? Thanks and best regards, ===== Frank Schilder AIT Ri

[ceph-users] Temporary shutdown of subcluster and cephfs

2022-10-19 Thread Frank Schilder
Thanks for any hints/corrections/confirmations! ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Slow OSD heartbeats message

2022-10-18 Thread Frank Schilder
. It would be great if this message could be improved in this way. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le

[ceph-users] Re: 1 OSD laggy: log_latency_fn slow; heartbeat_map is_healthy had timed out after 15

2022-10-16 Thread Frank Schilder
A disk may be failing without smartctl or other tools showing anything. Does it have remapped sectors? I would just throw the disk out and get a new one. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Michel

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-10-15 Thread Frank Schilder
t;OSD crashes during upgrade mimic->octopus"). The 300G OSDs on our test cluster worked fine. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Tyler Stachecki Sent: 27 September 2022 02:00 To: Marc Cc: F

[ceph-users] Re: strange OSD status when rebooting one server

2022-10-15 Thread Frank Schilder
wer objects misplaced after replacement. Its more work, but also faster recovery. If you continue to replace hosts and give them new host names, you should remove the old ones. At some point these buckets might interfere with mappings in unexpected ways. Best regards, ===== Frank Schil

[ceph-users] Re: strange OSD status when rebooting one server

2022-10-14 Thread Frank Schilder
are they there in the first place? Are you planning to add hosts or are these replaced ones? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Matthew Darwin Sent: 14 October 2022 18:57:37 To: c...@elchaka.de; ceph-users@ceph.io Subject

[ceph-users] Re: pg repair doesn't start

2022-10-13 Thread Frank Schilder
tive+clean+scrubbing 1 active+clean+scrubbing+deep+inconsistent+repair io: client: 444 MiB/s rd, 446 MiB/s wr, 2.19k op/s rd, 2.34k op/s wr recovery: 0 B/s, 223 objects/s Yay! Thanks and best regards, ===== Frank Schilder AIT Risø Campus Bygning 10

[ceph-users] pg repair doesn't start

2022-10-13 Thread Frank Schilder
RUB_ERRORS) 2022-10-11T19:26:24.246215+0200 mon.ceph-01 (mon.0) 633487 : cluster [ERR] Health check failed: Possible data damage: 1 pg inconsistent (PG_DAMAGED) Thanks and best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 __

[ceph-users] Re: MDS Performance and PG/PGP value

2022-10-13 Thread Frank Schilder
but, well, it might locate something. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Stefan Kooman Sent: 13 October 2022 13:56:45 To: Yoann Moulin; Patrick Donnelly Cc: ceph-users@ceph.io Subject: [ceph-users] Re: MDS

[ceph-users] Re: How to force PG merging in one step?

2022-10-12 Thread Frank Schilder
ng disables this warning. Recovery is the operation where exceeding a PG limit limit without knowing will hurt most. Thanks for the heads up. Probably need to watch my * a bit more with certain things. Best regards, ===== Frank Schilder AIT Risø Ca

[ceph-users] Re: Iinfinite backfill loop + number of pgp groups stuck at wrong value

2022-10-12 Thread Frank Schilder
problems with it. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Josh Baergen Sent: 07 October 2022 17:16:49 To: Nicola Mori Cc: ceph-users Subject: [ceph-users] Re: Iinfinite backfill loop + number of pgp groups

[ceph-users] Re: crush hierarchy backwards and upmaps ...

2022-10-12 Thread Frank Schilder
such activity any more. The issue tracker seems to have turned into a black hole. Do you know what the reason might be? thanks and bets regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan van der Ster Sent: 11 October 2022 19

[ceph-users] Re: Invalid crush class

2022-10-12 Thread Frank Schilder
https://tracker.ceph.com/issues/45253 = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Michael Thomas Sent: 08 October 2022 16:40:37 To: ceph-users@ceph.io Subject: [ceph-users] Invalid crush class In 15.2.7, how can I remove

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-12 Thread Frank Schilder
re pool. I'm done with getting the cluster up again and these disks are now almost empty. The problem seems to be that 100G OSDs are just a bit too small for octopus. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 Fro

[ceph-users] Re: How to force PG merging in one step?

2022-10-11 Thread Frank Schilder
it of disk life time. If I really need to reduce the impact of recovery IO I can set recovery_sleep. My personal opinion to the user group. Thanks for your help and have a nice evening! Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 _

[ceph-users] Re: LVM osds loose connection to disk

2022-10-10 Thread Frank Schilder
ope this makes some sense when interpreting the logs. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Igor Fedotov Sent: 09 October 2022 22:07:16 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] LVM osds loose

[ceph-users] How to force PG merging in one step?

2022-10-09 Thread Frank Schilder
, we disabled autoscaler on all pools and also globally. Still, it interferes with admin commands in an unsolicited way. I would like the PG merge happen on the fly as the data moves to the new OSDs. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: How to check which directory has ephemeral pinning set?

2022-10-09 Thread Frank Schilder
/home/x/y/z. Thanks and good Sunday. = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Milind Changire Sent: 09 October 2022 09:24:20 To: Frank Schilder Cc: ceph-users@ceph.io Subject: Re: [ceph-users] How to check which directory has

[ceph-users] LVM osds loose connection to disk

2022-10-08 Thread Frank Schilder
ecognised as down. Any hints on what to check if it happens again are also welcome. Many thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] How to check which directory has ephemeral pinning set?

2022-10-08 Thread Frank Schilder
What is the right way to confirm its working? Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: recurring stat mismatch on PG

2022-10-08 Thread Frank Schilder
will do a deep-scrub and report back. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan van der Ster Sent: 08 October 2022 11:18:37 To: Frank Schilder Cc: Ceph Users Subject: Re: [ceph-users] recurring stat

[ceph-users] Re: recurring stat mismatch on PG

2022-10-08 Thread Frank Schilder
, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan van der Ster Sent: 08 October 2022 11:03:05 To: Frank Schilder Cc: Ceph Users Subject: Re: [ceph-users] recurring stat mismatch on PG Hi, Is that 15.2.17? It reminds me of this bug - https

[ceph-users] recurring stat mismatch on PG

2022-10-08 Thread Frank Schilder
log_channel(cluster) log [ERR] : 19.1fff deep-scrub 1 errors This exact same mismatch was found before and I executed a pg-repair that fixed it. Now its back. Does anyone have an idea why this might be happening and how to deal with it? Thanks! = Frank Schilder AIT Risø Campus Bygning

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-07 Thread Frank Schilder
-bluestore-tool/. I guess I will stick with the tested command "repair". Nothing I found mentions what exactly is executed on start-up with bluestore_fsck_quick_fix_on_mount = true. Thanks for your quick answer! Best regards, ===== Frank Schilder AIT Risø Campus Bygning 10

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-07 Thread Frank Schilder
n off-line conversion is not mentioned. I know it has been posted before, but I seem unable to find it on this list. If someone could send me the command, I would be most grateful. Thanks and best regards, ===== Frank Schilder AIT Risø Campus Bygning 10

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-06 Thread Frank Schilder
Hi Igor, sorry for the extra e-mail. I forgot to ask: I'm interested in a tool to de-fragment the OSD. It doesn't look like the fsck command does that. Is there any such tool? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-06 Thread Frank Schilder
help at this late hour! Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Igor Fedotov Sent: 07 October 2022 00:37:34 To: Frank Schilder; ceph-users@ceph.io Cc: Stefan Kooman Subject: Re: [ceph-users] OSD crashes during

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-06 Thread Frank Schilder
d to loose more. The rebuild simply takes too long in the current situation. Thanks for your help and best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Igor Fedotov Sent: 06 October 2022 17:03:53 To: Frank Schilder;

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-06 Thread Frank Schilder
s includes FS maintenance, shut down and startup. Ceph fs clients should not crash on "ceph fs set XYZ down true", they should freeze. Etc. Its just the omap conversion that was postponed to post-upgrade as explained in [1], nothing else. Best regards, = Frank Schilder AI

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-06 Thread Frank Schilder
worked great ... until the unconverted OSDs started crashing. Things are stable now since about an hour. I really hope nothing more crashes. Recovery will likely take more than 24 hours. A long way to go in such a fragile situation. Best regards, ===== Frank Schilder AIT Risø Campus Bygn

<    1   2   3   4   5   6   7   8   >