[ceph-users] Re: CephFS: convert directory into subvolume

2023-08-24 Thread Eugen Block
) in the default subvolume group. So, in your case the actual path to the subvolume would be /mnt/volumes/_nogroup/subvol2/ On Tue, Aug 22, 2023 at 4:50 PM Eugen Block wrote: Hi, while writing a response to [1] I tried to convert an existing directory within a single cephfs into a subvolume

[ceph-users] Re: Patch change for CephFS subvolume

2023-08-23 Thread Eugen Block
will recognize it (no extend attr needed), if you use subvolumegroup name difference than "_nogroup", you must provide it in all subvolume command [--group_name ] regards, Anh Phan On Wed, Aug 23, 2023 at 6:51 PM Eugen Block wrote: Hi, I started a new thread [2] to not hijack yours.

[ceph-users] Re: Patch change for CephFS subvolume

2023-08-23 Thread Eugen Block
to a subvolume, but it also didn't appear in the list of set subvolumes. Perhaps it's no longer supported? Michal On 8/22/23 12:56, Eugen Block wrote: Hi, I don't know if there's a way to change the path (I assume not except creating a new path and copy the data), but you could set up

[ceph-users] Re: Client failing to respond to capability release

2023-08-23 Thread Eugen Block
debugging revealed that something was off. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Wednesday, August 23, 2023 8:55 AM To: ceph-users@ceph.io Subject: [ceph-users] Re: Client failing

[ceph-users] Re: Client failing to respond to capability release

2023-08-23 Thread Eugen Block
Hi, pointing you to your own thread [1] ;-) [1] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/HFILR5NMUCEZH7TJSGSACPI4P23XTULI/ Zitat von Frank Schilder : Hi all, I have this warning the whole day already (octopus latest cluster): HEALTH_WARN 4 clients failing to

[ceph-users] CephFS: convert directory into subvolume

2023-08-22 Thread Eugen Block
Hi, while writing a response to [1] I tried to convert an existing directory within a single cephfs into a subvolume. According to [2] that should be possible, I'm just wondering how to confirm that it actually worked. Because setting the xattr works fine, the directory just doesn't show

[ceph-users] Re: Patch change for CephFS subvolume

2023-08-22 Thread Eugen Block
Hi, I don't know if there's a way to change the path (I assume not except creating a new path and copy the data), but you could set up a directory the "old school" way (mount the root filesystem, create your subdirectory tree) and then convert the directory into a subvolume by setting

[ceph-users] Re: [quincy] Migrating ceph cluster to new network, bind OSDs to multple public_nework

2023-08-22 Thread Eugen Block
the compute nodes. Am Di., 22. Aug. 2023 um 09:17 Uhr schrieb Eugen Block : You'll need to update the mon_host line as well. Not sure if it makes sense to have both old and new network in there, but I'd try on one host first and see if it works. Zitat von Boris Behrens : > We're work

[ceph-users] Re: Global recovery event but HEALTH_OK

2023-08-22 Thread Eugen Block
Hi, can you add 'ceph -s' output? Has the recovery finished and if not, do you see progress? Has the upgrade finished? You could try a 'ceph mgr fail'. Zitat von Alfredo Daniel Rezinovsky : I had many movement in my cluster. Broken node, replacement, rebalancing. Noy I'm stuck in

[ceph-users] Re: [quincy] Migrating ceph cluster to new network, bind OSDs to multple public_nework

2023-08-22 Thread Eugen Block
basic osd_mclock_max_capacity_iops_ssd 15333.697366 Am Mo., 21. Aug. 2023 um 14:20 Uhr schrieb Eugen Block : Hi, > I don't have those configs. The cluster is not maintained via cephadm / > orchestrator. I just assumed that with Quincy it already would be managed by cephadm. S

[ceph-users] Re: [quincy] Migrating ceph cluster to new network, bind OSDs to multple public_nework

2023-08-21 Thread Eugen Block
Eugen Block : Hi, there have been a couple of threads wrt network change, simply restarting OSDs is not sufficient. I still haven't had to do it myself, but did you 'ceph orch reconfig osd' after adding the second public network, then restart them? I'm not sure if the orchestrator works

[ceph-users] Re: [quincy] Migrating ceph cluster to new network, bind OSDs to multple public_nework

2023-08-21 Thread Eugen Block
Hi, there have been a couple of threads wrt network change, simply restarting OSDs is not sufficient. I still haven't had to do it myself, but did you 'ceph orch reconfig osd' after adding the second public network, then restart them? I'm not sure if the orchestrator works as expected

[ceph-users] Re: EC pool degrades when adding device-class to crush rule

2023-08-21 Thread Eugen Block
Hi, I tried to find an older thread that explained this quite well, maybe my google foo left me... Anyway, the docs [1] explain the "degraded" state of a PG: When a client writes an object to the primary OSD, the primary OSD is responsible for writing the replicas to the replica OSDs.

[ceph-users] Re: OSD delete vs destroy vs purge

2023-08-21 Thread Eugen Block
Yeah, that's basically it, also taking into account Anthony's response, of course. Zitat von Nicola Mori : Thanks Eugen for the explanation. To summarize what I understood: - delete from GUI simply does a drain+destroy; - destroy will preserve the OSD id so that it will be used by the next

[ceph-users] Re: Degraded FS on 18.2.0 - two monitors per host????

2023-08-18 Thread Eugen Block
Hi, your subject is "...two monitors per host" but I guess you're asking for MDS daemons per host. ;-) What's the output of 'ceph orch ls mds --export'? You're using 3 active MDS daemons, maybe you set "count_per_host: 2" to have enough standby daemons? I don't think an upgrade would

[ceph-users] Re: cephadm orchestrator does not restart daemons [was: ceph orch upgrade stuck between 16.2.7 and 16.2.13]

2023-08-16 Thread Eugen Block
! Eugen Zitat von Robert Sander : On 8/16/23 12:10, Eugen Block wrote: I don't really have a good idea right now, but there was a thread [1] about ssh sessions that are not removed, maybe that could have such an impact? And if you crank up the debug level to 30, do you see anything else

[ceph-users] Re: cephadm orchestrator does not restart daemons [was: ceph orch upgrade stuck between 16.2.7 and 16.2.13]

2023-08-16 Thread Eugen Block
I don't really have a good idea right now, but there was a thread [1] about ssh sessions that are not removed, maybe that could have such an impact? And if you crank up the debug level to 30, do you see anything else? ceph config set mgr debug_mgr 30 [1]

[ceph-users] Re: Cephadm adoption - service reconfiguration changes container image

2023-08-16 Thread Eugen Block
That would have been my suggestion as well, set your own container image and override the default. Just one comment, the config option is "container_image" and not "container", that one fails: $ ceph config set global container my-registry:5000/ceph/ceph:16.2.9 Error EINVAL: unrecognized

[ceph-users] Re: ceph orch upgrade stuck between 16.2.7 and 16.2.13

2023-08-15 Thread Eugen Block
Hi, literally minutes before your email popped up in my inbox I had announced that I would upgrade our cluster from 16.2.10 to 16.2.13 tomorrow. Now I'm hesitating. ;-) I guess I would start looking on the nodes where it failed to upgrade OSDs and check out the cephadm.log as well as

[ceph-users] Re: radosgw-admin command hangs out ,many hours

2023-08-14 Thread Eugen Block
Hi, after you deployed the RGW service, have all the pools been created (automatically)? Can you share the output of: ceph -s ceph osd pool ls overlays: uncrognized mount option "volatile" or missing value I don't think that's the issue here. Zitat von nguyenvand...@baoviet.com.vn:

[ceph-users] Re: Puzzle re 'ceph: mds0 session blocklisted"

2023-08-12 Thread Eugen Block
Hi, just a thought: Maybe that message is just telling you that the previous session has been blocklisted during the client reboot. MDS clients are frequently requested to free up their caps etc., if they don't do that within the defined interval (don't know by heart) the client session

[ceph-users] Re: Lots of space allocated in completely empty OSDs

2023-08-12 Thread Eugen Block
Hi, I can't seem to find the threads I was looking for, this has been discussed before. Anyway, IIRC it could be a MGR issue which fails to update the stats. Maybe a MGR failover clears things up? If that doesn't help I would try a compaction on one OSD and see if the stats are corrected

[ceph-users] Re: CEPHADM_STRAY_DAEMON

2023-08-12 Thread Eugen Block
Hi, after you added the labels to the MONs, did the orchestrator (re)deploy MONs on the dedicated MON hosts? Are there now 5 MONs running? If the orchestrator didn't clean that up by itself (it can take up to 15 minutes, I believe) you can help it by removing a daemon manually [1]:

[ceph-users] Re: Can ceph-volume manage the LVs optionally used for DB / WAL at all?

2023-08-11 Thread Eugen Block
Hi, if you deploy OSDs from scratch you don't have to create LVs manually, that is handled entirely by ceph-volume (for example on cephadm based clusters you only provide a drivegroup definition). I'm not sure if automating db/wal migration has been considered, it might be (too)

[ceph-users] Re: how to set load balance on multi active mds?

2023-08-10 Thread Eugen Block
subtree pinning. So we want to know if any config we can tune for the dynamic subtree pinning. Thanks again! Thanks, xz 2023年8月9日 17:40,Eugen Block 写道: Hi, you could benefit from directory pinning [1] or dynamic subtree pinning [2]. We had great results with manual pinning in an older

[ceph-users] Re: OSD delete vs destroy vs purge

2023-08-09 Thread Eugen Block
Hi, I'll try to summarize as far as I understand the process, please correct me if I'm wrong. - delete: drain and then delete (optionally keep OSD ID) - destroy: mark as destroyed (to re-use OSD ID) - purge: remove everything I would call the "delete" option in the dashboard as a "safe

[ceph-users] Re: how to set load balance on multi active mds?

2023-08-09 Thread Eugen Block
Hi, you could benefit from directory pinning [1] or dynamic subtree pinning [2]. We had great results with manual pinning in an older Nautilus cluster, didn't have a chance to test the dynamic subtree pinning yet though. It's difficult to tell in advance which option would suit best your

[ceph-users] Re: 64k buckets for 1 user

2023-08-07 Thread Eugen Block
Hi, just last week there was a thread [1] about a large omap warning for a single user with 400k buckets. There's no resharding for that (but with 64k you would stay under the default 200k threshold), so that's the downside, I guess. I can't tell what other impacts that may have.

[ceph-users] Re: What's the max of snap ID?

2023-08-04 Thread Eugen Block
I'm no programmer but if I understand [1] correctly it's an unsigned long long: int ImageCtx::snap_set(uint64_t in_snap_id) { which means the max snap_id should be this: 2^64 = 18446744073709551616 Not sure if you can get your cluster to reach that limit, but I also don't know what

[ceph-users] Re: unbalanced OSDs

2023-08-03 Thread Eugen Block
Turn off the autoscaler and increase pg_num to 512 or so (power of 2). The recommendation is to have between 100 and 150 PGs per OSD (incl. replicas). And then let the balancer handle the rest. What is the current balancer status (ceph balancer status)? Zitat von Spiros Papageorgiou : Hi

[ceph-users] Re: ceph-volume lvm migrate error

2023-08-03 Thread Eugen Block
Check out the ownership of the newly created DB device, according to your output it belongs to the root user. In the osd.log you probably should see something related to "permission denied". If you change it to ceph:ceph the OSD might start properly. Zitat von Roland Giesler : Ouch, I

[ceph-users] Re: mgr services frequently crash on nodes 2,3,4

2023-08-03 Thread Eugen Block
Can you query those config options yourself? storage01:~ # ceph config get mgr mgr/dashboard/standby_behaviour storage01:~ # ceph config get mgr mgr/dashboard/AUDIT_API_ENABLED I'm not sure if those are responsible for the crash though. Zitat von "Adiga, Anantha" : Hi, Mgr service crash

[ceph-users] Re: Upgrading nautilus / centos7 to octopus / ubuntu 20.04. - Suggestions and hints?

2023-08-02 Thread Eugen Block
It's all covered in the docs [1], one of the points I already mentioned (require-osd-release), you should have bluestore OSDs and converted them to ceph-volume before you can adopt them with cephadm (if you deployed your cluster pre-nautilus). [1]

[ceph-users] Re: Disk device path changed - cephadm faild to apply osd service

2023-08-02 Thread Eugen Block
: spec: data_devices: paths: - /dev/sdh - /dev/sdi - /dev/sdj - /dev/sdk - /dev/sdl db_devices: paths: - /dev/sdf filter_logic: AND objectstore: bluestore Von: Eugen Block Gesendet: Mittwoch, 2. August 2023 08:13

[ceph-users] Re: Disk device path changed - cephadm faild to apply osd service

2023-08-02 Thread Eugen Block
Do you really need device paths in your configuration? You could use other criteria like disk sizes, vendors, rotational flag etc. If you really want device paths you'll probably need to ensure they're persistent across reboots via udev rules. Zitat von Kilian Ries : Hi, it seems that

[ceph-users] Re: 1 Large omap object found

2023-08-02 Thread Eugen Block
c.  But, I think one of our guys mentioned that the cleanup might not be getting rid of buckets, only the files in them.  So, I may have to get our dev guys to revisit this and see if we can clean up a crapload of empty buckets. On Tue, 2023-08-01 at 08:37 +0000, Eugen Block wrote: Th

[ceph-users] Re: Upgrading nautilus / centos7 to octopus / ubuntu 20.04. - Suggestions and hints?

2023-08-02 Thread Eugen Block
Hi, from Ceph perspective it's supported to upgrade from N to P, you can safely skip O. We have done that on several clusters without any issues. You just need to make sure that your upgrade to N was complete. Just a few days ago someone tried to upgrade from O to Q with

[ceph-users] Re: 1 Large omap object found

2023-08-01 Thread Eugen Block
active+clean13h 8092'56868 8093:4813791 [26,30,13]p26 [26,30,13]p26 2023-07- 31T17:50:40.349450+ 2023-07-31T17:50:40.349450+ 311 periodic scrub scheduled @ 2023-08-02T04:39:41.913504+ On Tue, 2023-08-01 at 06:14 +, Eugen Block wrote: Yeah, regarding data

[ceph-users] Re: MDS nodes blocklisted

2023-08-01 Thread Eugen Block
You could add (debug) logs for starters ;-) There was a thread [1] describing something quite similar, pointing to this bug report [2]. In recent versions it's supposed to be fixed although I don't see the tracker or PR number in the release notes of both pacific and quincy. Can you verify

[ceph-users] Re: 1 Large omap object found

2023-08-01 Thread Eugen Block
hdd 7.27739 1.0 7.3 TiB 1.1 TiB 1.1 TiB 1.1 GiB 8.4 GiB 6.2 TiB 14.99 0.94 19 up TOTAL 291 TiB 47 TiB 46 TiB51 GiB 359 GiB 244 TiB 16.02 MIN/MAX VAR: 0.52/1.77 STDDEV: 4.56 On Mon, 2023-07-31 at 09:22 +, Eugen Block wrote: Hi, can you

[ceph-users] Re: 1 Large omap object found

2023-07-31 Thread Eugen Block
Hi, can you share some more details like 'ceph df' and 'ceph osd df'? I don't have too much advice yet, but to see all entries in your meta pool you need add the --all flag because those objects are stored in namespaces: rados -p default.rgw.meta ls --all That pool contains user and

[ceph-users] Re: MON sync time depends on outage duration

2023-07-28 Thread Eugen Block
omments. Thanks, Eugen Zitat von Josh Baergen : Out of curiosity, what is your require_osd_release set to? (ceph osd dump | grep require_osd_release) Josh On Tue, Jul 11, 2023 at 5:11 AM Eugen Block wrote: I'm not so sure anymore if that could really help here. The dump-keys output from

[ceph-users] Re: OSD stuck on booting state after upgrade (v15.2.17 -> v17.2.6)

2023-07-27 Thread Eugen Block
Can you paste 'ceph versions' output please? You state that you upgraded from octopus --> quincy but your require-osd-release is nautilus. Did you change that to octopus after the previous upgrade? It's not supported to skip more than version (N --> P, O --> Q, but not N --> Q). Maybe it

[ceph-users] Re: Ceph 17.2.6 alert-manager receives error 500 from inactive MGR

2023-07-27 Thread Eugen Block
I think I see something similar on a Pacific cluster, the alertmanager doesn't seem to be aware of a mgr failover. One of the active alerts is CephMgrPrometheusModuleInactive stating: The mgr/prometheus module at storage04.fqdn:9283 is unreachable. ... Which is true because the active mgr

[ceph-users] Re: inactive PGs looking for a non existent OSD

2023-07-27 Thread Eugen Block
Hi, what exactly is your question? You seem to have made progress in bringing OSDs back up and reducing inactive PGs. What is unexpected to me is that one host failure would cause inactive PGs. Can you share more details about your osd tree and crush rules of the affected inactive PGs?

[ceph-users] Re: RGWs offline after upgrade to Nautilus

2023-07-26 Thread Eugen Block
Hi, apparently, my previous suggestions don't apply here (full OSDs or max_pgs_per_osd limit). Did you also check the rgw client keyrings? Did you also upgrade the operating system? Maybe some apparmor stuff? Can you set debug to 30 to see if there're more to see? Anything in the mon or

[ceph-users] Re: 1 PG stucked in "active+undersized+degraded for long time

2023-07-26 Thread Eugen Block
I can provide some more details, these were the recovery steps taken so far, they started from here (I don't know the whole/exact story though): 70/868386704 objects unfound (0.000%) Reduced data availability: 8 pgs inactive, 8 pgs incomplete Possible data damage: 1 pg recovery_unfound

[ceph-users] Re: OSD tries (and fails) to scrub the same PGs over and over

2023-07-22 Thread Eugen Block
the cluster status? Is there recovery or backfilling going on? No. Everything is good except this PG is not getting scrubbed. Vlad On 7/21/23 01:41, Eugen Block wrote: Hi, what's the cluster status? Is there recovery or backfilling going on? Zitat von Vladimir Brik : I have a PG that hasn't

[ceph-users] Re: OSD tries (and fails) to scrub the same PGs over and over

2023-07-21 Thread Eugen Block
Hi, what's the cluster status? Is there recovery or backfilling going on? Zitat von Vladimir Brik : I have a PG that hasn't been scrubbed in over a month and not deep-scrubbed in over two months. I tried forcing with `ceph pg (deep-)scrub` but with no success. Looking at the logs of that

[ceph-users] Re: RGWs offline after upgrade to Nautilus

2023-07-21 Thread Eugen Block
Hi, a couple of threads with similar error messages all lead back to some sort of pool or osd issue. What is your current cluster status (ceph -s)? Do you have some full OSDs? Those can cause this initialization timeout as well as hit the max_pg_per_osd limit. So a few more cluster

[ceph-users] Re: replacing all disks in a stretch mode ceph cluster

2023-07-19 Thread Eugen Block
Hi, during cluster upgrades from L to N or later one had to rebuild OSDs which were originally deployed by ceph-disk switching to ceph-volume. We've done this on multiple clusters and redeployed one node by one. We did not drain the nodes beforehand because the EC resiliency configuration

[ceph-users] Re: MON sync time depends on outage duration

2023-07-12 Thread Eugen Block
? How would that work in a real cluster with multiple MONs? If I stop the first, clean up the mon db, then start it again, wouldn't it sync the keys from its peers? Not sure how that would work... Zitat von Eugen Block : It was installed with Octopus and hasn't been upgraded yet

[ceph-users] Re: MON sync time depends on outage duration

2023-07-11 Thread Eugen Block
It was installed with Octopus and hasn't been upgraded yet: "require_osd_release": "octopus", Zitat von Josh Baergen : Out of curiosity, what is your require_osd_release set to? (ceph osd dump | grep require_osd_release) Josh On Tue, Jul 11, 2023 at 5:11 A

[ceph-users] Re: MON sync time depends on outage duration

2023-07-11 Thread Eugen Block
ect yet... Zitat von Dan van der Ster : Oh yes, sounds like purging the rbd trash will be the real fix here! Good luck! __ Clyso GmbH | Ceph Support and Consulting | https://www.clyso.com On Mon, Jul 10, 2023 at 6:10 AM Eugen Block wrote:

[ceph-users] Re: RGW dynamic resharding blocks write ops

2023-07-11 Thread Eugen Block
resharding operation on bucket index detected, blocking Zitat von Eugen Block : We had a quite small window yesterday to debug, I found the error messages but we didn't collect the logs yet, I will ask them to do that on Monday. I *think* the error was something like this: reshardin

[ceph-users] Re: MON sync time depends on outage duration

2023-07-10 Thread Eugen Block
of snapshot tombstones (rbd mirroring snapshots in the trash namespace), maybe that will reduce the osd_snap keys in the mon db, which then would increase the startup time. We'll see... Zitat von Eugen Block : Thanks, Dan! Yes that sounds familiar from the luminous and mimic days

[ceph-users] Re: CEPH orch made osd without WAL

2023-07-10 Thread Eugen Block
dedicated WAL device, but I have only /dev/nvme0n1 , so I cannot write a correct YAML file... Dne Po, čec 10, 2023 at 09:12:29 CEST napsal Eugen Block: Yes, because you did *not* specify a dedicated WAL device. This is also reflected in the OSD metadata: $ ceph osd metadata 6 | grep dedicated

[ceph-users] Re: CEPH orch made osd without WAL

2023-07-10 Thread Eugen Block
osdspec affinity osd_spec_default type block vdo 0 devices /dev/sdi (part of listing...) Sincerely Jan Marek Dne Po, čec 10, 2023 at 08:10:58 CEST napsal Eugen Block: Hi, if you don't specify a different devi

[ceph-users] Re: CEPH orch made osd without WAL

2023-07-10 Thread Eugen Block
Hi, if you don't specify a different device for WAL it will be automatically colocated on the same device as the DB. So you're good with this configuration. Regards, Eugen Zitat von Jan Marek : Hello, I've tried to add to CEPH cluster OSD node with a 12 rotational disks and 1 NVMe. My

[ceph-users] Re: RGW dynamic resharding blocks write ops

2023-07-07 Thread Eugen Block
which error code was returned to the client there? it should be a retryable error, and many http clients have retry logic to prevent these errors from reaching the application On Fri, Jul 7, 2023 at 6:35 AM Eugen Block wrote: Hi *, last week I successfully upgraded a customer cluster from

[ceph-users] Re: RGW dynamic resharding blocks write ops

2023-07-07 Thread Eugen Block
e Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com> --- On 2023. Jul 7., at 17:49, Eugen Block wrote: Email received from the internet. If in doubt, don't click any link nor ope

[ceph-users] Re: RGW dynamic resharding blocks write ops

2023-07-07 Thread Eugen Block
On 2023. Jul 7., at 17:35, Eugen Block wrote: Email received from the internet. If in doubt, don't click any link nor open any attachment ! Hi *, last week I successfully upgraded a customer cluster from Nautilus to Pacific, no real issues, their main use

[ceph-users] RGW dynamic resharding blocks write ops

2023-07-07 Thread Eugen Block
Hi *, last week I successfully upgraded a customer cluster from Nautilus to Pacific, no real issues, their main use is RGW. A couple of hours after most of the OSDs were upgraded (the RGWs were not yet) their application software reported an error, it couldn't write to a bucket. This

[ceph-users] Re: MON sync time depends on outage duration

2023-07-07 Thread Eugen Block
and reduce tens of percent of total size. This may be just another SST file creation, 1GB by default, Ii I remember it right Do you was looks to Grafana, about this HDD's utilization, IOPS? k Sent from my iPhone On 7 Jul 2023, at 10:54, Eugen Block wrote: Can you share some more details what

[ceph-users] Re: MON sync time depends on outage duration

2023-07-07 Thread Eugen Block
to the payload size or keys option, but a timing option. Zitat von Eugen Block : Thanks, Dan! Yes that sounds familiar from the luminous and mimic days. The workaround for zillions of snapshot keys at that time was to use: ceph config set mon mon_sync_max_payload_size 4096 I actually did search

[ceph-users] Re: MON sync time depends on outage duration

2023-07-07 Thread Eugen Block
ble to understand what is taking so long, and tune mon_sync_max_payload_size and mon_sync_max_payload_keys accordingly. Good luck! Dan __ Clyso GmbH | Ceph Support and Consulting | https://www.clyso.com On Thu, Jul 6, 2023 at 1:47 PM Eugen

[ceph-users] MON sync time depends on outage duration

2023-07-06 Thread Eugen Block
Hi *, I'm investigating an interesting issue on two customer clusters (used for mirroring) I've not solved yet, but today we finally made some progress. Maybe someone has an idea where to look next, I'd appreciate any hints or comments. These are two (latest) Octopus clusters, main usage

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-06-30 Thread Eugen Block
. Cheers, Michel Le 19/06/2023 à 14:09, Eugen Block a écrit : Hi, I have a real hardware cluster for testing available now. I'm not sure whether I'm completely misunderstanding how it's supposed to work or if it's a bug in the LRC plugin. This cluster has 18 HDD nodes available across 3 rooms

[ceph-users] Re: OSDs cannot join cluster anymore

2023-06-24 Thread Eugen Block
on were up to date. I do not know why the osd config files did not get refreshed however I guess something went wrong draining the nodes we removed from the cluster. Best regards, Malte Am 21.06.23 um 22:11 schrieb Eugen Block: I still can’t really grasp what might have happened here

[ceph-users] Re: users caps change unexpected

2023-06-23 Thread Eugen Block
Hi, without knowing the details I just assume that it’s just „translated“, the syntax you set is the older way of setting rbd caps, since a couple of years it’s sufficient to use „profile rbd“. Do you notice client access issues (which I would not expect) or are you just curious about the

[ceph-users] Re: 1 PG stucked in "active+undersized+degraded for long time

2023-06-22 Thread Eugen Block
Hi, have you tried restarting the primary OSD (currently 343)? It looks like this PG is part of an EC pool, are there enough hosts available, assuming your failure-domain is host? I assume that ceph isn't able to recreate the shard on a different OSD. You could share your osd tree and

[ceph-users] Re: OSDs cannot join cluster anymore

2023-06-21 Thread Eugen Block
uck undersized for 13h, current state undersized+remapped+peered, last acting [236] pg 10.c is stuck undersized for 13h, current state active+undersized+remapped, last acting [237,236] Best, Malte Am 21.06.23 um 10:31 schrieb Eugen Block: Hi, Yes, we drained the nodes. It needed two we

[ceph-users] Re: How does a "ceph orch restart SERVICE" affect availability?

2023-06-21 Thread Eugen Block
Hi, Will that try to be smart and just restart a few at a time to keep things up and available. Or will it just trigger a restart everywhere simultaneously. basically, that's what happens for example during an upgrade if services are restarted. It's designed to be a rolling upgrade

[ceph-users] Re: OSDs cannot join cluster anymore

2023-06-21 Thread Eugen Block
ash[2323668]: debug 2023-06-21T08:11:04.174+ 7fabef5a1200 0 monclient(hunting): authenticate timed out after 300 Same messages on all OSDs. We still have some nodes running and did not restart those OSDs. Best, Malte Am 21.06.23 um 09:50 schrieb Eugen Block: Hi, can you share more deta

[ceph-users] Re: OSDs cannot join cluster anymore

2023-06-21 Thread Eugen Block
Hi, can you share more details what exactly you did? How did you remove the nodes? Hopefully, you waited for the draining to finish? But if the remaining OSDs wait for removed OSDs it sounds like the draining was not finished. Zitat von Malte Stroem : Hello, we removed some nodes from

[ceph-users] Re: OpenStack (cinder) volumes retyping on Ceph back-end

2023-06-20 Thread Eugen Block
You should report this in the openstack-discuss mailing list or create a bug report on launchpad. If you want I can do that as well. I will do some more testing to have more details. Thanks, Eugen Zitat von Eugen Block : Hi, I don't quite understand the issue yet, maybe you can clarify. If

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-06-19 Thread Eugen Block
this (very high volume) list... Or may somebody pass the email thread to one of them? Help would be really appreciated. Cheers, Michel Le 19/06/2023 à 14:09, Eugen Block a écrit : Hi, I have a real hardware cluster for testing available now. I'm not sure whether I'm completely misunderstanding how

[ceph-users] Re: Grafana service fails to start due to bad directory name after Quincy upgrade

2023-06-19 Thread Eugen Block
Hi, so grafana is starting successfully now? What did you change? Regarding the container images, yes there are defaults in cephadm which can be overridden with ceph config. Can you share this output? ceph config dump | grep container_image I tend to always use a specific image as

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-06-19 Thread Eugen Block
that help me to understand the problem I remain interested. I propose to keep this thread for that. Zitat, I shared my crush map in the email you answered if the attachment was not suppressed by mailman. Cheers, Michel Sent from my mobile Le 18 mai 2023 11:19:35 Eugen Block a écrit : H

[ceph-users] Re: OpenStack (cinder) volumes retyping on Ceph back-end

2023-06-19 Thread Eugen Block
Hi, I don't quite understand the issue yet, maybe you can clarify. If I perform a "change volume type" from OpenStack on volumes attached to the VMs the system successfully migrates the volume from the source pool to the destination pool and at the end of the process the volume is visible

[ceph-users] Re: same OSD in multiple CRUSH hierarchies

2023-06-19 Thread Eugen Block
Hi, I don't think this is going to work. Each OSD belongs to a specific host and you can't have multiple buckets (e.g. bucket type "host") with the same name in the crush tree. But if I understand your requirement correctly, there should be no need to do it this way. If you structure your

[ceph-users] Re: OSD stuck down

2023-06-13 Thread Eugen Block
Hi, did you check the MON logs? They should contain some information about the reason why the OSD is marked down and out. You could also just try to mark it in yourself, does it change anything? $ ceph osd in 34 I would also take another look into the OSD logs: cephadm logs --name osd.34

[ceph-users] Re: Operations: cannot update immutable features

2023-06-12 Thread Eugen Block
Hi, can you check for snapshots in the trash namespace? # rbd snap ls --all / Instead of removing the feature try to remove the snapshot from trash (if there are any). Zitat von Adam Boyhan : I have a small cluster on Pacific with roughly 600 RBD images. Out of those 600 images I

[ceph-users] Re: The pg_num from 1024 reduce to 32 spend much time, is there way to shorten the time?

2023-06-08 Thread Eugen Block
Sure: https://docs.ceph.com/en/latest/rados/operations/balancer/#throttling Zitat von Louis Koo : ok, I will try it. Could you show me the archive doc? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to

[ceph-users] Re: 16.2.13: ERROR:ceph-crash:directory /var/lib/ceph/crash/posted does not exist; please create

2023-06-08 Thread Eugen Block
Hi, I wonder if a redeploy of the crash service would fix that, did you try that? Zitat von Zakhar Kirpichenko : I've opened a bug report https://tracker.ceph.com/issues/61589, which unfortunately received no attention. I fixed the issue by manually setting directory ownership for

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-06-08 Thread Eugen Block
daemons? I can definitely try. However, I tried to lower the max number of mds. Unfortunately, one of the MDSs seem to be stuck in "stopping" state for more than 12 hours now. Best, Emmanuel On Wed, May 24, 2023 at 4:34 PM Eugen Block wrote: Hi, using standby-replay daemons is somethi

[ceph-users] Re: Updating the Grafana SSL certificate in Quincy

2023-06-08 Thread Eugen Block
Hi, can you paste the following output? # ceph config-key list | grep grafana Do you have a mgr/cephadm/grafana_key set? I would check the contents of crt and key and see if they match. A workaround to test the certificate and key pair would be to use a per-host config [1]. Maybe it's

[ceph-users] Re: The pg_num from 1024 reduce to 32 spend much time, is there way to shorten the time?

2023-06-06 Thread Eugen Block
I suspect the target_max_misplaced_ratio (default 0.05). You could try setting it to 1 and see if it helps. This has been discussed multiple times on this list, check out the archives for more details. Zitat von Louis Koo : Thanks for your responses, I want to know why it spend much time to

[ceph-users] Re: Quincy release -Swift integration with Keystone

2023-06-06 Thread Eugen Block
Hi, it's not really useful to create multiple threads for the same question. I wrote up some examples [1] which worked for me to integrate keystone and radosgw. From the debug logs below, it appears that radosgw is still trying to authenticate with Swift instead of Keystone. Any pointers

[ceph-users] Re: PGs incomplete - Data loss

2023-06-01 Thread Eugen Block
Hi, the short answer is yes, but without knowing anything about the cluster or what happened exactly it's a wild guess. In general, you can use the ceph-objectstore-tool [1] to export a PG (one replica or chunk) from an OSD and import it to a different OSD. I have to add, I never had to do

[ceph-users] Re: fail delete "daemon(s) not managed by cephadm"

2023-05-28 Thread Eugen Block
Try on the mentioned host if there is a daemon with: cephadm ls | grep apcepfpspsp0111 If there is one you can remove it with cephadm rm-daemon … Sometimes a MGR failover clears up that message: ceph mgr fail Zitat von farhad kh : hi everyone i have a warning ` 1 stray daemon(s) not

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-24 Thread Eugen Block
Hi, using standby-replay daemons is something to test as it can have a negative impact, it really depends on the actual workload. We stopped using standby-replay in all clusters we (help) maintain, in one specific case with many active MDSs and a high load the failover time decreased and

[ceph-users] Re: mgr memory usage constantly increasing

2023-05-23 Thread Eugen Block
Hi, there was a thread [1] just a few weeks ago. Which mgr modules are enabled in your case? Also the mgr caps seem to be relevant here. [1] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/BKP6EVZZHJMYG54ZW64YABYV6RLPZNQO/ Zitat von Tobias Hachmer : Hello list, we have

[ceph-users] Re: Grafana service fails to start due to bad directory name after Quincy upgrade

2023-05-23 Thread Eugen Block
Hi, there was a change introduced [1] for cephadm to use dashes for container names instead of dots. That still seems to be an issue somehow, in your case cephadm is complaining about the missing directory:

[ceph-users] Re: Ceph OSDs suddenly use public network for heardbeat_check

2023-05-19 Thread Eugen Block
Hi, OSDs don't just communicate with each other but especially with MONs, too. They also check the OSD status (for example OSDs are marked out after 10 minutes if the MONs haven't heard from the OSDs during the mon_osd_down_out_interval), so your /etc/hosts should definitely contain the

[ceph-users] Re: Unable to change port range for OSDs in Pacific

2023-05-18 Thread Eugen Block
Hi, the config options you mention should work, but not in the ceph.conf. You should set it via ‚ceph config set …‘ and then restart the daemons (ceph orch daemon restart osd). Zitat von Renata Callado Borges : Dear all, How are you? I have a Pacific 3 nodes cluster, and the machines

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-18 Thread Eugen Block
inline). If somebody on the list has some clue on the LRC plugin, I'm still interested by understand what I'm doing wrong! Cheers, Michel Le 04/05/2023 à 15:07, Eugen Block a écrit : > Hi, > > I don't think you've shared your osd tree yet, could you do that? > Apparently nobody else but us

[ceph-users] Re: new install or change default registry to private registry

2023-05-17 Thread Eugen Block
Hi, I would recommend to add the —image option to the bootstrap command so it will only try to pull it from the local registry. If you also provide the —skip-monitoring-stack option it will ignore Prometheus etc for the initial bootstrap. After your cluster has been deployed you can set

[ceph-users] Re: rbd mirror snapshot trash

2023-05-16 Thread Eugen Block
channel I also got a response, there's a theory that trash snapshops appeared during mon reelection and vanished after upgrading to quincy. I'll recommend to delete the trash snapshots manually, then maybe increase the snaptrim config. Zitat von Stefan Kooman : On 5/16/23 09:47, Eugen

[ceph-users] rbd mirror snapshot trash

2023-05-16 Thread Eugen Block
Good morning, I would be grateful if anybody could shed some light on this, I can't reproduce it in my lab clusters so I was hoping for the community. A customer has 2 clusters with rbd mirroring (snapshots) enabled, it seems to work fine, they have regular checks and the images on the

<    1   2   3   4   5   6   7   8   9   10   >