[ceph-users] Re: Snapshot automation/scheduling for rbd?
Thanks. I think the only issue with doing snapshots via Cloudstack is potentially having to pause an instance for an extended period of time. I haven’t tested this yet but based on the docs, I think kvm has to be paused regardless. What about added volumes? Does an instance have to pause of you’re only snapshotting added volumes and not the root disk? Couple of questions. If I snapshot an rbd image from the ceph side, does that require an instance pause and is there a graceful way, perhaps through the api to do the full mapping of instance volumes -> Ceph block image named? So I can understand what block images belong to which Cloudstack instance. I never understood how to properly trace a volume from instance to Ceph image. Thanks! > On Saturday, Feb 03, 2024 at 10:47 AM, Jayanth Reddy > mailto:jayanthreddy5...@gmail.com)> wrote: > Hi, > For CloudStack with RBD, you should be able to control the snapshot placement > using the global setting "snapshot.backup.to.secondary". Setting this to > false makes snapshots be placed directly on Ceph instead of secondary > storage. See if you can perform recurring snapshots. I know that there are > limitations with KVM and disk snapshots but good to give it a try. > > Thanks > > > Get Outlook for Android (https://aka.ms/AAb9ysg) > From: Jeremy Hansen > Sent: Saturday, February 3, 2024 11:39:19 PM > To: ceph-users@ceph.io > Subject: [ceph-users] Re: Snapshot automation/scheduling for rbd? > > > Am I just off base here or missing something obvious? > > Thanks > > > > > On Thursday, Feb 01, 2024 at 2:13 AM, Jeremy Hansen > (mailto:jer...@skidrow.la)> wrote: > > Can rbd image snapshotting be scheduled like CephFS snapshots? Maybe I > > missed it in the documentation but it looked like scheduling snapshots > > wasn’t a feature for block images. I’m still running Pacific. We’re trying > > to devise a sufficient backup plan for Cloudstack and other things residing > > in Ceph. > > > > Thanks. > > -jeremy > > > > > > signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Snapshot automation/scheduling for rbd?
Can you share your script? Thanks! > On Saturday, Feb 03, 2024 at 10:35 AM, Marc (mailto:m...@f1-outsourcing.eu)> wrote: > I am having a script that checks on each node what vm's are active and then > the script makes a snap shot of their rbd's. It first issues some command to > the vm to freeze the fs if the vm supports it. > > > > > > Am I just off base here or missing something obvious? > > > > Thanks > > > > > > > > > > On Thursday, Feb 01, 2024 at 2:13 AM, Jeremy Hansen > <mailto:jer...@skidrow.la> > wrote: > > > > Can rbd image snapshotting be scheduled like CephFS snapshots? Maybe > > I missed it in the documentation but it looked like scheduling snapshots > > wasn’t a feature for block images. I’m still running Pacific. We’re trying > > to devise a sufficient backup plan for Cloudstack and other things residing > > in Ceph. > > > > Thanks. > > -jeremy > > > > > > > signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Snapshot automation/scheduling for rbd?
Am I just off base here or missing something obvious? Thanks > On Thursday, Feb 01, 2024 at 2:13 AM, Jeremy Hansen (mailto:jer...@skidrow.la)> wrote: > Can rbd image snapshotting be scheduled like CephFS snapshots? Maybe I missed > it in the documentation but it looked like scheduling snapshots wasn’t a > feature for block images. I’m still running Pacific. We’re trying to devise a > sufficient backup plan for Cloudstack and other things residing in Ceph. > > Thanks. > -jeremy > > > signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Snapshot automation/scheduling for rbd?
Can rbd image snapshotting be scheduled like CephFS snapshots? Maybe I missed it in the documentation but it looked like scheduling snapshots wasn’t a feature for block images. I’m still running Pacific. We’re trying to devise a sufficient backup plan for Cloudstack and other things residing in Ceph. Thanks. -jeremy signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Upgrading from 16.2.11?
I’d like to upgrade from 16.2.11 to the latest version. Is it possible to do this in one jump or do I need to go from 16.2.11 -> 16.2.14 -> 17.1.0 -> 17.2.7 -> 18.1.0 -> 18.2.1? I’m using cephadm. Thanks -jeremy signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Ceph as rootfs?
Is it possible to use Ceph as a root filesystem for a pxe booted host? Thanks ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Stray host/daemon
Found my previous post regarding this issue. Fixed by restarting mgr daemons. -jeremy > On Friday, Dec 01, 2023 at 3:04 AM, Me (mailto:jer...@skidrow.la)> wrote: > I think I ran in to this before but I forget the fix: > > HEALTH_WARN 1 stray host(s) with 1 daemon(s) not managed by cephadm > [WRN] CEPHADM_STRAY_HOST: 1 stray host(s) with 1 daemon(s) not managed by > cephadm > stray host cn06.ceph.fu.intra has 1 stray daemons: ['mon.cn03'] > > > Pacific 16.2.11 > > How do I clear this? > > Thanks > -jeremy > > > signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Stray host/daemon
I think I ran in to this before but I forget the fix: HEALTH_WARN 1 stray host(s) with 1 daemon(s) not managed by cephadm [WRN] CEPHADM_STRAY_HOST: 1 stray host(s) with 1 daemon(s) not managed by cephadm stray host cn06.ceph.fu.intra has 1 stray daemons: ['mon.cn03'] Pacific 16.2.11 How do I clear this? Thanks -jeremy signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Removed host still active, sort of?
Got around this issue by restarting the mgr daemons. -jeremy > On Saturday, Jun 10, 2023 at 11:26 PM, Me (mailto:jer...@skidrow.la)> wrote: > I see this in the web interface in Hosts and under cn03’s devices tab > > SAMSUNG_HD502HI_S1VFJ9ASB08190 > Unknown > n/a > sdg > mon.cn04 > > > 1 total > > > > > Which doesn’t make sense. There is no daemons running on this host and I > noticed the daemon lists looks like its one that should be on another node. > There is already a mon.cn04 running on the cn04 node. > > -jeremy > > > > > On Saturday, Jun 10, 2023 at 11:10 PM, Me > (mailto:jer...@skidrow.la)> wrote: > > I also see this error in the logs: > > > > 6/10/23 11:09:01 PM[ERR]host cn03.ceph does not exist Traceback (most > > recent call last): File "/usr/share/ceph/mgr/orchestrator/_interface.py", > > line 125, in wrapper return OrchResult(f(*args, **kwargs)) File > > "/usr/share/ceph/mgr/cephadm/module.py", line 1625, in remove_host > > self.inventory.rm_host(host) File > > "/usr/share/ceph/mgr/cephadm/inventory.py", line 108, in rm_host > > self.assert_host(host) File "/usr/share/ceph/mgr/cephadm/inventory.py", > > line 93, in assert_host raise OrchestratorError('host %s does not exist' % > > host) orchestrator._interface.OrchestratorError: host cn03.ceph does not > > exist > > > > > > > > > On Saturday, Jun 10, 2023 at 10:41 PM, Me > > (mailto:jer...@skidrow.la)> wrote: > > > I’m going through the process of transitioning to new hardware. Pacific > > > 16.2.11. > > > > > > I drained the host, all daemons were removed. Did the ceph orch host rm > > > > > > > > > [ceph: root@cn01 /]# ceph orch host rm cn03.ceph > > > Error EINVAL: host cn03.ceph does not exist > > > > > > > > > Yet I see it here: > > > > > > ceph osd crush tree |grep cn03 > > > -10 0 host cn03 > > > > > > > > > Web interface says: > > > > > > ceph health > > > HEALTH_WARN 1 stray host(s) with 1 daemon(s) not managed by cephadm > > > > > > > > > ceph orch host rm cn03.ceph --force > > > Error EINVAL: host cn03.ceph does not exist > > > > > > > > > No daemons are running. Something has a bad state. > > > > > > > > > What can I do to clear this up? The previous host went without a problem > > > and when all services were drained and I did the remove, it just > > > completely disappeared as expected. > > > > > > Thanks > > > -jeremy > > > > > > > > > signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Removed host still active, sort of?
I see this in the web interface in Hosts and under cn03’s devices tab SAMSUNG_HD502HI_S1VFJ9ASB08190 Unknown n/a sdg mon.cn04 1 total Which doesn’t make sense. There is no daemons running on this host and I noticed the daemon lists looks like its one that should be on another node. There is already a mon.cn04 running on the cn04 node. -jeremy > On Saturday, Jun 10, 2023 at 11:10 PM, Me (mailto:jer...@skidrow.la)> wrote: > I also see this error in the logs: > > 6/10/23 11:09:01 PM[ERR]host cn03.ceph does not exist Traceback (most recent > call last): File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 125, > in wrapper return OrchResult(f(*args, **kwargs)) File > "/usr/share/ceph/mgr/cephadm/module.py", line 1625, in remove_host > self.inventory.rm_host(host) File "/usr/share/ceph/mgr/cephadm/inventory.py", > line 108, in rm_host self.assert_host(host) File > "/usr/share/ceph/mgr/cephadm/inventory.py", line 93, in assert_host raise > OrchestratorError('host %s does not exist' % host) > orchestrator._interface.OrchestratorError: host cn03.ceph does not exist > > > > > On Saturday, Jun 10, 2023 at 10:41 PM, Me > (mailto:jer...@skidrow.la)> wrote: > > I’m going through the process of transitioning to new hardware. Pacific > > 16.2.11. > > > > I drained the host, all daemons were removed. Did the ceph orch host rm > > > > > > [ceph: root@cn01 /]# ceph orch host rm cn03.ceph > > Error EINVAL: host cn03.ceph does not exist > > > > > > Yet I see it here: > > > > ceph osd crush tree |grep cn03 > > -10 0 host cn03 > > > > > > Web interface says: > > > > ceph health > > HEALTH_WARN 1 stray host(s) with 1 daemon(s) not managed by cephadm > > > > > > ceph orch host rm cn03.ceph --force > > Error EINVAL: host cn03.ceph does not exist > > > > > > No daemons are running. Something has a bad state. > > > > > > What can I do to clear this up? The previous host went without a problem > > and when all services were drained and I did the remove, it just completely > > disappeared as expected. > > > > Thanks > > -jeremy > > > > > > signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Removed host still active, sort of?
I also see this error in the logs: 6/10/23 11:09:01 PM[ERR]host cn03.ceph does not exist Traceback (most recent call last): File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 125, in wrapper return OrchResult(f(*args, **kwargs)) File "/usr/share/ceph/mgr/cephadm/module.py", line 1625, in remove_host self.inventory.rm_host(host) File "/usr/share/ceph/mgr/cephadm/inventory.py", line 108, in rm_host self.assert_host(host) File "/usr/share/ceph/mgr/cephadm/inventory.py", line 93, in assert_host raise OrchestratorError('host %s does not exist' % host) orchestrator._interface.OrchestratorError: host cn03.ceph does not exist > On Saturday, Jun 10, 2023 at 10:41 PM, Me (mailto:jer...@skidrow.la)> wrote: > I’m going through the process of transitioning to new hardware. Pacific > 16.2.11. > > I drained the host, all daemons were removed. Did the ceph orch host rm > > > [ceph: root@cn01 /]# ceph orch host rm cn03.ceph > Error EINVAL: host cn03.ceph does not exist > > > Yet I see it here: > > ceph osd crush tree |grep cn03 > -10 0 host cn03 > > > Web interface says: > > ceph health > HEALTH_WARN 1 stray host(s) with 1 daemon(s) not managed by cephadm > > > ceph orch host rm cn03.ceph --force > Error EINVAL: host cn03.ceph does not exist > > > No daemons are running. Something has a bad state. > > > What can I do to clear this up? The previous host went without a problem and > when all services were drained and I did the remove, it just completely > disappeared as expected. > > Thanks > -jeremy > > > signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Removed host still active, sort of?
I’m going through the process of transitioning to new hardware. Pacific 16.2.11. I drained the host, all daemons were removed. Did the ceph orch host rm [ceph: root@cn01 /]# ceph orch host rm cn03.ceph Error EINVAL: host cn03.ceph does not exist Yet I see it here: ceph osd crush tree |grep cn03 -10 0 host cn03 Web interface says: ceph health HEALTH_WARN 1 stray host(s) with 1 daemon(s) not managed by cephadm ceph orch host rm cn03.ceph --force Error EINVAL: host cn03.ceph does not exist No daemons are running. Something has a bad state. What can I do to clear this up? The previous host went without a problem and when all services were drained and I did the remove, it just completely disappeared as expected. Thanks -jeremy signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph drain not removing daemons
Figured out how to cleanly relocate daemons via the interface. All is good. -jeremy > On Friday, Jun 09, 2023 at 2:04 PM, Me (mailto:jer...@skidrow.la)> wrote: > I’m doing a drain on a host using cephadm, Pacific, 16.2.11. > > ceph orch host drain > > removed all the OSDs, but these daemons remain: > > grafana.cn06 cn06.ceph.la1 *:3000 stopped 5m ago 18M - - > > mds.btc.cn06.euxhdu cn06.ceph.la1 running (2d) 5m ago 17M 29.4M - 16.2.11 > de4b0b384ad4 017f7ef441ff > mgr.cn06.rpkpwg cn06.ceph.la1 *:8443,9283 running (2d) 5m ago 10M 223M - > 16.2.11 de4b0b384ad4 f1b89b453ef3 > > > I manually stopped grafana. > > I expected these daemons to be removed as well. Is there an extra step I need > to do here so I can remove the host cleanly? > > Thanks! > -jeremy > > > signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Ceph drain not removing daemons
I’m doing a drain on a host using cephadm, Pacific, 16.2.11. ceph orch host drain removed all the OSDs, but these daemons remain: grafana.cn06 cn06.ceph.la1 *:3000 stopped 5m ago 18M - - mds.btc.cn06.euxhdu cn06.ceph.la1 running (2d) 5m ago 17M 29.4M - 16.2.11 de4b0b384ad4 017f7ef441ff mgr.cn06.rpkpwg cn06.ceph.la1 *:8443,9283 running (2d) 5m ago 10M 223M - 16.2.11 de4b0b384ad4 f1b89b453ef3 I manually stopped grafana. I expected these daemons to be removed as well. Is there an extra step I need to do here so I can remove the host cleanly? Thanks! -jeremy signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] unable to calc client keyring client.admin placement PlacementSpec(label='_admin'): Cannot place : No matching hosts for label _admin
3/3/23 2:13:53 AM[WRN]unable to calc client keyring client.admin placement PlacementSpec(label='_admin'): Cannot place : No matching hosts for label _admin I keep seeing this warning in the logs. I’m not really sure what action to take to resolve this issue. Thanks -jeremy signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Upgrade not doing anything...
I’m not exactly sure what I did, but it’s going through now. I did a ceph orch upgrade check --ceph-version 16.2.7 my current version…. and I did a pause and resume. Now daemons are upgrading to 16.2.11. -jeremy > On Monday, Feb 27, 2023 at 11:07 PM, Me (mailto:jer...@skidrow.la)> wrote: > [ceph: root@cn01 /]# ceph -W cephadm, > cluster: > id: bfa2ad58-c049-11eb-9098-3c8cf8ed728d > health: HEALTH_OK > > services: > mon: 5 daemons, quorum cn05,cn02,cn03,cn04,cn01 (age 111m) > mgr: cn06.rpkpwg(active, since 7h), standbys: cn02.arszct, cn03.elmwhu > mds: 2/2 daemons up, 2 standby > osd: 35 osds: 35 up (since 111m), 35 in (since 5h) > > data: > volumes: 2/2 healthy > pools: 8 pools, 545 pgs > objects: 8.13M objects, 7.7 TiB > usage: 31 TiB used, 95 TiB / 126 TiB avail > pgs: 545 active+clean > > io: > client: 4.1 MiB/s rd, 885 KiB/s wr, 128 op/s rd, 14 op/s wr > > progress: > Upgrade to quay.io/ceph/ceph:v16.2.11 (0s) > [] > > Cluster is healthy. > > Is there an easy way to see if anything was upgraded through the orchestrator? > > -jeremy > > > > > On Monday, Feb 27, 2023 at 10:58 PM, Curt > (mailto:light...@gmail.com)> wrote: > > Did any of your cluster get partial upgrade? What about ceph -W cephadm, > > does that return anything or just hang, also what about ceph health detail? > > You can always try ceph orch upgrade pause and then orch upgrade resume, > > might kick something loose, so to speak. > > On Tue, Feb 28, 2023, 10:39 Jeremy Hansen > (mailto:jer...@skidrow.la)> wrote: > > > { > > > "target_image": "quay.io/ceph/ceph:v16.2.11 > > > (http://quay.io/ceph/ceph:v16.2.11)", > > > "in_progress": true, > > > "services_complete": [], > > > "progress": "", > > > "message": "" > > > } > > > > > > Hasn’t changed in the past two hours. > > > > > > -jeremy > > > > > > > > > > > > > On Monday, Feb 27, 2023 at 10:22 PM, Curt > > > (mailto:light...@gmail.com)> wrote: > > > > What does Ceph orch upgrade status return? > > > > On Tue, Feb 28, 2023, 10:16 Jeremy Hansen > > > (mailto:jer...@skidrow.la)> wrote: > > > > > I’m trying to upgrade from 16.2.7 to 16.2.11. Reading the > > > > > documentation, I cut and paste the orchestrator command to begin the > > > > > upgrade, but I mistakenly pasted directly from the docs and it > > > > > initiated an “upgrade” to 16.2.6. I stopped the upgrade per the docs > > > > > and reissued the command specifying 16.2.11 but now I see no progress > > > > > in ceph -s. Cluster is healthy but it feels like the upgrade process > > > > > is just paused for some reason. > > > > > > > > > > Thanks! > > > > > -jeremy > > > > > > > > > > > > > > > > > > > > ___ > > > > > ceph-users mailing list -- ceph-users@ceph.io > > > > > (mailto:ceph-users@ceph.io) > > > > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > > (mailto:ceph-users-le...@ceph.io) signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Upgrade not doing anything...
[ceph: root@cn01 /]# ceph -W cephadm, cluster: id: bfa2ad58-c049-11eb-9098-3c8cf8ed728d health: HEALTH_OK services: mon: 5 daemons, quorum cn05,cn02,cn03,cn04,cn01 (age 111m) mgr: cn06.rpkpwg(active, since 7h), standbys: cn02.arszct, cn03.elmwhu mds: 2/2 daemons up, 2 standby osd: 35 osds: 35 up (since 111m), 35 in (since 5h) data: volumes: 2/2 healthy pools: 8 pools, 545 pgs objects: 8.13M objects, 7.7 TiB usage: 31 TiB used, 95 TiB / 126 TiB avail pgs: 545 active+clean io: client: 4.1 MiB/s rd, 885 KiB/s wr, 128 op/s rd, 14 op/s wr progress: Upgrade to quay.io/ceph/ceph:v16.2.11 (0s) [] Cluster is healthy. Is there an easy way to see if anything was upgraded through the orchestrator? -jeremy > On Monday, Feb 27, 2023 at 10:58 PM, Curt (mailto:light...@gmail.com)> wrote: > Did any of your cluster get partial upgrade? What about ceph -W cephadm, does > that return anything or just hang, also what about ceph health detail? You > can always try ceph orch upgrade pause and then orch upgrade resume, might > kick something loose, so to speak. > On Tue, Feb 28, 2023, 10:39 Jeremy Hansen (mailto:jer...@skidrow.la)> wrote: > > { > > "target_image": "quay.io/ceph/ceph:v16.2.11 > > (http://quay.io/ceph/ceph:v16.2.11)", > > "in_progress": true, > > "services_complete": [], > > "progress": "", > > "message": "" > > } > > > > Hasn’t changed in the past two hours. > > > > -jeremy > > > > > > > > > On Monday, Feb 27, 2023 at 10:22 PM, Curt > > (mailto:light...@gmail.com)> wrote: > > > What does Ceph orch upgrade status return? > > > On Tue, Feb 28, 2023, 10:16 Jeremy Hansen > > (mailto:jer...@skidrow.la)> wrote: > > > > I’m trying to upgrade from 16.2.7 to 16.2.11. Reading the > > > > documentation, I cut and paste the orchestrator command to begin the > > > > upgrade, but I mistakenly pasted directly from the docs and it > > > > initiated an “upgrade” to 16.2.6. I stopped the upgrade per the docs > > > > and reissued the command specifying 16.2.11 but now I see no progress > > > > in ceph -s. Cluster is healthy but it feels like the upgrade process is > > > > just paused for some reason. > > > > > > > > Thanks! > > > > -jeremy > > > > > > > > > > > > > > > > ___ > > > > ceph-users mailing list -- ceph-users@ceph.io > > > > (mailto:ceph-users@ceph.io) > > > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > (mailto:ceph-users-le...@ceph.io) signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Upgrade not doing anything...
{ "target_image": "quay.io/ceph/ceph:v16.2.11", "in_progress": true, "services_complete": [], "progress": "", "message": "" } Hasn’t changed in the past two hours. -jeremy > On Monday, Feb 27, 2023 at 10:22 PM, Curt (mailto:light...@gmail.com)> wrote: > What does Ceph orch upgrade status return? > On Tue, Feb 28, 2023, 10:16 Jeremy Hansen (mailto:jer...@skidrow.la)> wrote: > > I’m trying to upgrade from 16.2.7 to 16.2.11. Reading the documentation, I > > cut and paste the orchestrator command to begin the upgrade, but I > > mistakenly pasted directly from the docs and it initiated an “upgrade” to > > 16.2.6. I stopped the upgrade per the docs and reissued the command > > specifying 16.2.11 but now I see no progress in ceph -s. Cluster is healthy > > but it feels like the upgrade process is just paused for some reason. > > > > Thanks! > > -jeremy > > > > > > > > ___ > > ceph-users mailing list -- ceph-users@ceph.io (mailto:ceph-users@ceph.io) > > To unsubscribe send an email to ceph-users-le...@ceph.io > > (mailto:ceph-users-le...@ceph.io) signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Upgrade not doing anything...
I’m trying to upgrade from 16.2.7 to 16.2.11. Reading the documentation, I cut and paste the orchestrator command to begin the upgrade, but I mistakenly pasted directly from the docs and it initiated an “upgrade” to 16.2.6. I stopped the upgrade per the docs and reissued the command specifying 16.2.11 but now I see no progress in ceph -s. Cluster is healthy but it feels like the upgrade process is just paused for some reason. Thanks! -jeremy signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] 1 stray daemon(s) not managed by cephadm
How do I track down what is the stray daemon? Thanks -jeremy ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Two osd's assigned to one device
I have a situation (not sure how it happened), but Ceph believe I have two OSD's assigned to a single device. I tried to delete osd.2 and osd.3, but it just hangs. I'm also trying to zap sdc, which claims it does not have an osd, but I'm unable to zap it. Any suggestions? /dev/sdb HDD TOSHIBA MG04SCA40EE 3.6 TiB osd.2 osd.3 /dev/sdc SSD SAMSUNG MZILT3T8HBLS/007 3.5 TiB ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Issues after a shutdown
I use Ubiquiti equipment, mainly because I'm not a network admin... I rebooted the 10G switches and now everything is working and recovering. I hate when there's not a definitive answer but that's kind of the deal when you use Ubiquiti stuff. Thank you Sean and Frank. Frank, you were right. It made no sense because from a very basic point of view the network seemed fine, but Sean's ping revealed that it clearly wasn't. Thank you! -jeremy On Mon, Jul 25, 2022 at 3:08 PM Sean Redmond wrote: > Yea, assuming you can ping with a lower MTU, check the MTU on your > switching. > > On Mon, 25 Jul 2022, 23:05 Jeremy Hansen, > wrote: > >> That results in packet loss: >> >> [root@cn01 ~]# ping -M do -s 8972 192.168.30.14 >> PING 192.168.30.14 (192.168.30.14) 8972(9000) bytes of data. >> ^C >> --- 192.168.30.14 ping statistics --- >> 3 packets transmitted, 0 received, 100% packet loss, time 2062ms >> >> That's very weird... but this gives me something to figure out. Hmmm. >> Thank you. >> >> On Mon, Jul 25, 2022 at 3:01 PM Sean Redmond >> wrote: >> >>> Looks good, just confirm it with a large ping with don't fragment flag >>> set between each host. >>> >>> ping -M do -s 8972 [destination IP] >>> >>> >>> On Mon, 25 Jul 2022, 22:56 Jeremy Hansen, >>> wrote: >>> >>>> MTU is the same across all hosts: >>>> >>>> - cn01.ceph.la1.clx.corp- >>>> enp2s0: flags=4163 mtu 9000 >>>> inet 192.168.30.11 netmask 255.255.255.0 broadcast >>>> 192.168.30.255 >>>> inet6 fe80::3e8c:f8ff:feed:728d prefixlen 64 scopeid >>>> 0x20 >>>> ether 3c:8c:f8:ed:72:8d txqueuelen 1000 (Ethernet) >>>> RX packets 3163785 bytes 213625 (1.9 GiB) >>>> RX errors 0 dropped 0 overruns 0 frame 0 >>>> TX packets 6890933 bytes 40233267272 (37.4 GiB) >>>> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >>>> >>>> - cn02.ceph.la1.clx.corp- >>>> enp2s0: flags=4163 mtu 9000 >>>> inet 192.168.30.12 netmask 255.255.255.0 broadcast >>>> 192.168.30.255 >>>> inet6 fe80::3e8c:f8ff:feed:ff0c prefixlen 64 scopeid >>>> 0x20 >>>> ether 3c:8c:f8:ed:ff:0c txqueuelen 1000 (Ethernet) >>>> RX packets 3976256 bytes 2761764486 (2.5 GiB) >>>> RX errors 0 dropped 0 overruns 0 frame 0 >>>> TX packets 9270324 bytes 56984933585 (53.0 GiB) >>>> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >>>> >>>> - cn03.ceph.la1.clx.corp- >>>> enp2s0: flags=4163 mtu 9000 >>>> inet 192.168.30.13 netmask 255.255.255.0 broadcast >>>> 192.168.30.255 >>>> inet6 fe80::3e8c:f8ff:feed:feba prefixlen 64 scopeid >>>> 0x20 >>>> ether 3c:8c:f8:ed:fe:ba txqueuelen 1000 (Ethernet) >>>> RX packets 13081847 bytes 93614795356 (87.1 GiB) >>>> RX errors 0 dropped 0 overruns 0 frame 0 >>>> TX packets 4001854 bytes 2536322435 (2.3 GiB) >>>> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >>>> >>>> - cn04.ceph.la1.clx.corp- >>>> enp2s0: flags=4163 mtu 9000 >>>> inet 192.168.30.14 netmask 255.255.255.0 broadcast >>>> 192.168.30.255 >>>> inet6 fe80::3e8c:f8ff:feed:6f89 prefixlen 64 scopeid >>>> 0x20 >>>> ether 3c:8c:f8:ed:6f:89 txqueuelen 1000 (Ethernet) >>>> RX packets 60018 bytes 5622542 (5.3 MiB) >>>> RX errors 0 dropped 0 overruns 0 frame 0 >>>> TX packets 59889 bytes 17463794 (16.6 MiB) >>>> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >>>> >>>> - cn05.ceph.la1.clx.corp- >>>> enp2s0: flags=4163 mtu 9000 >>>> inet 192.168.30.15 netmask 255.255.255.0 broadcast >>>> 192.168.30.255 >>>> inet6 fe80::3e8c:f8ff:feed:7245 prefixlen 64 scopeid >>>> 0x20 >>>> ether 3c:8c:f8:ed:72:45 txqueuelen 1000 (Ethernet) >>>> RX packets 69163 bytes 8085511 (7.7 MiB) >>>> RX errors 0 dropped 0 overruns 0 frame 0 >>>> TX packets 73539 bytes 17069869 (16.2
[ceph-users] Re: Issues after a shutdown
That results in packet loss: [root@cn01 ~]# ping -M do -s 8972 192.168.30.14 PING 192.168.30.14 (192.168.30.14) 8972(9000) bytes of data. ^C --- 192.168.30.14 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2062ms That's very weird... but this gives me something to figure out. Hmmm. Thank you. On Mon, Jul 25, 2022 at 3:01 PM Sean Redmond wrote: > Looks good, just confirm it with a large ping with don't fragment flag set > between each host. > > ping -M do -s 8972 [destination IP] > > > On Mon, 25 Jul 2022, 22:56 Jeremy Hansen, > wrote: > >> MTU is the same across all hosts: >> >> - cn01.ceph.la1.clx.corp- >> enp2s0: flags=4163 mtu 9000 >> inet 192.168.30.11 netmask 255.255.255.0 broadcast >> 192.168.30.255 >> inet6 fe80::3e8c:f8ff:feed:728d prefixlen 64 scopeid 0x20 >> ether 3c:8c:f8:ed:72:8d txqueuelen 1000 (Ethernet) >> RX packets 3163785 bytes 213625 (1.9 GiB) >> RX errors 0 dropped 0 overruns 0 frame 0 >> TX packets 6890933 bytes 40233267272 (37.4 GiB) >> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >> >> - cn02.ceph.la1.clx.corp- >> enp2s0: flags=4163 mtu 9000 >> inet 192.168.30.12 netmask 255.255.255.0 broadcast >> 192.168.30.255 >> inet6 fe80::3e8c:f8ff:feed:ff0c prefixlen 64 scopeid 0x20 >> ether 3c:8c:f8:ed:ff:0c txqueuelen 1000 (Ethernet) >> RX packets 3976256 bytes 2761764486 (2.5 GiB) >> RX errors 0 dropped 0 overruns 0 frame 0 >> TX packets 9270324 bytes 56984933585 (53.0 GiB) >> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >> >> - cn03.ceph.la1.clx.corp- >> enp2s0: flags=4163 mtu 9000 >> inet 192.168.30.13 netmask 255.255.255.0 broadcast >> 192.168.30.255 >> inet6 fe80::3e8c:f8ff:feed:feba prefixlen 64 scopeid 0x20 >> ether 3c:8c:f8:ed:fe:ba txqueuelen 1000 (Ethernet) >> RX packets 13081847 bytes 93614795356 (87.1 GiB) >> RX errors 0 dropped 0 overruns 0 frame 0 >> TX packets 4001854 bytes 2536322435 (2.3 GiB) >> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >> >> - cn04.ceph.la1.clx.corp- >> enp2s0: flags=4163 mtu 9000 >> inet 192.168.30.14 netmask 255.255.255.0 broadcast >> 192.168.30.255 >> inet6 fe80::3e8c:f8ff:feed:6f89 prefixlen 64 scopeid 0x20 >> ether 3c:8c:f8:ed:6f:89 txqueuelen 1000 (Ethernet) >> RX packets 60018 bytes 5622542 (5.3 MiB) >> RX errors 0 dropped 0 overruns 0 frame 0 >> TX packets 59889 bytes 17463794 (16.6 MiB) >> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >> >> - cn05.ceph.la1.clx.corp- >> enp2s0: flags=4163 mtu 9000 >> inet 192.168.30.15 netmask 255.255.255.0 broadcast >> 192.168.30.255 >> inet6 fe80::3e8c:f8ff:feed:7245 prefixlen 64 scopeid 0x20 >> ether 3c:8c:f8:ed:72:45 txqueuelen 1000 (Ethernet) >> RX packets 69163 bytes 8085511 (7.7 MiB) >> RX errors 0 dropped 0 overruns 0 frame 0 >> TX packets 73539 bytes 17069869 (16.2 MiB) >> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >> >> - cn06.ceph.la1.clx.corp- >> enp2s0: flags=4163 mtu 9000 >> inet 192.168.30.16 netmask 255.255.255.0 broadcast >> 192.168.30.255 >> inet6 fe80::3e8c:f8ff:feed:feab prefixlen 64 scopeid 0x20 >> ether 3c:8c:f8:ed:fe:ab txqueuelen 1000 (Ethernet) >> RX packets 23570 bytes 2251531 (2.1 MiB) >> RX errors 0 dropped 0 overruns 0 frame 0 >> TX packets 22268 bytes 16186794 (15.4 MiB) >> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >> >> 10G. >> >> On Mon, Jul 25, 2022 at 2:51 PM Sean Redmond >> wrote: >> >>> Is the MTU in n the new rack set correctly? >>> >>> On Mon, 25 Jul 2022, 11:30 Jeremy Hansen, >>> wrote: >>> >>>> I transitioned some servers to a new rack and now I'm having major >>>> issues >>>> with Ceph upon bringing things back up. >>>> >>>> I believe the issue may be related to the ceph nodes coming back up with >>>> different IPs before VLANs were set. That's just a guess because I >>>> can't >>>> think of any other reason this would happen. &
[ceph-users] Re: Issues after a shutdown
Does ceph do any kind of io fencing if it notices an anomaly? Do I need to do something to re-enable these hosts if they get marked as bad? On Mon, Jul 25, 2022 at 2:56 PM Jeremy Hansen wrote: > MTU is the same across all hosts: > > - cn01.ceph.la1.clx.corp- > enp2s0: flags=4163 mtu 9000 > inet 192.168.30.11 netmask 255.255.255.0 broadcast 192.168.30.255 > inet6 fe80::3e8c:f8ff:feed:728d prefixlen 64 scopeid 0x20 > ether 3c:8c:f8:ed:72:8d txqueuelen 1000 (Ethernet) > RX packets 3163785 bytes 213625 (1.9 GiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 6890933 bytes 40233267272 (37.4 GiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > - cn02.ceph.la1.clx.corp- > enp2s0: flags=4163 mtu 9000 > inet 192.168.30.12 netmask 255.255.255.0 broadcast 192.168.30.255 > inet6 fe80::3e8c:f8ff:feed:ff0c prefixlen 64 scopeid 0x20 > ether 3c:8c:f8:ed:ff:0c txqueuelen 1000 (Ethernet) > RX packets 3976256 bytes 2761764486 (2.5 GiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 9270324 bytes 56984933585 (53.0 GiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > - cn03.ceph.la1.clx.corp- > enp2s0: flags=4163 mtu 9000 > inet 192.168.30.13 netmask 255.255.255.0 broadcast 192.168.30.255 > inet6 fe80::3e8c:f8ff:feed:feba prefixlen 64 scopeid 0x20 > ether 3c:8c:f8:ed:fe:ba txqueuelen 1000 (Ethernet) > RX packets 13081847 bytes 93614795356 (87.1 GiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 4001854 bytes 2536322435 (2.3 GiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > - cn04.ceph.la1.clx.corp- > enp2s0: flags=4163 mtu 9000 > inet 192.168.30.14 netmask 255.255.255.0 broadcast 192.168.30.255 > inet6 fe80::3e8c:f8ff:feed:6f89 prefixlen 64 scopeid 0x20 > ether 3c:8c:f8:ed:6f:89 txqueuelen 1000 (Ethernet) > RX packets 60018 bytes 5622542 (5.3 MiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 59889 bytes 17463794 (16.6 MiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > - cn05.ceph.la1.clx.corp- > enp2s0: flags=4163 mtu 9000 > inet 192.168.30.15 netmask 255.255.255.0 broadcast 192.168.30.255 > inet6 fe80::3e8c:f8ff:feed:7245 prefixlen 64 scopeid 0x20 > ether 3c:8c:f8:ed:72:45 txqueuelen 1000 (Ethernet) > RX packets 69163 bytes 8085511 (7.7 MiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 73539 bytes 17069869 (16.2 MiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > - cn06.ceph.la1.clx.corp- > enp2s0: flags=4163 mtu 9000 > inet 192.168.30.16 netmask 255.255.255.0 broadcast 192.168.30.255 > inet6 fe80::3e8c:f8ff:feed:feab prefixlen 64 scopeid 0x20 > ether 3c:8c:f8:ed:fe:ab txqueuelen 1000 (Ethernet) > RX packets 23570 bytes 2251531 (2.1 MiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 22268 bytes 16186794 (15.4 MiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > 10G. > > On Mon, Jul 25, 2022 at 2:51 PM Sean Redmond > wrote: > >> Is the MTU in n the new rack set correctly? >> >> On Mon, 25 Jul 2022, 11:30 Jeremy Hansen, >> wrote: >> >>> I transitioned some servers to a new rack and now I'm having major issues >>> with Ceph upon bringing things back up. >>> >>> I believe the issue may be related to the ceph nodes coming back up with >>> different IPs before VLANs were set. That's just a guess because I can't >>> think of any other reason this would happen. >>> >>> Current state: >>> >>> Every 2.0s: ceph -s >>>cn01.ceph.la1.clx.corp: Mon Jul 25 10:13:05 2022 >>> >>> cluster: >>> id: bfa2ad58-c049-11eb-9098-3c8cf8ed728d >>> health: HEALTH_WARN >>> 1 filesystem is degraded >>> 2 MDSs report slow metadata IOs >>> 2/5 mons down, quorum cn02,cn03,cn01 >>> 9 osds down >>> 3 hosts (17 osds) down >>> Reduced data availability: 97 pgs inactive, 9 pgs down >>> Degraded data redundancy: 13860144/30824413 objects degraded >>> (44.965%), 411 pgs degraded, 482 pgs undersized >>> >>> servi
[ceph-users] Re: Issues after a shutdown
MTU is the same across all hosts: - cn01.ceph.la1.clx.corp- enp2s0: flags=4163 mtu 9000 inet 192.168.30.11 netmask 255.255.255.0 broadcast 192.168.30.255 inet6 fe80::3e8c:f8ff:feed:728d prefixlen 64 scopeid 0x20 ether 3c:8c:f8:ed:72:8d txqueuelen 1000 (Ethernet) RX packets 3163785 bytes 213625 (1.9 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 6890933 bytes 40233267272 (37.4 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 - cn02.ceph.la1.clx.corp- enp2s0: flags=4163 mtu 9000 inet 192.168.30.12 netmask 255.255.255.0 broadcast 192.168.30.255 inet6 fe80::3e8c:f8ff:feed:ff0c prefixlen 64 scopeid 0x20 ether 3c:8c:f8:ed:ff:0c txqueuelen 1000 (Ethernet) RX packets 3976256 bytes 2761764486 (2.5 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 9270324 bytes 56984933585 (53.0 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 - cn03.ceph.la1.clx.corp- enp2s0: flags=4163 mtu 9000 inet 192.168.30.13 netmask 255.255.255.0 broadcast 192.168.30.255 inet6 fe80::3e8c:f8ff:feed:feba prefixlen 64 scopeid 0x20 ether 3c:8c:f8:ed:fe:ba txqueuelen 1000 (Ethernet) RX packets 13081847 bytes 93614795356 (87.1 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 4001854 bytes 2536322435 (2.3 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 - cn04.ceph.la1.clx.corp- enp2s0: flags=4163 mtu 9000 inet 192.168.30.14 netmask 255.255.255.0 broadcast 192.168.30.255 inet6 fe80::3e8c:f8ff:feed:6f89 prefixlen 64 scopeid 0x20 ether 3c:8c:f8:ed:6f:89 txqueuelen 1000 (Ethernet) RX packets 60018 bytes 5622542 (5.3 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 59889 bytes 17463794 (16.6 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 - cn05.ceph.la1.clx.corp- enp2s0: flags=4163 mtu 9000 inet 192.168.30.15 netmask 255.255.255.0 broadcast 192.168.30.255 inet6 fe80::3e8c:f8ff:feed:7245 prefixlen 64 scopeid 0x20 ether 3c:8c:f8:ed:72:45 txqueuelen 1000 (Ethernet) RX packets 69163 bytes 8085511 (7.7 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 73539 bytes 17069869 (16.2 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 - cn06.ceph.la1.clx.corp- enp2s0: flags=4163 mtu 9000 inet 192.168.30.16 netmask 255.255.255.0 broadcast 192.168.30.255 inet6 fe80::3e8c:f8ff:feed:feab prefixlen 64 scopeid 0x20 ether 3c:8c:f8:ed:fe:ab txqueuelen 1000 (Ethernet) RX packets 23570 bytes 2251531 (2.1 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 22268 bytes 16186794 (15.4 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 10G. On Mon, Jul 25, 2022 at 2:51 PM Sean Redmond wrote: > Is the MTU in n the new rack set correctly? > > On Mon, 25 Jul 2022, 11:30 Jeremy Hansen, > wrote: > >> I transitioned some servers to a new rack and now I'm having major issues >> with Ceph upon bringing things back up. >> >> I believe the issue may be related to the ceph nodes coming back up with >> different IPs before VLANs were set. That's just a guess because I can't >> think of any other reason this would happen. >> >> Current state: >> >> Every 2.0s: ceph -s >>cn01.ceph.la1.clx.corp: Mon Jul 25 10:13:05 2022 >> >> cluster: >> id: bfa2ad58-c049-11eb-9098-3c8cf8ed728d >> health: HEALTH_WARN >> 1 filesystem is degraded >> 2 MDSs report slow metadata IOs >> 2/5 mons down, quorum cn02,cn03,cn01 >> 9 osds down >> 3 hosts (17 osds) down >> Reduced data availability: 97 pgs inactive, 9 pgs down >> Degraded data redundancy: 13860144/30824413 objects degraded >> (44.965%), 411 pgs degraded, 482 pgs undersized >> >> services: >> mon: 5 daemons, quorum cn02,cn03,cn01 (age 62m), out of quorum: cn05, >> cn04 >> mgr: cn02.arszct(active, since 5m) >> mds: 2/2 daemons up, 2 standby >> osd: 35 osds: 15 up (since 62m), 24 in (since 58m); 222 remapped pgs >> >> data: >> volumes: 1/2 healthy, 1 recovering >> pools: 8 pools, 545 pgs >> objects: 7.71M objects, 6.7 TiB >> usage: 15 TiB used, 39 TiB / 54 TiB avail >> pgs: 0.367% pgs unknown >> 17.431% pgs not active >> 13860144/30824413 objects degraded (44.965%) >>
[ceph-users] Re: [Warning Possible spam] Re: Issues after a shutdown
ate active+undersized+remapped, last acting [9,6] pg 9.8f is stuck undersized for 62m, current state active+undersized+remapped, last acting [19,26,17] pg 9.90 is stuck undersized for 62m, current state active+undersized+remapped, last acting [35,26] pg 9.91 is stuck undersized for 62m, current state active+undersized+degraded, last acting [17,5] pg 9.92 is stuck undersized for 62m, current state active+undersized+degraded, last acting [21,26] pg 9.93 is stuck undersized for 62m, current state active+undersized+remapped, last acting [19,26,5] pg 9.94 is stuck undersized for 62m, current state active+undersized+degraded, last acting [21,11] pg 9.95 is stuck undersized for 61m, current state active+undersized+degraded, last acting [8,19] pg 9.96 is stuck undersized for 62m, current state active+undersized+degraded, last acting [17,6] pg 9.97 is stuck undersized for 61m, current state active+undersized+degraded, last acting [8,9,16] pg 9.98 is stuck undersized for 62m, current state active+undersized+degraded, last acting [6,21] pg 9.99 is stuck undersized for 61m, current state active+undersized+degraded, last acting [10,9] pg 9.9a is stuck undersized for 61m, current state active+undersized+remapped, last acting [4,16,10] pg 9.9b is stuck undersized for 61m, current state active+undersized+degraded, last acting [12,4,11] pg 9.9c is stuck undersized for 61m, current state active+undersized+degraded, last acting [9,16] pg 9.9d is stuck undersized for 62m, current state active+undersized+degraded, last acting [26,35] pg 9.9f is stuck undersized for 61m, current state active+undersized+degraded, last acting [9,17,26] pg 12.70 is stuck undersized for 62m, current state active+undersized+degraded, last acting [21,35] pg 12.71 is active+undersized+degraded, acting [6,12] pg 12.72 is stuck undersized for 61m, current state active+undersized+degraded, last acting [10,14,4] pg 12.73 is stuck undersized for 62m, current state active+undersized+remapped, last acting [5,17,11] pg 12.78 is stuck undersized for 61m, current state active+undersized+degraded, last acting [5,8,35] pg 12.79 is stuck undersized for 61m, current state active+undersized+degraded, last acting [4,17] pg 12.7a is stuck undersized for 62m, current state active+undersized+degraded, last acting [10,21] pg 12.7b is stuck undersized for 62m, current state active+undersized+remapped, last acting [17,21,11] pg 12.7c is stuck undersized for 62m, current state active+undersized+degraded, last acting [32,21,16] pg 12.7d is stuck undersized for 61m, current state active+undersized+degraded, last acting [35,6,9] pg 12.7e is stuck undersized for 61m, current state active+undersized+degraded, last acting [26,4] pg 12.7f is stuck undersized for 61m, current state active+undersized+degraded, last acting [9,14] It's no longer giving me the ssh key issues but hasn't done anything to improve my situation. When the machines came up with a different IP, did this somehow throw off some kind of ssh known hosts file or pub key exchange? It's all very strange why a momentary bad IP could wreak so much havoc. Thank you -jeremy On Mon, Jul 25, 2022 at 1:44 PM Frank Schilder wrote: > I don't use ceph-adm and I also don't know how you got the "some more > info". However, I did notice that it contains instructions, starting at > "Please make sure that the host is reachable ...". How about starting to > follow those? > > Best regards, > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: Jeremy Hansen > Sent: 25 July 2022 22:32:32 > To: ceph-users@ceph.io > Subject: [Warning Possible spam] [ceph-users] Re: Issues after a shutdown > > Here's some more info: > > HEALTH_WARN 2 failed cephadm daemon(s); 3 hosts fail cephadm check; 2 > filesystems are degraded; 1 MDSs report slow metadata IOs; 2/5 mons down, > quorum cn02,cn03,cn01; 10 osds down; 3 hosts (17 osds) down; Reduced data > availability: 13 pgs inactive, 9 pgs down; Degraded data redundancy: > 8515690/30862245 objects degraded (27.593%), 326 pgs degraded, 447 pgs > undersized > [WRN] CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s) > daemon osd.3 on cn01.ceph is in error state > daemon osd.2 on cn01.ceph is in error state > [WRN] CEPHADM_HOST_CHECK_FAILED: 3 hosts fail cephadm check > host cn04.ceph (192.168.30.14) failed check: Failed to connect to > cn04.ceph (192.168.30.14). > Please make sure that the host is reachable and accepts connections using > the cephadm SSH key > > To add the cephadm SSH key to the host: > > ceph cephadm get-pub-key > ~/ceph.pub > > ssh-copy-id -f -i ~/ceph.pub root@192.168.30.14 > > To check that the hos
[ceph-users] Re: Issues after a shutdown
s stuck undersized for 34m, current state active+undersized+remapped, last acting [32,35,4] pg 9.78 is stuck undersized for 35m, current state active+undersized+degraded, last acting [14,10] pg 9.79 is stuck undersized for 35m, current state active+undersized+degraded, last acting [21,32] pg 9.7b is stuck undersized for 34m, current state active+undersized+degraded, last acting [8,12,5] pg 9.7c is stuck undersized for 34m, current state active+undersized+degraded, last acting [4,35,10] pg 9.7d is stuck undersized for 35m, current state active+undersized+degraded, last acting [5,19,10] pg 9.7e is stuck undersized for 35m, current state active+undersized+remapped, last acting [21,10,17] pg 9.80 is stuck undersized for 34m, current state active+undersized+degraded, last acting [8,4,17] pg 9.81 is stuck undersized for 35m, current state active+undersized+degraded, last acting [14,26] pg 9.82 is stuck undersized for 35m, current state active+undersized+degraded, last acting [26,16] pg 9.83 is stuck undersized for 34m, current state active+undersized+degraded, last acting [8,4] pg 9.84 is stuck undersized for 34m, current state active+undersized+degraded, last acting [4,35,6] pg 9.85 is stuck undersized for 35m, current state active+undersized+degraded, last acting [32,12,9] pg 9.86 is stuck undersized for 34m, current state active+undersized+degraded, last acting [35,5,8] pg 9.87 is stuck undersized for 35m, current state active+undersized+degraded, last acting [9,12] pg 9.88 is stuck undersized for 35m, current state active+undersized+remapped, last acting [19,32,35] pg 9.89 is stuck undersized for 34m, current state active+undersized+degraded, last acting [10,14,4] pg 9.8a is stuck undersized for 35m, current state active+undersized+degraded, last acting [21,19] pg 9.8b is stuck undersized for 34m, current state active+undersized+degraded, last acting [8,35] pg 9.8c is stuck undersized for 31m, current state active+undersized+remapped, last acting [10,19,5] pg 9.8d is stuck undersized for 35m, current state active+undersized+remapped, last acting [9,6] pg 9.8f is stuck undersized for 35m, current state active+undersized+remapped, last acting [19,26,17] pg 9.90 is stuck undersized for 35m, current state active+undersized+remapped, last acting [35,26] pg 9.91 is stuck undersized for 35m, current state active+undersized+degraded, last acting [17,5] pg 9.92 is stuck undersized for 35m, current state active+undersized+degraded, last acting [21,26] pg 9.93 is stuck undersized for 35m, current state active+undersized+remapped, last acting [19,26,5] pg 9.94 is stuck undersized for 35m, current state active+undersized+degraded, last acting [21,11] pg 9.95 is stuck undersized for 34m, current state active+undersized+degraded, last acting [8,19] pg 9.96 is stuck undersized for 35m, current state active+undersized+degraded, last acting [17,6] pg 9.97 is stuck undersized for 34m, current state active+undersized+degraded, last acting [8,9,16] pg 9.98 is stuck undersized for 35m, current state active+undersized+degraded, last acting [6,21] pg 9.99 is stuck undersized for 35m, current state active+undersized+degraded, last acting [10,9] pg 9.9a is stuck undersized for 34m, current state active+undersized+remapped, last acting [4,16,10] pg 9.9b is stuck undersized for 34m, current state active+undersized+degraded, last acting [12,4,11] pg 9.9c is stuck undersized for 35m, current state active+undersized+degraded, last acting [9,16] pg 9.9d is stuck undersized for 35m, current state active+undersized+degraded, last acting [26,35] pg 9.9f is stuck undersized for 35m, current state active+undersized+degraded, last acting [9,17,26] pg 12.70 is stuck undersized for 35m, current state active+undersized+degraded, last acting [21,35] pg 12.71 is active+undersized+degraded, acting [6,12] pg 12.72 is stuck undersized for 34m, current state active+undersized+degraded, last acting [10,14,4] pg 12.73 is stuck undersized for 35m, current state active+undersized+remapped, last acting [5,17,11] pg 12.78 is stuck undersized for 34m, current state active+undersized+degraded, last acting [5,8,35] pg 12.79 is stuck undersized for 34m, current state active+undersized+degraded, last acting [4,17] pg 12.7a is stuck undersized for 35m, current state active+undersized+degraded, last acting [10,21] pg 12.7b is stuck undersized for 35m, current state active+undersized+remapped, last acting [17,21,11] pg 12.7c is stuck undersized for 35m, current state active+undersized+degraded, last acting [32,21,16] pg 12.7d is stuck undersized for 35m, current state active+undersized+degraded, last acting [35,6,9] pg 12.7e is stuck undersized for 34m, current state active+undersized+degraded, last acting [26,4] pg 12.7f is stuck undersized for 35m, current state active+undersiz
[ceph-users] Re: Issues after a shutdown
thru 31627 down_at 31208 last_clean_interval [30974,31195) [v2: 192.168.30.12:6832/3860067997,v1:192.168.30.12:6833/3860067997] [v2: 192.168.30.12:6834/3860067997,v1:192.168.30.12:6835/3860067997] exists,up 9200a57e-2845-43ff-9787-8f1f3158fe90 osd.33 down in weight 1 up_from 30354 up_thru 30688 down_at 30693 last_clean_interval [25521,30350) [v2: 192.168.30.16:6842/2342555666,v1:192.168.30.16:6843/2342555666] [v2: 192.168.30.16:6844/2364555666,v1:192.168.30.16:6845/2364555666] exists 20c55d85-cf9a-4133-a189-7fdad2318f58 osd.34 down in weight 1 up_from 30390 up_thru 30688 down_at 30691 last_clean_interval [25516,30314) [v2: 192.168.30.16:6808/2282629870,v1:192.168.30.16:6811/2282629870] [v2: 192.168.30.16:6812/2282629870,v1:192.168.30.16:6814/2282629870] exists 77e0ef8f-c047-4f84-afb2-a8ad054e562f osd.35 up in weight 1 up_from 31204 up_thru 31657 down_at 31203 last_clean_interval [30958,31195) [v2: 192.168.30.13:6842/1919357520,v1:192.168.30.13:6843/1919357520] [v2: 192.168.30.13:6844/1919357520,v1:192.168.30.13:6845/1919357520] exists,up 2d2de0cb-6d41-4957-a473-2bbe9ce227bf osd.36 down in weight 1 up_from 29494 up_thru 30560 down_at 30688 last_clean_interval [25491,29492) [v2: 192.168.30.15:6816/2153321591,v1:192.168.30.15:6817/2153321591] [v2: 192.168.30.15:6842/2158321591,v1:192.168.30.15:6843/2158321591] exists 26114668-68b2-458b-89c2-cbad5507ab75 > > > On Jul 25, 2022, at 3:29 AM, Jeremy Hansen < > farnsworth.mcfad...@gmail.com> wrote: > > > > I transitioned some servers to a new rack and now I'm having major issues > > with Ceph upon bringing things back up. > > > > I believe the issue may be related to the ceph nodes coming back up with > > different IPs before VLANs were set. That's just a guess because I can't > > think of any other reason this would happen. > > > > Current state: > > > > Every 2.0s: ceph -s > > cn01.ceph.la1.clx.corp: Mon Jul 25 10:13:05 2022 > > > > cluster: > >id: bfa2ad58-c049-11eb-9098-3c8cf8ed728d > >health: HEALTH_WARN > >1 filesystem is degraded > >2 MDSs report slow metadata IOs > >2/5 mons down, quorum cn02,cn03,cn01 > >9 osds down > >3 hosts (17 osds) down > >Reduced data availability: 97 pgs inactive, 9 pgs down > >Degraded data redundancy: 13860144/30824413 objects degraded > > (44.965%), 411 pgs degraded, 482 pgs undersized > > > > services: > >mon: 5 daemons, quorum cn02,cn03,cn01 (age 62m), out of quorum: cn05, > > cn04 > >mgr: cn02.arszct(active, since 5m) > >mds: 2/2 daemons up, 2 standby > >osd: 35 osds: 15 up (since 62m), 24 in (since 58m); 222 remapped pgs > > > > data: > >volumes: 1/2 healthy, 1 recovering > >pools: 8 pools, 545 pgs > >objects: 7.71M objects, 6.7 TiB > >usage: 15 TiB used, 39 TiB / 54 TiB avail > >pgs: 0.367% pgs unknown > > 17.431% pgs not active > > 13860144/30824413 objects degraded (44.965%) > > 1137693/30824413 objects misplaced (3.691%) > > 280 active+undersized+degraded > > 67 undersized+degraded+remapped+backfilling+peered > > 57 active+undersized+remapped > > 45 active+clean+remapped > > 44 active+undersized+degraded+remapped+backfilling > > 18 undersized+degraded+peered > > 10 active+undersized > > 9 down > > 7 active+clean > > 3 active+undersized+remapped+backfilling > > 2 active+undersized+degraded+remapped+backfill_wait > > 2 unknown > > 1 undersized+peered > > > > io: > >client: 170 B/s rd, 0 op/s rd, 0 op/s wr > >recovery: 168 MiB/s, 158 keys/s, 166 objects/s > > > > I have to disable and re-enable the dashboard just to use it. It seems > to > > get bogged down after a few moments. > > > > The three servers that were moved to the new rack Ceph has marked as > > "Down", but if I do a cephadm host-check, they all seem to pass: > > > > ceph > > - cn01.ceph.- > > podman (/usr/bin/podman) version 4.0.2 is present > > systemctl is present > > lvcreate is present > > Unit chronyd.service is enabled and running > > Host looks OK > > - cn02.ceph.- > > podman (/usr/bin/podman) version 4.0.2 is present > > systemctl is present > > lvcreate is present > > Unit chronyd.service is enabled a
[ceph-users] Issues after a shutdown
I transitioned some servers to a new rack and now I'm having major issues with Ceph upon bringing things back up. I believe the issue may be related to the ceph nodes coming back up with different IPs before VLANs were set. That's just a guess because I can't think of any other reason this would happen. Current state: Every 2.0s: ceph -s cn01.ceph.la1.clx.corp: Mon Jul 25 10:13:05 2022 cluster: id: bfa2ad58-c049-11eb-9098-3c8cf8ed728d health: HEALTH_WARN 1 filesystem is degraded 2 MDSs report slow metadata IOs 2/5 mons down, quorum cn02,cn03,cn01 9 osds down 3 hosts (17 osds) down Reduced data availability: 97 pgs inactive, 9 pgs down Degraded data redundancy: 13860144/30824413 objects degraded (44.965%), 411 pgs degraded, 482 pgs undersized services: mon: 5 daemons, quorum cn02,cn03,cn01 (age 62m), out of quorum: cn05, cn04 mgr: cn02.arszct(active, since 5m) mds: 2/2 daemons up, 2 standby osd: 35 osds: 15 up (since 62m), 24 in (since 58m); 222 remapped pgs data: volumes: 1/2 healthy, 1 recovering pools: 8 pools, 545 pgs objects: 7.71M objects, 6.7 TiB usage: 15 TiB used, 39 TiB / 54 TiB avail pgs: 0.367% pgs unknown 17.431% pgs not active 13860144/30824413 objects degraded (44.965%) 1137693/30824413 objects misplaced (3.691%) 280 active+undersized+degraded 67 undersized+degraded+remapped+backfilling+peered 57 active+undersized+remapped 45 active+clean+remapped 44 active+undersized+degraded+remapped+backfilling 18 undersized+degraded+peered 10 active+undersized 9 down 7 active+clean 3 active+undersized+remapped+backfilling 2 active+undersized+degraded+remapped+backfill_wait 2 unknown 1 undersized+peered io: client: 170 B/s rd, 0 op/s rd, 0 op/s wr recovery: 168 MiB/s, 158 keys/s, 166 objects/s I have to disable and re-enable the dashboard just to use it. It seems to get bogged down after a few moments. The three servers that were moved to the new rack Ceph has marked as "Down", but if I do a cephadm host-check, they all seem to pass: ceph - cn01.ceph.- podman (/usr/bin/podman) version 4.0.2 is present systemctl is present lvcreate is present Unit chronyd.service is enabled and running Host looks OK - cn02.ceph.- podman (/usr/bin/podman) version 4.0.2 is present systemctl is present lvcreate is present Unit chronyd.service is enabled and running Host looks OK - cn03.ceph.- podman (/usr/bin/podman) version 4.0.2 is present systemctl is present lvcreate is present Unit chronyd.service is enabled and running Host looks OK - cn04.ceph.- podman (/usr/bin/podman) version 4.0.2 is present systemctl is present lvcreate is present Unit chronyd.service is enabled and running Host looks OK - cn05.ceph.- podman|docker (/usr/bin/podman) is present systemctl is present lvcreate is present Unit chronyd.service is enabled and running Host looks OK - cn06.ceph.- podman (/usr/bin/podman) version 4.0.2 is present systemctl is present lvcreate is present Unit chronyd.service is enabled and running Host looks OK It seems to be recovering with what it has left, but a large amount of OSDs are down. When trying to restart one of the down'd OSDs, I see a huge dump. Jul 25 03:19:38 cn06.ceph ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug 2022-07-25T10:19:38.532+ 7fce14a6c080 0 osd.34 30689 done with init, starting boot process Jul 25 03:19:38 cn06.ceph ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug 2022-07-25T10:19:38.532+ 7fce14a6c080 1 osd.34 30689 start_boot Jul 25 03:20:10 cn06.ceph ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug 2022-07-25T10:20:10.655+ 7fcdfd12d700 1 osd.34 30689 start_boot Jul 25 03:20:41 cn06.ceph ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug 2022-07-25T10:20:41.159+ 7fcdfd12d700 1 osd.34 30689 start_boot Jul 25 03:21:11 cn06.ceph ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug 2022-07-25T10:21:11.662+ 7fcdfd12d700 1 osd.34 30689 start_boot At this point it just keeps printing start_boot, but the dashboard has it marked as "in" but "down". On these three hosts that moved, there were a bunch marked as "out" and "down", and some with "in" but "down". Not sure where to go next. I'm going to let the recovery continue and hope that my 4x replication on these pools saves me. Not sure where to go from here. Any help is very much appreciated. This Ceph cluster holds all of our Cloudstack images... it would be terrible to lose this data.
[ceph-users] Network issues with a CephFS client mount via a Cloudstack instance
I’m going to also post this to the Cloudstack list as well. Attempting to rsync a large file to the Ceph volume, the instance becomes unresponsive at the network level. It eventually returns but it will continually drop offline as the file copies. Dmesg shows this on the Cloudstack host machine: [ 7144.888744] e1000e :00:19.0 eno1: Detected Hardware Unit Hang: TDH <80> TDT next_to_use next_to_clean <7f> buffer_info[next_to_clean]: time_stamp <100686d46> next_to_watch <80> jiffies <100687140> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> [ 7146.872563] e1000e :00:19.0 eno1: Detected Hardware Unit Hang: TDH <80> TDT next_to_use next_to_clean <7f> buffer_info[next_to_clean]: time_stamp <100686d46> next_to_watch <80> jiffies <100687900> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> [ 7148.856703] e1000e :00:19.0 eno1: Detected Hardware Unit Hang: TDH <80> TDT next_to_use next_to_clean <7f> buffer_info[next_to_clean]: time_stamp <100686d46> next_to_watch <80> jiffies <1006880c0> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> [ 7150.199756] e1000e :00:19.0 eno1: Reset adapter unexpectedly The host machine: System Information Manufacturer: Dell Inc. Product Name: OptiPlex 990 Running CentOS 8.4. I also see the same error on another host of a different hw type: Manufacturer: Hewlett-Packard Product Name: HP Compaq 8200 Elite SFF PC but both are using e1000 drivers. I upgraded the kernel to 5.13.x and I thought this fixed the issue, but now I see the error again. Migrating the instance to a bigger server class machine (also e1000e, old Rackable system) where I have a bigger pipe via bonding, I don’t seem to have the issue. Just curious if this could be a known bug with e1000e and if there is any kind of work around. Thanks -jeremy ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Only 2/5 mon services running
It looks like the second mon server was down from my reboot. Restarted and everything is functional again but I still can’t figure out why only 2 out of the 5 mon servers is down and won’t start. If they were functioning, I probably wouldn’t have noticing the cluster being down. Thanks -jeremy > On Jun 7, 2021, at 7:53 PM, Jeremy Hansen wrote: > > Signed PGP part > > In an attempt to troubleshoot why only 2/5 mon services were running, I > believe I’ve broke something: > > [ceph: root@cn01 /]# ceph orch ls > NAME PORTS RUNNING REFRESHED AGE PLACEMENT > alertmanager 1/1 81s ago9d count:1 > crash 6/6 7m ago 9d * > grafana 1/1 80s ago9d count:1 > mds.testfs2/2 81s ago9d > cn01.ceph.la1.clx.corp;cn02.ceph.la1.clx.corp;cn03.ceph.la1.clx.corp;cn04.ceph.la1.clx.corp;cn05.ceph.la1.clx.corp;cn06.ceph.la1.clx.corp;count:2 > mgr 2/2 81s ago9d count:2 > mon 2/5 81s ago9d count:5 > node-exporter 6/6 7m ago 9d * > osd.all-available-devices 20/26 7m ago 9d * > osd.unmanaged 7/7 7m ago - > prometheus2/2 80s ago9d count:2 > > I tried to stop and start the mon service, but now the cluster is pretty much > unresponsive, I’m assuming because I stopped mon: > > [ceph: root@cn01 /]# ceph orch stop mon > Scheduled to stop mon.cn01 on host 'cn01.ceph.la1.clx.corp' > Scheduled to stop mon.cn02 on host 'cn02.ceph.la1.clx.corp' > Scheduled to stop mon.cn03 on host 'cn03.ceph.la1.clx.corp' > Scheduled to stop mon.cn04 on host 'cn04.ceph.la1.clx.corp' > Scheduled to stop mon.cn05 on host 'cn05.ceph.la1.clx.corp' > [ceph: root@cn01 /]# ceph orch start mon > > > ^CCluster connection aborted > > > Now even after a reboot of the cluster, it’s unresponsive. How do I get mon > started again? > > I’m going through Ceph and breaking things left and right, so I apologize for > all the questions. I learn best from breaking things and figuring out how to > resolve the issues. > > > Thank you > -jeremy > > signature.asc Description: Message signed with OpenPGP ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Only 2/5 mon services running
In an attempt to troubleshoot why only 2/5 mon services were running, I believe I’ve broke something: [ceph: root@cn01 /]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT alertmanager 1/1 81s ago9d count:1 crash 6/6 7m ago 9d * grafana 1/1 80s ago9d count:1 mds.testfs2/2 81s ago9d cn01.ceph.la1.clx.corp;cn02.ceph.la1.clx.corp;cn03.ceph.la1.clx.corp;cn04.ceph.la1.clx.corp;cn05.ceph.la1.clx.corp;cn06.ceph.la1.clx.corp;count:2 mgr 2/2 81s ago9d count:2 mon 2/5 81s ago9d count:5 node-exporter 6/6 7m ago 9d * osd.all-available-devices 20/26 7m ago 9d * osd.unmanaged 7/7 7m ago - prometheus2/2 80s ago9d count:2 I tried to stop and start the mon service, but now the cluster is pretty much unresponsive, I’m assuming because I stopped mon: [ceph: root@cn01 /]# ceph orch stop mon Scheduled to stop mon.cn01 on host 'cn01.ceph.la1.clx.corp' Scheduled to stop mon.cn02 on host 'cn02.ceph.la1.clx.corp' Scheduled to stop mon.cn03 on host 'cn03.ceph.la1.clx.corp' Scheduled to stop mon.cn04 on host 'cn04.ceph.la1.clx.corp' Scheduled to stop mon.cn05 on host 'cn05.ceph.la1.clx.corp' [ceph: root@cn01 /]# ceph orch start mon ^CCluster connection aborted Now even after a reboot of the cluster, it’s unresponsive. How do I get mon started again? I’m going through Ceph and breaking things left and right, so I apologize for all the questions. I learn best from breaking things and figuring out how to resolve the issues. Thank you -jeremy signature.asc Description: Message signed with OpenPGP ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Global Recovery Event
This seems to have recovered on its own. Thank you -jeremy > On Jun 7, 2021, at 5:44 PM, Neha Ojha wrote: > > On Mon, Jun 7, 2021 at 5:24 PM Jeremy Hansen <mailto:jer...@skidrow.la>> wrote: >> >> >> I’m seeing this in my health status: >> >> progress: >>Global Recovery Event (13h) >> [] (remaining: 5w) >> >> I’m not sure how this was initiated but this is a cluster with almost zero >> objects. Is there a way to halt this process? Why would it estimate 5 >> weeks to recover a cluster with almost zero data? > > You could be running into https://tracker.ceph.com/issues/49988 > <https://tracker.ceph.com/issues/49988>. You > can try to run "ceph progress clear" and see if that helps or just > turn the progress module off and turn it back on. > > - Neha > >> >> [ceph: root@cn01 /]# ceph -s -w >> cluster: >>id: bfa2ad58-c049-11eb-9098-3c8cf8ed728d >>health: HEALTH_OK >> >> services: >>mon: 2 daemons, quorum cn02,cn05 (age 13h) >>mgr: cn01.ceph.la1.clx.corp.xnkoft(active, since 13h), standbys: >> cn02.arszct >>mds: 1/1 daemons up, 1 standby >>osd: 27 osds: 27 up (since 13h), 27 in (since 16h) >> >> data: >>volumes: 1/1 healthy >>pools: 3 pools, 65 pgs >>objects: 22.09k objects, 86 GiB >>usage: 261 GiB used, 98 TiB / 98 TiB avail >>pgs: 65 active+clean >> >> progress: >>Global Recovery Event (13h) >> [] (remaining: 5w) >> >> >> >> Thanks >> -jeremy >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > ___ > ceph-users mailing list -- ceph-users@ceph.io <mailto:ceph-users@ceph.io> > To unsubscribe send an email to ceph-users-le...@ceph.io > <mailto:ceph-users-le...@ceph.io> signature.asc Description: Message signed with OpenPGP ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Global Recovery Event
I’m seeing this in my health status: progress: Global Recovery Event (13h) [] (remaining: 5w) I’m not sure how this was initiated but this is a cluster with almost zero objects. Is there a way to halt this process? Why would it estimate 5 weeks to recover a cluster with almost zero data? [ceph: root@cn01 /]# ceph -s -w cluster: id: bfa2ad58-c049-11eb-9098-3c8cf8ed728d health: HEALTH_OK services: mon: 2 daemons, quorum cn02,cn05 (age 13h) mgr: cn01.ceph.la1.clx.corp.xnkoft(active, since 13h), standbys: cn02.arszct mds: 1/1 daemons up, 1 standby osd: 27 osds: 27 up (since 13h), 27 in (since 16h) data: volumes: 1/1 healthy pools: 3 pools, 65 pgs objects: 22.09k objects, 86 GiB usage: 261 GiB used, 98 TiB / 98 TiB avail pgs: 65 active+clean progress: Global Recovery Event (13h) [] (remaining: 5w) Thanks -jeremy signature.asc Description: Message signed with OpenPGP ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
cephadm rm-daemon --name osd.29 on the node with the stale daemon did the trick. -jeremy > On Jun 7, 2021, at 2:24 AM, Jeremy Hansen wrote: > > Signed PGP part > So I found the failed daemon: > > [root@cn05 ~]# systemctl | grep 29 > > ● ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d@osd.29.service > loaded failed failedCeph > osd.29 for bfa2ad58-c049-11eb-9098-3c8cf8ed728d > > But I’ve already replaced this osd, so this is perhaps left over from a > previous osd.29 on this host. How would I go about removing this cleanly and > more important, in a way that Ceph is aware of the change, therefore clearing > the warning. > > Thanks > -jeremy > > >> On Jun 7, 2021, at 1:54 AM, Jeremy Hansen wrote: >> >> Signed PGP part >> Thank you. So I see this: >> >> 2021-06-07T08:41:24.133493+ mgr.cn01.ceph.la1.clx.corp.xnkoft >> (mgr.224161) 1494 : cephadm [INF] Reconfiguring osd.29 (monmap changed)... >> 2021-06-07T08:44:37.650022+ mgr.cn01.ceph.la1.clx.corp.xnkoft >> (mgr.224161) 1592 : cephadm [INF] Reconfiguring osd.29 (monmap changed)... >> 2021-06-07T08:47:07.039405+ mgr.cn01.ceph.la1.clx.corp.xnkoft >> (mgr.224161) 1667 : cephadm [INF] Reconfiguring osd.29 (monmap changed)... >> 2021-06-07T08:51:00.094847+ mgr.cn01.ceph.la1.clx.corp.xnkoft >> (mgr.224161) 1785 : cephadm [INF] Reconfiguring osd.29 (monmap changed)… >> >> Yet… >> >> ceph osd ls >> 0 >> 1 >> 2 >> 3 >> 4 >> 5 >> 6 >> 7 >> 8 >> 9 >> 10 >> 11 >> 12 >> 13 >> 14 >> 16 >> 17 >> 18 >> 20 >> 22 >> 23 >> 24 >> 26 >> 27 >> 31 >> 33 >> 34 >> >> So how would I approach fixing this? >> >>> On Jun 7, 2021, at 1:10 AM, 赵贺东 wrote: >>> >>> Hello Jeremy Hansen, >>> >>> try: >>> ceph log last cephadm >>> >>> or see files below >>> /var/log/ceph/cephadm.log >>> >>> >>> >>>> On Jun 7, 2021, at 15:49, Jeremy Hansen wrote: >>>> >>>> What’s the proper way to track down where this error is coming from? >>>> Thanks. >>>> >>>> >>>> 6/7/21 12:40:00 AM >>>> [WRN] >>>> [WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s) >>>> >>>> 6/7/21 12:40:00 AM >>>> [WRN] >>>> Health detail: HEALTH_WARN 1 failed cephadm daemon(s) >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ___ >>>> ceph-users mailing list -- ceph-users@ceph.io >>>> To unsubscribe send an email to ceph-users-le...@ceph.io >>> >>> ___ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io >> >> > > signature.asc Description: Message signed with OpenPGP ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
So I found the failed daemon: [root@cn05 ~]# systemctl | grep 29 ● ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d@osd.29.service loaded failed failedCeph osd.29 for bfa2ad58-c049-11eb-9098-3c8cf8ed728d But I’ve already replaced this osd, so this is perhaps left over from a previous osd.29 on this host. How would I go about removing this cleanly and more important, in a way that Ceph is aware of the change, therefore clearing the warning. Thanks -jeremy > On Jun 7, 2021, at 1:54 AM, Jeremy Hansen wrote: > > Signed PGP part > Thank you. So I see this: > > 2021-06-07T08:41:24.133493+ mgr.cn01.ceph.la1.clx.corp.xnkoft > (mgr.224161) 1494 : cephadm [INF] Reconfiguring osd.29 (monmap changed)... > 2021-06-07T08:44:37.650022+ mgr.cn01.ceph.la1.clx.corp.xnkoft > (mgr.224161) 1592 : cephadm [INF] Reconfiguring osd.29 (monmap changed)... > 2021-06-07T08:47:07.039405+ mgr.cn01.ceph.la1.clx.corp.xnkoft > (mgr.224161) 1667 : cephadm [INF] Reconfiguring osd.29 (monmap changed)... > 2021-06-07T08:51:00.094847+ mgr.cn01.ceph.la1.clx.corp.xnkoft > (mgr.224161) 1785 : cephadm [INF] Reconfiguring osd.29 (monmap changed)… > > Yet… > > ceph osd ls > 0 > 1 > 2 > 3 > 4 > 5 > 6 > 7 > 8 > 9 > 10 > 11 > 12 > 13 > 14 > 16 > 17 > 18 > 20 > 22 > 23 > 24 > 26 > 27 > 31 > 33 > 34 > > So how would I approach fixing this? > >> On Jun 7, 2021, at 1:10 AM, 赵贺东 wrote: >> >> Hello Jeremy Hansen, >> >> try: >> ceph log last cephadm >> >> or see files below >> /var/log/ceph/cephadm.log >> >> >> >>> On Jun 7, 2021, at 15:49, Jeremy Hansen wrote: >>> >>> What’s the proper way to track down where this error is coming from? >>> Thanks. >>> >>> >>> 6/7/21 12:40:00 AM >>> [WRN] >>> [WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s) >>> >>> 6/7/21 12:40:00 AM >>> [WRN] >>> Health detail: HEALTH_WARN 1 failed cephadm daemon(s) >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ___ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io >> >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > signature.asc Description: Message signed with OpenPGP ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
Thank you. So I see this: 2021-06-07T08:41:24.133493+ mgr.cn01.ceph.la1.clx.corp.xnkoft (mgr.224161) 1494 : cephadm [INF] Reconfiguring osd.29 (monmap changed)... 2021-06-07T08:44:37.650022+ mgr.cn01.ceph.la1.clx.corp.xnkoft (mgr.224161) 1592 : cephadm [INF] Reconfiguring osd.29 (monmap changed)... 2021-06-07T08:47:07.039405+ mgr.cn01.ceph.la1.clx.corp.xnkoft (mgr.224161) 1667 : cephadm [INF] Reconfiguring osd.29 (monmap changed)... 2021-06-07T08:51:00.094847+ mgr.cn01.ceph.la1.clx.corp.xnkoft (mgr.224161) 1785 : cephadm [INF] Reconfiguring osd.29 (monmap changed)… Yet… ceph osd ls 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 16 17 18 20 22 23 24 26 27 31 33 34 So how would I approach fixing this? > On Jun 7, 2021, at 1:10 AM, 赵贺东 wrote: > > Hello Jeremy Hansen, > > try: > ceph log last cephadm > > or see files below > /var/log/ceph/cephadm.log > > > >> On Jun 7, 2021, at 15:49, Jeremy Hansen wrote: >> >> What’s the proper way to track down where this error is coming from? Thanks. >> >> >> 6/7/21 12:40:00 AM >> [WRN] >> [WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s) >> >> 6/7/21 12:40:00 AM >> [WRN] >> Health detail: HEALTH_WARN 1 failed cephadm daemon(s) >> >> >> >> >> >> >> >> >> >> >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io signature.asc Description: Message signed with OpenPGP ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
What’s the proper way to track down where this error is coming from? Thanks. 6/7/21 12:40:00 AM [WRN] [WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s) 6/7/21 12:40:00 AM [WRN] Health detail: HEALTH_WARN 1 failed cephadm daemon(s) signature.asc Description: Message signed with OpenPGP ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] HEALTH_WARN Reduced data availability: 33 pgs inactive
I’m trying to understand this situation: ceph health detail HEALTH_WARN Reduced data availability: 33 pgs inactive [WRN] PG_AVAILABILITY: Reduced data availability: 33 pgs inactive pg 1.0 is stuck inactive for 20h, current state unknown, last acting [] pg 2.0 is stuck inactive for 20h, current state unknown, last acting [] pg 2.1 is stuck inactive for 20h, current state unknown, last acting [] pg 2.2 is stuck inactive for 20h, current state unknown, last acting [] pg 2.3 is stuck inactive for 20h, current state unknown, last acting [] pg 2.4 is stuck inactive for 20h, current state unknown, last acting [] pg 2.5 is stuck inactive for 20h, current state unknown, last acting [] pg 2.6 is stuck inactive for 20h, current state unknown, last acting [] pg 2.7 is stuck inactive for 20h, current state unknown, last acting [] pg 2.8 is stuck inactive for 20h, current state unknown, last acting [] pg 2.9 is stuck inactive for 20h, current state unknown, last acting [] pg 2.a is stuck inactive for 20h, current state unknown, last acting [] pg 2.b is stuck inactive for 20h, current state unknown, last acting [] pg 2.c is stuck inactive for 20h, current state unknown, last acting [] pg 2.d is stuck inactive for 20h, current state unknown, last acting [] pg 2.e is stuck inactive for 20h, current state unknown, last acting [] pg 2.f is stuck inactive for 20h, current state unknown, last acting [] pg 2.10 is stuck inactive for 20h, current state unknown, last acting [] pg 2.11 is stuck inactive for 20h, current state unknown, last acting [] pg 2.12 is stuck inactive for 20h, current state unknown, last acting [] pg 2.13 is stuck inactive for 20h, current state unknown, last acting [] pg 2.14 is stuck inactive for 20h, current state unknown, last acting [] pg 2.15 is stuck inactive for 20h, current state unknown, last acting [] pg 2.16 is stuck inactive for 20h, current state unknown, last acting [] pg 2.17 is stuck inactive for 20h, current state unknown, last acting [] pg 2.18 is stuck inactive for 20h, current state unknown, last acting [] pg 2.19 is stuck inactive for 20h, current state unknown, last acting [] pg 2.1a is stuck inactive for 20h, current state unknown, last acting [] pg 2.1b is stuck inactive for 20h, current state unknown, last acting [] pg 2.1c is stuck inactive for 20h, current state unknown, last acting [] pg 2.1d is stuck inactive for 20h, current state unknown, last acting [] pg 2.1e is stuck inactive for 20h, current state unknown, last acting [] pg 2.1f is stuck inactive for 20h, current state unknown, last acting [] [ceph: root@cn01 /]# date Sat May 29 01:28:37 UTC 2021 [ceph: root@cn01 /]# ceph pg dump_stuck inactive PG_STAT STATEUP UP_PRIMARY ACTING ACTING_PRIMARY 2.1f unknown [] -1 [] -1 2.1e unknown [] -1 [] -1 2.1d unknown [] -1 [] -1 2.1c unknown [] -1 [] -1 2.1b unknown [] -1 [] -1 2.1a unknown [] -1 [] -1 2.19 unknown [] -1 [] -1 2.18 unknown [] -1 [] -1 2.17 unknown [] -1 [] -1 2.16 unknown [] -1 [] -1 2.15 unknown [] -1 [] -1 2.14 unknown [] -1 [] -1 2.13 unknown [] -1 [] -1 2.12 unknown [] -1 [] -1 2.11 unknown [] -1 [] -1 2.10 unknown [] -1 [] -1 2.f unknown [] -1 [] -1 2.9 unknown [] -1 [] -1 2.b unknown [] -1 [] -1 2.c unknown [] -1 [] -1 2.e unknown [] -1 [] -1 2.a unknown [] -1 [] -1 2.d unknown [] -1 [] -1 2.8 unknown [] -1 [] -1 2.7 unknown [] -1 [] -1 2.6 unknown [] -1 [] -1 2.5 unknown [] -1 [] -1 2.0 unknown [] -1 [] -1 1.0 unknown [] -1 [] -1 2.3 unknown [] -1 [] -1 2.1 unknown [] -1 [] -1 2.2 unknown [] -1 [] -1 2.4 unknown [] -1 [] -1 ok [ceph: root@cn01 /]# ceph pg 2.4 query Couldn't parse JSON : Expecting value: line 1 column 1 (char 0) Traceback (most recent call last): File "/usr/bin/ceph", line 1310, in retval = main() File "/usr/bin/ceph", line 1230, in main si
[ceph-users] Re: Remapping OSDs under a PG
:15.122042+ 2.100 0 00 00 0 0 active+clean21h 0'0254:42 [10,17,7]p10 [10,17,7]p10 2021-05-28T00:46:38.770867+ 2021-05-28T00:46:15.122042+ 2.110 0 00 00 0 0 active+clean21h 0'0254:42 [34,20,1]p34 [34,20,1]p34 2021-05-28T00:46:39.572906+ 2021-05-28T00:46:15.122042+ 2.120 0 00 00 0 0 active+clean21h 0'0254:56[5,24,26]p5[5,24,26]p5 2021-05-28T00:46:38.802818+ 2021-05-28T00:46:15.122042+ 2.130 0 00 00 0 0 active+clean21h 0'0254:42 [21,35,3]p21 [21,35,3]p21 2021-05-28T00:46:39.517117+ 2021-05-28T00:46:15.122042+ 2.140 0 00 00 0 0 active+clean21h 0'0254:42[18,9,7]p18[18,9,7]p18 2021-05-28T00:46:38.078800+ 2021-05-28T00:46:15.122042+ 2.150 0 00 00 0 0 active+clean21h 0'0254:42[7,14,34]p7[7,14,34]p7 2021-05-28T00:46:38.748425+ 2021-05-28T00:46:15.122042+ 2.160 0 00 00 0 0 active+clean21h 0'0254:42 [0,23,7]p0 [0,23,7]p0 2021-05-28T00:46:42.000503+ 2021-05-28T00:46:15.122042+ 2.170 0 00 00 0 0 active+clean21h 0'0254:42 [21,5,11]p21 [21,5,11]p21 2021-05-28T00:46:46.515686+ 2021-05-28T00:46:15.122042+ 2.180 0 00 00 0 0 active+clean21h 0'0254:42 [18,9,33]p18 [18,9,33]p18 2021-05-28T00:46:40.104875+ 2021-05-28T00:46:15.122042+ 2.190 0 00 00 0 0 active+clean21h 0'0254:42 [13,23,4]p13 [13,23,4]p13 2021-05-28T00:46:38.739980+ 2021-05-28T00:46:35.469823+ 2.1a0 0 00 00 0 0 active+clean21h 0'0254:42[3,23,28]p3[3,23,28]p3 2021-05-28T00:46:41.549389+ 2021-05-28T00:46:15.122042+ 2.1b0 0 00 00 0 0 active+clean21h 0'0254:56[5,28,23]p5[5,28,23]p5 2021-05-28T00:46:40.824368+ 2021-05-28T00:46:15.122042+ 2.1c0 0 00 00 0 0 active+clean21h 0'0254:42 [33,29,31]p33 [33,29,31]p33 2021-05-28T00:46:38.106675+ 2021-05-28T00:46:15.122042+ 2.1d0 0 00 00 0 0 active+clean21h 0'0254:42 [10,33,28]p10 [10,33,28]p10 2021-05-28T00:46:39.785338+ 2021-05-28T00:46:15.122042+ 2.1e0 0 00 00 0 0 active+clean21h 0'0254:42[3,21,13]p3[3,21,13]p3 2021-05-28T00:46:40.584803+ 2021-05-28T00:46:40.584803+ 2.1f0 0 00 00 0 0 active+clean21h 0'0254:42 [22,7,34]p22 [22,7,34]p22 2021-05-28T00:46:38.061932+ 2021-05-28T00:46:15.122042+ PG 1.0, which has all the objects, is still using osd.28, which is an ssd drive. ceph pg ls-by-pool device_health_metrics PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP 1.0 41 0 00 00 0 71 active+clean22h 205'71 253:484 [28,33,10]p28 [28,33,10]p28 2021-05-27T14:44:37.466384+ 2021-05-26T04:23:11.758060+ Also, I attempted to add my “crush location” and I believe I’m missing something fundamental. It claims no change but that doesn’t make sense because I haven’t previously specified this information: ceph osd crush set osd.24 3.63869 root=default datacenter=la1 rack=rack1 host=cn06 room=room1 row=6 set item id 24 name 'osd.24' weight 3.63869 at location {datacenter=la1,host=cn06,rack=rack1,room=room1,root=default,row=6}: no change My end goal is to create a crush map that is away of two separate racks with independent UPS power to increase our availability in the event of power going out on one of our racks. Thank you -jeremy > On May 28, 2021, at 5:01 AM, Jeremy Hansen wrote: > > I’m continuing to read and it’s becoming more clear. > > The CRUSH map seems pretty amazing! > > -jeremy > >
[ceph-users] Re: Remapping OSDs under a PG
I’m continuing to read and it’s becoming more clear. The CRUSH map seems pretty amazing! -jeremy > On May 28, 2021, at 1:10 AM, Jeremy Hansen wrote: > > Thank you both for your response. So this leads me to the next question: > > ceph osd crush rule create-replicated > > > What is and in this case? > > It also looks like this is responsible for things like “rack awareness” type > attributes which is something I’d like to utilize.: > > # types > type 0 osd > type 1 host > type 2 chassis > type 3 rack > type 4 row > type 5 pdu > type 6 pod > type 7 room > type 8 datacenter > type 9 zone > type 10 region > type 11 root > This is something I will eventually take advantage of as well. > > Thank you! > -jeremy > > >> On May 28, 2021, at 12:03 AM, Janne Johansson wrote: >> >> Create a crush rule that only chooses non-ssd drives, then >> ceph osd pool set crush_rule YourNewRuleName >> and it will move over to the non-ssd OSDs. >> >> Den fre 28 maj 2021 kl 02:18 skrev Jeremy Hansen : >>> >>> >>> I’m very new to Ceph so if this question makes no sense, I apologize. >>> Continuing to study but I thought an answer to this question would help me >>> understand Ceph a bit more. >>> >>> Using cephadm, I set up a cluster. Cephadm automatically creates a pool >>> for Ceph metrics. It looks like one of my ssd osd’s was allocated for the >>> PG. I’d like to understand how to remap this PG so it’s not using the SSD >>> OSDs. >>> >>> ceph pg map 1.0 >>> osdmap e205 pg 1.0 (1.0) -> up [28,33,10] acting [28,33,10] >>> >>> OSD 28 is the SSD. >>> >>> Is this possible? Does this make any sense? I’d like to reserve the SSDs >>> for their own pool. >>> >>> Thank you! >>> -jeremy >>> ___ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io >> >> >> >> -- >> May the most significant bit of your life be positive. >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Remapping OSDs under a PG
Thank you both for your response. So this leads me to the next question: ceph osd crush rule create-replicated What is and in this case? It also looks like this is responsible for things like “rack awareness” type attributes which is something I’d like to utilize.: # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 zone type 10 region type 11 root This is something I will eventually take advantage of as well. Thank you! -jeremy > On May 28, 2021, at 12:03 AM, Janne Johansson wrote: > > Create a crush rule that only chooses non-ssd drives, then > ceph osd pool set crush_rule YourNewRuleName > and it will move over to the non-ssd OSDs. > > Den fre 28 maj 2021 kl 02:18 skrev Jeremy Hansen : >> >> >> I’m very new to Ceph so if this question makes no sense, I apologize. >> Continuing to study but I thought an answer to this question would help me >> understand Ceph a bit more. >> >> Using cephadm, I set up a cluster. Cephadm automatically creates a pool for >> Ceph metrics. It looks like one of my ssd osd’s was allocated for the PG. >> I’d like to understand how to remap this PG so it’s not using the SSD OSDs. >> >> ceph pg map 1.0 >> osdmap e205 pg 1.0 (1.0) -> up [28,33,10] acting [28,33,10] >> >> OSD 28 is the SSD. >> >> Is this possible? Does this make any sense? I’d like to reserve the SSDs >> for their own pool. >> >> Thank you! >> -jeremy >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > > > -- > May the most significant bit of your life be positive. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io signature.asc Description: Message signed with OpenPGP ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Remapping OSDs under a PG
I’m very new to Ceph so if this question makes no sense, I apologize. Continuing to study but I thought an answer to this question would help me understand Ceph a bit more. Using cephadm, I set up a cluster. Cephadm automatically creates a pool for Ceph metrics. It looks like one of my ssd osd’s was allocated for the PG. I’d like to understand how to remap this PG so it’s not using the SSD OSDs. ceph pg map 1.0 osdmap e205 pg 1.0 (1.0) -> up [28,33,10] acting [28,33,10] OSD 28 is the SSD. Is this possible? Does this make any sense? I’d like to reserve the SSDs for their own pool. Thank you! -jeremy signature.asc Description: Message signed with OpenPGP ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io