Re: [ceph-users] Need to replace OSD. How do I find physical disk
On the host with the osd run: ceph-volume lvm list From: "☣Adam" To: ceph-users@lists.ceph.com Date: 07/18/2019 03:25 PM Subject:[EXTERNAL] Re: [ceph-users] Need to replace OSD. How do I find physical disk Sent by:"ceph-users" The block device can be found in /var/lib/ceph/osd/ceph-$ID/block # ls -l /var/lib/ceph/osd/ceph-9/block In my case it links to /dev/sdbvg/sdb which makes is pretty obvious which drive this is, but the Volume Group and Logical volume could be named anything. To see what physical disk(s) make up this volume group use lvblk (as Reed suggested) # lvblk If that drive needs to be located in a computer with many drives, smartctl should be able to be used to pull the make, model, and serial number # smartctl -i /dev/sdb I was not aware of ceph-volume, or `ceph-disk list` (which is apparently now deprecated in favor of ceph-volume), so thank you to all in this thread for teaching about alternative (arguably more proper) ways of doing this. :-) On 7/18/19 12:58 PM, Pelletier, Robert wrote: > How do I find the physical disk in a Ceph luminous cluster in order to > replace it. Osd.9 is down in my cluster which resides on ceph-osd1 host. > > > > If I run lsblk -io KNAME,TYPE,SIZE,MODEL,SERIAL I can get the serial > numbers of all the physical disks for example > > sdb disk 1.8T ST2000DM001-1CH1 Z1E5VLRG > > > > But how do I find out which osd is mapped to sdb and so on? > > When I run df –h I get this > > > > [root@ceph-osd1 ~]# df -h > > Filesystem Size Used Avail Use% Mounted on > > /dev/mapper/ceph--osd1-root 19G 1.9G 17G 10% / > > devtmpfs 48G 0 48G 0% /dev > > tmpfs 48G 0 48G 0% /dev/shm > > tmpfs 48G 9.3M 48G 1% /run > > tmpfs 48G 0 48G 0% /sys/fs/cgroup > > /dev/sda3 947M 232M 716M 25% /boot > > tmpfs 48G 24K 48G 1% /var/lib/ceph/osd/ceph-2 > > tmpfs 48G 24K 48G 1% /var/lib/ceph/osd/ceph-5 > > tmpfs 48G 24K 48G 1% /var/lib/ceph/osd/ceph-0 > > tmpfs 48G 24K 48G 1% /var/lib/ceph/osd/ceph-8 > > tmpfs 48G 24K 48G 1% /var/lib/ceph/osd/ceph-7 > > tmpfs 48G 24K 48G 1% /var/lib/ceph/osd/ceph-33 > > tmpfs 48G 24K 48G 1% /var/lib/ceph/osd/ceph-10 > > tmpfs 48G 24K 48G 1% /var/lib/ceph/osd/ceph-1 > > tmpfs 48G 24K 48G 1% /var/lib/ceph/osd/ceph-38 > > tmpfs 48G 24K 48G 1% /var/lib/ceph/osd/ceph-4 > > tmpfs 48G 24K 48G 1% /var/lib/ceph/osd/ceph-6 > > tmpfs 9.5G 0 9.5G 0% /run/user/0 > > > > > > *Robert Pelletier, **IT and Security Specialist*** > > Eastern Maine Community College > (207) 974-4782 | 354 Hogan Rd., Bangor, ME 04401 > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwIF-g=jf_iaSHvJObTbx-siA1ZOg=3V1n-r1W__Mu-wEAwzq7jDpopOSMrfRfomn1f5bgT28=TXW65vJi4jZZ8MBtN2cjvq0bG2nV1-y_EM43NonJWFs=a-SpJzGVKsv4FRPY4Q84J3RrM3-FRsTVJz3f0825pOc= > ___ ceph-users mailing list ceph-users@lists.ceph.com https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwIF-g=jf_iaSHvJObTbx-siA1ZOg=3V1n-r1W__Mu-wEAwzq7jDpopOSMrfRfomn1f5bgT28=TXW65vJi4jZZ8MBtN2cjvq0bG2nV1-y_EM43NonJWFs=a-SpJzGVKsv4FRPY4Q84J3RrM3-FRsTVJz3f0825pOc= ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Client admin socket for RBD
Sasha, Sorry I don't get it, the documentation for the command states that in order to see the config DB for all do: "ceph config dump" To see what's in the config DB for a particular daemon do: "ceph config get " To see what's set for a particular daemon (be it from the config db, override, conf file, etc): "ceph config show " I don't see anywhere where the command you mentioned is valid: "ceph config get client.admin" Here is output from a monitor node on bare metal root@hostmonitor1:~# ceph config dump WHO MASK LEVELOPTIONVALUERO mon.hostmonitor1 advanced mon_osd_down_out_interval 30 mon.hostmonitor1 advanced mon_osd_min_in_ratio 0.10 mgr unknown mgr/balancer/active 1* mgr unknown mgr/balancer/mode upmap* osd.* advanced debug_ms 20/20 osd.* advanced osd_max_backfills 2 root@hostmonitor1:~# ceph config get mon.hostmonitor1 WHO MASK LEVELOPTIONVALUERO mon.hostmonitor1 advanced mon_osd_down_out_interval 30 mon.hostmonitor1 advanced mon_osd_min_in_ratio 0.10 root@hostmonitor1:~# ceph config get client.admin WHO MASK LEVEL OPTION VALUE RO <-blank What am I missing from what you're suggesting? Thank you for clarifying, Tarek Zegar Senior SDS Engineer Email tze...@us.ibm.com Mobile 630.974.7172 From: Sasha Litvak To: Tarek Zegar , ceph-users@lists.ceph.com Date: 06/25/2019 10:38 AM Subject:[EXTERNAL] Re: Re: Re: [ceph-users] Client admin socket for RBD Tarek, Of course you are correct about the client nodes. I executed this command inside of container that runs mon. Or it can be done on the bare metal node that runs mon. You essentially quering mon configuration database. On Tue, Jun 25, 2019 at 8:53 AM Tarek Zegar wrote: "config get" on a client.admin? There is no daemon for client.admin, I get nothing. Can you please explain? Tarek Zegar Senior SDS Engineer Email tze...@us.ibm.com Mobile 630.974.7172 Inactive hide details for Sasha Litvak ---06/24/2019 07:48:46 PM---ceph config get client.admin On Mon, Jun 24, 2019, 1:10 PM TSasha Litvak ---06/24/2019 07:48:46 PM---ceph config get client.admin On Mon, Jun 24, 2019, 1:10 PM Tarek Zegar wrote: From: Sasha Litvak To: Tarek Zegar Date: 06/24/2019 07:48 PM Subject: [EXTERNAL] Re: Re: [ceph-users] Client admin socket for RBD ceph config get client.admin On Mon, Jun 24, 2019, 1:10 PM Tarek Zegar wrote: Alex, Sorry real quick, what did you type to get that last bit of info? Tarek Zegar Senior SDS Engineer Email tze...@us.ibm.com Mobile 630.974.7172 Alex Litvak ---06/24/2019 01:07:28 PM---Jason, Here you go: From: Alex Litvak To: ceph-users@lists.ceph.com Cc: ceph-users < public-ceph-users-idqoxfivofjgjs9i8mt...@plane.gmane.org> Date: 06/24/2019 01:07 PM Subject: [EXTERNAL] Re: [ceph-users] Client admin socket for RBD Sent by: "ceph-users" Jason, Here you go: WHO MASK LEVEL OPTION VALUE RO client advanced admin_socket /var/run/ceph/$name.$pid.asok * global advanced cluster_network 10.0.42.0/23 * global advanced debug_asok 0/0 global advanced debug_auth 0/0 global advanced debug_bdev 0/0 global advanced debug_bluefs 0/0 global advanced debug_bluestore 0/0 global advanced debug_buffer 0/0 global advanced debug_civetweb 0/0 global advanced debug_client 0/0 global advanced debug_compressor 0/0 global advanced debug_context 0/0 global advanced debug_crush 0/0 global advanced debug_crypto 0/0 global advanced debug_dpdk 0/0 global advanced debug_eventtrace 0/0 global advanced debug_filer 0/0 global advanced debug_filestore 0/0 global advanced debug_finisher 0/0 global advanced debug_fuse 0/0 global advanced debug_heartbeatmap 0/0 global advanced debug_javaclient 0/0 global advanced debug_journal 0/0 global advanced debug_journaler 0/0 global advanced debug_kinetic 0/0
Re: [ceph-users] Enable buffered write for bluestore
http://docs.ceph.com/docs/master/rbd/rbd-config-ref/ From: Trilok Agarwal To: ceph-users@lists.ceph.com Date: 06/12/2019 07:31 PM Subject:[EXTERNAL] [ceph-users] Enable buffered write for bluestore Sent by:"ceph-users" Hi How can we enable bluestore_default_buffered_write using ceph-conf utility Any pointers would be appreciated ___ ceph-users mailing list ceph-users@lists.ceph.com https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwICAg=jf_iaSHvJObTbx-siA1ZOg=3V1n-r1W__Mu-wEAwzq7jDpopOSMrfRfomn1f5bgT28=IQSM2SLfLlMC9PBaiOwxBO2O-cyYKVyr23g8JOm-DF8=3b4-XBOn9Iq644agPjiuSxrFn2zc85p1o_4W41IH3sQ= ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Reweight OSD to 0, why doesn't report degraded if UP set under Pool Size
Hi Haung, So you are suggesting that even though osd.4 in this case has weight 0, it's still getting new data being written to it? I find that counter to what weight 0 means. Thanks Tarek From: huang jun To: Tarek Zegar Cc: Paul Emmerich , Ceph Users Date: 06/08/2019 05:27 AM Subject:[EXTERNAL] Re: [ceph-users] Reweight OSD to 0, why doesn't report degraded if UP set under Pool Size i think the write data will also write to the osd.4 in this case. bc your osd.4 is not down, so the ceph don't think the pg have some osd down, and it will replicated the data to all osds in actingbackfill set. Tarek Zegar 于2019年6月7日周五 下午10:37写道: Paul / All I'm not sure what warning your are referring to, I'm on Nautilus. The point I'm getting at is if you weight out all OSD on a host with a cluster of 3 OSD hosts with 3 OSD each, crush rule = host, then write to the cluster, it *should* imo not just say remapped but undersized / degraded. See below, 1 out of the 3 OSD hosts has ALL it's OSD marked out and weight = 0. When you write (say using FIO), the PGs *only* have 2 OSD in them (UP set), which is pool min size. I don't understand why it's not saying undersized/degraded, this seems like a bug. Who cares that the Acting Set has the 3 original OSD in it, the actual data is only on 2 OSD, which is a degraded state root@hostadmin:~# ceph -s cluster: id: 33d41932-9df2-40ba-8e16-8dedaa4b3ef6 health: HEALTH_WARN application not enabled on 1 pool(s) services: mon: 1 daemons, quorum hostmonitor1 (age 29m) mgr: hostmonitor1(active, since 31m) osd: 9 osds: 9 up, 6 in; 100 remapped pgs data: pools: 1 pools, 100 pgs objects: 520 objects, 2.0 GiB usage: 15 GiB used, 75 GiB / 90 GiB avail pgs: 520/1560 objects misplaced (33.333%) 100 active+clean+remapped root@hostadmin:~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.08817 root default -3 0.02939 host hostosd1 0 hdd 0.00980 osd.0 up 1.0 1.0 3 hdd 0.00980 osd.3 up 1.0 1.0 6 hdd 0.00980 osd.6 up 1.0 1.0 -5 0.02939 host hostosd2 1 hdd 0.00980 osd.1 up 0 1.0 4 hdd 0.00980 osd.4 up 0 1.0 7 hdd 0.00980 osd.7 up 0 1.0 -7 0.02939 host hostosd3 2 hdd 0.00980 osd.2 up 1.0 1.0 5 hdd 0.00980 osd.5 up 1.0 1.0 8 hdd 0.00980 osd.8 up 1.0 1.0 root@hostadmin:~# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 hdd 0.00980 1.0 10 GiB 1.7 GiB 765 MiB 12 KiB 1024 MiB 8.2 GiB 17.48 1.03 34 up 3 hdd 0.00980 1.0 10 GiB 1.7 GiB 765 MiB 12 KiB 1024 MiB 8.2 GiB 17.48 1.03 36 up 6 hdd 0.00980 1.0 10 GiB 1.6 GiB 593 MiB 4 KiB 1024 MiB 8.4 GiB 15.80 0.93 30 up 1 hdd 0.00980 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 up 4 hdd 0.00980 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 up 7 hdd 0.00980 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 100 up 2 hdd 0.00980 1.0 10 GiB 1.5 GiB 525 MiB 8 KiB 1024 MiB 8.5 GiB 15.13 0.89 20 up 5 hdd 0.00980 1.0 10 GiB 1.9 GiB 941 MiB 4 KiB 1024 MiB 8.1 GiB 19.20 1.13 43 up 8 hdd 0.00980 1.0 10 GiB 1.6 GiB 657 MiB 8 KiB 1024 MiB 8.4 GiB 16.42 0.97 37 up TOTAL 90 GiB 15 GiB 6.2 GiB 61 KiB 9.0 GiB 75 GiB 16.92 MIN/MAX VAR: 0.89/1.13 STDDEV: 1.32 Tarek Zegar Senior SDS Engineer Email tze...@us.ibm.com Mobile 630.974.7172 Inactive hide details for Paul Emmerich ---06/07/2019 05:25:23 AM---remapped no longer triggers a health warning in nautilus. YPaul Emmerich ---06/07/2019 05:25:23 AM---remapped no longer triggers a health warning in nautilus. Your data is still there, it's just on the From: Paul Emmerich To: Tarek Zegar Cc: Ceph Users Date: 06/07/2019 05:25 AM Subject: [EXTERNAL] Re: [ceph-users] Reweight OSD to 0, why doesn't report degraded if UP set under Pool Size remapped no longer triggers a health warning in nautilus. Your data is still there, it's just on the wrong OSD if that OSD is still up and running. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Thu, Jun 6, 2019 at 10:48 PM Tarek Zegar wrote: For testing purposes I set a bunch of OSD to 0 weight, this correctly forces Ceph to not use said OSD. I took enough out such that the UP set only had Pool min size # of OSD (i.e 2 OSD). Two Questions: 1. Why doesn't the acting set eventually match the UP set and simply point to [6,5] only 2. Why are none of the PGs marked as undersized and degraded? The data is only hosted on 2 OSD rather then Pool size (3), I would expect a undersized warning and degraded for PG with data? Example PG: PG 1.4d active+clean+remapped UP= [6,5] Acting = [6,5,4] OSD Tree: ID CLASS WEIGHT TYPE NAME STATUS
Re: [ceph-users] Reweight OSD to 0, why doesn't report degraded if UP set under Pool Size
Paul / All I'm not sure what warning your are referring to, I'm on Nautilus. The point I'm getting at is if you weight out all OSD on a host with a cluster of 3 OSD hosts with 3 OSD each, crush rule = host, then write to the cluster, it *should* imo not just say remapped but undersized / degraded. See below, 1 out of the 3 OSD hosts has ALL it's OSD marked out and weight = 0. When you write (say using FIO), the PGs *only* have 2 OSD in them (UP set), which is pool min size. I don't understand why it's not saying undersized/degraded, this seems like a bug. Who cares that the Acting Set has the 3 original OSD in it, the actual data is only on 2 OSD, which is a degraded state root@hostadmin:~# ceph -s cluster: id: 33d41932-9df2-40ba-8e16-8dedaa4b3ef6 health: HEALTH_WARN application not enabled on 1 pool(s) services: mon: 1 daemons, quorum hostmonitor1 (age 29m) mgr: hostmonitor1(active, since 31m) osd: 9 osds: 9 up, 6 in; 100 remapped pgs data: pools: 1 pools, 100 pgs objects: 520 objects, 2.0 GiB usage: 15 GiB used, 75 GiB / 90 GiB avail pgs: 520/1560 objects misplaced (33.333%) 100 active+clean+remapped root@hostadmin:~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.08817 root default -3 0.02939 host hostosd1 0 hdd 0.00980 osd.0 up 1.0 1.0 3 hdd 0.00980 osd.3 up 1.0 1.0 6 hdd 0.00980 osd.6 up 1.0 1.0 -5 0.02939 host hostosd2 1 hdd 0.00980 osd.1 up0 1.0 4 hdd 0.00980 osd.4 up0 1.0 7 hdd 0.00980 osd.7 up0 1.0 -7 0.02939 host hostosd3 2 hdd 0.00980 osd.2 up 1.0 1.0 5 hdd 0.00980 osd.5 up 1.0 1.0 8 hdd 0.00980 osd.8 up 1.0 1.0 root@hostadmin:~# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATAOMAP META AVAIL %USE VAR PGS STATUS 0 hdd 0.00980 1.0 10 GiB 1.7 GiB 765 MiB 12 KiB 1024 MiB 8.2 GiB 17.48 1.03 34 up 3 hdd 0.00980 1.0 10 GiB 1.7 GiB 765 MiB 12 KiB 1024 MiB 8.2 GiB 17.48 1.03 36 up 6 hdd 0.00980 1.0 10 GiB 1.6 GiB 593 MiB 4 KiB 1024 MiB 8.4 GiB 15.80 0.93 30 up 1 hdd 0.0098000 B 0 B 0 B0 B 0 B 0 B 00 0 up 4 hdd 0.0098000 B 0 B 0 B0 B 0 B 0 B 00 0 up 7 hdd 0.0098000 B 0 B 0 B0 B 0 B 0 B 00 100 up 2 hdd 0.00980 1.0 10 GiB 1.5 GiB 525 MiB 8 KiB 1024 MiB 8.5 GiB 15.13 0.89 20 up 5 hdd 0.00980 1.0 10 GiB 1.9 GiB 941 MiB 4 KiB 1024 MiB 8.1 GiB 19.20 1.13 43 up 8 hdd 0.00980 1.0 10 GiB 1.6 GiB 657 MiB 8 KiB 1024 MiB 8.4 GiB 16.42 0.97 37 up TOTAL 90 GiB 15 GiB 6.2 GiB 61 KiB 9.0 GiB 75 GiB 16.92 MIN/MAX VAR: 0.89/1.13 STDDEV: 1.32 Tarek Zegar Senior SDS Engineer Email tze...@us.ibm.com Mobile 630.974.7172 From: Paul Emmerich To: Tarek Zegar Cc: Ceph Users Date: 06/07/2019 05:25 AM Subject:[EXTERNAL] Re: [ceph-users] Reweight OSD to 0, why doesn't report degraded if UP set under Pool Size remapped no longer triggers a health warning in nautilus. Your data is still there, it's just on the wrong OSD if that OSD is still up and running. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Thu, Jun 6, 2019 at 10:48 PM Tarek Zegar wrote: For testing purposes I set a bunch of OSD to 0 weight, this correctly forces Ceph to not use said OSD. I took enough out such that the UP set only had Pool min size # of OSD (i.e 2 OSD). Two Questions: 1. Why doesn't the acting set eventually match the UP set and simply point to [6,5] only 2. Why are none of the PGs marked as undersized and degraded? The data is only hosted on 2 OSD rather then Pool size (3), I would expect a undersized warning and degraded for PG with data? Example PG: PG 1.4d active+clean+remapped UP= [6,5] Acting = [6,5,4] OSD Tree: ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.08817 root default -3 0.02939 host hostosd1 0 hdd 0.00980 osd.0 up 1.0 1.0 3 hdd 0.00980 osd.3 up 1.0 1.0 6 hdd 0.00980 osd.6 up 1.0 1.0 -5 0.02939 host hostosd2 1 hdd 0.00980 osd.1 up 0 1.0 4 hdd 0.00980 osd.4 up 0 1.0 7 hdd 0.00980 osd.7 up 0 1.0 -7 0.02939 host hostosd3 2 hdd 0.00980 osd.2 up 1.0 1.0 5 hdd 0.00980 osd.5 up 1.0 1.0 8 hdd 0.00980 osd.8 up 0 1.0 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com
[ceph-users] Reweight OSD to 0, why doesn't report degraded if UP set under Pool Size
For testing purposes I set a bunch of OSD to 0 weight, this correctly forces Ceph to not use said OSD. I took enough out such that the UP set only had Pool min size # of OSD (i.e 2 OSD). Two Questions: 1. Why doesn't the acting set eventually match the UP set and simply point to [6,5] only 2. Why are none of the PGs marked as undersized and degraded? The data is only hosted on 2 OSD rather then Pool size (3), I would expect a undersized warning and degraded for PG with data? Example PG: PG 1.4d active+clean+remapped UP= [6,5] Acting = [6,5,4] OSD Tree: ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.08817 root default -3 0.02939 host hostosd1 0 hdd 0.00980 osd.0 up 1.0 1.0 3 hdd 0.00980 osd.3 up 1.0 1.0 6 hdd 0.00980 osd.6 up 1.0 1.0 -5 0.02939 host hostosd2 1 hdd 0.00980 osd.1 up0 1.0 4 hdd 0.00980 osd.4 up0 1.0 7 hdd 0.00980 osd.7 up0 1.0 -7 0.02939 host hostosd3 2 hdd 0.00980 osd.2 up 1.0 1.0 5 hdd 0.00980 osd.5 up 1.0 1.0 8 hdd 0.00980 osd.8 up0 1.0 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fix scrub error in bluestore.
Look here http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#pgs-inconsistent Read error typically is a disk issue. The doc is not clear on how to resolve that From: Alfredo Rezinovsky To: Ceph Users Date: 06/06/2019 10:58 AM Subject:[EXTERNAL] [ceph-users] Fix scrub error in bluestore. Sent by:"ceph-users" https://ceph.com/geen-categorie/ceph-manually-repair-object/ is a little outdated. After stopping the OSD, flushing the journal I don't have any clue on how to move the object (easy in filestore). I have thins in my osd log. 2019-06-05 10:46:41.418 7f47d0502700 -1 log_channel(cluster) log [ERR] : 10.c5 shard 2 soid 10:a39e2c78:::183f81f.0001:head : candidate had a read error How can I fix it? -- Alfrenovsky___ ceph-users mailing list ceph-users@lists.ceph.com https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwICAg=jf_iaSHvJObTbx-siA1ZOg=3V1n-r1W__Mu-wEAwzq7jDpopOSMrfRfomn1f5bgT28=352TJwgu0vnFCTdMhAtPjFy3LjdYBfTkgOCdE2HTktQ=M9UCn5VB0zy165xxF7Ip1o4HxjQZMz6QvEXcDYwZIaI= ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Balancer: uneven OSDs
Hi Oliver Here is the output of the active mgr log after I toggled balancer off / on, I grep'd out only "balancer" as it was far to verbose (see below). When I look at ceph osd df I see it optimized :) I would like to understand two things however, why is "prepared 0/10 changes" zero if it actually did something, what in the log can I look for before I toggled that said basically "hey balancer isn't going to work because I still think min-client-compact-level < luminous" Thanks for helping me in getting this working! root@hostmonitor1:/var/log/ceph# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 1 hdd 0.0098000 B 0 B 0 B 00 0 3 hdd 0.00980 1.0 10 GiB 5.3 GiB 4.7 GiB 53.25 0.97 150 6 hdd 0.00980 1.0 10 GiB 5.6 GiB 4.4 GiB 56.07 1.03 150 0 hdd 0.0098000 B 0 B 0 B 00 0 5 hdd 0.00980 1.0 10 GiB 5.7 GiB 4.3 GiB 56.97 1.04 151 7 hdd 0.00980 1.0 10 GiB 5.2 GiB 4.8 GiB 52.35 0.96 149 2 hdd 0.0098000 B 0 B 0 B 00 0 4 hdd 0.00980 1.0 10 GiB 5.5 GiB 4.5 GiB 55.25 1.01 150 8 hdd 0.00980 1.0 10 GiB 5.4 GiB 4.6 GiB 54.07 0.99 150 TOTAL 70 GiB 34 GiB 36 GiB 54.66 MIN/MAX VAR: 0.96/1.04 STDDEV: 1.60 2019-05-29 17:06:49.324 7f40ce42a700 0 log_channel(audit) log [DBG] : from='client.11262 192.168.0.12:0/4104979884' entity='client.admin' cmd= [{"prefix": "balancer off", "target": ["mgr", ""]}]: dispatch 2019-05-29 17:06:49.324 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer status' 2019-05-29 17:06:49.324 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer mode' 2019-05-29 17:06:49.324 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer on' 2019-05-29 17:06:49.324 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer off' 2019-05-29 17:06:49.324 7f40cec2b700 1 mgr[balancer] Handling command: '{'prefix': 'balancer off', 'target': ['mgr', '']}' 2019-05-29 17:06:49.388 7f40d747a700 4 mgr[py] Loaded module_config entry mgr/balancer/max_misplaced:.50 2019-05-29 17:06:49.388 7f40d747a700 4 mgr[py] Loaded module_config entry mgr/balancer/mode:upmap 2019-05-29 17:06:49.539 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/active 2019-05-29 17:06:49.539 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/begin_time 2019-05-29 17:06:49.539 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/end_time 2019-05-29 17:06:49.539 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/sleep_interval 2019-05-29 17:06:54.279 7f40ce42a700 4 mgr.server handle_command prefix=balancer on 2019-05-29 17:06:54.279 7f40ce42a700 0 log_channel(audit) log [DBG] : from='client.11268 192.168.0.12:0/1339099349' entity='client.admin' cmd= [{"prefix": "balancer on", "target": ["mgr", ""]}]: dispatch 2019-05-29 17:06:54.279 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer status' 2019-05-29 17:06:54.279 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer mode' 2019-05-29 17:06:54.279 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer on' 2019-05-29 17:06:54.279 7f40cec2b700 1 mgr[balancer] Handling command: '{'prefix': 'balancer on', 'target': ['mgr', '']}' 2019-05-29 17:06:54.287 7f40d747a700 4 mgr[py] Loaded module_config entry mgr/balancer/active:1 2019-05-29 17:06:54.287 7f40d747a700 4 mgr[py] Loaded module_config entry mgr/balancer/max_misplaced:.50 2019-05-29 17:06:54.287 7f40d747a700 4 mgr[py] Loaded module_config entry mgr/balancer/mode:upmap 2019-05-29 17:06:54.299 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/active 2019-05-29 17:06:54.299 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/begin_time 2019-05-29 17:06:54.299 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/end_time 2019-05-29 17:06:54.299 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/sleep_interval 2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr[balancer] Optimize plan auto_2019-05-29_17:06:54 2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/mode 2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/max_misplaced 2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr[balancer] Mode upmap, max misplaced 0.50 2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr[balancer] do_upmap 2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/upmap_max_iterations 2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/upmap_max_deviation 2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr[balancer] pools ['rbd'] 2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr[balancer] prepared 0/10 changes From: Oliver Freyermuth To: Tarek Zegar Cc: ceph-users@lists.ceph.com Date: 05/29/2019 11
Re: [ceph-users] Balancer: uneven OSDs
Hi Oliver, Thank you for the response, I did ensure that min-client-compact-level is indeed Luminous (see below). I have no kernel mapped rbd clients. Ceph versions reports mimic. Also below is the output of ceph balancer status. One thing to note, I did enable the balancer after I already filled the cluster, not from the onset. I had hoped that it wouldn't matter, though your comment "if the compat-level is too old for upmap, you'll only find a small warning about that in the logfiles" leaves me to believe that it will *not* work in doing it this way, please confirm and let me know what message to look for in /var/log/ceph. Thank you! root@hostadmin:~# ceph balancer status { "active": true, "plans": [], "mode": "upmap" } root@hostadmin:~# ceph features { "mon": [ { "features": "0x3ffddff8ffacfffb", "release": "luminous", "num": 3 } ], "osd": [ { "features": "0x3ffddff8ffacfffb", "release": "luminous", "num": 7 } ], "client": [ { "features": "0x3ffddff8ffacfffb", "release": "luminous", "num": 1 } ], "mgr": [ { "features": "0x3ffddff8ffacfffb", "release": "luminous", "num": 3 } ] } From: Oliver Freyermuth To: ceph-users@lists.ceph.com Date: 05/29/2019 11:13 AM Subject:[EXTERNAL] Re: [ceph-users] Balancer: uneven OSDs Sent by:"ceph-users" Hi Tarek, what's the output of "ceph balancer status"? In case you are using "upmap" mode, you must make sure to have a min-client-compat-level of at least Luminous: http://docs.ceph.com/docs/mimic/rados/operations/upmap/ Of course, please be aware that your clients must be recent enough (especially for kernel clients). Sadly, if the compat-level is too old for upmap, you'll only find a small warning about that in the logfiles, but no error on terminal when activating the balancer or any other kind of erroneous / health condition. Cheers, Oliver Am 29.05.19 um 17:52 schrieb Tarek Zegar: > Can anyone help with this? Why can't I optimize this cluster, the pg counts and data distribution is way off. > __ > > I enabled the balancer plugin and even tried to manually invoke it but it won't allow any changes. Looking at ceph osd df, it's not even at all. Thoughts? > > root@hostadmin:~# ceph osd df > ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS > 1 hdd 0.00980 0 0 B 0 B 0 B 0 0 0 > 3 hdd 0.00980 1.0 10 GiB 8.3 GiB 1.7 GiB 82.83 1.14 156 > 6 hdd 0.00980 1.0 10 GiB 8.4 GiB 1.6 GiB 83.77 1.15 144 > 0 hdd 0.00980 0 0 B 0 B 0 B 0 0 0 > 5 hdd 0.00980 1.0 10 GiB 9.0 GiB 1021 MiB 90.03 1.23 159 > 7 hdd 0.00980 1.0 10 GiB 7.7 GiB 2.3 GiB 76.57 1.05 141 > 2 hdd 0.00980 1.0 10 GiB 5.5 GiB 4.5 GiB 55.42 0.76 90 > 4 hdd 0.00980 1.0 10 GiB 5.9 GiB 4.1 GiB 58.78 0.81 99 > 8 hdd 0.00980 1.0 10 GiB 6.3 GiB 3.7 GiB 63.12 0.87 111 > TOTAL 90 GiB 53 GiB 37 GiB 72.93 > MIN/MAX VAR: 0.76/1.23 STDDEV: 12.67 > > > root@hostadmin:~# osdmaptool om --upmap out.txt --upmap-pool rbd > osdmaptool: osdmap file 'om' > writing upmap command output to: out.txt > checking for upmap cleanups > upmap, max-count 100, max*deviation 0.01 <---really? It's not even close to 1% across the drives* > limiting to pools rbd (1) > *no upmaps proposed* > > > ceph balancer optimize myplan > Error EALREADY: Unable to find further optimization,or distribution is already perfect > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > (See attached file: smime.p7s) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com smime.p7s Description: Binary data ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Balancer: uneven OSDs
Can anyone help with this? Why can't I optimize this cluster, the pg counts and data distribution is way off. __ I enabled the balancer plugin and even tried to manually invoke it but it won't allow any changes. Looking at ceph osd df, it's not even at all. Thoughts? root@hostadmin:~# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL%USE VAR PGS 1 hdd 0.0098000 B 0 B 0 B 00 0 3 hdd 0.00980 1.0 10 GiB 8.3 GiB 1.7 GiB 82.83 1.14 156 6 hdd 0.00980 1.0 10 GiB 8.4 GiB 1.6 GiB 83.77 1.15 144 0 hdd 0.0098000 B 0 B 0 B 00 0 5 hdd 0.00980 1.0 10 GiB 9.0 GiB 1021 MiB 90.03 1.23 159 7 hdd 0.00980 1.0 10 GiB 7.7 GiB 2.3 GiB 76.57 1.05 141 2 hdd 0.00980 1.0 10 GiB 5.5 GiB 4.5 GiB 55.42 0.76 90 4 hdd 0.00980 1.0 10 GiB 5.9 GiB 4.1 GiB 58.78 0.81 99 8 hdd 0.00980 1.0 10 GiB 6.3 GiB 3.7 GiB 63.12 0.87 111 TOTAL 90 GiB 53 GiB 37 GiB 72.93 MIN/MAX VAR: 0.76/1.23 STDDEV: 12.67 root@hostadmin:~# osdmaptool om --upmap out.txt --upmap-pool rbd osdmaptool: osdmap file 'om' writing upmap command output to: out.txt checking for upmap cleanups upmap, max-count 100, max deviation 0.01 <---really? It's not even close to 1% across the drives limiting to pools rbd (1) no upmaps proposed ceph balancer optimize myplan Error EALREADY: Unable to find further optimization,or distribution is already perfect ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Balancer: uneven OSDs
I enabled the balancer plugin and even tried to manually invoke it but it won't allow any changes. Looking at ceph osd df, it's not even at all. Thoughts? root@hostadmin:~# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL%USE VAR PGS 1 hdd 0.0098000 B 0 B 0 B 00 0 3 hdd 0.00980 1.0 10 GiB 8.3 GiB 1.7 GiB 82.83 1.14 156 6 hdd 0.00980 1.0 10 GiB 8.4 GiB 1.6 GiB 83.77 1.15 144 0 hdd 0.0098000 B 0 B 0 B 00 0 5 hdd 0.00980 1.0 10 GiB 9.0 GiB 1021 MiB 90.03 1.23 159 7 hdd 0.00980 1.0 10 GiB 7.7 GiB 2.3 GiB 76.57 1.05 141 2 hdd 0.00980 1.0 10 GiB 5.5 GiB 4.5 GiB 55.42 0.76 90 4 hdd 0.00980 1.0 10 GiB 5.9 GiB 4.1 GiB 58.78 0.81 99 8 hdd 0.00980 1.0 10 GiB 6.3 GiB 3.7 GiB 63.12 0.87 111 TOTAL 90 GiB 53 GiB 37 GiB 72.93 MIN/MAX VAR: 0.76/1.23 STDDEV: 12.67 root@hostadmin:~# osdmaptool om --upmap out.txt --upmap-pool rbd osdmaptool: osdmap file 'om' writing upmap command output to: out.txt checking for upmap cleanups upmap, max-count 100, max deviation 0.01 <---really? It's not even close to 1% across the drives limiting to pools rbd (1) no upmaps proposed ceph balancer optimize myplan Error EALREADY: Unable to find further optimization,or distribution is already perfect ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] PG stuck in Unknown after removing OSD - Help?
Set 3 osd to "out", all were on the same host and should not impact the pool because it's 3x replication and CRUSH is one osd per host. However, now we have one PG stuck UKNOWN. Not sure why this is the case, I do have background writes going on at the time of OSD out. Thoughts? ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.08817 root default -5 0.02939 host hostosd1 3 hdd 0.00980 osd.3 up 1.0 1.0 4 hdd 0.00980 osd.4 up 1.0 1.0 5 hdd 0.00980 osd.5 up 1.0 1.0 -7 0.02939 host hostosd2 0 hdd 0.00980 osd.0 up 1.0 1.0 6 hdd 0.00980 osd.6 up 1.0 1.0 8 hdd 0.00980 osd.8 up 1.0 1.0 -3 0.02939 host hostosd3 1 hdd 0.00980 osd.1 up0 1.0 2 hdd 0.00980 osd.2 up0 1.0 7 hdd 0.00980 osd.7 up0 1.0 ceph health detail PG_AVAILABILITY Reduced data availability: 1 pg inactive pg 1.e2 is stuck inactive for 1885.728547, current state unknown, last acting [4,0] ceph pg 1.e2 query { "state": "unknown", "snap_trimq": "[]", "snap_trimq_len": 0, "epoch": 132, "up": [ 4, 0 ], "acting": [ 4, 0 ], "info": { "pgid": "1.e2", "last_update": "34'3072", "last_complete": "34'3072", "log_tail": "0'0", "last_user_version": 3072, "last_backfill": "MAX", "last_backfill_bitwise": 0, "purged_snaps": [], "history": { "epoch_created": 29, "epoch_pool_created": 29, "last_epoch_started": 30, "last_interval_started": 29, "last_epoch_clean": 30, "last_interval_clean": 29, "last_epoch_split": 0, "last_epoch_marked_full": 0, "same_up_since": 70, "same_interval_since": 70, "same_primary_since": 70, "last_scrub": "0'0", "last_scrub_stamp": "2019-05-20 21:15:42.448125", "last_deep_scrub": "0'0", "last_deep_scrub_stamp": "2019-05-20 21:15:42.448125", "last_clean_scrub_stamp": "2019-05-20 21:15:42.448125" }, "stats": { "version": "34'3072", "reported_seq": "3131", "reported_epoch": "132", "state": "unknown", "last_fresh": "2019-05-20 22:52:07.898135", "last_change": "2019-05-20 22:50:46.711730", "last_active": "2019-05-20 22:50:26.109185", "last_peered": "2019-05-20 22:02:01.008787", "last_clean": "2019-05-20 22:02:01.008787", "last_became_active": "2019-05-20 21:15:43.662550", "last_became_peered": "2019-05-20 21:15:43.662550", "last_unstale": "2019-05-20 22:52:07.898135", "last_undegraded": "2019-05-20 22:52:07.898135", "last_fullsized": "2019-05-20 22:52:07.898135", "mapping_epoch": 70, "log_start": "0'0", "ondisk_log_start": "0'0", "created": 29, "last_epoch_clean": 30, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "0'0", "last_scrub_stamp": "2019-05-20 21:15:42.448125", "last_deep_scrub": "0'0", "last_deep_scrub_stamp": "2019-05-20 21:15:42.448125", "last_clean_scrub_stamp": "2019-05-20 21:15:42.448125", "log_size": 3072, "ondisk_log_size": 3072, "stats_invalid": false, "dirty_stats_invalid": false, "omap_stats_invalid": false, "hitset_stats_invalid": false, "hitset_bytes_stats_invalid": false, "pin_stats_invalid": false, "manifest_stats_invalid": false, "snaptrimq_len": 0, "stat_sum": { "num_bytes": 12582912, "num_objects": 3, "num_object_clones": 0, "num_object_copies": 9, "num_objects_missing_on_primary": 0, "num_objects_missing": 0, "num_objects_degraded": 0, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 3, "num_whiteouts": 0, "num_read": 0, "num_read_kb": 0, "num_write": 3072, "num_write_kb": 12288, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_objects_recovered": 0, "num_bytes_recovered": 0, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0,
Re: [ceph-users] Lost OSD from PCIe error, recovered, HOW to restore OSD process
FYI for anyone interested, below is how to recover from a someone removing a NVME drive (the first two steps show how mine were removed and brought back) Steps 3-6 are to get the drive lvm volume back AND get the OSD daemon running for the drive 1. echo 1 > /sys/block/nvme0n1/device/device/remove 2. echo 1 > /sys/bus/pci/rescan 3. vgcfgrestore ceph-8c81b2a3-6c8e-4cae-a3c0-e2d91f82d841 ; vgchange -ay ceph-8c81b2a3-6c8e-4cae-a3c0-e2d91f82d841 4. ceph auth add osd.122 osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-122/keyring 5. ceph-volume lvm activate --all 6. You should see the drive somewhere in the ceph tree, move it to the right host Tarek From: "Tarek Zegar" To: Alfredo Deza Cc: ceph-users Date: 05/15/2019 10:32 AM Subject:[EXTERNAL] Re: [ceph-users] Lost OSD from PCIe error, recovered, to restore OSD process Sent by:"ceph-users" TLDR; I activated the drive successfully but the daemon won't start, looks like it's complaining about mon config, idk why (there is a valid ceph.conf on the host). Thoughts? I feel like it's close. Thank you I executed the command: ceph-volume lvm activate --all It found the drive and activated it: --> Activating OSD ID 122 FSID a151bea5-d123-45d9-9b08-963a511c042a --> ceph-volume lvm activate successful for osd ID: 122 However, systemd would not start the OSD process 122: May 15 14:16:13 pok1-qz1-sr1-rk001-s20 ceph-osd[757237]: 2019-05-15 14:16:13.862 71970700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2] May 15 14:16:13 pok1-qz1-sr1-rk001-s20 ceph-osd[757237]: 2019-05-15 14:16:13.862 7116f700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2] May 15 14:16:13 pok1-qz1-sr1-rk001-s20 ceph-osd[757237]: failed to fetch mon config (--no-mon-config to skip) May 15 14:16:13 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service: Main process exited, code=exited, status=1/FAILURE May 15 14:16:13 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service: Failed with result 'exit-code'. May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service: Service hold-off time over, scheduling restart. May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service: Scheduled restart job, restart counter is at 3. -- Subject: Automatic restarting of a unit has been scheduled -- Defined-By: systemd -- Support: http://www.ubuntu.com/support -- -- Automatic restarting of the unit ceph-osd@122.service has been scheduled, as the result for -- the configured Restart= setting for the unit. May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: Stopped Ceph object storage daemon osd.122. -- Subject: Unit ceph-osd@122.service has finished shutting down -- Defined-By: systemd -- Support: http://www.ubuntu.com/support -- -- Unit ceph-osd@122.service has finished shutting down. May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service: Start request repeated too quickly. May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service: Failed with result 'exit-code'. May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: Failed to start Ceph object storage daemon osd.122 Inactive hide details for Alfredo Deza ---05/15/2019 08:27:13 AM---On Tue, May 14, 2019 at 7:24 PM Bob R Alfredo Deza ---05/15/2019 08:27:13 AM---On Tue, May 14, 2019 at 7:24 PM Bob R wrote: > From: Alfredo Deza To: Bob R Cc: Tarek Zegar , ceph-users Date: 05/15/2019 08:27 AM Subject: [EXTERNAL] Re: [ceph-users] Lost OSD from PCIe error, recovered, to restore OSD process On Tue, May 14, 2019 at 7:24 PM Bob R wrote: > > Does 'ceph-volume lvm list' show it? If so you can try to activate it with 'ceph-volume lvm activate 122 74b01ec2--124d--427d--9812--e437f90261d4' Good suggestion. If `ceph-volume lvm list` can see it, it can probably activate it again. You can activate it with the OSD ID + OSD FSID, or do: ceph-volume lvm activate --all You didn't say if the OSD wasn't coming up after trying to start it (the systemd unit should still be there for ID 122), or if you tried rebooting and that OSD didn't come up. The systemd unit is tied to both the ID and FSID of the OSD, so it shouldn't matter if the underlying device changed since ceph-volume ensures it is the right one every time it activates. > > Bob > > On Tue, May 14, 2019 at 7:35 AM Tarek Zegar wrote: >> >> Someone nuked and OSD that had 1 replica PGs. They accidentally did echo 1 > /sys/block/nvme0n1/device/device/remove >> We got it back doing a echo 1 > /sys/bus/pci/rescan >> However, it reenumerated as a different drive number (guess we didn't have udev rules) >> They restored the LVM volume (vgcfgrestore ceph-8c81b2a3-6c8e-4cae-a3c0-e2d91f82d841 ; vgchange -ay ceph-8c81b2a3-6c8e-4cae-a3c0-e2d91f82d841) >> >> lsblk >&g
Re: [ceph-users] Lost OSD from PCIe error, recovered, to restore OSD process
TLDR; I activated the drive successfully but the daemon won't start, looks like it's complaining about mon config, idk why (there is a valid ceph.conf on the host). Thoughts? I feel like it's close. Thank you I executed the command: ceph-volume lvm activate --all It found the drive and activated it: --> Activating OSD ID 122 FSID a151bea5-d123-45d9-9b08-963a511c042a --> ceph-volume lvm activate successful for osd ID: 122 However, systemd would not start the OSD process 122: May 15 14:16:13 pok1-qz1-sr1-rk001-s20 ceph-osd[757237]: 2019-05-15 14:16:13.862 71970700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2] May 15 14:16:13 pok1-qz1-sr1-rk001-s20 ceph-osd[757237]: 2019-05-15 14:16:13.862 7116f700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2] May 15 14:16:13 pok1-qz1-sr1-rk001-s20 ceph-osd[757237]: failed to fetch mon config (--no-mon-config to skip) May 15 14:16:13 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service: Main process exited, code=exited, status=1/FAILURE May 15 14:16:13 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service: Failed with result 'exit-code'. May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service: Service hold-off time over, scheduling restart. May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service: Scheduled restart job, restart counter is at 3. -- Subject: Automatic restarting of a unit has been scheduled -- Defined-By: systemd -- Support: http://www.ubuntu.com/support -- -- Automatic restarting of the unit ceph-osd@122.service has been scheduled, as the result for -- the configured Restart= setting for the unit. May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: Stopped Ceph object storage daemon osd.122. -- Subject: Unit ceph-osd@122.service has finished shutting down -- Defined-By: systemd -- Support: http://www.ubuntu.com/support -- -- Unit ceph-osd@122.service has finished shutting down. May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service: Start request repeated too quickly. May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service: Failed with result 'exit-code'. May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: Failed to start Ceph object storage daemon osd.122 From: Alfredo Deza To: Bob R Cc: Tarek Zegar , ceph-users Date: 05/15/2019 08:27 AM Subject:[EXTERNAL] Re: [ceph-users] Lost OSD from PCIe error, recovered, to restore OSD process On Tue, May 14, 2019 at 7:24 PM Bob R wrote: > > Does 'ceph-volume lvm list' show it? If so you can try to activate it with 'ceph-volume lvm activate 122 74b01ec2--124d--427d--9812--e437f90261d4' Good suggestion. If `ceph-volume lvm list` can see it, it can probably activate it again. You can activate it with the OSD ID + OSD FSID, or do: ceph-volume lvm activate --all You didn't say if the OSD wasn't coming up after trying to start it (the systemd unit should still be there for ID 122), or if you tried rebooting and that OSD didn't come up. The systemd unit is tied to both the ID and FSID of the OSD, so it shouldn't matter if the underlying device changed since ceph-volume ensures it is the right one every time it activates. > > Bob > > On Tue, May 14, 2019 at 7:35 AM Tarek Zegar wrote: >> >> Someone nuked and OSD that had 1 replica PGs. They accidentally did echo 1 > /sys/block/nvme0n1/device/device/remove >> We got it back doing a echo 1 > /sys/bus/pci/rescan >> However, it reenumerated as a different drive number (guess we didn't have udev rules) >> They restored the LVM volume (vgcfgrestore ceph-8c81b2a3-6c8e-4cae-a3c0-e2d91f82d841 ; vgchange -ay ceph-8c81b2a3-6c8e-4cae-a3c0-e2d91f82d841) >> >> lsblk >> nvme0n2 259:9 0 1.8T 0 diskc >> ceph--8c81b2a3--6c8e--4cae--a3c0--e2d91f82d841-osd--data--74b01ec2--124d--427d--9812--e437f90261d4 253:1 0 1.8T 0 lvm >> >> We are stuck here. How do we attach an OSD daemon to the drive? It was OSD.122 previously >> >> Thanks >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=3V1n-r1W__Mu-wEAwzq7jDpopOSMrfRfomn1f5bgT28=T8FGOFoarkOiORgemihDpPCoz3wRG5GH_oQWne3ROvc=4zaqEyKSugJ7AN4hZW6vOZ4SZ0-SxF-yj8OGBM2zv6c= > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=3V1n-r1W__Mu-wEAwzq7jDpopOSMrfRfomn1f5bgT28=T8FGOFoarkOiORgemihDpPCoz3wRG5GH_oQWne3ROvc=4zaqEyKSugJ7AN4hZW6vOZ4SZ0-SxF-yj8OGBM2zv6c= ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rolling upgrade fails with flag norebalance with background IO [EXT]
https://github.com/ceph/ceph-ansible/issues/3961 <--- created ticket Thanks Tarek From: Matthew Vernon To: Tarek Zegar , solarflo...@gmail.com Cc: ceph-users@lists.ceph.com Date: 05/14/2019 04:41 AM Subject:[EXTERNAL] Re: [ceph-users] Rolling upgrade fails with flag norebalance with background IO [EXT] On 14/05/2019 00:36, Tarek Zegar wrote: > It's not just mimic to nautilus > I confirmed with luminous to mimic > > They are checking for clean pgs with flags set, they should unset flags, > then check. Set flags again, move on to next osd I think I'm inclined to agree that "norebalance" is likely to get in the way when upgrading a cluster - our rolling upgrade playbook omits it. OTOH, you might want to raise this on the ceph-ansible list ( ceph-ansi...@lists.ceph.com ) and/or as a github issue - I don't think the ceph-ansible maintainers routinely watch this list. HTH, Matthew -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Lost OSD from PCIe error, recovered, to restore OSD process
Someone nuked and OSD that had 1 replica PGs. They accidentally did echo 1 > /sys/block/nvme0n1/device/device/remove We got it back doing a echo 1 > /sys/bus/pci/rescan However, it reenumerated as a different drive number (guess we didn't have udev rules) They restored the LVM volume (vgcfgrestore ceph-8c81b2a3-6c8e-4cae-a3c0-e2d91f82d841 ; vgchange -ay ceph-8c81b2a3-6c8e-4cae-a3c0-e2d91f82d841) lsblk nvme0n2 259:90 1.8T 0 diskc ceph--8c81b2a3--6c8e--4cae--a3c0--e2d91f82d841-osd--data--74b01ec2--124d--427d--9812--e437f90261d4 253:10 1.8T 0 lvm We are stuck here. How do we attach an OSD daemon to the drive? It was OSD.122 previously Thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rolling upgrade fails with flag norebalance with background IO
It's not just mimic to nautilus I confirmed with luminous to mimic They are checking for clean pgs with flags set, they should unset flags, then check. Set flags again, move on to next osd - Original message -From: solarflow99 To: Tarek Zegar Cc: Ceph Users Subject: [EXTERNAL] Re: [ceph-users] Rolling upgrade fails with flag norebalance with background IODate: Mon, May 13, 2019 6:36 PM Are you sure can you really use 3.2 for nautilus? On Fri, May 10, 2019 at 7:23 AM Tarek Zegar <tze...@us.ibm.com> wrote: Ceph-ansible 3.2, rolling upgrade mimic -> nautilus. The ansible file sets flag "norebalance". When there is*no* I/O to the cluster, upgrade works fine. When upgrading with IO running in the background, some PG become `active+undersized+remapped+backfilling`Flag norebalance prevents them from backfilling / recovering and upgrade fails. I'm uncertain why those OSD are "backfilling" instead of "recovering" but I guess it doesn't matter, norebalance halts the process. setting ceph tell osd.* injectargs '--osd_max_backfills=2 made no difference https://github.com/ceph/ceph-ansible/commit/08d94324545b3c4e0f6a1caf6224f37d1c2b36db <-- did anyone other then the author verify this?Tarek ___ceph-users mailing listceph-users@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph MGR CRASH : balancer module
Hello, My manager keeps dying, the last meta log is below. What is causing this? I do have two roots in the osd tree with shared hosts(see below), I can't imagine that is causing balancer to fail? meta log: { "crash_id": "2019-05-11_19:09:17.999875Z_aa7afa7c-bc7e-43ec-b32a-821bd47bd68b", "timestamp": "2019-05-11 19:09:17.999875Z", "process_name": "ceph-mgr", "entity_name": "mgr.pok1-qz1-sr1-rk023-s08", "ceph_version": "14.2.0", "utsname_hostname": "pok1-qz1-sr1-rk023-s08", "utsname_sysname": "Linux", "utsname_release": "4.15.0-1014-ibm-gt", "utsname_version": "#16-Ubuntu SMP Tue Dec 11 11:19:10 UTC 2018", "utsname_machine": "x86_64", "os_name": "Ubuntu", "os_id": "ubuntu", "os_version_id": "18.04", "os_version": "18.04.1 LTS (Bionic Beaver)", "assert_condition": "osd_weight.count(i.first)", "assert_func": "int OSDMap::calc_pg_upmaps(CephContext*, float, int, const std::set&, OSDMap::Incremental*)", "assert_file": "/build/ceph-14.2.0/src/osd/OSDMap.cc", "assert_line": 4743, "assert_thread_name": "balancer", "assert_msg": "/build/ceph-14.2.0/src/osd/OSDMap.cc: In function 'int OSDMap::calc_pg_upmaps(CephContext*, float, int, const std::set&, OSDMap::Incremental*)' thread 7fffd6572700 time 2019-05-11 19:09:17.998114 \n/build/ceph-14.2.0/src/osd/OSDMap.cc: 4743: FAILED ceph_assert (osd_weight.count(i.first))\n", "backtrace": [ "(()+0x12890) [0x7fffee586890]", "(gsignal()+0xc7) [0x7fffed67ee97]", "(abort()+0x141) [0x7fffed680801]", "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x7fffef1eb7d3]", "(ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7fffef1eb95d]", "(OSDMap::calc_pg_upmaps(CephContext*, float, int, std::set, std::allocator > const&, OSDMap::Incremental*)+0x274b) [0x7fffef61bb3b]", "(()+0x1d52b6) [0x557292b6]", "(PyEval_EvalFrameEx()+0x8010) [0x7fffeeab21d0]", "(PyEval_EvalCodeEx()+0x7d8) [0x7fffeebe2278]", "(PyEval_EvalFrameEx()+0x5bf6) [0x7fffeeaafdb6]", "(PyEval_EvalFrameEx()+0x8b5b) [0x7fffeeab2d1b]", "(PyEval_EvalFrameEx()+0x8b5b) [0x7fffeeab2d1b]", "(PyEval_EvalCodeEx()+0x7d8) [0x7fffeebe2278]", "(()+0x1645f9) [0x7fffeeb675f9]", "(PyObject_Call()+0x43) [0x7fffeea57333]", "(()+0x1abd1c) [0x7fffeebaed1c]", "(PyObject_Call()+0x43) [0x7fffeea57333]", "(PyObject_CallMethod()+0xc8) [0x7fffeeb7bc78]", "(PyModuleRunner::serve()+0x62) [0x55725f32]", "(PyModuleRunner::PyModuleRunnerThread::entry()+0x1cf) [0x557265df]", "(()+0x76db) [0x7fffee57b6db]", "(clone()+0x3f) [0x7fffed76188f]" ] } OSD TREE: ID CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF -2954.58200 root tzrootthreenodes -2518.19400 host pok1-qz1-sr1-rk001-s20 0 ssd 1.81898 osd.0 up 1.0 1.0 122 ssd 1.81898 osd.122 up 1.0 1.0 135 ssd 1.81898 osd.135 up 1.0 1.0 149 ssd 1.81898 osd.149 up 1.0 1.0 162 ssd 1.81898 osd.162 up 1.0 1.0 175 ssd 1.81898 osd.175 up 1.0 1.0 188 ssd 1.81898 osd.188 up 1.0 1.0 200 ssd 1.81898 osd.200 up 1.0 1.0 213 ssd 1.81898 osd.213 up 1.0 1.0 225 ssd 1.81898 osd.225 up 1.0 1.0 -518.19400 host pok1-qz1-sr1-rk002-s05 112 ssd 1.81898 osd.112 up 1.0 1.0 120 ssd 1.81898 osd.120 up 1.0 1.0 132 ssd 1.81898 osd.132 up 1.0 1.0 144 ssd 1.81898 osd.144 up 1.0 1.0 156 ssd 1.81898 osd.156 up 1.0 1.0 168 ssd 1.81898 osd.168 up 1.0 1.0 180 ssd 1.81898 osd.180 up 1.0 1.0 192 ssd 1.81898 osd.192 up 1.0 1.0 204 ssd 1.81898 osd.204 up 1.0 1.0 216 ssd 1.81898 osd.216 up 1.0 1.0 -1118.19400 host pok1-qz1-sr1-rk002-s16 115 ssd 1.81898 osd.115 up 1.0 1.0 127 ssd 1.81898 osd.127 up 1.0 1.0 139 ssd 1.81898 osd.139 up 1.0 1.0 151 ssd 1.81898 osd.151 up 1.0 1.0 163 ssd 1.81898 osd.163 up 1.0 1.0 174 ssd 1.81898
[ceph-users] Rolling upgrade fails with flag norebalance with background IO
Ceph-ansible 3.2, rolling upgrade mimic -> nautilus. The ansible file sets flag "norebalance". When there is*no* I/O to the cluster, upgrade works fine. When upgrading with IO running in the background, some PG become `active+undersized+remapped+backfilling` Flag norebalance prevents them from backfilling / recovering and upgrade fails. I'm uncertain why those OSD are "backfilling" instead of "recovering" but I guess it doesn't matter, norebalance halts the process. setting ceph tell osd.* injectargs '--osd_max_backfills=2 made no difference https://github.com/ceph/ceph-ansible/commit/08d94324545b3c4e0f6a1caf6224f37d1c2b36db <-- did anyone other then the author verify this? Tarek ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] PG in UP set but not Acting? Backfill halted
Hello, Been working on Ceph for only a few weeks and have a small cluster in VMs. I did a ceph-ansible rolling_update to nautilus from mimic and some of my PG were stuck in 'active+undersized+remapped+backfilling' with no progress. All OSD were up and in (see ceph tree below). The PGs only had 2 OSD in the acting set, yet 3 in the UP set. I don't understand how the acting set can have two and the upset can have 3; if anything, wouldn't the upset be a subset of acting? Anyway, I noticed that the ansible rolling_update set the following flags 'noout' AND 'norebalance'. PG query showed backfill target as OSD 0 (which was missing from the acting set) and "waiting on backfill" was blank, as such I'm very confused. So it wants to backfill OSD 0, it's not blocked per empty set in waiting_on_backfill, so what's holding it up? Why is it not in the acting set? (what's the clear definition of acting vs up) ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.08817 root default -5 0.02939 host hostosd1 0 hdd 0.00980 osd.0 up 1.0 1.0 4 hdd 0.00980 osd.4 up 1.0 1.0 7 hdd 0.00980 osd.7 up 1.0 1.0 -3 0.02939 host hostosd2 1 hdd 0.00980 osd.1 up 1.0 1.0 3 hdd 0.00980 osd.3 up 1.0 1.0 6 hdd 0.00980 osd.6 up 1.0 1.0 -7 0.02939 host hostosd3 2 hdd 0.00980 osd.2 up 1.0 1.0 5 hdd 0.00980 osd.5 up 1.0 1.0 8 hdd 0.00980 osd.8 up 1.0 1.0 PG Info 1.35 3 00 0 0 8388623 0 0 3045 3045active+undersized+remapped +backfilling 2019-05-09 16:18:02.513033 50'107145 50:108127 [5,6,0] 5 [5,6] PG Query "state": "active+undersized+remapped+backfilling", "snap_trimq": "[]", "snap_trimq_len": 0, "epoch": 50, "up": [ 5, 6, 0 ], "acting": [ 5, 6 ], "backfill_targets": [ "0" ], "acting_recovery_backfill": [ "0", "5", "6" ] ... "waiting_on_backfill": [], "last_backfill_started": "MAX", "backfill_info": { "begin": "MAX", "end": "MAX", "objects": [] }, "peer_backfill_info": [ "0", { "begin": "MAX", "end": "MAX", "objects": [] } ], "backfills_in_flight": [], "recovering": [], ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com