Re: [ceph-users] How to repair active+clean+inconsistent?
So, I’ve issued the deep-scrub command (and the repair command) and nothing seems to happen. Unrelated to this issue, I have to take down some OSD to prepare a host for RMA. One of them happens to be in the replication group for this PG. So, a scrub happened indirectly. I now have this from “ceph -s”: cluster 374aed9e-5fc1-47e1-8d29-4416f7425e76 health HEALTH_ERR 1 pgs inconsistent 18446 scrub errors monmap e1: 3 mons at {mgmt01=10.0.1.1:6789/0,mgmt02=10.1.1.1:6789/0,mgmt03=10.2.1.1:6789/0} election epoch 252, quorum 0,1,2 mgmt01,mgmt02,mgmt03 fsmap e346: 1/1/1 up {0=mgmt01=up:active}, 2 up:standby osdmap e40248: 120 osds: 119 up, 119 in flags sortbitwise,require_jewel_osds pgmap v22025963: 3136 pgs, 18 pools, 18975 GB data, 214 Mobjects 59473 GB used, 287 TB / 345 TB avail 3120 active+clean 15 active+clean+scrubbing+deep 1 active+clean+inconsistent That’s a lot of scrub errors: HEALTH_ERR 1 pgs inconsistent; 18446 scrub errors pg 1.65 is active+clean+inconsistent, acting [62,67,33] 18446 scrub errors Now, “rados list-inconsistent-obj 1.65” returns a *very* long JSON output. Here’s a very small snippet, the errors look the same across: { “object”:{ "name":”10ea8bb.0045”, "nspace":”", "locator":”", "snap":"head”, "version”:59538 }, "errors":["attr_name_mismatch”], "union_shard_errors":["oi_attr_missing”], "selected_object_info":"1:a70dc1cc:::10ea8bb.0045:head(2897'59538 client.4895965.0:462007 dirty|data_digest|omap_digest s 4194304 uv 59538 dd f437a612 od alloc_hint [0 0])”, "shards”:[ { "osd":33, "errors":[], "size":4194304, "omap_digest”:"0x”, "data_digest”:"0xf437a612”, "attrs":[ {"name":"_”, "value":”EAgNAQAABAM1AA...“, "Base64":true}, {"name":"snapset”, "value":”AgIZAQ...“, "Base64":true} ] }, { "osd":62, "errors":[], "size":4194304, "omap_digest":"0x”, "data_digest":"0xf437a612”, "attrs”:[ {"name":"_”, "value":”EAgNAQAABAM1AA...", "Base64":true}, {"name":"snapset”, "value":”AgIZAQ…", "Base64":true} ] }, { "osd":67, "errors":["oi_attr_missing”], "size":4194304, "omap_digest":"0x”, "data_digest":"0xf437a612”, "attrs":[] } ] } Clearly, on osd.67, the “attrs” array is empty. The question is, how do I fix this? Many thanks in advance, -kc K.C. Wong kcw...@verseon.com <mailto:kcw...@verseon.com> M: +1 (408) 769-8235 - Confidentiality Notice: This message contains confidential information. If you are not the intended recipient and received this message in error, any use or distribution is strictly prohibited. Please also notify us immediately by return e-mail, and delete this message from your computer system. Thank you. - 4096R/B8995EDE <https://sks-keyservers.net/pks/lookup?op=get=0x23A692E9B8995EDE> E527 CBE8 023E 79EA 8BBB 5C77 23A6 92E9 B899 5EDE hkps://hkps.pool.sks-keyservers.net > On Nov 11, 2018, at 10:58 PM, Brad Hubbard wrote: > > On Mon, Nov 12, 2018 at 4:21 PM Ashley Merrick <mailto:singap...@amerrick.co.uk>> wrote: >> >> Your need to run "ceph pg deep-scrub 1.65" first > > Right, thanks Ashley. That's what the "Note that you may have to do a > deep scrub to populate the output." part of my answer meant but > perhaps I needed to go further? > > The system has a record of a scrub error on a previous scan but > subsequent activity in the cluster has invalidated the specifics. You > need to run another scrub to get the specific information for this pg > at this point in time (the information does not remain valid > indefinitely and therefore may need to be renewed depending on > circumstances). > >> >> On Mon, Nov 12, 2018 at 2:20 PM K.C. Wong wrote: >>> >>> Hi Brad, >>> >>> I got the following: >>> >>> [root@mgmt01 ~]# ceph health detail >>> HEALTH_ERR 1 pgs inconsistent; 1 scrub errors >>> pg 1.65 is active+clean+inconsistent, acting [62,67,47] >
Re: [ceph-users] How to repair active+clean+inconsistent?
Thanks, Ashley. Should I expect the deep-scrubbing to start immediately? [root@mgmt01 ~]# ceph pg deep-scrub 1.65 instructing pg 1.65 on osd.62 to deep-scrub [root@mgmt01 ~]# ceph pg ls deep_scrub pg_stat objects mip degrmispunf bytes log disklog state state_stamp v reportedup up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 16.75 430657 0 0 0 0 30754735820 30073007 active+clean+scrubbing+deep 2018-11-11 11:05:11.572325 39934'549067 39934:1311893 [4,64,35] 4 [4,64,35] 4 28743'539264 2018-11-07 02:17:53.293336 28743'5392642018-11-03 14:39:44.837702 16.86 430617 0 0 0 0 30316842298 30483048 active+clean+scrubbing+deep 2018-11-11 15:56:30.148527 39934'548012 39934:1038058 [18,2,62] 18 [18,2,62] 18 26347'529815 2018-10-28 01:06:55.526624 26347'5298152018-10-28 01:06:55.526624 16.eb 432196 0 0 0 0 30612459543 30713071 active+clean+scrubbing+deep 2018-11-11 11:02:46.993022 39934'550340 39934:3662047 [56,44,42] 56 [56,44,42] 56 28507'540255 2018-11-02 03:28:28.013949 28507'5402552018-11-02 03:28:28.013949 16.f3 431399 0 0 0 0 30672009253 30673067 active+clean+scrubbing+deep 2018-11-11 17:40:55.732162 39934'549240 39934:2212192 [69,82,6] 69 [69,82,6] 69 28743'539336 2018-11-02 17:22:05.745972 28743'5393362018-11-02 17:22:05.745972 16.f7 430885 0 0 0 0 30796505272 31003100 active+clean+scrubbing+deep 2018-11-11 22:50:05.231599 39934'548910 39934:683169[59,63,119] 59 [59,63,119] 59 28743'539167 2018-11-03 07:24:43.776341 26347'5308302018-10-28 04:44:12.276982 16.14c 430565 0 0 0 0 31177011073 30423042 active+clean+scrubbing+deep 2018-11-11 20:11:31.107313 39934'550564 39934:1545200 [41,12,70] 41 [41,12,70] 41 28743'540758 2018-11-03 23:04:49.155741 28743'5407582018-11-03 23:04:49.155741 16.156 430356 0 0 0 0 31021738479 30063006 active+clean+scrubbing+deep 2018-11-11 20:44:14.019537 39934'549241 39934:2958053 [83,47,1] 83 [83,47,1] 83 28743'539462 2018-11-04 14:46:56.890822 28743'5394622018-11-04 14:46:56.890822 16.19f 431613 0 0 0 0 30746145827 30633063 active+clean+scrubbing+deep 2018-11-11 19:06:40.693002 39934'549429 39934:1189872 [14,54,37] 14 [14,54,37] 14 28743'539660 2018-11-04 18:25:13.225962 26347'5313452018-10-28 20:08:45.286421 16.1b1 431225 0 0 0 0 30988996529 30483048 active+clean+scrubbing+deep 2018-11-11 20:12:35.367935 39934'549604 39934:778127[34,106,11] 34 [34,106,11] 34 26347'531560 2018-10-27 16:49:46.944748 26347'5315602018-10-27 16:49:46.944748 16.1e2 431724 0 0 0 0 30247732969 30703070 active+clean+scrubbing+deep 2018-11-11 20:55:17.591646 39934'550105 39934:1428341 [103,48,3] 103 [103,48,3] 103 28743'540270 2018-11-06 03:36:30.531106 28507'5398402018-11-02 01:08:23.268409 16.1f3 430604 0 0 0 0 30633545866 30393039 active+clean+scrubbing+deep 2018-11-11 20:15:28.557464 39934'548804 39934:1354817 [66,102,33] 66 [66,102,33] 66 28743'538896 2018-11-04 04:59:33.118414 28743'5388962018-11-04 04:59:33.118414 [root@mgmt01 ~]# ceph pg ls inconsistent pg_stat objects mip degrmispunf bytes log disklog state state_stamp v reportedup up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 1.6512806 0 0 0 0 30010463024 30083008 active+clean+inconsistent 2018-11-10 00:16:43.965966 39934'184512 39934:388820[62,67,47] 62 [62,67,47] 62 28743'183853 2018-11-04 01:31:27.042458 28743'1838532018-11-04 01:31:27.042458 It’s similar to when I issued “ceph pg repair 1.65”, instructing osd.62 to repair 1.65, and then nothing seems to happen. -kc K.C. Wong kcw...@verseon.com <mailto:kcw...@verseon.com> M: +1 (408) 769-8235 - Confidentiality Notice: This message contains confidential information. If you are not the intended recipient and received this message in error, any use or distribution is strictly prohibited. Pleas
Re: [ceph-users] How to repair active+clean+inconsistent?
Hi Brad, I got the following: [root@mgmt01 ~]# ceph health detail HEALTH_ERR 1 pgs inconsistent; 1 scrub errors pg 1.65 is active+clean+inconsistent, acting [62,67,47] 1 scrub errors [root@mgmt01 ~]# rados list-inconsistent-obj 1.65 No scrub information available for pg 1.65 error 2: (2) No such file or directory [root@mgmt01 ~]# rados list-inconsistent-snapset 1.65 No scrub information available for pg 1.65 error 2: (2) No such file or directory Rather odd output, I’d say; not that I understand what that means. I also tried ceph list-inconsistent-pg: [root@mgmt01 ~]# rados lspools rbd cephfs_data cephfs_metadata .rgw.root default.rgw.control default.rgw.data.root default.rgw.gc default.rgw.log ctrl-p prod corp camp dev default.rgw.users.uid default.rgw.users.keys default.rgw.buckets.index default.rgw.buckets.data default.rgw.buckets.non-ec [root@mgmt01 ~]# for i in $(rados lspools); do rados list-inconsistent-pg $i; done [] ["1.65"] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] So, that’d put the inconsistency in the cephfs_data pool. Thank you for your help, -kc K.C. Wong kcw...@verseon.com <mailto:kcw...@verseon.com> M: +1 (408) 769-8235 - Confidentiality Notice: This message contains confidential information. If you are not the intended recipient and received this message in error, any use or distribution is strictly prohibited. Please also notify us immediately by return e-mail, and delete this message from your computer system. Thank you. - 4096R/B8995EDE <https://sks-keyservers.net/pks/lookup?op=get=0x23A692E9B8995EDE> E527 CBE8 023E 79EA 8BBB 5C77 23A6 92E9 B899 5EDE hkps://hkps.pool.sks-keyservers.net > On Nov 11, 2018, at 5:43 PM, Brad Hubbard wrote: > > What does "rados list-inconsistent-obj " say? > > Note that you may have to do a deep scrub to populate the output. > On Mon, Nov 12, 2018 at 5:10 AM K.C. Wong wrote: >> >> Hi folks, >> >> I would appreciate any pointer as to how I can resolve a >> PG stuck in “active+clean+inconsistent” state. This has >> resulted in HEALTH_ERR status for the last 5 days with no >> end in sight. The state got triggered when one of the drives >> in the PG returned I/O error. I’ve since replaced the failed >> drive. >> >> I’m running Jewel (out of centos-release-ceph-jewel) on >> CentOS 7. I’ve tried “ceph pg repair ” and it didn’t seem >> to do anything. I’ve tried even more drastic measures such as >> comparing all the files (using filestore) under that PG_head >> on all 3 copies and then nuking the outlier. Nothing worked. >> >> Many thanks, >> >> -kc >> >> K.C. Wong >> kcw...@verseon.com >> M: +1 (408) 769-8235 >> >> - >> Confidentiality Notice: >> This message contains confidential information. If you are not the >> intended recipient and received this message in error, any use or >> distribution is strictly prohibited. Please also notify us >> immediately by return e-mail, and delete this message from your >> computer system. Thank you. >> - >> 4096R/B8995EDE E527 CBE8 023E 79EA 8BBB 5C77 23A6 92E9 B899 5EDE >> hkps://hkps.pool.sks-keyservers.net >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > Cheers, > Brad signature.asc Description: Message signed with OpenPGP ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to repair active+clean+inconsistent?
Hi folks, I would appreciate any pointer as to how I can resolve a PG stuck in “active+clean+inconsistent” state. This has resulted in HEALTH_ERR status for the last 5 days with no end in sight. The state got triggered when one of the drives in the PG returned I/O error. I’ve since replaced the failed drive. I’m running Jewel (out of centos-release-ceph-jewel) on CentOS 7. I’ve tried “ceph pg repair ” and it didn’t seem to do anything. I’ve tried even more drastic measures such as comparing all the files (using filestore) under that PG_head on all 3 copies and then nuking the outlier. Nothing worked. Many thanks, -kc K.C. Wong kcw...@verseon.com M: +1 (408) 769-8235 - Confidentiality Notice: This message contains confidential information. If you are not the intended recipient and received this message in error, any use or distribution is strictly prohibited. Please also notify us immediately by return e-mail, and delete this message from your computer system. Thank you. - 4096R/B8995EDE E527 CBE8 023E 79EA 8BBB 5C77 23A6 92E9 B899 5EDE hkps://hkps.pool.sks-keyservers.net signature.asc Description: Message signed with OpenPGP ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] HEALTH_ERR resulted from a bad sector
Our Ceph cluster entered into that HEALTH_ERR last week. We’re running Infernalis and that was the first time I’ve seen it in that state. Even when OSD instances dropped off we’ve only seen HEALTH_WARN. The output of `ceph status` looks like this: [root@r01u02-b ~]# ceph status cluster ed62b3b9-be4a-4ce2-8cd3-34854aa8d6c2 health HEALTH_ERR 1 pgs inconsistent 1 scrub errors monmap e1: 3 mons at {r01u01-a=192.168.111.11:6789/0,r01u02-b=192.168.111.16:6789/0,r01u03-c=192.168.111.21:6789/0} election epoch 900, quorum 0,1,2 r01u01-a,r01u02-b,r01u03-c mdsmap e744: 1/1/1 up {0=r01u01-a=up:active}, 2 up:standby osdmap e533858: 48 osds: 48 up, 48 in flags sortbitwise pgmap v47571404: 3456 pgs, 14 pools, 16470 GB data, 18207 kobjects 33056 GB used, 56324 GB / 89381 GB avail 3444 active+clean 8 active+clean+scrubbing+deep 3 active+clean+scrubbing 1 active+clean+inconsistent client io 1535 kB/s wr, 23 op/s I tracked down the inconsistent PG and found that one of pair of OSDs had kernel log messages like these: [1773723.509386] sd 5:0:0:0: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [1773723.509390] sd 5:0:0:0: [sdb] Sense Key : Medium Error [current] [descriptor] [1773723.509394] sd 5:0:0:0: [sdb] Add. Sense: Unrecovered read error - auto reallocate failed [1773723.509398] sd 5:0:0:0: [sdb] CDB: Read(10) 28 00 01 4c 1b a0 00 00 08 00 [1773723.509401] blk_update_request: I/O error, dev sdb, sector 21765025 Replacing the disk on that OSD server eventually fix the problem, but it took a long time to get out of the error state: [root@r01u01-a ~]# ceph status cluster ed62b3b9-be4a-4ce2-8cd3-34854aa8d6c2 health HEALTH_ERR 61 pgs backfill 2 pgs backfilling 1 pgs inconsistent 1 pgs repair 63 pgs stuck unclean recovery 5/37908099 objects degraded (0.000%) recovery 1244055/37908099 objects misplaced (3.282%) 1 scrub errors monmap e1: 3 mons at {r01u01-a=192.168.111.11:6789/0,r01u02-b=192.168.111.16:6789/0,r01u03-c=192.168.111.21:6789/0} election epoch 920, quorum 0,1,2 r01u01-a,r01u02-b,r01u03-c mdsmap e759: 1/1/1 up {0=r01u02-b=up:active}, 2 up:standby osdmap e534536: 48 osds: 48 up, 48 in; 63 remapped pgs flags sortbitwise pgmap v47590337: 3456 pgs, 14 pools, 16466 GB data, 18205 kobjects 33085 GB used, 56295 GB / 89381 GB avail 5/37908099 objects degraded (0.000%) 1244055/37908099 objects misplaced (3.282%) 3385 active+clean 61 active+remapped+wait_backfill 6 active+clean+scrubbing+deep 2 active+remapped+backfilling 1 active+clean+scrubbing+deep+inconsistent+repair 1 active+clean+scrubbing client io 2720 kB/s wr, 16 op/s Here’s what I’m curious about: * How did a bad sector resulted in more damage to the Ceph cluster than a few downed OSD servers? * Is this issue addressed in later releases? I’m in the middle of setting up a Jewel instance. * What can be done to avoid the `HEALTH_ERR` state in similar failure scenarios? Increasing the default pool size from 2 to 3? Many thanks for any input/insight you may have. -kc K.C. Wong kcw...@verseon.com M: +1 (408) 769-8235 - Confidentiality Notice: This message contains confidential information. If you are not the intended recipient and received this message in error, any use or distribution is strictly prohibited. Please also notify us immediately by return e-mail, and delete this message from your computer system. Thank you. - 4096R/B8995EDE E527 CBE8 023E 79EA 8BBB 5C77 23A6 92E9 B899 5EDE hkps://hkps.pool.sks-keyservers.net signature.asc Description: Message signed with OpenPGP ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Help with systemd
Thank you for the suggestion and gist. I'll give that a try. -kc > On Aug 22, 2016, at 11:53 AM, Jeffrey Ollie <j...@ocjtech.us> wrote: > > I put the systemd service files that I use to map a RBD and mount the > filesystem before starting up PostgreSQL into the following gist. It's > probably not perfect, but it seems to work for me. Personally, I like > using a native service to accomplish this rather than using fstab and > the generator. > > https://gist.github.com/jcollie/60f8b278d1ac5eadb4794db1f4c0e87d > > On Mon, Aug 22, 2016 at 1:16 PM, K.C. Wong <kcw...@verseon.com> wrote: >> Folks, >> >> I have some services that depends on RBD images getting >> mounted prior to service start-up. I am having a really >> hard time getting out of systemd dependency hell. >> >> * I create a run-once systemd service that basically does >> the rbd map operation, and set it start after network.target, >> network-online.target, and ceph.target (probably overkill) >> * I added 'x-systemd.requires=' to the >> mount-point in /etc/fstab >> >> And when the system reboot, it'd complain about ordering >> cycle and sometimes resulting in rescue mode. Because the >> filesystem is 'xfs', I believe systemd-fstab-generator >> classifies the mount-point as 'local-fs'. Is there a way >> to force a 'remote-fs' reclassification? Or is there some >> other way to get out of this ordering nightmare... Old >> school 'S' and 'K' numbers are *so* simple; I'd trade >> consistency for speed any day. >> >> Thanks for any suggestion or insight. >> >> -kc >> BTW, I disable NetworkManager which, I know, kind of breaks >> network-online.target. >> >> K.C. Wong >> kcw...@verseon.com >> 4096R/B8995EDE E527 CBE8 023E 79EA 8BBB 5C77 23A6 92E9 B899 5EDE >> hkps://hkps.pool.sks-keyservers.net >> >> >> _______ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > > -- > Jeff Ollie > The majestik møøse is one of the mäni interesting furry animals in Sweden. K.C. Wong kcw...@verseon.com 4096R/B8995EDE E527 CBE8 023E 79EA 8BBB 5C77 23A6 92E9 B899 5EDE hkps://hkps.pool.sks-keyservers.net signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Help with systemd
Folks, I have some services that depends on RBD images getting mounted prior to service start-up. I am having a really hard time getting out of systemd dependency hell. * I create a run-once systemd service that basically does the rbd map operation, and set it start after network.target, network-online.target, and ceph.target (probably overkill) * I added 'x-systemd.requires=' to the mount-point in /etc/fstab And when the system reboot, it'd complain about ordering cycle and sometimes resulting in rescue mode. Because the filesystem is 'xfs', I believe systemd-fstab-generator classifies the mount-point as 'local-fs'. Is there a way to force a 'remote-fs' reclassification? Or is there some other way to get out of this ordering nightmare... Old school 'S' and 'K' numbers are *so* simple; I'd trade consistency for speed any day. Thanks for any suggestion or insight. -kc BTW, I disable NetworkManager which, I know, kind of breaks network-online.target. K.C. Wong kcw...@verseon.com 4096R/B8995EDE E527 CBE8 023E 79EA 8BBB 5C77 23A6 92E9 B899 5EDE hkps://hkps.pool.sks-keyservers.net signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Troubleshooting] I have a watcher I can't get rid of...
Thank you, Jason. While I can't find the culprit for the watcher (the watcher never expired, and survived a reboot. udev, maybe?), blacklisting the host did allow me to remove the device. Much appreciated, -kc > On Aug 4, 2016, at 4:50 AM, Jason Dillaman <jdill...@redhat.com> wrote: > > If the client is no longer running the watch should expire within 30 > seconds. If you are still experiencing this issue, you can blacklist > the mystery client via "ceph osd blacklist add". > > On Wed, Aug 3, 2016 at 6:06 PM, K.C. Wong <kcw...@verseon.com> wrote: >> I'm having a hard time removing an RBD that I no longer need. >> >> # rbd rm / >> 2016-08-03 15:00:01.085784 7ff9dfc997c0 -1 librbd: image has watchers - not >> removing >> Removing image: 0% complete...failed. >> rbd: error: image still has watchers >> This means the image is still open or the client using it crashed. Try again >> after closing/unmapping it or waiting 30s for the crashed client to timeout. >> >> So, I use `rbd status` to identify the watcher: >> >> # rbd status / >> Watchers: >>watcher=:0/705293879 client.1076985 cookie=1 >> >> I log onto that host, and did >> >> # rbd showmapped >> >> which returns nothing >> >> I don't use snapshot and I don't use cloning, so, there shouldn't >> be any image sharing. I ended up rebooting that host and the >> watcher is still around, and my problem persist: I can't remove >> the RBD. >> >> At this point, I'm all out of ideas on how to troubleshoot this >> problem. I'm running infernalis: >> >> # ceph --version >> ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd) >> >> in my set up, on CentOS 7.2 hosts >> >> # uname -r >> 3.10.0-327.22.2.el7.x86_64 >> >> I appreciate any assistance, >> >> -kc >> >> K.C. Wong >> kcw...@verseon.com >> 4096R/B8995EDE E527 CBE8 023E 79EA 8BBB 5C77 23A6 92E9 B899 5EDE >> hkps://hkps.pool.sks-keyservers.net >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > > -- > Jason K.C. Wong kcw...@verseon.com 4096R/B8995EDE E527 CBE8 023E 79EA 8BBB 5C77 23A6 92E9 B899 5EDE hkps://hkps.pool.sks-keyservers.net signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] [Troubleshooting] I have a watcher I can't get rid of...
I'm having a hard time removing an RBD that I no longer need. # rbd rm / 2016-08-03 15:00:01.085784 7ff9dfc997c0 -1 librbd: image has watchers - not removing Removing image: 0% complete...failed. rbd: error: image still has watchers This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout. So, I use `rbd status` to identify the watcher: # rbd status / Watchers: watcher=:0/705293879 client.1076985 cookie=1 I log onto that host, and did # rbd showmapped which returns nothing I don't use snapshot and I don't use cloning, so, there shouldn't be any image sharing. I ended up rebooting that host and the watcher is still around, and my problem persist: I can't remove the RBD. At this point, I'm all out of ideas on how to troubleshoot this problem. I'm running infernalis: # ceph --version ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd) in my set up, on CentOS 7.2 hosts # uname -r 3.10.0-327.22.2.el7.x86_64 I appreciate any assistance, -kc K.C. Wong kcw...@verseon.com 4096R/B8995EDE E527 CBE8 023E 79EA 8BBB 5C77 23A6 92E9 B899 5EDE hkps://hkps.pool.sks-keyservers.net signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to avoid kernel conflicts
The systems on which the `rbd map` hangs problem occurred are definitely not under memory stress. I don't believer they are doing a lot of disk I/O either. Here's the basic set-up: * all nodes in the "data-plane" are identical * they each host and OSD instance, sharing one of the drive * I'm running Docker containers using an RBD volume plugin and Docker Compose * when the hang happens, the most visible behavior is that `docker ps` hangs * then I run `systemctl status` and see and `rbd map` process spawned by the RBD volume plugin * I then tried an `strace -f -p ` and that process promptly exits (with RC 0) and the hang resolves itself I'll tried to capture the strace output the next time I run into it and share with the mailing list. Thanks, Ilya. -kc > On May 9, 2016, at 2:21 AM, Ilya Dryomov <idryo...@gmail.com> wrote: > > On Mon, May 9, 2016 at 12:19 AM, K.C. Wong <kcw...@verseon.com> wrote: >> >>> As the tip said, you should not use rbd via kernel module on an OSD host >>> >>> However, using it with userspace code (librbd etc, as in kvm) is fine >>> >>> Generally, you should not have both: >>> - "server" in userspace >>> - "client" in kernelspace >> >> If `librbd` would help avoid this problem, then switch to `rbd-fuse` >> should do the trick, right? >> >> The reason for my line of question is that I've seen occasionl freeze >> up of `rbd map` that's resolved by a 'slight tap' by way of an strace. >> There is definitely great attractiveness to not have specialized nodes >> and make every one the same as the next one on the rack. > > The problem with placing the kernel client on the OSD node is the > potential deadlock under heavy I/O when memory becomes scarce. It's > not recommended, but people are doing it - if you don't stress your > system too much, it'll never happen. > > "rbd map" freeze is definitely not related to the abov. Did the actual > command hang? Could you describe what you saw in more detail and how > did strace help? It could be that you ran into > >http://tracker.ceph.com/issues/14737 > > Thanks, > >Ilya K.C. Wong kcw...@verseon.com 4096R/B8995EDE E527 CBE8 023E 79EA 8BBB 5C77 23A6 92E9 B899 5EDE hkps://hkps.pool.sks-keyservers.net signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to avoid kernel conflicts
> As the tip said, you should not use rbd via kernel module on an OSD host > > However, using it with userspace code (librbd etc, as in kvm) is fine > > Generally, you should not have both: > - "server" in userspace > - "client" in kernelspace If `librbd` would help avoid this problem, then switch to `rbd-fuse` should do the trick, right? The reason for my line of question is that I've seen occasionl freeze up of `rbd map` that's resolved by a 'slight tap' by way of an strace. There is definitely great attractiveness to not have specialized nodes and make every one the same as the next one on the rack. Thanks, -kc K.C. Wong kcw...@verseon.com 4096R/B8995EDE E527 CBE8 023E 79EA 8BBB 5C77 23A6 92E9 B899 5EDE hkps://hkps.pool.sks-keyservers.net signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to avoid kernel conflicts
Hi, I saw this tip in the troubleshooting section: DO NOT mount kernel clients directly on the same node as your Ceph Storage Cluster, because kernel conflicts can arise. However, you can mount kernel clients within virtual machines (VMs) on a single node. Does this mean having a converged deployment is a bad idea? Do I really need dedicated storage nodes? By converged, I mean every node hosting an OSD. At the same time, workload on the node may mount RBD volumes or access CephFS. Do I have to isolate the OSD daemon in its own VM? Any advice would be appreciated. -kc K.C. Wong kcw...@verseon.com 4096R/B8995EDE E527 CBE8 023E 79EA 8BBB 5C77 23A6 92E9 B899 5EDE hkps://hkps.pool.sks-keyservers.net signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com