Re: [ceph-users] Random OSDs respawning continuously
Hi all, When i stop the respawning osd on an OSD node, another osd is respawning on the same node. when the OSD is started to respawing, it puts the following info in the osd log. slow request 31.129671 seconds old, received at 2015-02-13 19:09:32.180496: osd_op(*osd.551*.95229:11 191 10005c4.0033 [copy-get max 8388608] 13.f4ccd256 RETRY=50 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg OSD.551 is part of cache tier. All the respawning osds have the log with different cache tier OSDs. If i restart all the osds in the cache tier osd node, respawning is stopped and cluster become active + clean state. But when i try to write some data on the cluster, random osd starts the respawning. can anyone help me how to solve this issue? 2015-02-13 19:10:02.309848 7f53eef54700 0 log_channel(default) log [WRN] : 11 slow requests, 11 included below; oldest blocked for 30.132629 secs 2015-02-13 19:10:02.309854 7f53eef54700 0 log_channel(default) log [WRN] : slow request 30.132629 seconds old, received at 2015-02-13 19:09:32.177075: osd_op(osd.551.95229:63 10002ae. [copy-from ver 7622] 13.7273b256 RETRY=130 snapc 1=[] ondisk+retry+write+ignore_overlay+enforce_snapc+known_if_redirected e95518) currently reached_pg 2015-02-13 19:10:02.309858 7f53eef54700 0 log_channel(default) log [WRN] : slow request 30.131608 seconds old, received at 2015-02-13 19:09:32.178096: osd_op(osd.551.95229:41 5 10003a0.0006 [copy-get max 8388608] 13.aefb256 RETRY=118 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg 2015-02-13 19:10:02.309861 7f53eef54700 0 log_channel(default) log [WRN] : slow request 30.130994 seconds old, received at 2015-02-13 19:09:32.178710: osd_op(osd.551.95229:26 83 100029d.003b [copy-get max 8388608] 13.a2be1256 RETRY=115 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg 2015-02-13 19:10:02.309864 7f53eef54700 0 log_channel(default) log [WRN] : slow request 30.130426 seconds old, received at 2015-02-13 19:09:32.179278: osd_op(osd.551.95229:39 39 10004e9.0032 [copy-get max 8388608] 13.6a25b256 RETRY=105 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg 2015-02-13 19:10:02.309868 7f53eef54700 0 log_channel(default) log [WRN] : slow request 30.129697 seconds old, received at 2015-02-13 19:09:32.180007: osd_op(osd.551.95229:97 49 1000553.007e [copy-get max 8388608] 13.c8645256 RETRY=59 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg 2015-02-13 19:10:03.310284 7f53eef54700 0 log_channel(default) log [WRN] : 11 slow requests, 6 included below; oldest blocked for 31.133092 secs 2015-02-13 19:10:03.310305 7f53eef54700 0 log_channel(default) log [WRN] : slow request 31.129671 seconds old, received at 2015-02-13 19:09:32.180496: osd_op(osd.551.95229:11 191 10005c4.0033 [copy-get max 8388608] 13.f4ccd256 RETRY=50 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg 2015-02-13 19:10:03.310308 7f53eef54700 0 log_channel(default) log [WRN] : slow request 31.128616 seconds old, received at 2015-02-13 19:09:32.181551: osd_op(osd.551.95229:12 903 10002e4.00d6 [copy-get max 8388608] 13.f56a3256 RETRY=41 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg 2015-02-13 19:10:03.310322 7f53eef54700 0 log_channel(default) log [WRN] : slow request 31.127807 seconds old, received at 2015-02-13 19:09:32.182360: osd_op(osd.551.95229:14 165 1000480.0110 [copy-get max 8388608] 13.fd8c1256 RETRY=32 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg 2015-02-13 19:10:03.310327 7f53eef54700 0 log_channel(default) log [WRN] : slow request 31.127320 seconds old, received at 2015-02-13 19:09:32.182847: osd_op(osd.551.95229:15 013 100047f.0133 [copy-get max 8388608] 13.b7b05256 RETRY=27 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg 2015-02-13 19:10:03.310331 7f53eef54700 0 log_channel(default) log [WRN] : slow request 31.126935 seconds old, received at 2015-02-13 19:09:32.183232: osd_op(osd.551.95229:15 767 100066d.001e [copy-get max 8388608] 13.3b017256 RETRY=25 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg 2015-02-13 19:10:04.310685 7f53eef54700 0 log_channel(default) log [WRN] : 11 slow requests, 1 included below; oldest blocked for 32.133566 secs 2015-02-13 19:10:04.310705 7f53eef54700 0 log_channel(default) log [WRN] : slow request 32.126584 seconds old, received at 2015-02-13 19:09:32.184057: osd_op(osd.551.95229:16 293 1000601.0029 [copy-get max 8388608]
Re: [ceph-users] Random OSDs respawning continuously
It's not entirely clear, but it looks like all the ops are just your caching pool OSDs trying to promote objects, and your backing pool OSD's aren't fast enough to satisfy all the IO demanded of them. You may be overloading the system. -Greg On Fri, Feb 13, 2015 at 6:06 AM Mohamed Pakkeer mdfakk...@gmail.com wrote: Hi all, When i stop the respawning osd on an OSD node, another osd is respawning on the same node. when the OSD is started to respawing, it puts the following info in the osd log. slow request 31.129671 seconds old, received at 2015-02-13 19:09:32.180496: osd_op(*osd.551*.95229:11 191 10005c4.0033 [copy-get max 8388608] 13.f4ccd256 RETRY=50 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg OSD.551 is part of cache tier. All the respawning osds have the log with different cache tier OSDs. If i restart all the osds in the cache tier osd node, respawning is stopped and cluster become active + clean state. But when i try to write some data on the cluster, random osd starts the respawning. can anyone help me how to solve this issue? 2015-02-13 19:10:02.309848 7f53eef54700 0 log_channel(default) log [WRN] : 11 slow requests, 11 included below; oldest blocked for 30.132629 secs 2015-02-13 19:10:02.309854 7f53eef54700 0 log_channel(default) log [WRN] : slow request 30.132629 seconds old, received at 2015-02-13 19:09:32.177075: osd_op(osd.551.95229:63 10002ae. [copy-from ver 7622] 13.7273b256 RETRY=130 snapc 1=[] ondisk+retry+write+ignore_overlay+enforce_snapc+known_if_redirected e95518) currently reached_pg 2015-02-13 19:10:02.309858 7f53eef54700 0 log_channel(default) log [WRN] : slow request 30.131608 seconds old, received at 2015-02-13 19:09:32.178096: osd_op(osd.551.95229:41 5 10003a0.0006 [copy-get max 8388608] 13.aefb256 RETRY=118 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg 2015-02-13 19:10:02.309861 7f53eef54700 0 log_channel(default) log [WRN] : slow request 30.130994 seconds old, received at 2015-02-13 19:09:32.178710: osd_op(osd.551.95229:26 83 100029d.003b [copy-get max 8388608] 13.a2be1256 RETRY=115 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg 2015-02-13 19:10:02.309864 7f53eef54700 0 log_channel(default) log [WRN] : slow request 30.130426 seconds old, received at 2015-02-13 19:09:32.179278: osd_op(osd.551.95229:39 39 10004e9.0032 [copy-get max 8388608] 13.6a25b256 RETRY=105 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg 2015-02-13 19:10:02.309868 7f53eef54700 0 log_channel(default) log [WRN] : slow request 30.129697 seconds old, received at 2015-02-13 19:09:32.180007: osd_op(osd.551.95229:97 49 1000553.007e [copy-get max 8388608] 13.c8645256 RETRY=59 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg 2015-02-13 19:10:03.310284 7f53eef54700 0 log_channel(default) log [WRN] : 11 slow requests, 6 included below; oldest blocked for 31.133092 secs 2015-02-13 19:10:03.310305 7f53eef54700 0 log_channel(default) log [WRN] : slow request 31.129671 seconds old, received at 2015-02-13 19:09:32.180496: osd_op(osd.551.95229:11 191 10005c4.0033 [copy-get max 8388608] 13.f4ccd256 RETRY=50 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg 2015-02-13 19:10:03.310308 7f53eef54700 0 log_channel(default) log [WRN] : slow request 31.128616 seconds old, received at 2015-02-13 19:09:32.181551: osd_op(osd.551.95229:12 903 10002e4.00d6 [copy-get max 8388608] 13.f56a3256 RETRY=41 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg 2015-02-13 19:10:03.310322 7f53eef54700 0 log_channel(default) log [WRN] : slow request 31.127807 seconds old, received at 2015-02-13 19:09:32.182360: osd_op(osd.551.95229:14 165 1000480.0110 [copy-get max 8388608] 13.fd8c1256 RETRY=32 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg 2015-02-13 19:10:03.310327 7f53eef54700 0 log_channel(default) log [WRN] : slow request 31.127320 seconds old, received at 2015-02-13 19:09:32.182847: osd_op(osd.551.95229:15 013 100047f.0133 [copy-get max 8388608] 13.b7b05256 RETRY=27 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg 2015-02-13 19:10:03.310331 7f53eef54700 0 log_channel(default) log [WRN] : slow request 31.126935 seconds old, received at 2015-02-13 19:09:32.183232: osd_op(osd.551.95229:15 767 100066d.001e [copy-get max 8388608] 13.3b017256 RETRY=25 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected
[ceph-users] Random OSDs respawning continuously
Hi all, Cluster : 540 OSDs , Cache tier and EC pool ceph version 0.87 cluster c2a97a2f-fdc7-4eb5-82ef-70c52f2eceb1 health HEALTH_WARN 10 pgs peering; 21 pgs stale; 2 pgs stuck inactive; 2 pgs stuck unclean; 287 requests are blocked 32 sec; recovery 24/6707031 objects degraded (0.000%); too few pgs per osd (13 min 20); 1/552 in osds are down; clock skew detected on mon.master02, mon.master03 monmap e3: 3 mons at {master01= 10.1.2.231:6789/0,master02=10.1.2.232:6789/0,master03=10.1.2.233:6789/0}, election epoch 4, quorum 0,1,2 master01,master02,master03 mdsmap e17: 1/1/1 up {0=master01=up:active} osdmap e57805: 552 osds: 551 up, 552 in pgmap v278604: 7264 pgs, 3 pools, 2027 GB data, 547 kobjects 3811 GB used, 1958 TB / 1962 TB avail 24/6707031 objects degraded (0.000%) 7 stale+peering 3 peering 7240 active+clean 13 stale 1 stale+active We have mounted ceph using ceph-fuse client . Suddenly some of osds are re spawning continuously. Still cluster health is unstable. How to stop the respawning osds? 2015-02-12 18:41:51.562337 7f8371373900 0 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), process ceph-osd, pid 3911 2015-02-12 18:41:51.564781 7f8371373900 0 filestore(/var/lib/ceph/osd/ceph-538) backend xfs (magic 0x58465342) 2015-02-12 18:41:51.564792 7f8371373900 1 filestore(/var/lib/ceph/osd/ceph-538) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2015-02-12 18:41:51.655623 7f8371373900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is supported and appears to work 2015-02-12 18:41:51.655639 7f8371373900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-02-12 18:41:51.663864 7f8371373900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2015-02-12 18:41:51.663910 7f8371373900 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_feature: extsize is disabled by conf 2015-02-12 18:41:51.994021 7f8371373900 0 filestore(/var/lib/ceph/osd/ceph-538) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2015-02-12 18:41:52.788178 7f8371373900 1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-02-12 18:41:52.848430 7f8371373900 1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-02-12 18:41:52.922806 7f8371373900 1 journal close /var/lib/ceph/osd/ceph-538/journal 2015-02-12 18:41:52.948320 7f8371373900 0 filestore(/var/lib/ceph/osd/ceph-538) backend xfs (magic 0x58465342) 2015-02-12 18:41:52.981122 7f8371373900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is supported and appears to work 2015-02-12 18:41:52.981137 7f8371373900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-02-12 18:41:52.989395 7f8371373900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2015-02-12 18:41:52.989440 7f8371373900 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_feature: extsize is disabled by conf 2015-02-12 18:41:53.149095 7f8371373900 0 filestore(/var/lib/ceph/osd/ceph-538) mount: WRITEAHEAD journal mode explicitly enabled in conf 2015-02-12 18:41:53.154258 7f8371373900 1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-02-12 18:41:53.217404 7f8371373900 1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-02-12 18:41:53.467512 7f8371373900 0 cls cls/hello/cls_hello.cc:271: loading cls_hello 2015-02-12 18:41:53.563846 7f8371373900 0 osd.538 54486 crush map has features 104186773504, adjusting msgr requires for clients 2015-02-12 18:41:53.563865 7f8371373900 0 osd.538 54486 crush map has features 379064680448 was 8705, adjusting msgr requires for mons 2015-02-12 18:41:53.563869 7f8371373900 0 osd.538 54486 crush map has features 379064680448, adjusting msgr requires for osds 2015-02-12 18:41:53.563888 7f8371373900 0 osd.538 54486 load_pgs 2015-02-12 18:41:55.430730 7f8371373900 0 osd.538 54486 load_pgs opened 137 pgs 2015-02-12 18:41:55.432854 7f8371373900 -1 osd.538 54486 set_disk_tp_priority(22) Invalid argument: osd_disk_thread_ioprio_class is but only the following values are allowed: idle, be or rt 2015-02-12 18:41:55.442748 7f835dfc8700 0 osd.538 54486 ignoring osdmap until we have initialized 2015-02-12 18:41:55.456802 7f835dfc8700 0 osd.538 54486 ignoring osdmap until we have