Re: [ceph-users] OSDs stuck in booting state on CentOS 7.2.1511 and ceph infernalis 9.2.0
og_channel(cluster) log [WRN] >> : failed to encode map e25805 with expected crc >> 2015-12-18 16:09:50.950405 7fb5c7f39700 0 log_channel(cluster) log [WRN] >> : failed to encode map e25905 with expected crc >> 2015-12-18 16:09:50.983355 7fb5c7f39700 0 log_channel(cluster) log [WRN] >> : failed to encode map e25905 with expected crc >> >> I'm running this on Ubuntu 14.04.3 with the linux-image-generic-lts-wily >> kernel (4.2.0-21.25~14.04.1). >> >> Are you running a mixed cluster right now too? For example this is my >> cluster right now: >> >> root@b1:~# ceph tell osd.* version | grep version | uniq -c >> osd.10: Error ENXIO: problem getting command descriptions from osd.10 >> osd.10: problem getting command descriptions from osd.10 >> 11 "version": "ceph version 9.2.0 >> (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)" >> 15 "version": "ceph version 0.94.5 >> (9764da52395923e0b32908d83a9f7304401fee43)" >> >> Bryan >> >> From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Bob R < >> b...@drinksbeer.org> >> Date: Wednesday, December 16, 2015 at 11:45 AM >> To: ceph-users <ceph-users@lists.ceph.com> >> Subject: [ceph-users] OSDs stuck in booting state on CentOS 7.2.1511 and >> ceph infernalis 9.2.0 >> >> We've been operating a cluster relatively incident free since 0.86. On >> Monday I did a yum update on one node, ceph00, and after rebooting we're >> seeing every OSD stuck in 'booting' state. I've tried removing all of the >> OSDs and recreating them with ceph-deploy (ceph-disk required modification >> to use partx -a rather than partprobe) but we see the same status. I'm not >> sure how to troubleshoot this further. Our OSDs on this host are now >> running as the ceph user which may be related to the issue as the other >> three hosts are running as root (although I followed the steps listed to >> upgrade from hammer to infernalis and did chown -R ceph:ceph /var/lib/ceph >> on each node). >> >> [root@ceph00 ceph]# lsb_release -idrc >> Distributor ID: CentOS >> Description:CentOS Linux release 7.2.1511 (Core) >> Release:7.2.1511 >> Codename: Core >> >> [root@ceph00 ceph]# ceph --version >> ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299) >> >> [root@ceph00 ceph]# ceph daemon osd.0 status >> { >> "cluster_fsid": "2e4ea2c0-fb62-41fa-b7b7-e34d759b851e", >> "osd_fsid": "ddf659ad-a3db-4094-b4d0-7d50f34b8f75", >> "whoami": 0, >> "state": "booting", >> "oldest_map": 25243, >> "newest_map": 26610, >> "num_pgs": 0 >> } >> >> [root@ceph00 ceph]# ceph daemon osd.3 status >> { >> "cluster_fsid": "2e4ea2c0-fb62-41fa-b7b7-e34d759b851e", >> "osd_fsid": "8b1acd8a-645d-4dc2-8c1d-6dbb1715265f", >> "whoami": 3, >> "state": "booting", >> "oldest_map": 25243, >> "newest_map": 26612, >> "num_pgs": 0 >> } >> >> [root@ceph00 ceph]# ceph osd tree >> ID WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >> -23 1.43999 root ssd >> -19 0 host ceph00_ssd >> -20 0.48000 host ceph01_ssd >> 40 0.48000 osd.40 up 1.0 1.0 >> -21 0.48000 host ceph02_ssd >> 43 0.48000 osd.43 up 1.0 1.0 >> -22 0.48000 host ceph03_ssd >> 41 0.48000 osd.41 up 1.0 1.0 >> -1 120.0 root default >> -17 80.0 room b1 >> -14 40.0 host ceph01 >> 1 4.0 osd.1up 1.0 1.0 >> 4 4.0 osd.4up 1.0 1.0 >> 18 4.0 osd.18 up 1.0 1.0 >> 19 4.0 osd.19 up 1.0 1.0 >> 20 4.0 osd.20 up 1.0 1.0 >> 21 4.0 osd.21 up 1.0 1.0 >> 22 4.0 osd.22 up 1.0 1.0 >> 23 4.0 osd.23 up 1.0 1.0 >> 24 4.0 osd.24 up 1.0 1.0 >> 25 4.0 osd.25 up 1.0 1.0 >> -16 40.0
Re: [ceph-users] OSDs stuck in booting state on CentOS 7.2.1511 and ceph infernalis 9.2.0
; /var/lib/ceph/tmp/mnt.IOnlxY/journal > 2015-12-18 16:09:48.946175 7fd5e2bec940 -1 created object store > /var/lib/ceph/tmp/mnt.IOnlxY journal /var/lib/ceph/tmp/mnt.IOnlxY/journal > for osd.10 fsid ---- > 2015-12-18 16:09:48.946269 7fd5e2bec940 -1 auth: error reading file: > /var/lib/ceph/tmp/mnt.IOnlxY/keyring: can't open > /var/lib/ceph/tmp/mnt.IOnlxY/keyring: (2) No such file or directory > 2015-12-18 16:09:48.946623 7fd5e2bec940 -1 created new key in keyring > /var/lib/ceph/tmp/mnt.IOnlxY/keyring > 2015-12-18 16:09:50.698753 7fb5db130940 0 ceph version 9.2.0 > (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 7045 > 2015-12-18 16:09:50.745427 7fb5db130940 0 > filestore(/var/lib/ceph/osd/ceph-10) backend xfs (magic 0x58465342) > 2015-12-18 16:09:50.745978 7fb5db130940 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP > ioctl is disabled via 'filestore fiemap' config option > 2015-12-18 16:09:50.745987 7fb5db130940 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: > SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option > 2015-12-18 16:09:50.746012 7fb5db130940 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: splice > is supported > 2015-12-18 16:09:50.746517 7fb5db130940 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: > syncfs(2) syscall fully supported (by glibc and kernel) > 2015-12-18 16:09:50.746616 7fb5db130940 0 > xfsfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: extsize is > supported and your kernel >= 3.5 > 2015-12-18 16:09:50.748775 7fb5db130940 0 > filestore(/var/lib/ceph/osd/ceph-10) mount: enabling WRITEAHEAD journal > mode: checkpoint is not enabled > 2015-12-18 16:09:50.749005 7fb5db130940 1 journal _open > /var/lib/ceph/osd/ceph-10/journal fd 19: 1072693248 bytes, block size 4096 > bytes, directio = 1, aio = 1 > 2015-12-18 16:09:50.749256 7fb5db130940 1 journal _open > /var/lib/ceph/osd/ceph-10/journal fd 19: 1072693248 bytes, block size 4096 > bytes, directio = 1, aio = 1 > 2015-12-18 16:09:50.749632 7fb5db130940 1 > filestore(/var/lib/ceph/osd/ceph-10) upgrade > 2015-12-18 16:09:50.783188 7fb5db130940 0 > cls/cephfs/cls_cephfs.cc:136: loading cephfs_size_scan > 2015-12-18 16:09:50.851735 7fb5db130940 0 > cls/hello/cls_hello.cc:305: loading cls_hello > 2015-12-18 16:09:50.851807 7fb5db130940 0 osd.10 0 crush map has features > 33816576, adjusting msgr requires for clients > 2015-12-18 16:09:50.851818 7fb5db130940 0 osd.10 0 crush map has features > 33816576 was 8705, adjusting msgr requires for mons > 2015-12-18 16:09:50.851821 7fb5db130940 0 osd.10 0 crush map has features > 33816576, adjusting msgr requires for osds > 2015-12-18 16:09:50.851965 7fb5db130940 0 osd.10 0 load_pgs > 2015-12-18 16:09:50.851988 7fb5db130940 0 osd.10 0 load_pgs opened 0 pgs > 2015-12-18 16:09:50.852822 7fb5db130940 -1 osd.10 0 log_to_monitors > {default=true} > 2015-12-18 16:09:50.870133 7fb5c7f39700 0 osd.10 0 ignoring osdmap until > we have initialized > 2015-12-18 16:09:50.870409 7fb5db130940 0 osd.10 0 done with init, > starting boot process > 2015-12-18 16:09:50.873357 7fb5c7f39700 0 osd.10 25804 crush map has > features 104186773504, adjusting msgr requires for clients > 2015-12-18 16:09:50.873368 7fb5c7f39700 0 osd.10 25804 crush map has > features 379064680448 was 33825281, adjusting msgr requires for mons > 2015-12-18 16:09:50.873374 7fb5c7f39700 0 osd.10 25804 crush map has > features 379064680448, adjusting msgr requires for osds > 2015-12-18 16:09:50.873377 7fb5c7f39700 0 osd.10 25804 > check_osdmap_features enabling on-disk ERASURE CODES compat feature > 2015-12-18 16:09:50.876187 7fb5c7f39700 0 log_channel(cluster) log [WRN] > : failed to encode map e25805 with expected crc > 2015-12-18 16:09:50.879534 7fb5c7f39700 0 log_channel(cluster) log [WRN] > : failed to encode map e25805 with expected crc > 2015-12-18 16:09:50.950405 7fb5c7f39700 0 log_channel(cluster) log [WRN] > : failed to encode map e25905 with expected crc > 2015-12-18 16:09:50.983355 7fb5c7f39700 0 log_channel(cluster) log [WRN] > : failed to encode map e25905 with expected crc > > I'm running this on Ubuntu 14.04.3 with the linux-image-generic-lts-wily > kernel (4.2.0-21.25~14.04.1). > > Are you running a mixed cluster right now too? For example this is my > cluster right now: > > root@b1:~# ceph tell osd.* version | grep version | uniq -c > osd.10: Error ENXIO: problem getting command descriptions from osd.10 > osd.10: problem getting command descriptions from osd.10 > 11 "version": "ceph version 9.2.0 > (bb2ecea240f3a1d525bcb35670cb07bd1f0
Re: [ceph-users] OSDs stuck in booting state on CentOS 7.2.1511 and ceph infernalis 9.2.0
ig option 2015-12-18 16:09:50.745987 7fb5db130940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option 2015-12-18 16:09:50.746012 7fb5db130940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: splice is supported 2015-12-18 16:09:50.746517 7fb5db130940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2015-12-18 16:09:50.746616 7fb5db130940 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: extsize is supported and your kernel >= 3.5 2015-12-18 16:09:50.748775 7fb5db130940 0 filestore(/var/lib/ceph/osd/ceph-10) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2015-12-18 16:09:50.749005 7fb5db130940 1 journal _open /var/lib/ceph/osd/ceph-10/journal fd 19: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-12-18 16:09:50.749256 7fb5db130940 1 journal _open /var/lib/ceph/osd/ceph-10/journal fd 19: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-12-18 16:09:50.749632 7fb5db130940 1 filestore(/var/lib/ceph/osd/ceph-10) upgrade 2015-12-18 16:09:50.783188 7fb5db130940 0 cls/cephfs/cls_cephfs.cc:136: loading cephfs_size_scan 2015-12-18 16:09:50.851735 7fb5db130940 0 cls/hello/cls_hello.cc:305: loading cls_hello 2015-12-18 16:09:50.851807 7fb5db130940 0 osd.10 0 crush map has features 33816576, adjusting msgr requires for clients 2015-12-18 16:09:50.851818 7fb5db130940 0 osd.10 0 crush map has features 33816576 was 8705, adjusting msgr requires for mons 2015-12-18 16:09:50.851821 7fb5db130940 0 osd.10 0 crush map has features 33816576, adjusting msgr requires for osds 2015-12-18 16:09:50.851965 7fb5db130940 0 osd.10 0 load_pgs 2015-12-18 16:09:50.851988 7fb5db130940 0 osd.10 0 load_pgs opened 0 pgs 2015-12-18 16:09:50.852822 7fb5db130940 -1 osd.10 0 log_to_monitors {default=true} 2015-12-18 16:09:50.870133 7fb5c7f39700 0 osd.10 0 ignoring osdmap until we have initialized 2015-12-18 16:09:50.870409 7fb5db130940 0 osd.10 0 done with init, starting boot process 2015-12-18 16:09:50.873357 7fb5c7f39700 0 osd.10 25804 crush map has features 104186773504, adjusting msgr requires for clients 2015-12-18 16:09:50.873368 7fb5c7f39700 0 osd.10 25804 crush map has features 379064680448 was 33825281, adjusting msgr requires for mons 2015-12-18 16:09:50.873374 7fb5c7f39700 0 osd.10 25804 crush map has features 379064680448, adjusting msgr requires for osds 2015-12-18 16:09:50.873377 7fb5c7f39700 0 osd.10 25804 check_osdmap_features enabling on-disk ERASURE CODES compat feature 2015-12-18 16:09:50.876187 7fb5c7f39700 0 log_channel(cluster) log [WRN] : failed to encode map e25805 with expected crc 2015-12-18 16:09:50.879534 7fb5c7f39700 0 log_channel(cluster) log [WRN] : failed to encode map e25805 with expected crc 2015-12-18 16:09:50.950405 7fb5c7f39700 0 log_channel(cluster) log [WRN] : failed to encode map e25905 with expected crc 2015-12-18 16:09:50.983355 7fb5c7f39700 0 log_channel(cluster) log [WRN] : failed to encode map e25905 with expected crc I'm running this on Ubuntu 14.04.3 with the linux-image-generic-lts-wily kernel (4.2.0-21.25~14.04.1). Are you running a mixed cluster right now too? For example this is my cluster right now: root@b1:~# ceph tell osd.* version | grep version | uniq -c osd.10: Error ENXIO: problem getting command descriptions from osd.10 osd.10: problem getting command descriptions from osd.10 11 "version": "ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)" 15 "version": "ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)" Bryan From: ceph-users <ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>> on behalf of Bob R <b...@drinksbeer.org<mailto:b...@drinksbeer.org>> Date: Wednesday, December 16, 2015 at 11:45 AM To: ceph-users <ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>> Subject: [ceph-users] OSDs stuck in booting state on CentOS 7.2.1511 and ceph infernalis 9.2.0 We've been operating a cluster relatively incident free since 0.86. On Monday I did a yum update on one node, ceph00, and after rebooting we're seeing every OSD stuck in 'booting' state. I've tried removing all of the OSDs and recreating them with ceph-deploy (ceph-disk required modification to use partx -a rather than partprobe) but we see the same status. I'm not sure how to troubleshoot this further. Our OSDs on this host are now running as the ceph user which may be related to the issue as the other three hosts are running as root (although I followed the steps listed to upgrade from hammer to infernalis and did chown -R ceph:ceph /var/lib/ceph on each node). [root@ceph00 ceph]# lsb_release -idrc Distributor ID: CentOS Description:CentOS Linux release 7.2
[ceph-users] OSDs stuck in booting state on CentOS 7.2.1511 and ceph infernalis 9.2.0
We've been operating a cluster relatively incident free since 0.86. On Monday I did a yum update on one node, ceph00, and after rebooting we're seeing every OSD stuck in 'booting' state. I've tried removing all of the OSDs and recreating them with ceph-deploy (ceph-disk required modification to use partx -a rather than partprobe) but we see the same status. I'm not sure how to troubleshoot this further. Our OSDs on this host are now running as the ceph user which may be related to the issue as the other three hosts are running as root (although I followed the steps listed to upgrade from hammer to infernalis and did chown -R ceph:ceph /var/lib/ceph on each node). [root@ceph00 ceph]# lsb_release -idrc Distributor ID: CentOS Description:CentOS Linux release 7.2.1511 (Core) Release:7.2.1511 Codename: Core [root@ceph00 ceph]# ceph --version ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299) [root@ceph00 ceph]# ceph daemon osd.0 status { "cluster_fsid": "2e4ea2c0-fb62-41fa-b7b7-e34d759b851e", "osd_fsid": "ddf659ad-a3db-4094-b4d0-7d50f34b8f75", "whoami": 0, "state": "booting", "oldest_map": 25243, "newest_map": 26610, "num_pgs": 0 } [root@ceph00 ceph]# ceph daemon osd.3 status { "cluster_fsid": "2e4ea2c0-fb62-41fa-b7b7-e34d759b851e", "osd_fsid": "8b1acd8a-645d-4dc2-8c1d-6dbb1715265f", "whoami": 3, "state": "booting", "oldest_map": 25243, "newest_map": 26612, "num_pgs": 0 } [root@ceph00 ceph]# ceph osd tree ID WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -23 1.43999 root ssd -19 0 host ceph00_ssd -20 0.48000 host ceph01_ssd 40 0.48000 osd.40 up 1.0 1.0 -21 0.48000 host ceph02_ssd 43 0.48000 osd.43 up 1.0 1.0 -22 0.48000 host ceph03_ssd 41 0.48000 osd.41 up 1.0 1.0 -1 120.0 root default -17 80.0 room b1 -14 40.0 host ceph01 1 4.0 osd.1up 1.0 1.0 4 4.0 osd.4up 1.0 1.0 18 4.0 osd.18 up 1.0 1.0 19 4.0 osd.19 up 1.0 1.0 20 4.0 osd.20 up 1.0 1.0 21 4.0 osd.21 up 1.0 1.0 22 4.0 osd.22 up 1.0 1.0 23 4.0 osd.23 up 1.0 1.0 24 4.0 osd.24 up 1.0 1.0 25 4.0 osd.25 up 1.0 1.0 -16 40.0 host ceph03 30 4.0 osd.30 up 1.0 1.0 31 4.0 osd.31 up 1.0 1.0 32 4.0 osd.32 up 1.0 1.0 33 4.0 osd.33 up 1.0 1.0 34 4.0 osd.34 up 1.0 1.0 35 4.0 osd.35 up 1.0 1.0 36 4.0 osd.36 up 1.0 1.0 37 4.0 osd.37 up 1.0 1.0 38 4.0 osd.38 up 1.0 1.0 39 4.0 osd.39 up 1.0 1.0 -18 40.0 room b2 -13 0 host ceph00 -15 40.0 host ceph02 2 4.0 osd.2up 1.0 1.0 5 4.0 osd.5up 1.0 1.0 14 4.0 osd.14 up 1.0 1.0 15 4.0 osd.15 up 1.0 1.0 16 4.0 osd.16 up 1.0 1.0 17 4.0 osd.17 up 1.0 1.0 26 4.0 osd.26 up 1.0 1.0 27 4.0 osd.27 up 1.0 1.0 28 4.0 osd.28 up 1.0 1.0 29 4.0 osd.29 up 1.0 1.0 0 0 osd.0 down0 1.0 3 0 osd.3 down0 1.0 6 0 osd.6 down0 1.0 7 0 osd.7 down0 1.0 8 0 osd.8 down0 1.0 9 0 osd.9 down0 1.0 10 0 osd.10 down0 1.0 11 0 osd.11 down0 1.0 12 0 osd.12 down0 1.0 13 0 osd.13 down0 1.0 Any assistance is greatly appreciated. Bob ___ ceph-users mailing list