Re: [ceph-users] OSDs stuck in booting state on CentOS 7.2.1511 and ceph infernalis 9.2.0

2015-12-21 Thread Bob R
og_channel(cluster) log [WRN]
>> : failed to encode map e25805 with expected crc
>> 2015-12-18 16:09:50.950405 7fb5c7f39700  0 log_channel(cluster) log [WRN]
>> : failed to encode map e25905 with expected crc
>> 2015-12-18 16:09:50.983355 7fb5c7f39700  0 log_channel(cluster) log [WRN]
>> : failed to encode map e25905 with expected crc
>>
>> I'm running this on Ubuntu 14.04.3 with the linux-image-generic-lts-wily
>> kernel (4.2.0-21.25~14.04.1).
>>
>> Are you running a mixed cluster right now too?  For example this is my
>> cluster right now:
>>
>> root@b1:~# ceph tell osd.* version | grep version | uniq -c
>> osd.10: Error ENXIO: problem getting command descriptions from osd.10
>> osd.10: problem getting command descriptions from osd.10
>>  11 "version": "ceph version 9.2.0
>> (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)"
>>  15 "version": "ceph version 0.94.5
>> (9764da52395923e0b32908d83a9f7304401fee43)"
>>
>> Bryan
>>
>> From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Bob R <
>> b...@drinksbeer.org>
>> Date: Wednesday, December 16, 2015 at 11:45 AM
>> To: ceph-users <ceph-users@lists.ceph.com>
>> Subject: [ceph-users] OSDs stuck in booting state on CentOS 7.2.1511 and
>> ceph infernalis 9.2.0
>>
>> We've been operating a cluster relatively incident free since 0.86. On
>> Monday I did a yum update on one node, ceph00, and after rebooting we're
>> seeing every OSD stuck in 'booting' state. I've tried removing all of the
>> OSDs and recreating them with ceph-deploy (ceph-disk required modification
>> to use partx -a rather than partprobe) but we see the same status. I'm not
>> sure how to troubleshoot this further. Our OSDs on this host are now
>> running as the ceph user which may be related to the issue as the other
>> three hosts are running as root (although I followed the steps listed to
>> upgrade from hammer to infernalis and did chown -R ceph:ceph /var/lib/ceph
>> on each node).
>>
>> [root@ceph00 ceph]# lsb_release -idrc
>> Distributor ID: CentOS
>> Description:CentOS Linux release 7.2.1511 (Core)
>> Release:7.2.1511
>> Codename:   Core
>>
>> [root@ceph00 ceph]# ceph --version
>> ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>>
>> [root@ceph00 ceph]# ceph daemon osd.0 status
>> {
>> "cluster_fsid": "2e4ea2c0-fb62-41fa-b7b7-e34d759b851e",
>> "osd_fsid": "ddf659ad-a3db-4094-b4d0-7d50f34b8f75",
>> "whoami": 0,
>> "state": "booting",
>> "oldest_map": 25243,
>> "newest_map": 26610,
>> "num_pgs": 0
>> }
>>
>> [root@ceph00 ceph]# ceph daemon osd.3 status
>> {
>> "cluster_fsid": "2e4ea2c0-fb62-41fa-b7b7-e34d759b851e",
>> "osd_fsid": "8b1acd8a-645d-4dc2-8c1d-6dbb1715265f",
>> "whoami": 3,
>> "state": "booting",
>> "oldest_map": 25243,
>> "newest_map": 26612,
>> "num_pgs": 0
>> }
>>
>> [root@ceph00 ceph]# ceph osd tree
>> ID  WEIGHTTYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
>> -23   1.43999 root ssd
>> -19 0 host ceph00_ssd
>> -20   0.48000 host ceph01_ssd
>>  40   0.48000 osd.40   up  1.0  1.0
>> -21   0.48000 host ceph02_ssd
>>  43   0.48000 osd.43   up  1.0  1.0
>> -22   0.48000 host ceph03_ssd
>>  41   0.48000 osd.41   up  1.0  1.0
>>  -1 120.0 root default
>> -17  80.0 room b1
>> -14  40.0 host ceph01
>>   1   4.0 osd.1up  1.0  1.0
>>   4   4.0 osd.4up  1.0  1.0
>>  18   4.0 osd.18   up  1.0  1.0
>>  19   4.0 osd.19   up  1.0  1.0
>>  20   4.0 osd.20   up  1.0  1.0
>>  21   4.0 osd.21   up  1.0  1.0
>>  22   4.0 osd.22   up  1.0  1.0
>>  23   4.0 osd.23   up  1.0  1.0
>>  24   4.0 osd.24   up  1.0  1.0
>>  25   4.0 osd.25   up  1.0  1.0
>> -16  40.0

Re: [ceph-users] OSDs stuck in booting state on CentOS 7.2.1511 and ceph infernalis 9.2.0

2015-12-18 Thread Bob R
; /var/lib/ceph/tmp/mnt.IOnlxY/journal
> 2015-12-18 16:09:48.946175 7fd5e2bec940 -1 created object store
> /var/lib/ceph/tmp/mnt.IOnlxY journal /var/lib/ceph/tmp/mnt.IOnlxY/journal
> for osd.10 fsid ----
> 2015-12-18 16:09:48.946269 7fd5e2bec940 -1 auth: error reading file:
> /var/lib/ceph/tmp/mnt.IOnlxY/keyring: can't open
> /var/lib/ceph/tmp/mnt.IOnlxY/keyring: (2) No such file or directory
> 2015-12-18 16:09:48.946623 7fd5e2bec940 -1 created new key in keyring
> /var/lib/ceph/tmp/mnt.IOnlxY/keyring
> 2015-12-18 16:09:50.698753 7fb5db130940  0 ceph version 9.2.0
> (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 7045
> 2015-12-18 16:09:50.745427 7fb5db130940  0
> filestore(/var/lib/ceph/osd/ceph-10) backend xfs (magic 0x58465342)
> 2015-12-18 16:09:50.745978 7fb5db130940  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP
> ioctl is disabled via 'filestore fiemap' config option
> 2015-12-18 16:09:50.745987 7fb5db130940  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features:
> SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
> 2015-12-18 16:09:50.746012 7fb5db130940  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: splice
> is supported
> 2015-12-18 16:09:50.746517 7fb5db130940  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features:
> syncfs(2) syscall fully supported (by glibc and kernel)
> 2015-12-18 16:09:50.746616 7fb5db130940  0
> xfsfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: extsize is
> supported and your kernel >= 3.5
> 2015-12-18 16:09:50.748775 7fb5db130940  0
> filestore(/var/lib/ceph/osd/ceph-10) mount: enabling WRITEAHEAD journal
> mode: checkpoint is not enabled
> 2015-12-18 16:09:50.749005 7fb5db130940  1 journal _open
> /var/lib/ceph/osd/ceph-10/journal fd 19: 1072693248 bytes, block size 4096
> bytes, directio = 1, aio = 1
> 2015-12-18 16:09:50.749256 7fb5db130940  1 journal _open
> /var/lib/ceph/osd/ceph-10/journal fd 19: 1072693248 bytes, block size 4096
> bytes, directio = 1, aio = 1
> 2015-12-18 16:09:50.749632 7fb5db130940  1
> filestore(/var/lib/ceph/osd/ceph-10) upgrade
> 2015-12-18 16:09:50.783188 7fb5db130940  0 
> cls/cephfs/cls_cephfs.cc:136: loading cephfs_size_scan
> 2015-12-18 16:09:50.851735 7fb5db130940  0 
> cls/hello/cls_hello.cc:305: loading cls_hello
> 2015-12-18 16:09:50.851807 7fb5db130940  0 osd.10 0 crush map has features
> 33816576, adjusting msgr requires for clients
> 2015-12-18 16:09:50.851818 7fb5db130940  0 osd.10 0 crush map has features
> 33816576 was 8705, adjusting msgr requires for mons
> 2015-12-18 16:09:50.851821 7fb5db130940  0 osd.10 0 crush map has features
> 33816576, adjusting msgr requires for osds
> 2015-12-18 16:09:50.851965 7fb5db130940  0 osd.10 0 load_pgs
> 2015-12-18 16:09:50.851988 7fb5db130940  0 osd.10 0 load_pgs opened 0 pgs
> 2015-12-18 16:09:50.852822 7fb5db130940 -1 osd.10 0 log_to_monitors
> {default=true}
> 2015-12-18 16:09:50.870133 7fb5c7f39700  0 osd.10 0 ignoring osdmap until
> we have initialized
> 2015-12-18 16:09:50.870409 7fb5db130940  0 osd.10 0 done with init,
> starting boot process
> 2015-12-18 16:09:50.873357 7fb5c7f39700  0 osd.10 25804 crush map has
> features 104186773504, adjusting msgr requires for clients
> 2015-12-18 16:09:50.873368 7fb5c7f39700  0 osd.10 25804 crush map has
> features 379064680448 was 33825281, adjusting msgr requires for mons
> 2015-12-18 16:09:50.873374 7fb5c7f39700  0 osd.10 25804 crush map has
> features 379064680448, adjusting msgr requires for osds
> 2015-12-18 16:09:50.873377 7fb5c7f39700  0 osd.10 25804
> check_osdmap_features enabling on-disk ERASURE CODES compat feature
> 2015-12-18 16:09:50.876187 7fb5c7f39700  0 log_channel(cluster) log [WRN]
> : failed to encode map e25805 with expected crc
> 2015-12-18 16:09:50.879534 7fb5c7f39700  0 log_channel(cluster) log [WRN]
> : failed to encode map e25805 with expected crc
> 2015-12-18 16:09:50.950405 7fb5c7f39700  0 log_channel(cluster) log [WRN]
> : failed to encode map e25905 with expected crc
> 2015-12-18 16:09:50.983355 7fb5c7f39700  0 log_channel(cluster) log [WRN]
> : failed to encode map e25905 with expected crc
>
> I'm running this on Ubuntu 14.04.3 with the linux-image-generic-lts-wily
> kernel (4.2.0-21.25~14.04.1).
>
> Are you running a mixed cluster right now too?  For example this is my
> cluster right now:
>
> root@b1:~# ceph tell osd.* version | grep version | uniq -c
> osd.10: Error ENXIO: problem getting command descriptions from osd.10
> osd.10: problem getting command descriptions from osd.10
>  11 "version": "ceph version 9.2.0
> (bb2ecea240f3a1d525bcb35670cb07bd1f0

Re: [ceph-users] OSDs stuck in booting state on CentOS 7.2.1511 and ceph infernalis 9.2.0

2015-12-18 Thread Stillwell, Bryan
ig option
2015-12-18 16:09:50.745987 7fb5db130940  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: 
SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2015-12-18 16:09:50.746012 7fb5db130940  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: splice is 
supported
2015-12-18 16:09:50.746517 7fb5db130940  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: syncfs(2) 
syscall fully supported (by glibc and kernel)
2015-12-18 16:09:50.746616 7fb5db130940  0 
xfsfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: extsize is 
supported and your kernel >= 3.5
2015-12-18 16:09:50.748775 7fb5db130940  0 filestore(/var/lib/ceph/osd/ceph-10) 
mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2015-12-18 16:09:50.749005 7fb5db130940  1 journal _open 
/var/lib/ceph/osd/ceph-10/journal fd 19: 1072693248 bytes, block size 4096 
bytes, directio = 1, aio = 1
2015-12-18 16:09:50.749256 7fb5db130940  1 journal _open 
/var/lib/ceph/osd/ceph-10/journal fd 19: 1072693248 bytes, block size 4096 
bytes, directio = 1, aio = 1
2015-12-18 16:09:50.749632 7fb5db130940  1 filestore(/var/lib/ceph/osd/ceph-10) 
upgrade
2015-12-18 16:09:50.783188 7fb5db130940  0  cls/cephfs/cls_cephfs.cc:136: 
loading cephfs_size_scan
2015-12-18 16:09:50.851735 7fb5db130940  0  cls/hello/cls_hello.cc:305: 
loading cls_hello
2015-12-18 16:09:50.851807 7fb5db130940  0 osd.10 0 crush map has features 
33816576, adjusting msgr requires for clients
2015-12-18 16:09:50.851818 7fb5db130940  0 osd.10 0 crush map has features 
33816576 was 8705, adjusting msgr requires for mons
2015-12-18 16:09:50.851821 7fb5db130940  0 osd.10 0 crush map has features 
33816576, adjusting msgr requires for osds
2015-12-18 16:09:50.851965 7fb5db130940  0 osd.10 0 load_pgs
2015-12-18 16:09:50.851988 7fb5db130940  0 osd.10 0 load_pgs opened 0 pgs
2015-12-18 16:09:50.852822 7fb5db130940 -1 osd.10 0 log_to_monitors 
{default=true}
2015-12-18 16:09:50.870133 7fb5c7f39700  0 osd.10 0 ignoring osdmap until we 
have initialized
2015-12-18 16:09:50.870409 7fb5db130940  0 osd.10 0 done with init, starting 
boot process
2015-12-18 16:09:50.873357 7fb5c7f39700  0 osd.10 25804 crush map has features 
104186773504, adjusting msgr requires for clients
2015-12-18 16:09:50.873368 7fb5c7f39700  0 osd.10 25804 crush map has features 
379064680448 was 33825281, adjusting msgr requires for mons
2015-12-18 16:09:50.873374 7fb5c7f39700  0 osd.10 25804 crush map has features 
379064680448, adjusting msgr requires for osds
2015-12-18 16:09:50.873377 7fb5c7f39700  0 osd.10 25804 check_osdmap_features 
enabling on-disk ERASURE CODES compat feature
2015-12-18 16:09:50.876187 7fb5c7f39700  0 log_channel(cluster) log [WRN] : 
failed to encode map e25805 with expected crc
2015-12-18 16:09:50.879534 7fb5c7f39700  0 log_channel(cluster) log [WRN] : 
failed to encode map e25805 with expected crc
2015-12-18 16:09:50.950405 7fb5c7f39700  0 log_channel(cluster) log [WRN] : 
failed to encode map e25905 with expected crc
2015-12-18 16:09:50.983355 7fb5c7f39700  0 log_channel(cluster) log [WRN] : 
failed to encode map e25905 with expected crc

I'm running this on Ubuntu 14.04.3 with the linux-image-generic-lts-wily kernel 
(4.2.0-21.25~14.04.1).

Are you running a mixed cluster right now too?  For example this is my cluster 
right now:

root@b1:~# ceph tell osd.* version | grep version | uniq -c
osd.10: Error ENXIO: problem getting command descriptions from osd.10
osd.10: problem getting command descriptions from osd.10
 11 "version": "ceph version 9.2.0 
(bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)"
 15 "version": "ceph version 0.94.5 
(9764da52395923e0b32908d83a9f7304401fee43)"

Bryan

From: ceph-users 
<ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>> 
on behalf of Bob R <b...@drinksbeer.org<mailto:b...@drinksbeer.org>>
Date: Wednesday, December 16, 2015 at 11:45 AM
To: ceph-users <ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>>
Subject: [ceph-users] OSDs stuck in booting state on CentOS 7.2.1511 and ceph 
infernalis 9.2.0

We've been operating a cluster relatively incident free since 0.86. On Monday I 
did a yum update on one node, ceph00, and after rebooting we're seeing every 
OSD stuck in 'booting' state. I've tried removing all of the OSDs and 
recreating them with ceph-deploy (ceph-disk required modification to use partx 
-a rather than partprobe) but we see the same status. I'm not sure how to 
troubleshoot this further. Our OSDs on this host are now running as the ceph 
user which may be related to the issue as the other three hosts are running as 
root (although I followed the steps listed to upgrade from hammer to infernalis 
and did chown -R ceph:ceph /var/lib/ceph on each node).

[root@ceph00 ceph]# lsb_release -idrc
Distributor ID: CentOS
Description:CentOS Linux release 7.2

[ceph-users] OSDs stuck in booting state on CentOS 7.2.1511 and ceph infernalis 9.2.0

2015-12-16 Thread Bob R
We've been operating a cluster relatively incident free since 0.86. On
Monday I did a yum update on one node, ceph00, and after rebooting we're
seeing every OSD stuck in 'booting' state. I've tried removing all of the
OSDs and recreating them with ceph-deploy (ceph-disk required modification
to use partx -a rather than partprobe) but we see the same status. I'm not
sure how to troubleshoot this further. Our OSDs on this host are now
running as the ceph user which may be related to the issue as the other
three hosts are running as root (although I followed the steps listed to
upgrade from hammer to infernalis and did chown -R ceph:ceph /var/lib/ceph
on each node).

[root@ceph00 ceph]# lsb_release -idrc
Distributor ID: CentOS
Description:CentOS Linux release 7.2.1511 (Core)
Release:7.2.1511
Codename:   Core

[root@ceph00 ceph]# ceph --version
ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)

[root@ceph00 ceph]# ceph daemon osd.0 status
{
"cluster_fsid": "2e4ea2c0-fb62-41fa-b7b7-e34d759b851e",
"osd_fsid": "ddf659ad-a3db-4094-b4d0-7d50f34b8f75",
"whoami": 0,
"state": "booting",
"oldest_map": 25243,
"newest_map": 26610,
"num_pgs": 0
}

[root@ceph00 ceph]# ceph daemon osd.3 status
{
"cluster_fsid": "2e4ea2c0-fb62-41fa-b7b7-e34d759b851e",
"osd_fsid": "8b1acd8a-645d-4dc2-8c1d-6dbb1715265f",
"whoami": 3,
"state": "booting",
"oldest_map": 25243,
"newest_map": 26612,
"num_pgs": 0
}

[root@ceph00 ceph]# ceph osd tree
ID  WEIGHTTYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
-23   1.43999 root ssd
-19 0 host ceph00_ssd
-20   0.48000 host ceph01_ssd
 40   0.48000 osd.40   up  1.0  1.0
-21   0.48000 host ceph02_ssd
 43   0.48000 osd.43   up  1.0  1.0
-22   0.48000 host ceph03_ssd
 41   0.48000 osd.41   up  1.0  1.0
 -1 120.0 root default
-17  80.0 room b1
-14  40.0 host ceph01
  1   4.0 osd.1up  1.0  1.0
  4   4.0 osd.4up  1.0  1.0
 18   4.0 osd.18   up  1.0  1.0
 19   4.0 osd.19   up  1.0  1.0
 20   4.0 osd.20   up  1.0  1.0
 21   4.0 osd.21   up  1.0  1.0
 22   4.0 osd.22   up  1.0  1.0
 23   4.0 osd.23   up  1.0  1.0
 24   4.0 osd.24   up  1.0  1.0
 25   4.0 osd.25   up  1.0  1.0
-16  40.0 host ceph03
 30   4.0 osd.30   up  1.0  1.0
 31   4.0 osd.31   up  1.0  1.0
 32   4.0 osd.32   up  1.0  1.0
 33   4.0 osd.33   up  1.0  1.0
 34   4.0 osd.34   up  1.0  1.0
 35   4.0 osd.35   up  1.0  1.0
 36   4.0 osd.36   up  1.0  1.0
 37   4.0 osd.37   up  1.0  1.0
 38   4.0 osd.38   up  1.0  1.0
 39   4.0 osd.39   up  1.0  1.0
-18  40.0 room b2
-13 0 host ceph00
-15  40.0 host ceph02
  2   4.0 osd.2up  1.0  1.0
  5   4.0 osd.5up  1.0  1.0
 14   4.0 osd.14   up  1.0  1.0
 15   4.0 osd.15   up  1.0  1.0
 16   4.0 osd.16   up  1.0  1.0
 17   4.0 osd.17   up  1.0  1.0
 26   4.0 osd.26   up  1.0  1.0
 27   4.0 osd.27   up  1.0  1.0
 28   4.0 osd.28   up  1.0  1.0
 29   4.0 osd.29   up  1.0  1.0
  0 0 osd.0  down0  1.0
  3 0 osd.3  down0  1.0
  6 0 osd.6  down0  1.0
  7 0 osd.7  down0  1.0
  8 0 osd.8  down0  1.0
  9 0 osd.9  down0  1.0
 10 0 osd.10 down0  1.0
 11 0 osd.11 down0  1.0
 12 0 osd.12 down0  1.0
 13 0 osd.13 down0  1.0


Any assistance is greatly appreciated.

Bob
___
ceph-users mailing list