[ceph-users] can't start OSD
Hello, I have some issues to restart down OSDs. My cluster is running on debian stretch (with backported kernel 4.13.0) with luminous version (12.2.0). An admin changed the fsid and did restart the OSDs of one machine. I don't know if it can be the cause of all of this but my cluster is in HEALTH_ERR and some PG are down or inactive. Now the good config is back but some OSDs of my cluster (on other machines too) can't start. Here is the health detail: HEALTH_ERR 2282635/254779209 objects misplaced (0.896%); Reduced data availability: 3 pgs inactive, 1 pg down; Degraded data redundancy: 2837613/254779209 objects degraded (1.114%), 93 pgs unclean, 70 pgs degraded, 64 pgs undersized; 4017 stuck requests are blocked > 4096 sec OBJECT_MISPLACED 2282635/254779209 objects misplaced (0.896%) PG_AVAILABILITY Reduced data availability: 3 pgs inactive, 1 pg down pg 14.12a is down, acting [28,13,19] pg 14.15d is stuck inactive for 5344.345563, current state unknown, last acting [] pg 14.1d7 is stuck inactive for 4306.284248, current state undersized+degraded+remapped+backfilling+peered, last acting [13] PG_DEGRADED Degraded data redundancy: 2837613/254779209 objects degraded (1.114%), 93 pgs unclean, 70 pgs degraded, 64 pgs undersized pg 10.3 is stuck unclean for 5483.175862, current state active+remapped+backfill_wait, last acting [35,44,30] pg 10.1f is active+recovery_wait+degraded, acting [56,8,52] pg 14.0 is stuck undersized for 6003.911469, current state active+undersized+degraded+remapped+backfilling, last acting [13,42] pg 14.21 is stuck undersized for 437.855288, current state active+undersized+degraded+remapped+backfilling, last acting [40,59] pg 14.2b is stuck unclean for 123.787607, current state active+remapped+backfill_wait, last acting [62,30,24] pg 14.4a is stuck undersized for 723.893114, current state active+undersized+degraded+remapped+backfill_wait, last acting [43,22] pg 14.56 is stuck unclean for 123.821351, current state active+remapped+backfill_wait, last acting [56,43,63] pg 14.1fe is stuck undersized for 123.800787, current state active+undersized+degraded+remapped+backfill_wait, last acting [63,8] pg 14.20a is stuck unclean for 24341.489625, current state active+remapped+backfill_wait, last acting [20,28,37] pg 14.20b is stuck unclean for 24351.403819, current state active+remapped+backfill_wait, last acting [60,6,57] pg 14.21d is stuck unclean for 24345.292525, current state active+remapped+backfill_wait, last acting [59,62,10] pg 14.226 is stuck undersized for 363.681151, current state active+undersized+degraded+remapped+backfilling, last acting [44,19] pg 14.22c is stuck unclean for 123.793121, current state active+remapped+backfill_wait, last acting [16,40,9] pg 14.236 is stuck undersized for 163.374339, current state active+undersized+degraded+remapped+backfill_wait, last acting [61,6] pg 14.240 is stuck undersized for 437.857887, current state active+undersized+degraded+remapped+backfilling, last acting [57,27] pg 14.24d is stuck undersized for 115.191726, current state active+undersized+degraded+remapped+backfilling, last acting [19,27] pg 14.268 is stuck undersized for 7932.097742, current state active+undersized+degraded+remapped+backfilling, last acting [12,58] pg 14.27d is stuck unclean for 7935.169818, current state active+remapped+backfilling, last acting [12,47,8] pg 14.290 is stuck undersized for 437.855071, current state active+undersized+degraded+remapped+backfilling, last acting [29,3] pg 14.2aa is stuck undersized for 114.181416, current state active+undersized+degraded+remapped+backfill_wait, last acting [3,46] pg 14.2ac is stuck undersized for 123.821179, current state active+undersized+degraded+remapped+backfill_wait, last acting [47,18] pg 14.2b9 is stuck undersized for 3704.234924, current state active+undersized+degraded+remapped+backfilling, last acting [13,38] pg 14.2c4 is stuck undersized for 123.824405, current state active+undersized+degraded+remapped+backfill_wait, last acting [15,36] pg 14.2c5 is stuck undersized for 161.266102, current state active+undersized+degraded+remapped+backfill_wait, last acting [63,44] pg 14.2e0 is stuck undersized for 438.862093, current state active+undersized+degraded+remapped+backfilling, last acting [9,21] pg 14.2eb is stuck undersized for 437.860653, current state active+undersized+degraded+remapped+backfilling, last acting [8,34] pg 14.2f8 is stuck undersized for 163.373209, current state active+undersized+degraded+remapped+backfill_wait, last acting [61,28] pg 14.305 is stuck undersized for 723.892233, current state active+undersized+degraded+remapped+backfill_wait, last acting [9,40] pg 14.320 is stuck unclean for 123.788128, current state active+remapped+backfill_wait, last acting [62,6,5] pg 14.322 is stuck undersized for 437.856055, current
Re: [ceph-users] Can't Start OSD
, current state stale+active+clean, last acting [1] pg 7.5d9 is stuck stale for 5954121.994181, current state stale+active+clean, last acting [1] pg 7.395 is stuck stale for 5954121.993989, current state stale+active+clean, last acting [1] pg 7.38e is stuck stale for 5954121.993988, current state stale+active+clean, last acting [1] pg 7.13a is stuck stale for 5954121.993766, current state stale+active+clean, last acting [1] pg 7.683 is stuck stale for 5954121.994255, current state stale+active+clean, last acting [1] pg 7.439 is stuck stale for 5954121.994079, current state stale+active+clean, last acting [1] It’s osd id=1 that’s problematic, but I should have a replica of the data somewhere else? Thanks! ~Noah On Mar 22, 2015, at 2:04 PM, Somnath Roy somnath@sandisk.com wrote: Are you seeing any error related to the disk (where OSD is mounted) in dmesg ? Could be a leveldb corruption or ceph bug. Now, unfortunately not enough log in that portion of the code base to reveal exactly why we are not getting infoos object from leveldb :-( Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Noah Mehl Sent: Sunday, March 22, 2015 10:11 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Can't Start OSD In production for over a year, and no upgrades. Thanks! ~Noah On Mar 22, 2015, at 1:01 PM, Somnath Roy somnath@sandisk.com wrote: Noah, Is this fresh installation or after upgrade ? It seems related to omap (leveldb) stuff. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Noah Mehl Sent: Sunday, March 22, 2015 9:34 AM To: ceph-users@lists.ceph.com Subject: [ceph-users] Can't Start OSD I have an OSD that’s failing to start. I can’t make heads or tails of the error (pasted below). Thanks! ~Noah 2015-03-22 16:32:39.265116 7f4da7fa0780 0 ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process ceph-osd, pid 13483 2015-03-22 16:32:39.269499 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs 2015-03-22 16:32:39.269509 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2015-03-22 16:32:39.450031 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported and appears to work 2015-03-22 16:32:39.450069 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-03-22 16:32:39.450743 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs 2015-03-22 16:32:39.499753 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully supported (by glibc and kernel) 2015-03-22 16:32:39.500078 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 2015-03-22 16:32:40.765736 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2015-03-22 16:32:40.777156 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:40.777278 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:40.778223 7f4da7fa0780 1 journal close /var/lib/ceph/osd/ceph-1/journal 2015-03-22 16:32:41.066655 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs 2015-03-22 16:32:41.150578 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported and appears to work 2015-03-22 16:32:41.150624 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-03-22 16:32:41.151359 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs 2015-03-22 16:32:41.225302 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully supported (by glibc and kernel) 2015-03-22 16:32:41.225498 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 2015-03-22 16:32:42.375558 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2015-03-22 16:32:42.382958 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 1429: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:42.383187 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 1481: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:43.076434 7f4da7fa0780 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, ceph::bufferlist*)' thread 7f4da7fa0780 time 2015-03-22 16:32:43.075101 osd/PG.cc: 2270: FAILED assert
Re: [ceph-users] Can't Start OSD
Noah, Is this fresh installation or after upgrade ? It seems related to omap (leveldb) stuff. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Noah Mehl Sent: Sunday, March 22, 2015 9:34 AM To: ceph-users@lists.ceph.com Subject: [ceph-users] Can't Start OSD I have an OSD that’s failing to start. I can’t make heads or tails of the error (pasted below). Thanks! ~Noah 2015-03-22 16:32:39.265116 7f4da7fa0780 0 ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process ceph-osd, pid 13483 2015-03-22 16:32:39.269499 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs 2015-03-22 16:32:39.269509 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2015-03-22 16:32:39.450031 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported and appears to work 2015-03-22 16:32:39.450069 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-03-22 16:32:39.450743 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs 2015-03-22 16:32:39.499753 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully supported (by glibc and kernel) 2015-03-22 16:32:39.500078 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 2015-03-22 16:32:40.765736 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2015-03-22 16:32:40.777156 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:40.777278 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:40.778223 7f4da7fa0780 1 journal close /var/lib/ceph/osd/ceph-1/journal 2015-03-22 16:32:41.066655 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs 2015-03-22 16:32:41.150578 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported and appears to work 2015-03-22 16:32:41.150624 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-03-22 16:32:41.151359 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs 2015-03-22 16:32:41.225302 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully supported (by glibc and kernel) 2015-03-22 16:32:41.225498 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 2015-03-22 16:32:42.375558 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2015-03-22 16:32:42.382958 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 1429: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:42.383187 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 1481: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:43.076434 7f4da7fa0780 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, ceph::bufferlist*)' thread 7f4da7fa0780 time 2015-03-22 16:32:43.075101 osd/PG.cc: 2270: FAILED assert(values.size() == 1) ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7) 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, ceph::buffer::list*)+0x4d7) [0x70ebf7] 2: (OSD::load_pgs()+0x14ce) [0x694efe] 3: (OSD::init()+0x11be) [0x69cffe] 4: (main()+0x1d09) [0x5c3509] 5: (__libc_start_main()+0xed) [0x7f4da5bde76d] 6: /usr/bin/ceph-osd() [0x5c6e1d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- begin dump of recent events --- -75 2015-03-22 16:32:39.259280 7f4da7fa0780 5 asok(0x1aec1c0) register_command perfcounters_dump hook 0x1ae4010 -74 2015-03-22 16:32:39.259373 7f4da7fa0780 5 asok(0x1aec1c0) register_command 1 hook 0x1ae4010 -73 2015-03-22 16:32:39.259393 7f4da7fa0780 5 asok(0x1aec1c0) register_command perf dump hook 0x1ae4010 -72 2015-03-22 16:32:39.259429 7f4da7fa0780 5 asok(0x1aec1c0) register_command perfcounters_schema hook 0x1ae4010 -71 2015-03-22 16:32:39.259445 7f4da7fa0780 5 asok(0x1aec1c0) register_command 2 hook 0x1ae4010 -70 2015-03-22 16:32:39.259453 7f4da7fa0780 5 asok(0x1aec1c0) register_command perf schema hook 0x1ae4010 -69 2015-03-22 16:32:39.259467 7f4da7fa0780 5 asok(0x1aec1c0) register_command config show hook 0x1ae4010 -68 2015-03-22 16:32:39.259481 7f4da7fa0780 5 asok(0x1aec1c0) register_command config set hook 0x1ae4010 -67 2015-03-22 16:32:39.259495 7f4da7fa0780 5 asok(0x1aec1c0) register_command config get hook 0x1ae4010 -66 2015-03-22 16:32
Re: [ceph-users] Can't Start OSD
In production for over a year, and no upgrades. Thanks! ~Noah On Mar 22, 2015, at 1:01 PM, Somnath Roy somnath@sandisk.com wrote: Noah, Is this fresh installation or after upgrade ? It seems related to omap (leveldb) stuff. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Noah Mehl Sent: Sunday, March 22, 2015 9:34 AM To: ceph-users@lists.ceph.com Subject: [ceph-users] Can't Start OSD I have an OSD that’s failing to start. I can’t make heads or tails of the error (pasted below). Thanks! ~Noah 2015-03-22 16:32:39.265116 7f4da7fa0780 0 ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process ceph-osd, pid 13483 2015-03-22 16:32:39.269499 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs 2015-03-22 16:32:39.269509 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2015-03-22 16:32:39.450031 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported and appears to work 2015-03-22 16:32:39.450069 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-03-22 16:32:39.450743 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs 2015-03-22 16:32:39.499753 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully supported (by glibc and kernel) 2015-03-22 16:32:39.500078 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 2015-03-22 16:32:40.765736 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2015-03-22 16:32:40.777156 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:40.777278 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:40.778223 7f4da7fa0780 1 journal close /var/lib/ceph/osd/ceph-1/journal 2015-03-22 16:32:41.066655 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs 2015-03-22 16:32:41.150578 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported and appears to work 2015-03-22 16:32:41.150624 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-03-22 16:32:41.151359 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs 2015-03-22 16:32:41.225302 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully supported (by glibc and kernel) 2015-03-22 16:32:41.225498 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 2015-03-22 16:32:42.375558 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2015-03-22 16:32:42.382958 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 1429: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:42.383187 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 1481: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:43.076434 7f4da7fa0780 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, ceph::bufferlist*)' thread 7f4da7fa0780 time 2015-03-22 16:32:43.075101 osd/PG.cc: 2270: FAILED assert(values.size() == 1) ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7) 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, ceph::buffer::list*)+0x4d7) [0x70ebf7] 2: (OSD::load_pgs()+0x14ce) [0x694efe] 3: (OSD::init()+0x11be) [0x69cffe] 4: (main()+0x1d09) [0x5c3509] 5: (__libc_start_main()+0xed) [0x7f4da5bde76d] 6: /usr/bin/ceph-osd() [0x5c6e1d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- begin dump of recent events --- -75 2015-03-22 16:32:39.259280 7f4da7fa0780 5 asok(0x1aec1c0) register_command perfcounters_dump hook 0x1ae4010 -74 2015-03-22 16:32:39.259373 7f4da7fa0780 5 asok(0x1aec1c0) register_command 1 hook 0x1ae4010 -73 2015-03-22 16:32:39.259393 7f4da7fa0780 5 asok(0x1aec1c0) register_command perf dump hook 0x1ae4010 -72 2015-03-22 16:32:39.259429 7f4da7fa0780 5 asok(0x1aec1c0) register_command perfcounters_schema hook 0x1ae4010 -71 2015-03-22 16:32:39.259445 7f4da7fa0780 5 asok(0x1aec1c0) register_command 2 hook 0x1ae4010 -70 2015-03-22 16:32:39.259453 7f4da7fa0780 5 asok(0x1aec1c0) register_command perf schema hook 0x1ae4010 -69 2015-03-22 16:32:39.259467 7f4da7fa0780 5 asok(0x1aec1c0) register_command config show hook 0x1ae4010 -68
[ceph-users] Can't Start OSD
I have an OSD that’s failing to start. I can’t make heads or tails of the error (pasted below). Thanks! ~Noah 2015-03-22 16:32:39.265116 7f4da7fa0780 0 ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process ceph-osd, pid 13483 2015-03-22 16:32:39.269499 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs 2015-03-22 16:32:39.269509 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2015-03-22 16:32:39.450031 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported and appears to work 2015-03-22 16:32:39.450069 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-03-22 16:32:39.450743 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs 2015-03-22 16:32:39.499753 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully supported (by glibc and kernel) 2015-03-22 16:32:39.500078 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 2015-03-22 16:32:40.765736 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2015-03-22 16:32:40.777156 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:40.777278 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:40.778223 7f4da7fa0780 1 journal close /var/lib/ceph/osd/ceph-1/journal 2015-03-22 16:32:41.066655 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs 2015-03-22 16:32:41.150578 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported and appears to work 2015-03-22 16:32:41.150624 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-03-22 16:32:41.151359 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs 2015-03-22 16:32:41.225302 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully supported (by glibc and kernel) 2015-03-22 16:32:41.225498 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 2015-03-22 16:32:42.375558 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2015-03-22 16:32:42.382958 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 1429: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:42.383187 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 1481: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:43.076434 7f4da7fa0780 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, ceph::bufferlist*)' thread 7f4da7fa0780 time 2015-03-22 16:32:43.075101 osd/PG.cc: 2270: FAILED assert(values.size() == 1) ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7) 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, ceph::buffer::list*)+0x4d7) [0x70ebf7] 2: (OSD::load_pgs()+0x14ce) [0x694efe] 3: (OSD::init()+0x11be) [0x69cffe] 4: (main()+0x1d09) [0x5c3509] 5: (__libc_start_main()+0xed) [0x7f4da5bde76d] 6: /usr/bin/ceph-osd() [0x5c6e1d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- begin dump of recent events --- -75 2015-03-22 16:32:39.259280 7f4da7fa0780 5 asok(0x1aec1c0) register_command perfcounters_dump hook 0x1ae4010 -74 2015-03-22 16:32:39.259373 7f4da7fa0780 5 asok(0x1aec1c0) register_command 1 hook 0x1ae4010 -73 2015-03-22 16:32:39.259393 7f4da7fa0780 5 asok(0x1aec1c0) register_command perf dump hook 0x1ae4010 -72 2015-03-22 16:32:39.259429 7f4da7fa0780 5 asok(0x1aec1c0) register_command perfcounters_schema hook 0x1ae4010 -71 2015-03-22 16:32:39.259445 7f4da7fa0780 5 asok(0x1aec1c0) register_command 2 hook 0x1ae4010 -70 2015-03-22 16:32:39.259453 7f4da7fa0780 5 asok(0x1aec1c0) register_command perf schema hook 0x1ae4010 -69 2015-03-22 16:32:39.259467 7f4da7fa0780 5 asok(0x1aec1c0) register_command config show hook 0x1ae4010 -68 2015-03-22 16:32:39.259481 7f4da7fa0780 5 asok(0x1aec1c0) register_command config set hook 0x1ae4010 -67 2015-03-22 16:32:39.259495 7f4da7fa0780 5 asok(0x1aec1c0) register_command config get hook 0x1ae4010 -66 2015-03-22 16:32:39.259505 7f4da7fa0780 5 asok(0x1aec1c0) register_command log flush hook 0x1ae4010 -65 2015-03-22 16:32:39.259519 7f4da7fa0780 5 asok(0x1aec1c0) register_command log dump hook 0x1ae4010 -64 2015-03-22 16:32:39.259536 7f4da7fa0780 5 asok(0x1aec1c0) register_command log reopen hook 0x1ae4010 -63 2015-03-22 16:32:39.265116
Re: [ceph-users] Can't Start OSD
Are you seeing any error related to the disk (where OSD is mounted) in dmesg ? Could be a leveldb corruption or ceph bug. Now, unfortunately not enough log in that portion of the code base to reveal exactly why we are not getting infoos object from leveldb :-( Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Noah Mehl Sent: Sunday, March 22, 2015 10:11 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Can't Start OSD In production for over a year, and no upgrades. Thanks! ~Noah On Mar 22, 2015, at 1:01 PM, Somnath Roy somnath@sandisk.com wrote: Noah, Is this fresh installation or after upgrade ? It seems related to omap (leveldb) stuff. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Noah Mehl Sent: Sunday, March 22, 2015 9:34 AM To: ceph-users@lists.ceph.com Subject: [ceph-users] Can't Start OSD I have an OSD that’s failing to start. I can’t make heads or tails of the error (pasted below). Thanks! ~Noah 2015-03-22 16:32:39.265116 7f4da7fa0780 0 ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process ceph-osd, pid 13483 2015-03-22 16:32:39.269499 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs 2015-03-22 16:32:39.269509 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2015-03-22 16:32:39.450031 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported and appears to work 2015-03-22 16:32:39.450069 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-03-22 16:32:39.450743 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs 2015-03-22 16:32:39.499753 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully supported (by glibc and kernel) 2015-03-22 16:32:39.500078 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 2015-03-22 16:32:40.765736 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2015-03-22 16:32:40.777156 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:40.777278 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:40.778223 7f4da7fa0780 1 journal close /var/lib/ceph/osd/ceph-1/journal 2015-03-22 16:32:41.066655 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs 2015-03-22 16:32:41.150578 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported and appears to work 2015-03-22 16:32:41.150624 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-03-22 16:32:41.151359 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs 2015-03-22 16:32:41.225302 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully supported (by glibc and kernel) 2015-03-22 16:32:41.225498 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 2015-03-22 16:32:42.375558 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2015-03-22 16:32:42.382958 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 1429: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:42.383187 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 1481: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:43.076434 7f4da7fa0780 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, ceph::bufferlist*)' thread 7f4da7fa0780 time 2015-03-22 16:32:43.075101 osd/PG.cc: 2270: FAILED assert(values.size() == 1) ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7) 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, ceph::buffer::list*)+0x4d7) [0x70ebf7] 2: (OSD::load_pgs()+0x14ce) [0x694efe] 3: (OSD::init()+0x11be) [0x69cffe] 4: (main()+0x1d09) [0x5c3509] 5: (__libc_start_main()+0xed) [0x7f4da5bde76d] 6: /usr/bin/ceph-osd() [0x5c6e1d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- begin dump of recent events --- -75 2015-03-22 16:32:39.259280 7f4da7fa0780 5 asok(0x1aec1c0) register_command perfcounters_dump hook 0x1ae4010 -74 2015-03-22 16:32:39.259373 7f4da7fa0780 5 asok(0x1aec1c0) register_command 1 hook 0x1ae4010 -73 2015-03-22 16:32:39.259393 7f4da7fa0780 5 asok
Re: [ceph-users] Can't Start OSD
To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Can't Start OSD In production for over a year, and no upgrades. Thanks! ~Noah On Mar 22, 2015, at 1:01 PM, Somnath Roy somnath@sandisk.com wrote: Noah, Is this fresh installation or after upgrade ? It seems related to omap (leveldb) stuff. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Noah Mehl Sent: Sunday, March 22, 2015 9:34 AM To: ceph-users@lists.ceph.com Subject: [ceph-users] Can't Start OSD I have an OSD that’s failing to start. I can’t make heads or tails of the error (pasted below). Thanks! ~Noah 2015-03-22 16:32:39.265116 7f4da7fa0780 0 ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process ceph-osd, pid 13483 2015-03-22 16:32:39.269499 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs 2015-03-22 16:32:39.269509 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2015-03-22 16:32:39.450031 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported and appears to work 2015-03-22 16:32:39.450069 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-03-22 16:32:39.450743 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs 2015-03-22 16:32:39.499753 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully supported (by glibc and kernel) 2015-03-22 16:32:39.500078 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 2015-03-22 16:32:40.765736 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2015-03-22 16:32:40.777156 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:40.777278 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:40.778223 7f4da7fa0780 1 journal close /var/lib/ceph/osd/ceph-1/journal 2015-03-22 16:32:41.066655 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs 2015-03-22 16:32:41.150578 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported and appears to work 2015-03-22 16:32:41.150624 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-03-22 16:32:41.151359 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs 2015-03-22 16:32:41.225302 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully supported (by glibc and kernel) 2015-03-22 16:32:41.225498 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 2015-03-22 16:32:42.375558 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2015-03-22 16:32:42.382958 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 1429: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:42.383187 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 1481: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:43.076434 7f4da7fa0780 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, ceph::bufferlist*)' thread 7f4da7fa0780 time 2015-03-22 16:32:43.075101 osd/PG.cc: 2270: FAILED assert(values.size() == 1) ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7) 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, ceph::buffer::list*)+0x4d7) [0x70ebf7] 2: (OSD::load_pgs()+0x14ce) [0x694efe] 3: (OSD::init()+0x11be) [0x69cffe] 4: (main()+0x1d09) [0x5c3509] 5: (__libc_start_main()+0xed) [0x7f4da5bde76d] 6: /usr/bin/ceph-osd() [0x5c6e1d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- begin dump of recent events --- -75 2015-03-22 16:32:39.259280 7f4da7fa0780 5 asok(0x1aec1c0) register_command perfcounters_dump hook 0x1ae4010 -74 2015-03-22 16:32:39.259373 7f4da7fa0780 5 asok(0x1aec1c0) register_command 1 hook 0x1ae4010 -73 2015-03-22 16:32:39.259393 7f4da7fa0780 5 asok(0x1aec1c0) register_command perf dump hook 0x1ae4010 -72 2015-03-22 16:32:39.259429 7f4da7fa0780 5 asok(0x1aec1c0) register_command perfcounters_schema hook 0x1ae4010 -71 2015-03-22 16:32:39.259445 7f4da7fa0780 5 asok(0x1aec1c0) register_command 2 hook 0x1ae4010 -70 2015-03-22 16:32:39.259453 7f4da7fa0780 5 asok(0x1aec1c0) register_command perf schema hook 0x1ae4010 -69 2015-03-22 16:32:39.259467 7f4da7fa0780 5 asok
Re: [ceph-users] Can't Start OSD
for 5954121.994079, current state stale+active+clean, last acting [1] It’s osd id=1 that’s problematic, but I should have a replica of the data somewhere else? Thanks! ~Noah On Mar 22, 2015, at 2:04 PM, Somnath Roy somnath@sandisk.com wrote: Are you seeing any error related to the disk (where OSD is mounted) in dmesg ? Could be a leveldb corruption or ceph bug. Now, unfortunately not enough log in that portion of the code base to reveal exactly why we are not getting infoos object from leveldb :-( Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Noah Mehl Sent: Sunday, March 22, 2015 10:11 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Can't Start OSD In production for over a year, and no upgrades. Thanks! ~Noah On Mar 22, 2015, at 1:01 PM, Somnath Roy somnath@sandisk.com wrote: Noah, Is this fresh installation or after upgrade ? It seems related to omap (leveldb) stuff. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Noah Mehl Sent: Sunday, March 22, 2015 9:34 AM To: ceph-users@lists.ceph.com Subject: [ceph-users] Can't Start OSD I have an OSD that’s failing to start. I can’t make heads or tails of the error (pasted below). Thanks! ~Noah 2015-03-22 16:32:39.265116 7f4da7fa0780 0 ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process ceph-osd, pid 13483 2015-03-22 16:32:39.269499 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs 2015-03-22 16:32:39.269509 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2015-03-22 16:32:39.450031 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported and appears to work 2015-03-22 16:32:39.450069 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-03-22 16:32:39.450743 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs 2015-03-22 16:32:39.499753 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully supported (by glibc and kernel) 2015-03-22 16:32:39.500078 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 2015-03-22 16:32:40.765736 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2015-03-22 16:32:40.777156 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:40.777278 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:40.778223 7f4da7fa0780 1 journal close /var/lib/ceph/osd/ceph-1/journal 2015-03-22 16:32:41.066655 7f4da7fa0780 1 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs 2015-03-22 16:32:41.150578 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported and appears to work 2015-03-22 16:32:41.150624 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-03-22 16:32:41.151359 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs 2015-03-22 16:32:41.225302 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully supported (by glibc and kernel) 2015-03-22 16:32:41.225498 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 2015-03-22 16:32:42.375558 7f4da7fa0780 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2015-03-22 16:32:42.382958 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 1429: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:42.383187 7f4da7fa0780 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 1481: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-03-22 16:32:43.076434 7f4da7fa0780 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, ceph::bufferlist*)' thread 7f4da7fa0780 time 2015-03-22 16:32:43.075101 osd/PG.cc: 2270: FAILED assert(values.size() == 1) ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7) 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, ceph::buffer::list*)+0x4d7) [0x70ebf7] 2: (OSD::load_pgs()+0x14ce) [0x694efe] 3: (OSD::init()+0x11be) [0x69cffe] 4: (main()+0x1d09) [0x5c3509] 5: (__libc_start_main()+0xed) [0x7f4da5bde76d] 6: /usr/bin/ceph-osd() [0x5c6e1d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- begin dump of recent events --- -75 2015
Re: [ceph-users] Can't start osd- one osd alway be down.
Hi Craig Lewis, My pool have 300TB DATA, I can't recreate a new pool, then copying data by ceph cp pool (take very long time). I upgraded Ceph to Giant (0.86), but still error :(( I think my proplem is objects misplaced (0.320%) # ceph pg 23.96 query num_objects_missing_on_primary: 0, num_objects_degraded: 0, NUM_OBJECTS_MISPLACED: 79, cluster xx-x-x-x health HEALTH_WARN 225 pgs degraded; 2 pgs repair; 225 pgs stuck degraded; 263 pgs stuck unclean; 225 pgs stuck undersized; 225 pgs undersized; recovery 308759/54799506 objects degraded (0.563%); 175270/54799506 objects misplaced (0.320%); 1/130 in osds are down; flags noout,nodeep-scrub pgmap v28905830: 14973 pgs, 23 pools, 70255 GB data, 17838 kobjects 206 TB used, 245 TB / 452 TB avail 308759/54799506 objects degraded (0.563%); 175270/54799506 OBJECTS MISPLACED (0.320%) 14708 active+clean 38 ACTIVE+REMAPPED 225 ACTIVE+UNDERSIZED+DEGRADED client io 35068 kB/s rd, 71815 kB/s wr, 4956 op/s - Checking in ceph log: 2014-10-28 15:33:59.733177 7f6a7f1ab700 5 OSD.21 pg_epoch: 103718 pg[23.96( v 103713'171086 (103609'167229,103713'171086] local-les=103715 n=85 ec=25000 les/c 103715/103710 103714/103714/103236) [92,21,78] r=1 lpr=103714 pi=100280-103713/118 luod=0'0 crt=103713'171086 active] ENTER STARTED/REPLICAACTIVE/REPNOTRECOVERING Then logging many failed log: (on many objects eg: c03fe096/rbd_data.5348922ae8944a.306b,..) 2014-10-28 15:33:59.343435 7f6a7e1a9700 5 -- op tracker -- seq: 1793, time: 2014-10-28 15:33:59.343435, event: done, op: MOSDPGPush(23.96 103718 [PushOp(C03FE096/RBD_DATA.5348922AE8944A.306B/HEAD//24, version: 103622'283374, data_included: [0~4194304], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(c03fe096/rbd_data.5348922ae8944a.306b/head//24@103622'283374, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(4120f096/rbd_data.7a63d32ae8944a.0083/head//24, version: 103679'295624, data_included: [0~4194304], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(4120f096/rbd_data.7a63d32ae8944a.0083/head//24@103679'295624, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) Thanks! -- Tuan HaNoi-VietNam On 2014-10-28 01:35, Craig Lewis wrote: My experience is that once you hit this bug, those PGs are gone. I tried marking the primary OSD OUT, which caused this problem to move to the new primary OSD. Luckily for me, my affected PGs were using replication state in the secondary cluster. I ended up deleting the whole pool and recreating it. Which pools are 7 and 23? It's possible that it's something that easy to replace. On Fri, Oct 24, 2014 at 9:26 PM, Ta Ba Tuan tua...@vccloud.vn wrote: Hi Craig, Thanks for replying. When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log. Those pgs are active+degraded state. #ceph pg map 7.9d8 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start osd.21 then pg 7.9d8 and three remain pgs to changed to state active+recovering) . osd.21 still down after following logs: ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't start osd- one osd alway be down.
My experience is that once you hit this bug, those PGs are gone. I tried marking the primary OSD OUT, which caused this problem to move to the new primary OSD. Luckily for me, my affected PGs were using replication state in the secondary cluster. I ended up deleting the whole pool and recreating it. Which pools are 7 and 23? It's possible that it's something that easy to replace. On Fri, Oct 24, 2014 at 9:26 PM, Ta Ba Tuan tua...@vccloud.vn wrote: Hi Craig, Thanks for replying. When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log. Those pgs are active+degraded state. #ceph pg map 7.9d8 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start osd.21 then pg 7.9d8 and three remain pgs to changed to state active+recovering) . osd.21 still down after following logs: ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't start osd- one osd alway be down.
I send some related bugs: (osd.21 not be able started) -8705 2014-10-25 14:41:04.345727 7f12bac2f700 5 *osd.21* pg_epoch: 102843 pg[*6.5e1*( v 102843'11832159 (102377'11822991,102843'11832159] lb c4951de1/rbd_data.3955c5cdbb2ea.000405f0/head//6 local-les=101780 n=4719 ec=164 les/c 102841/102838 102840/102840/102477) [40,0,21]/[40,0,60] r=-1 lpr=102840 pi=31832-102839/230 luod=0'0 crt=102843'11832157 lcod 102843'11832158 active+remapped] *exit Started/ReplicaActive/RepNotRecovering* 0.000170 1 0.000296 -1637 2014-10-25 14:41:14.326580 7f12bac2f700 5 *osd.21* pg_epoch: 102843 pg[*2.23b*( v 102839'91984 (91680'88526,102839'91984] local-les=102841 n=85 ec=25000 les/c 102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100114-102839/50 luod=0'0 crt=102839'91984 active] *enter Started/ReplicaActive/RepNotRecovering* -437 2014-10-25 14:41:15.042174 7f12ba42e700 5 *osd.21 *pg_epoch: 102843 pg[*27.239(* v 102808'38419 (81621'35409,102808'38419] local-les=102841 n=23 ec=25085 les/c 102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100252-102839/53 luod=0'0 crt=102808'38419 active] *enter **Started/ReplicaActive/RepNotRecovering* Thanks! On 10/25/2014 11:26 AM, Ta Ba Tuan wrote: Hi Craig, Thanks for replying. When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log. Those pgs are active+degraded state. #ceph pg map 7.9d8 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start osd.21 then pg 7.9d8 and three remain pgs to changed to state active+recovering) . osd.21 still down after following logs: 2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731 seconds old, received at 2014-10-25 10:57:17.580013: MOSDPGPush(*7.9d8 *102803 [Push Op(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6, version: 102798'7794851, data_included: [0~4194304], data_size: 4194304, omap_heade r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6@102 798'7794851, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete :true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_rec overed_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588 seconds old, received at 2014-10-25 10:57:18.140156: MOSDPGPush(*23.596* 102803 [Pus hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24, version: 102798'295732, data_included: [0~4194304], data_size: 4194304, omap_head er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24@1 02798'295732, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re covered_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696 seconds old, received at 2014-10-25 10:57:17.555048: MOSDPGPush(*23.9c6* 102803 [Pus hOp(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24, version: 102798'66056, data_included: [0~4194304], data_size: 4194304, omap_heade r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24@10 2798'66056, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete: true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_reco vered_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:58.418847 osd.21 [WRN] 26 slow requests, 1 included below; oldest blocked for 54.967456 secs 2014-10-25 10:57:58.418859 osd.21 [WRN] slow request 30.967294 seconds old, received at 2014-10-25 10:57:27.451488: MOSDPGPush(*23.63c* 102803 [Pus hOp(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24, version: 102748'145637, data_included: [0~4194304], data_size: 4194304, omap_head er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24@1 02748'145637, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re covered_to:, omap_complete:false))])
Re: [ceph-users] Can't start osd- one osd alway be down.
My Ceph was hung, andosd.21 172.30.5.2:6870/8047 879 : [ERR] 6.9d8 has 4 objects unfound and apparently lost. After I restart all ceph-data nodes, I can't start osd.21, have many logs about pg 6.9d8 as: -440 2014-10-25 19:28:17.468161 7fec5731d700 5 -- op tracker -- seq: 3083, time: 2014-10-25 19:28:17.468161, event: reached_pg, op: MOSDPGPus h(*6.9d8* 102856 [PushOp(e8de59d8/*rbd_data.4d091f7304c844.e871/head//6*, version: 102853'7800592, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(e8de59d8/rbd_data.4d091f7304c844. e871/head//6@102853'7800592, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:41 94304, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_comp lete:false, omap_recovered_to:, omap_complete:false))]) I think having some error objects. What'm I must do?,please! Thanks! -- Tuan HaNoi-VietNam On 10/25/2014 03:01 PM, Ta Ba Tuan wrote: I send some related bugs: (osd.21 not be able started) -8705 2014-10-25 14:41:04.345727 7f12bac2f700 5 *osd.21* pg_epoch: 102843 pg[*6.5e1*( v 102843'11832159 (102377'11822991,102843'11832159] lb c4951de1/rbd_data.3955c5cdbb2ea.000405f0/head//6 local-les=101780 n=4719 ec=164 les/c 102841/102838 102840/102840/102477) [40,0,21]/[40,0,60] r=-1 lpr=102840 pi=31832-102839/230 luod=0'0 crt=102843'11832157 lcod 102843'11832158 active+remapped] *exit Started/ReplicaActive/RepNotRecovering* 0.000170 1 0.000296 -1637 2014-10-25 14:41:14.326580 7f12bac2f700 5 *osd.21* pg_epoch: 102843 pg[*2.23b*( v 102839'91984 (91680'88526,102839'91984] local-les=102841 n=85 ec=25000 les/c 102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100114-102839/50 luod=0'0 crt=102839'91984 active] *enter Started/ReplicaActive/RepNotRecovering* -437 2014-10-25 14:41:15.042174 7f12ba42e700 5 *osd.21 *pg_epoch: 102843 pg[*27.239(* v 102808'38419 (81621'35409,102808'38419] local-les=102841 n=23 ec=25085 les/c 102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100252-102839/53 luod=0'0 crt=102808'38419 active] *enter **Started/ReplicaActive/RepNotRecovering* Thanks! On 10/25/2014 11:26 AM, Ta Ba Tuan wrote: Hi Craig, Thanks for replying. When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log. Those pgs are active+degraded state. #ceph pg map 7.9d8 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start osd.21 then pg 7.9d8 and three remain pgs to changed to state active+recovering) . osd.21 still down after following logs: 2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731 seconds old, received at 2014-10-25 10:57:17.580013: MOSDPGPush(*7.9d8 *102803 [Push Op(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6, version: 102798'7794851, data_included: [0~4194304], data_size: 4194304, omap_heade r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6@102 798'7794851, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete :true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_rec overed_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588 seconds old, received at 2014-10-25 10:57:18.140156: MOSDPGPush(*23.596* 102803 [Pus hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24, version: 102798'295732, data_included: [0~4194304], data_size: 4194304, omap_head er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24@1 02798'295732, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re covered_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696 seconds old, received at 2014-10-25 10:57:17.555048: MOSDPGPush(*23.9c6* 102803 [Pus hOp(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24, version: 102798'66056, data_included: [0~4194304], data_size: 4194304, omap_heade r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24@10 2798'66056, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first,
Re: [ceph-users] Can't start osd- one osd alway be down.
#ceph pg *6.9d8* query ... peer_info: [ { peer: 49, pgid: 6.9d8, last_update: 102889'7801917, last_complete: 102889'7801917, log_tail: 102377'7792649, last_user_version: 7801879, last_backfill: MAX, purged_snaps: [1~7,9~44b,455~1f8,64f~63,6b3~3a,6ee~12f,81f~10,830~8,839~69b,ed7~7,edf~4,ee4~6f5,15da~f9,16d4~1f,16f5~7,16fd~4,1705~5 e,1764~7,1771~78,17eb~12,1800~2,1803~d,1812~3,181a~1,181c~a,1827~3b,1863~1,1865~1,1867~1,186b~e,187a~3,1881~1,1884~7,188c~1,188f~3,1894~5,189f~2, 18ab~1,18c6~1,1922~13,193d~1,1940~1,194a~1,1968~5,1975~1,1979~4,197e~4,1984~1,1987~11,199c~1,19a0~1,19a3~9,19ad~3,19b2~1,19b6~27,19de~8], history: { epoch_created: 164, last_epoch_started: 102888, last_epoch_clean: 102888, last_epoch_split: 0 parent_split_bits: 0, last_scrub: 91654'7460936, last_scrub_stamp: 2014-10-10 10:36:25.433016, last_deep_scrub: 81667'5815892, last_deep_scrub_stamp: 2014-08-29 09:44:14.012219, last_clean_scrub_stamp: 2014-10-10 10:36:25.433016, log_size: 9229, ondisk_log_size: 9229, stats_invalid: 1, stat_sum: { num_bytes: 17870536192, num_objects: 4327, num_object_clones: 29, num_object_copies: 12981,* ** num_objects_missing_on_primary: 4,* num_objects_degraded: 4, num_objects_unfound: 0, num_objects_dirty: 1092, num_whiteouts: 0, num_read: 4820626, num_read_kb: 59073045, num_write: 12748709, num_write_kb: 181630845, num_scrub_errors: 0, num_shallow_scrub_errors: 0, num_deep_scrub_errors: 0, num_objects_recovered: 135847, num_bytes_recovered: 562255538176, num_keys_recovered: 0, num_objects_omap: 0, num_objects_hit_set_archive: 0}, On 10/25/2014 07:40 PM, Ta Ba Tuan wrote: My Ceph was hung, andosd.21 172.30.5.2:6870/8047 879 : [ERR] 6.9d8 has 4 objects unfound and apparently lost. After I restart all ceph-data nodes, I can't start osd.21, have many logs about pg 6.9d8 as: -440 2014-10-25 19:28:17.468161 7fec5731d700 5 -- op tracker -- seq: 3083, time: 2014-10-25 19:28:17.468161, event: reached_pg, op: MOSDPGPus h(*6.9d8* 102856 [PushOp(e8de59d8/*rbd_data.4d091f7304c844.e871/head//6*, version: 102853'7800592, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(e8de59d8/rbd_data.4d091f7304c844. e871/head//6@102853'7800592, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:41 94304, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_comp lete:false, omap_recovered_to:, omap_complete:false))]) I think having some error objects. What'm I must do?,please! Thanks! -- Tuan HaNoi-VietNam On 10/25/2014 03:01 PM, Ta Ba Tuan wrote: I send some related bugs: (osd.21 not be able started) -8705 2014-10-25 14:41:04.345727 7f12bac2f700 5 *osd.21* pg_epoch: 102843 pg[*6.5e1*( v 102843'11832159 (102377'11822991,102843'11832159] lb c4951de1/rbd_data.3955c5cdbb2ea.000405f0/head//6 local-les=101780 n=4719 ec=164 les/c 102841/102838 102840/102840/102477) [40,0,21]/[40,0,60] r=-1 lpr=102840 pi=31832-102839/230 luod=0'0 crt=102843'11832157 lcod 102843'11832158 active+remapped] *exit Started/ReplicaActive/RepNotRecovering* 0.000170 1 0.000296 -1637 2014-10-25 14:41:14.326580 7f12bac2f700 5 *osd.21* pg_epoch: 102843 pg[*2.23b*( v 102839'91984 (91680'88526,102839'91984] local-les=102841 n=85 ec=25000 les/c 102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100114-102839/50 luod=0'0 crt=102839'91984 active] *enter Started/ReplicaActive/RepNotRecovering* -437 2014-10-25 14:41:15.042174 7f12ba42e700 5 *osd.21 *pg_epoch: 102843 pg[*27.239(* v 102808'38419 (81621'35409,102808'38419] local-les=102841 n=23 ec=25085 les/c 102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100252-102839/53 luod=0'0 crt=102808'38419 active] *enter **Started/ReplicaActive/RepNotRecovering* Thanks! On 10/25/2014 11:26 AM, Ta Ba Tuan wrote: Hi Craig, Thanks for replying. When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log. Those pgs are active+degraded state. #ceph pg map 7.9d8 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start osd.21 then pg 7.9d8 and three remain pgs to changed to state
Re: [ceph-users] Can't start osd- one osd alway be down.
It looks like you're running into http://tracker.ceph.com/issues/5699 You're running 0.80.7, which has a fix for that bug. From my reading of the code, I believe the fix only prevents the issue from occurring. It doesn't work around or repair bad snapshots created on older versions of Ceph. Were any of the snapshots you're removing up created on older versions of Ceph? If they were all created on Firefly, then you should open a new tracker issue, and try to get some help on IRC or the developers mailing list. On Thu, Oct 23, 2014 at 10:21 PM, Ta Ba Tuan tua...@vccloud.vn wrote: Dear everyone I can't start osd.21, (attached log file). some pgs can't be repair. I'm using replicate 3 for my data pool. Feel some objects in those pgs be failed, I tried to delete some data that related above objects, but still not start osd.21 and, removed osd.21, but other osds (eg: osd.86 down, not start osd.86). Guide me to debug it, please! Thanks! -- Tuan Ha Noi - VietNam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't start osd- one osd alway be down.
Hi Craig, Thanks for replying. When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log. Those pgs are active+degraded state. #ceph pg map 7.9d8 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start osd.21 then pg 7.9d8 and three remain pgs to changed to state active+recovering) . osd.21 still down after following logs: 2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731 seconds old, received at 2014-10-25 10:57:17.580013: MOSDPGPush(*7.9d8 *102803 [Push Op(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6, version: 102798'7794851, data_included: [0~4194304], data_size: 4194304, omap_heade r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6@102 798'7794851, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete :true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_rec overed_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588 seconds old, received at 2014-10-25 10:57:18.140156: MOSDPGPush(*23.596* 102803 [Pus hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24, version: 102798'295732, data_included: [0~4194304], data_size: 4194304, omap_head er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24@1 02798'295732, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re covered_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696 seconds old, received at 2014-10-25 10:57:17.555048: MOSDPGPush(*23.9c6* 102803 [Pus hOp(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24, version: 102798'66056, data_included: [0~4194304], data_size: 4194304, omap_heade r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24@10 2798'66056, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete: true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_reco vered_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:58.418847 osd.21 [WRN] 26 slow requests, 1 included below; oldest blocked for 54.967456 secs 2014-10-25 10:57:58.418859 osd.21 [WRN] slow request 30.967294 seconds old, received at 2014-10-25 10:57:27.451488: MOSDPGPush(*23.63c* 102803 [Pus hOp(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24, version: 102748'145637, data_included: [0~4194304], data_size: 4194304, omap_head er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24@1 02748'145637, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re covered_to:, omap_complete:false))]) v2 currently no flag points reached Thanks! -- Tuan HaNoi-VietNam On 10/25/2014 05:07 AM, Craig Lewis wrote: It looks like you're running into http://tracker.ceph.com/issues/5699 You're running 0.80.7, which has a fix for that bug. From my reading of the code, I believe the fix only prevents the issue from occurring. It doesn't work around or repair bad snapshots created on older versions of Ceph. Were any of the snapshots you're removing up created on older versions of Ceph? If they were all created on Firefly, then you should open a new tracker issue, and try to get some help on IRC or the developers mailing list. On Thu, Oct 23, 2014 at 10:21 PM, Ta Ba Tuan tua...@vccloud.vn mailto:tua...@vccloud.vn wrote: Dear everyone I can't start osd.21, (attached log file). some pgs can't be repair. I'm using replicate 3 for my data pool. Feel some objects in those pgs be failed, I tried to delete some data that related above objects, but still not start osd.21 and, removed osd.21, but other osds (eg: osd.86 down, not start osd.86). Guide me to debug it, please! Thanks! -- Tuan Ha Noi - VietNam ___ ceph-users mailing list
[ceph-users] Can't start osd- one osd alway be down.
Dear everyone I can't start osd.21, (attached log file). some pgs can't be repair. I'm using replicate 3 for my data pool. Feel some objects in those pgs be failed, I tried to delete some data that related above objects, but still not start osd.21 and, removed osd.21, but other osds (eg: osd.86 down, not start osd.86). Guide me to debug it, please! Thanks! -- Tuan Ha Noi - VietNam 2014-10-24 11:10:53.036094 7f86c6fcb780 0 xfsfilestorebackend(/var/lib/ceph/osd/cloud-21) detect_feature: extsize is disabled by conf 2014-10-24 11:10:53.181392 7f86c6fcb780 0 filestore(/var/lib/ceph/osd/cloud-21) mount: WRITEAHEAD journal mode explicitly enabled in conf 2014-10-24 11:10:53.191499 7f86c6fcb780 1 journal _open /dev/sdi1 fd 24: 11998855168 bytes, block size 4096 bytes, directio = 1, aio = 1 2014-10-24 11:11:03.794632 7f86c6fcb780 1 journal _open /dev/sdi1 fd 24: 11998855168 bytes, block size 4096 bytes, directio = 1, aio = 1 2014-10-24 11:11:03.845410 7f86c6fcb780 0 cls cls/hello/cls_hello.cc:271: loading cls_hello 2014-10-24 11:11:04.174302 7f86c6fcb780 0 osd.21 101773 crush map has features 1107558400, adjusting msgr requires for clients 2014-10-24 11:11:04.174360 7f86c6fcb780 0 osd.21 101773 crush map has features 1107558400 was 8705, adjusting msgr requires for mons 2014-10-24 11:11:04.174373 7f86c6fcb780 0 osd.21 101773 crush map has features 1107558400, adjusting msgr requires for osds 2014-10-24 11:11:04.174402 7f86c6fcb780 0 osd.21 101773 load_pgs 2014-10-24 11:11:22.986057 7f86c6fcb780 0 osd.21 101773 load_pgs opened 281 pgs 2014-10-24 11:11:23.039971 7f86b6d2e700 0 osd.21 101773 ignoring osdmap until we have initialized 2014-10-24 11:11:23.040818 7f86b6d2e700 0 osd.21 101773 ignoring osdmap until we have initialized 2014-10-24 11:11:23.276236 7f86c6fcb780 0 osd.21 101773 done with init, starting boot process 2014-10-24 11:12:44.346474 7f865ca3c700 0 -- 192.168.1.2:6840/28594 172.30.1.81:0/4234900213 pipe(0x23c15000 sd=66 :6840 s=0 pgs=0 cs=0 l=0 c=0x246f96e0).accept peer addr is really 172.30.1.81:0/4234900213 (socket is 172.30.1.81:47697/0) 2014-10-24 11:15:27.767594 7f86a2505700 -1 *** Caught signal (Segmentation fault) ** in thread 7f86a2505700 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: /usr/bin/ceph-osd() [0x9c830a] 2: (()+0xfcb0) [0x7f86c6009cb0] 3: (ReplicatedPG::trim_object(hobject_t const)+0x395) [0x8079e5] 4: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const)+0x44c) [0x82215c] 5: (boost::statechart::simple_stateReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::listmpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, (boost::statechart::history_mode)0::react_impl(boost::statechart::event_base const, void const*)+0xc0) [0x867390] 6: (boost::statechart::state_machineReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocatorvoid, boost::statechart::null_exception_translator::process_queued_events()+0xfb) [0x84d70b] 7: (boost::statechart::state_machineReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocatorvoid, boost::statechart::null_exception_translator::process_event(boost::statechart::event_base const)+0x1e) [0x84d8de] 8: (ReplicatedPG::snap_trimmer()+0x588) [0x7cc118] 9: (OSD::SnapTrimWQ::_process(PG*)+0x14) [0x675f14] 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0xa9a366] 11: (ThreadPool::WorkThread::entry()+0x10) [0xa9c380] 12: (()+0x7e9a) [0x7f86c6001e9a] 13: (clone()+0x6d) [0x7f86c4efb31d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- begin dump of recent events --- -1 2014-10-24 11:15:20.324218 7f869c4f9700 1 -- 192.168.1.2:0/28594 -- 192.168.1.3:6853/4658 -- osd_ping(ping e101783 stamp 2014-10-24 11:15:20.322565) v2 -- ?+0 0xe955e00 con 0x20af2b00 - 2014-10-24 11:15:20.324268 7f869c4f9700 1 -- 192.168.1.2:0/28594 -- 192.168.1.3:6862/22372 -- osd_ping(ping e101783 stamp 2014-10-24 11:15:20.322565) v2 -- ?+0 0x22e2de00 con 0x20af2840 -9998 2014-10-24 11:15:20.324313 7f869c4f9700 1 -- 192.168.1.2:0/28594 -- 192.168.1.3:6863/22372 -- osd_ping(ping e101783 stamp 2014-10-24 11:15:20.322565) v2 -- ?+0 0x22e2d700 con 0x20af26e0 -9713 2014-10-24 11:15:20.365153 7f86ae51d700 5 -- op tracker -- , seq: 18573, time: 2014-10-24 11:15:20.365153, event: done, request: osd_op(client.7869019.0:6944380 rbd_data.451e822ae8944a.0128 [set-alloc-hint object_size 4194304 write_size 4194304,write 479232~4096] 6.b4cc39f6 snapc 18ee=[18ee] ack+ondisk+write e101783) v4 -9712 2014-10-24 11:15:20.365266 7f86ae51d700 5 -- op tracker -- , seq: 18576, time: 2014-10-24 11:15:20.365266, event: done, request: osd_sub_op_reply(client.7869019.0:6944380 6.9f6 b4cc39f6/rbd_data.451e822ae8944a.0128/head//6 [] ondisk, result = 0) v2 -9711 2014-10-24
Re: [ceph-users] Can't start OSD
Try to make these OSD IN ceph osd in osd.12 osd.13 osd.14 osd.15 Then restart osd services - Karan Singh - On 08 Aug 2014, at 00:55, O'Reilly, Dan daniel.orei...@dish.com wrote: # idweight type name up/down reweight -1 7.2 root default -2 1.8 host tm1cldosdl01 0 0.45osd.0 up 1 1 0.45osd.1 up 1 2 0.45osd.2 up 1 3 0.45osd.3 up 1 -3 1.8 host tm1cldosdl02 4 0.45osd.4 up 1 5 0.45osd.5 up 1 6 0.45osd.6 up 1 7 0.45osd.7 up 1 -4 1.8 host tm1cldosdl03 8 0.45osd.8 up 1 9 0.45osd.9 up 1 10 0.45osd.10 up 1 11 0.45osd.11 up 1 -5 1.8 host tm1cldosdl04 12 0.45osd.12 down0 13 0.45osd.13 down0 14 0.45osd.14 down0 15 0.45osd.15 down0 [ceph@tm1cldosdl04 ~]$ sudo /etc/init.d/ceph start osd.12 /etc/init.d/ceph: osd.12 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines ) What am I missing? Specifically, what would need to be in ceph.conf or /var/lib/ceph? Dan O'Reilly UNIX Systems Administration image001.jpg 9601 S. Meridian Blvd. Englewood, CO 80112 720-514-6293 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't start OSD
Nope. Nothing works. This is VERY frustrating. What happened: - I rebooted the box, simulating a system failure. - When the system came back up, ceph wasn't started, and the osd volumes weren't mounted. - I did a service ceph start osd and the ceph processes don't start - I did a ceph-deploy activate on the devices, so they're mounted. service ceph start still doesn't start anything. Right now: # service ceph restart === osd.18 === === osd.18 === Stopping Ceph osd.18 on tm1cldosdl04...done === osd.18 === create-or-move updated item name 'osd.18' weight 0.45 at location {host=tm1cldosdl04,root=default} to crush map Starting Ceph osd.18 on tm1cldosdl04... starting osd.18 at :/0 osd_data /var/lib/ceph/osd/ceph-18 /var/lib/ceph/osd/ceph-18/journal === osd.17 === === osd.17 === Stopping Ceph osd.17 on tm1cldosdl04...done === osd.17 === create-or-move updated item name 'osd.17' weight 0.45 at location {host=tm1cldosdl04,root=default} to crush map Starting Ceph osd.17 on tm1cldosdl04... starting osd.17 at :/0 osd_data /var/lib/ceph/osd/ceph-17 /var/lib/ceph/osd/ceph-17/journal === osd.19 === === osd.19 === Stopping Ceph osd.19 on tm1cldosdl04...done === osd.19 === create-or-move updated item name 'osd.19' weight 0.45 at location {host=tm1cldosdl04,root=default} to crush map Starting Ceph osd.19 on tm1cldosdl04... starting osd.19 at :/0 osd_data /var/lib/ceph/osd/ceph-19 /var/lib/ceph/osd/ceph-19/journal === osd.16 === === osd.16 === Stopping Ceph osd.16 on tm1cldosdl04...done === osd.16 === create-or-move updated item name 'osd.16' weight 0.45 at location {host=tm1cldosdl04,root=default} to crush map Starting Ceph osd.16 on tm1cldosdl04... starting osd.16 at :/0 osd_data /var/lib/ceph/osd/ceph-16 /var/lib/ceph/osd/ceph-16/journal [NEW:note: root@tm1cldosdl04 on parent: /root] # ps -eaf|grep ceph root 7528 6124 0 07:32 pts/000:00:00 grep ceph [NEW:note: root@tm1cldosdl04 on parent: /root] # ceph osd tree # idweight type name up/down reweight -1 9 root default -2 1.8 host tm1cldosdl01 0 0.45osd.0 up 1 1 0.45osd.1 up 1 2 0.45osd.2 up 1 3 0.45osd.3 up 1 -3 1.8 host tm1cldosdl02 4 0.45osd.4 up 1 5 0.45osd.5 up 1 6 0.45osd.6 up 1 7 0.45osd.7 up 1 -4 1.8 host tm1cldosdl03 8 0.45osd.8 up 1 9 0.45osd.9 up 1 10 0.45osd.10 up 1 11 0.45osd.11 up 1 -5 3.6 host tm1cldosdl04 12 0.45osd.12 DNE 13 0.45osd.13 DNE 14 0.45osd.14 DNE 15 0.45osd.15 DNE 16 0.45osd.16 down0 17 0.45osd.17 down0 18 0.45osd.18 down0 19 0.45osd.19 down0 I'm missing something here. I don't know if it's a config issue or what. But the docs haven't helped me. From: Karan Singh [mailto:karan.si...@csc.fi] Sent: Friday, August 08, 2014 1:11 AM To: O'Reilly, Dan Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Can't start OSD Try to make these OSD IN ceph osd in osd.12 osd.13 osd.14 osd.15 Then restart osd services - Karan Singh - On 08 Aug 2014, at 00:55, O'Reilly, Dan daniel.orei...@dish.commailto:daniel.orei...@dish.com wrote: # idweight type name up/down reweight -1 7.2 root default -2 1.8 host tm1cldosdl01 0 0.45osd.0 up 1 1 0.45osd.1 up 1 2 0.45osd.2 up 1 3 0.45osd.3 up 1 -3 1.8 host tm1cldosdl02 4 0.45osd.4 up 1 5 0.45osd.5 up 1 6 0.45osd.6 up 1 7 0.45osd.7 up 1 -4 1.8 host tm1cldosdl03 8 0.45osd.8 up 1 9 0.45osd.9 up 1 10 0.45osd.10 up 1 11 0.45osd.11 up 1 -5 1.8 host tm1cldosdl04 12 0.45osd.12 down0 13 0.45osd.13 down0 14 0.45osd.14 down0 15 0.45osd.15 down0 [ceph@tm1cldosdl04 ~]$ sudo /etc/init.d/ceph start osd.12 /etc/init.d/ceph: osd.12 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines ) What am I missing? Specifically, what would need to be in ceph.conf or /var
Re: [ceph-users] Can't start OSD
How about the logs? Is something there? ls /var/log/ceph/ German Anders --- Original message --- Asunto: Re: [ceph-users] Can't start OSD De: O'Reilly, Dan daniel.orei...@dish.com Para: Karan Singh karan.si...@csc.fi Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com Fecha: Friday, 08/08/2014 10:53 Nope. Nothing works. This is VERY frustrating. What happened: - I rebooted the box, simulating a system failure. - When the system came back up, ceph wasn’t started, and the osd volumes weren’t mounted. - I did a “service ceph start osd” and the ceph processes don’t start - I did a “ceph-deploy activate” on the devices, so they’re mounted. “service ceph start” still doesn’t start anything. Right now: # service ceph restart === osd.18 === === osd.18 === Stopping Ceph osd.18 on tm1cldosdl04...done === osd.18 === create-or-move updated item name 'osd.18' weight 0.45 at location {host=tm1cldosdl04,root=default} to crush map Starting Ceph osd.18 on tm1cldosdl04... starting osd.18 at :/0 osd_data /var/lib/ceph/osd/ceph-18 /var/lib/ceph/osd/ceph-18/journal === osd.17 === === osd.17 === Stopping Ceph osd.17 on tm1cldosdl04...done === osd.17 === create-or-move updated item name 'osd.17' weight 0.45 at location {host=tm1cldosdl04,root=default} to crush map Starting Ceph osd.17 on tm1cldosdl04... starting osd.17 at :/0 osd_data /var/lib/ceph/osd/ceph-17 /var/lib/ceph/osd/ceph-17/journal === osd.19 === === osd.19 === Stopping Ceph osd.19 on tm1cldosdl04...done === osd.19 === create-or-move updated item name 'osd.19' weight 0.45 at location {host=tm1cldosdl04,root=default} to crush map Starting Ceph osd.19 on tm1cldosdl04... starting osd.19 at :/0 osd_data /var/lib/ceph/osd/ceph-19 /var/lib/ceph/osd/ceph-19/journal === osd.16 === === osd.16 === Stopping Ceph osd.16 on tm1cldosdl04...done === osd.16 === create-or-move updated item name 'osd.16' weight 0.45 at location {host=tm1cldosdl04,root=default} to crush map Starting Ceph osd.16 on tm1cldosdl04... starting osd.16 at :/0 osd_data /var/lib/ceph/osd/ceph-16 /var/lib/ceph/osd/ceph-16/journal [NEW:note: root@tm1cldosdl04 on parent: /root] # ps -eaf|grep ceph root 7528 6124 0 07:32 pts/000:00:00 grep ceph [NEW:note: root@tm1cldosdl04 on parent: /root] # ceph osd tree # idweight type name up/down reweight -1 9 root default -2 1.8 host tm1cldosdl01 0 0.45osd.0 up 1 1 0.45osd.1 up 1 2 0.45osd.2 up 1 3 0.45osd.3 up 1 -3 1.8 host tm1cldosdl02 4 0.45osd.4 up 1 5 0.45osd.5 up 1 6 0.45osd.6 up 1 7 0.45osd.7 up 1 -4 1.8 host tm1cldosdl03 8 0.45osd.8 up 1 9 0.45osd.9 up 1 10 0.45osd.10 up 1 11 0.45osd.11 up 1 -5 3.6 host tm1cldosdl04 12 0.45osd.12 DNE 13 0.45osd.13 DNE 14 0.45osd.14 DNE 15 0.45osd.15 DNE 16 0.45osd.16 down0 17 0.45osd.17 down0 18 0.45osd.18 down0 19 0.45osd.19 down0 I’m missing something here. I don’t know if it’s a config issue or what. But the docs haven’t helped me. From: Karan Singh [mailto:karan.si...@csc.fi] Sent: Friday, August 08, 2014 1:11 AM To: O'Reilly, Dan Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Can't start OSD Try to make these OSD IN ceph osd in osd.12 osd.13 osd.14 osd.15 Then restart osd services - Karan Singh - On 08 Aug 2014, at 00:55, O'Reilly, Dan daniel.orei...@dish.com wrote: # idweight type name up/down reweight -1 7.2 root default -2 1.8 host tm1cldosdl01 0 0.45osd.0 up 1 1 0.45osd.1 up 1 2 0.45osd.2 up 1 3 0.45osd.3 up 1 -3 1.8 host tm1cldosdl02 4 0.45osd.4 up 1 5 0.45osd.5 up 1 6 0.45osd.6 up 1 7 0.45osd.7 up 1 -4 1.8 host tm1cldosdl03 8 0.45osd.8 up 1 9 0.45osd.9 up 1 10 0.45osd.10 up 1 11 0.45osd.11 up 1 -5 1.8 host tm1cldosdl04 12 0.45osd.12 down0 13 0.45
Re: [ceph-users] Can't start OSD
I’m afraid I don’t know exactly how to interpret this, but after a reboot: 2014-08-08 08:48:44.616005 7f0c3b1447a0 0 ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74), process ceph-osd, pid 2978 2014-08-08 08:48:44.635680 7f0c3b1447a0 0 filestore(/var/lib/ceph/osd/ceph-10) mount detected xfs (libxfs) 2014-08-08 08:48:44.635730 7f0c3b1447a0 1 filestore(/var/lib/ceph/osd/ceph-10) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2014-08-08 08:48:44.681911 7f0c3b1447a0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP ioctl is supported and appears to work 2014-08-08 08:48:44.681959 7f0c3b1447a0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2014-08-08 08:48:44.748483 7f0c3b1447a0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: syscall(SYS_syncfs, fd) fully supported 2014-08-08 08:48:44.748605 7f0c3b1447a0 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_feature: extsize is supported 2014-08-08 08:48:44.889826 7f0c3b1447a0 0 filestore(/var/lib/ceph/osd/ceph-10) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2014-08-08 08:48:45.064198 7f0c3b1447a0 -1 filestore(/var/lib/ceph/osd/ceph-10) mount failed to open journal /var/lib/ceph/osd/ceph-10/journal: (2) No such file or directory 2014-08-08 08:48:45.074220 7f0c3b1447a0 -1 ** ERROR: error converting store /var/lib/ceph/osd/ceph-10: (2) No such file or directory 2014-08-08 08:49:19.957725 7f2c40c1a7a0 0 ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74), process ceph-osd, pid 4707 2014-08-08 08:49:19.973896 7f2c40c1a7a0 0 filestore(/var/lib/ceph/osd/ceph-10) mount detected xfs (libxfs) 2014-08-08 08:49:19.973931 7f2c40c1a7a0 1 filestore(/var/lib/ceph/osd/ceph-10) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2014-08-08 08:49:20.016413 7f2c40c1a7a0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP ioctl is supported and appears to work 2014-08-08 08:49:20.016444 7f2c40c1a7a0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2014-08-08 08:49:20.083052 7f2c40c1a7a0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: syscall(SYS_syncfs, fd) fully supported 2014-08-08 08:49:20.083179 7f2c40c1a7a0 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_feature: extsize is supported 2014-08-08 08:49:20.134213 7f2c40c1a7a0 0 filestore(/var/lib/ceph/osd/ceph-10) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2014-08-08 08:49:20.136710 7f2c40c1a7a0 -1 filestore(/var/lib/ceph/osd/ceph-10) mount failed to open journal /var/lib/ceph/osd/ceph-10/journal: (2) No such file or directory 2014-08-08 08:49:20.146797 7f2c40c1a7a0 -1 ** ERROR: error converting store /var/lib/ceph/osd/ceph-10: (2) No such file or directory From: German Anders [mailto:gand...@despegar.com] Sent: Friday, August 08, 2014 8:23 AM To: O'Reilly, Dan Cc: Karan Singh; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Can't start OSD How about the logs? Is something there? ls /var/log/ceph/ German Anders --- Original message --- Asunto: Re: [ceph-users] Can't start OSD De: O'Reilly, Dan daniel.orei...@dish.commailto:daniel.orei...@dish.com Para: Karan Singh karan.si...@csc.fimailto:karan.si...@csc.fi Cc: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Fecha: Friday, 08/08/2014 10:53 Nope. Nothing works. This is VERY frustrating. What happened: - I rebooted the box, simulating a system failure. - When the system came back up, ceph wasn’t started, and the osd volumes weren’t mounted. - I did a “service ceph start osd” and the ceph processes don’t start - I did a “ceph-deploy activate” on the devices, so they’re mounted. “service ceph start” still doesn’t start anything. Right now: # service ceph restart === osd.18 === === osd.18 === Stopping Ceph osd.18 on tm1cldosdl04...done === osd.18 === create-or-move updated item name 'osd.18' weight 0.45 at location {host=tm1cldosdl04,root=default} to crush map Starting Ceph osd.18 on tm1cldosdl04... starting osd.18 at :/0 osd_data /var/lib/ceph/osd/ceph-18 /var/lib/ceph/osd/ceph-18/journal === osd.17 === === osd.17 === Stopping Ceph osd.17 on tm1cldosdl04...done === osd.17 === create-or-move updated item name 'osd.17' weight 0.45 at location {host=tm1cldosdl04,root=default} to crush map Starting Ceph osd.17 on tm1cldosdl04... starting osd.17 at :/0 osd_data /var/lib/ceph/osd/ceph-17 /var/lib/ceph/osd/ceph-17/journal === osd.19 === === osd.19 === Stopping Ceph osd.19 on tm1cldosdl04...done === osd.19 === create-or-move updated item name 'osd.19' weight 0.45 at location {host
Re: [ceph-users] Can't start OSD
Hi, Can you run ls -lah /var/lib/ceph/osd/ceph-10/journal It’s saying it can’t find the journal Regards, Matt On 9 Aug 2014, at 12:51 am, O'Reilly, Dan daniel.orei...@dish.com wrote: I’m afraid I don’t know exactly how to interpret this, but after a reboot: 2014-08-08 08:48:44.616005 7f0c3b1447a0 0 ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74), process ceph-osd, pid 2978 2014-08-08 08:48:44.635680 7f0c3b1447a0 0 filestore(/var/lib/ceph/osd/ceph-10) mount detected xfs (libxfs) 2014-08-08 08:48:44.635730 7f0c3b1447a0 1 filestore(/var/lib/ceph/osd/ceph-10) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2014-08-08 08:48:44.681911 7f0c3b1447a0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP ioctl is supported and appears to work 2014-08-08 08:48:44.681959 7f0c3b1447a0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2014-08-08 08:48:44.748483 7f0c3b1447a0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: syscall(SYS_syncfs, fd) fully supported 2014-08-08 08:48:44.748605 7f0c3b1447a0 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_feature: extsize is supported 2014-08-08 08:48:44.889826 7f0c3b1447a0 0 filestore(/var/lib/ceph/osd/ceph-10) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2014-08-08 08:48:45.064198 7f0c3b1447a0 -1 filestore(/var/lib/ceph/osd/ceph-10) mount failed to open journal /var/lib/ceph/osd/ceph-10/journal: (2) No such file or directory 2014-08-08 08:48:45.074220 7f0c3b1447a0 -1 ** ERROR: error converting store /var/lib/ceph/osd/ceph-10: (2) No such file or directory 2014-08-08 08:49:19.957725 7f2c40c1a7a0 0 ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74), process ceph-osd, pid 4707 2014-08-08 08:49:19.973896 7f2c40c1a7a0 0 filestore(/var/lib/ceph/osd/ceph-10) mount detected xfs (libxfs) 2014-08-08 08:49:19.973931 7f2c40c1a7a0 1 filestore(/var/lib/ceph/osd/ceph-10) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2014-08-08 08:49:20.016413 7f2c40c1a7a0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP ioctl is supported and appears to work 2014-08-08 08:49:20.016444 7f2c40c1a7a0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2014-08-08 08:49:20.083052 7f2c40c1a7a0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: syscall(SYS_syncfs, fd) fully supported 2014-08-08 08:49:20.083179 7f2c40c1a7a0 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_feature: extsize is supported 2014-08-08 08:49:20.134213 7f2c40c1a7a0 0 filestore(/var/lib/ceph/osd/ceph-10) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2014-08-08 08:49:20.136710 7f2c40c1a7a0 -1 filestore(/var/lib/ceph/osd/ceph-10) mount failed to open journal /var/lib/ceph/osd/ceph-10/journal: (2) No such file or directory 2014-08-08 08:49:20.146797 7f2c40c1a7a0 -1 ** ERROR: error converting store /var/lib/ceph/osd/ceph-10: (2) No such file or directory From: German Anders [mailto:gand...@despegar.com] Sent: Friday, August 08, 2014 8:23 AM To: O'Reilly, Dan Cc: Karan Singh; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Can't start OSD How about the logs? Is something there? ls /var/log/ceph/ German Anders --- Original message --- Asunto: Re: [ceph-users] Can't start OSD De: O'Reilly, Dan daniel.orei...@dish.com Para: Karan Singh karan.si...@csc.fi Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com Fecha: Friday, 08/08/2014 10:53 Nope. Nothing works. This is VERY frustrating. What happened: - I rebooted the box, simulating a system failure. - When the system came back up, ceph wasn’t started, and the osd volumes weren’t mounted. - I did a “service ceph start osd” and the ceph processes don’t start - I did a “ceph-deploy activate” on the devices, so they’re mounted. “service ceph start” still doesn’t start anything. Right now: # service ceph restart === osd.18 === === osd.18 === Stopping Ceph osd.18 on tm1cldosdl04...done === osd.18 === create-or-move updated item name 'osd.18' weight 0.45 at location {host=tm1cldosdl04,root=default} to crush map Starting Ceph osd.18 on tm1cldosdl04... starting osd.18 at :/0 osd_data /var/lib/ceph/osd/ceph-18 /var/lib/ceph/osd/ceph-18/journal === osd.17 === === osd.17 === Stopping Ceph osd.17 on tm1cldosdl04...done === osd.17 === create-or-move updated item name 'osd.17' weight 0.45 at location {host=tm1cldosdl04,root=default} to crush map Starting Ceph osd.17 on tm1cldosdl04... starting osd.17 at :/0 osd_data /var/lib/ceph
[ceph-users] Can't start OSD
# idweight type name up/down reweight -1 7.2 root default -2 1.8 host tm1cldosdl01 0 0.45osd.0 up 1 1 0.45osd.1 up 1 2 0.45osd.2 up 1 3 0.45osd.3 up 1 -3 1.8 host tm1cldosdl02 4 0.45osd.4 up 1 5 0.45osd.5 up 1 6 0.45osd.6 up 1 7 0.45osd.7 up 1 -4 1.8 host tm1cldosdl03 8 0.45osd.8 up 1 9 0.45osd.9 up 1 10 0.45osd.10 up 1 11 0.45osd.11 up 1 -5 1.8 host tm1cldosdl04 12 0.45osd.12 down0 13 0.45osd.13 down0 14 0.45osd.14 down0 15 0.45osd.15 down0 [ceph@tm1cldosdl04 ~]$ sudo /etc/init.d/ceph start osd.12 /etc/init.d/ceph: osd.12 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines ) What am I missing? Specifically, what would need to be in ceph.conf or /var/lib/ceph? Dan O'Reilly UNIX Systems Administration [cid:image001.jpg@01CFB257.7F6F8E70] 9601 S. Meridian Blvd. Englewood, CO 80112 720-514-6293 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com