Hello, probably I have restarted osd too many times or I have put in/out osd too many times but now I get this:
root@proxmox-zotac:~# /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph -f starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal osd/PG.cc: In function 'static int PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*, ceph::bufferlist*)' thread 7f7fd358e880 time 2016-03-09 00:08:09.193975 osd/PG.cc: 2868: FAILED assert(r > 0) ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x76) [0xc03c46] 2: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, ceph::buffer::list*)+0x4ab) [0x7c616b] 3: (OSD::load_pgs()+0xa20) [0x6a9170] 4: (OSD::init()+0xc84) [0x6ac204] 5: (main()+0x2839) [0x632459] 6: (__libc_start_main()+0xf5) [0x7f7fd08b3b45] 7: /usr/bin/ceph-osd() [0x64c087] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2016-03-09 00:08:09.196669 7f7fd358e880 -1 osd/PG.cc: In function 'static int PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*, ceph::bufferlist*)' thread 7f7fd358e880 time 2016-03-09 00:08:09.193975 osd/PG.cc: 2868: FAILED assert(r > 0) ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x76) [0xc03c46] 2: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, ceph::buffer::list*)+0x4ab) [0x7c616b] 3: (OSD::load_pgs()+0xa20) [0x6a9170] 4: (OSD::init()+0xc84) [0x6ac204] 5: (main()+0x2839) [0x632459] 6: (__libc_start_main()+0xf5) [0x7f7fd08b3b45] 7: /usr/bin/ceph-osd() [0x64c087] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 0> 2016-03-09 00:08:09.196669 7f7fd358e880 -1 osd/PG.cc: In function 'static int PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*, ceph::bufferlist*)' thread 7f7fd358e880 time 2016-03-09 00:08:09.193975 osd/PG.cc: 2868: FAILED assert(r > 0) ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x76) [0xc03c46] 2: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, ceph::buffer::list*)+0x4ab) [0x7c616b] 3: (OSD::load_pgs()+0xa20) [0x6a9170] 4: (OSD::init()+0xc84) [0x6ac204] 5: (main()+0x2839) [0x632459] 6: (__libc_start_main()+0xf5) [0x7f7fd08b3b45] 7: /usr/bin/ceph-osd() [0x64c087] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. terminate called after throwing an instance of 'ceph::FailedAssertion' *** Caught signal (Aborted) ** in thread 7f7fd358e880 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) 1: /usr/bin/ceph-osd() [0xb04503] 2: (()+0xf8d0) [0x7f7fd24268d0] 3: (gsignal()+0x37) [0x7f7fd08c7067] 4: (abort()+0x148) [0x7f7fd08c8448] 5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7f7fd11b4b3d] 6: (()+0x5ebb6) [0x7f7fd11b2bb6] 7: (()+0x5ec01) [0x7f7fd11b2c01] 8: (()+0x5ee19) [0x7f7fd11b2e19] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x247) [0xc03e17] 10: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, ceph::buffer::list*)+0x4ab) [0x7c616b] 11: (OSD::load_pgs()+0xa20) [0x6a9170] 12: (OSD::init()+0xc84) [0x6ac204] 13: (main()+0x2839) [0x632459] 14: (__libc_start_main()+0xf5) [0x7f7fd08b3b45] 15: /usr/bin/ceph-osd() [0x64c087] 2016-03-09 00:08:09.203630 7f7fd358e880 -1 *** Caught signal (Aborted) ** in thread 7f7fd358e880 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) 1: /usr/bin/ceph-osd() [0xb04503] 2: (()+0xf8d0) [0x7f7fd24268d0] 3: (gsignal()+0x37) [0x7f7fd08c7067] 4: (abort()+0x148) [0x7f7fd08c8448] 5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7f7fd11b4b3d] 6: (()+0x5ebb6) [0x7f7fd11b2bb6] 7: (()+0x5ec01) [0x7f7fd11b2c01] 8: (()+0x5ee19) [0x7f7fd11b2e19] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x247) [0xc03e17] 10: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, ceph::buffer::list*)+0x4ab) [0x7c616b] 11: (OSD::load_pgs()+0xa20) [0x6a9170] 12: (OSD::init()+0xc84) [0x6ac204] 13: (main()+0x2839) [0x632459] 14: (__libc_start_main()+0xf5) [0x7f7fd08b3b45] 15: /usr/bin/ceph-osd() [0x64c087] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 0> 2016-03-09 00:08:09.203630 7f7fd358e880 -1 *** Caught signal (Aborted) ** in thread 7f7fd358e880 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) 1: /usr/bin/ceph-osd() [0xb04503] 2: (()+0xf8d0) [0x7f7fd24268d0] 3: (gsignal()+0x37) [0x7f7fd08c7067] 4: (abort()+0x148) [0x7f7fd08c8448] 5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7f7fd11b4b3d] 6: (()+0x5ebb6) [0x7f7fd11b2bb6] 7: (()+0x5ec01) [0x7f7fd11b2c01] 8: (()+0x5ee19) [0x7f7fd11b2e19] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x247) [0xc03e17] 10: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, ceph::buffer::list*)+0x4ab) [0x7c616b] 11: (OSD::load_pgs()+0xa20) [0x6a9170] 12: (OSD::init()+0xc84) [0x6ac204] 13: (main()+0x2839) [0x632459] 14: (__libc_start_main()+0xf5) [0x7f7fd08b3b45] 15: /usr/bin/ceph-osd() [0x64c087] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2016-03-02 9:38 GMT+01:00 Mario Giammarco <mgiamma...@gmail.com>: > Here it is: > > cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca > health HEALTH_WARN > 4 pgs incomplete > 4 pgs stuck inactive > 4 pgs stuck unclean > 1 requests are blocked > 32 sec > monmap e8: 3 mons at {0= > 10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0} > election epoch 840, quorum 0,1,2 0,1,2 > osdmap e2405: 3 osds: 3 up, 3 in > pgmap v5904430: 288 pgs, 4 pools, 391 GB data, 100 kobjects > 1090 GB used, 4481 GB / 5571 GB avail > 284 active+clean > 4 incomplete > client io 4008 B/s rd, 446 kB/s wr, 23 op/s > > > 2016-03-02 9:31 GMT+01:00 Shinobu Kinjo <ski...@redhat.com>: > >> Is "ceph -s" still showing you same output? >> >> > cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca >> > health HEALTH_WARN >> > 4 pgs incomplete >> > 4 pgs stuck inactive >> > 4 pgs stuck unclean >> > monmap e8: 3 mons at >> > {0=10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0} >> > election epoch 832, quorum 0,1,2 0,1,2 >> > osdmap e2400: 3 osds: 3 up, 3 in >> > pgmap v5883297: 288 pgs, 4 pools, 391 GB data, 100 kobjects >> > 1090 GB used, 4481 GB / 5571 GB avail >> > 284 active+clean >> > 4 incomplete >> >> Cheers, >> S >> >> ----- Original Message ----- >> From: "Mario Giammarco" <mgiamma...@gmail.com> >> To: "Lionel Bouton" <lionel-subscript...@bouton.name> >> Cc: "Shinobu Kinjo" <ski...@redhat.com>, ceph-users@lists.ceph.com >> Sent: Wednesday, March 2, 2016 4:27:15 PM >> Subject: Re: [ceph-users] Help: pool not responding >> >> Tried to set min_size=1 but unfortunately nothing has changed. >> Thanks for the idea. >> >> 2016-02-29 22:56 GMT+01:00 Lionel Bouton <lionel-subscript...@bouton.name >> >: >> >> > Le 29/02/2016 22:50, Shinobu Kinjo a écrit : >> > >> > the fact that they are optimized for benchmarks and certainly not >> > Ceph OSD usage patterns (with or without internal journal). >> > >> > Are you assuming that SSHD is causing the issue? >> > If you could elaborate on this more, it would be helpful. >> > >> > >> > Probably not (unless they reveal themselves extremely unreliable with >> Ceph >> > OSD usage patterns which would be surprising to me). >> > >> > For incomplete PG the documentation seems good enough for what should be >> > done : >> > http://docs.ceph.com/docs/master/rados/operations/pg-states/ >> > >> > The relevant text: >> > >> > *Incomplete* Ceph detects that a placement group is missing information >> > about writes that may have occurred, or does not have any healthy >> copies. >> > If you see this state, try to start any failed OSDs that may contain the >> > needed information or temporarily adjust min_size to allow recovery. >> > >> > We don't have the full history but the most probable cause of these >> > incomplete PGs is that min_size is set to 2 or 3 and at some time the 4 >> > incomplete pgs didn't have as many replica as the min_size value. So if >> > setting min_size to 2 isn't enough setting it to 1 should unfreeze them. >> > >> > Lionel >> > >> > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com