Re: [ceph-users] [Help: pool not responding] Now osd crash

Mario Giammarco Tue, 08 Mar 2016 15:11:49 -0800

Hello,
probably I have restarted osd too many times or I have put in/out osd too
many times but now I get this:


root@proxmox-zotac:~# /usr/bin/ceph-osd -i 1 --pid-file
/var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph -f
starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1
/var/lib/ceph/osd/ceph-1/journal
osd/PG.cc: In function 'static int PG::peek_map_epoch(ObjectStore*, spg_t,
epoch_t*, ceph::bufferlist*)' thread 7f7fd358e880 time 2016-03-09
00:08:09.193975
osd/PG.cc: 2868: FAILED assert(r > 0)
ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x76) [0xc03c46]
2: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*,
ceph::buffer::list*)+0x4ab) [0x7c616b]
3: (OSD::load_pgs()+0xa20) [0x6a9170]
4: (OSD::init()+0xc84) [0x6ac204]
5: (main()+0x2839) [0x632459]
6: (__libc_start_main()+0xf5) [0x7f7fd08b3b45]
7: /usr/bin/ceph-osd() [0x64c087]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
2016-03-09 00:08:09.196669 7f7fd358e880 -1 osd/PG.cc: In function 'static
int PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*, ceph::bufferlist*)'
thread 7f7fd358e880 time 2016-03-09 00:08:09.193975
osd/PG.cc: 2868: FAILED assert(r > 0)

ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x76) [0xc03c46]
2: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*,
ceph::buffer::list*)+0x4ab) [0x7c616b]
3: (OSD::load_pgs()+0xa20) [0x6a9170]
4: (OSD::init()+0xc84) [0x6ac204]
5: (main()+0x2839) [0x632459]
6: (__libc_start_main()+0xf5) [0x7f7fd08b3b45]
7: /usr/bin/ceph-osd() [0x64c087]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.

    0> 2016-03-09 00:08:09.196669 7f7fd358e880 -1 osd/PG.cc: In function
'static int PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*,
ceph::bufferlist*)' thread 7f7fd358e880 time 2016-03-09 00:08:09.193975
osd/PG.cc: 2868: FAILED assert(r > 0)

ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x76) [0xc03c46]
2: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*,
ceph::buffer::list*)+0x4ab) [0x7c616b]
3: (OSD::load_pgs()+0xa20) [0x6a9170]
4: (OSD::init()+0xc84) [0x6ac204]
5: (main()+0x2839) [0x632459]
6: (__libc_start_main()+0xf5) [0x7f7fd08b3b45]
7: /usr/bin/ceph-osd() [0x64c087]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.

terminate called after throwing an instance of 'ceph::FailedAssertion'
*** Caught signal (Aborted) **
in thread 7f7fd358e880
ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: /usr/bin/ceph-osd() [0xb04503]
2: (()+0xf8d0) [0x7f7fd24268d0]
3: (gsignal()+0x37) [0x7f7fd08c7067]
4: (abort()+0x148) [0x7f7fd08c8448]
5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7f7fd11b4b3d]
6: (()+0x5ebb6) [0x7f7fd11b2bb6]
7: (()+0x5ec01) [0x7f7fd11b2c01]
8: (()+0x5ee19) [0x7f7fd11b2e19]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x247) [0xc03e17]
10: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*,
ceph::buffer::list*)+0x4ab) [0x7c616b]
11: (OSD::load_pgs()+0xa20) [0x6a9170]
12: (OSD::init()+0xc84) [0x6ac204]
13: (main()+0x2839) [0x632459]
14: (__libc_start_main()+0xf5) [0x7f7fd08b3b45]
15: /usr/bin/ceph-osd() [0x64c087]
2016-03-09 00:08:09.203630 7f7fd358e880 -1 *** Caught signal (Aborted) **
in thread 7f7fd358e880

ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: /usr/bin/ceph-osd() [0xb04503]
2: (()+0xf8d0) [0x7f7fd24268d0]
3: (gsignal()+0x37) [0x7f7fd08c7067]
4: (abort()+0x148) [0x7f7fd08c8448]
5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7f7fd11b4b3d]
6: (()+0x5ebb6) [0x7f7fd11b2bb6]
7: (()+0x5ec01) [0x7f7fd11b2c01]
8: (()+0x5ee19) [0x7f7fd11b2e19]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x247) [0xc03e17]
10: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*,
ceph::buffer::list*)+0x4ab) [0x7c616b]
11: (OSD::load_pgs()+0xa20) [0x6a9170]
12: (OSD::init()+0xc84) [0x6ac204]
13: (main()+0x2839) [0x632459]
14: (__libc_start_main()+0xf5) [0x7f7fd08b3b45]
15: /usr/bin/ceph-osd() [0x64c087]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.

    0> 2016-03-09 00:08:09.203630 7f7fd358e880 -1 *** Caught signal
(Aborted) **
in thread 7f7fd358e880

ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: /usr/bin/ceph-osd() [0xb04503]
2: (()+0xf8d0) [0x7f7fd24268d0]
3: (gsignal()+0x37) [0x7f7fd08c7067]
4: (abort()+0x148) [0x7f7fd08c8448]
5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7f7fd11b4b3d]
6: (()+0x5ebb6) [0x7f7fd11b2bb6]
7: (()+0x5ec01) [0x7f7fd11b2c01]
8: (()+0x5ee19) [0x7f7fd11b2e19]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x247) [0xc03e17]
10: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*,
ceph::buffer::list*)+0x4ab) [0x7c616b]
11: (OSD::load_pgs()+0xa20) [0x6a9170]
12: (OSD::init()+0xc84) [0x6ac204]
13: (main()+0x2839) [0x632459]
14: (__libc_start_main()+0xf5) [0x7f7fd08b3b45]
15: /usr/bin/ceph-osd() [0x64c087]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.


2016-03-02 9:38 GMT+01:00 Mario Giammarco <mgiamma...@gmail.com>:

> Here it is:
>
>  cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca
>      health HEALTH_WARN
>             4 pgs incomplete
>             4 pgs stuck inactive
>             4 pgs stuck unclean
>             1 requests are blocked > 32 sec
>      monmap e8: 3 mons at {0=
> 10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0}
>             election epoch 840, quorum 0,1,2 0,1,2
>      osdmap e2405: 3 osds: 3 up, 3 in
>       pgmap v5904430: 288 pgs, 4 pools, 391 GB data, 100 kobjects
>             1090 GB used, 4481 GB / 5571 GB avail
>                  284 active+clean
>                    4 incomplete
>   client io 4008 B/s rd, 446 kB/s wr, 23 op/s
>
>
> 2016-03-02 9:31 GMT+01:00 Shinobu Kinjo <ski...@redhat.com>:
>
>> Is "ceph -s" still showing you same output?
>>
>> >     cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca
>> >      health HEALTH_WARN
>> >             4 pgs incomplete
>> >             4 pgs stuck inactive
>> >             4 pgs stuck unclean
>> >      monmap e8: 3 mons at
>> > {0=10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0}
>> >             election epoch 832, quorum 0,1,2 0,1,2
>> >      osdmap e2400: 3 osds: 3 up, 3 in
>> >       pgmap v5883297: 288 pgs, 4 pools, 391 GB data, 100 kobjects
>> >             1090 GB used, 4481 GB / 5571 GB avail
>> >                  284 active+clean
>> >                    4 incomplete
>>
>> Cheers,
>> S
>>
>> ----- Original Message -----
>> From: "Mario Giammarco" <mgiamma...@gmail.com>
>> To: "Lionel Bouton" <lionel-subscript...@bouton.name>
>> Cc: "Shinobu Kinjo" <ski...@redhat.com>, ceph-users@lists.ceph.com
>> Sent: Wednesday, March 2, 2016 4:27:15 PM
>> Subject: Re: [ceph-users] Help: pool not responding
>>
>> Tried to set min_size=1 but unfortunately nothing has changed.
>> Thanks for the idea.
>>
>> 2016-02-29 22:56 GMT+01:00 Lionel Bouton <lionel-subscript...@bouton.name
>> >:
>>
>> > Le 29/02/2016 22:50, Shinobu Kinjo a écrit :
>> >
>> > the fact that they are optimized for benchmarks and certainly not
>> > Ceph OSD usage patterns (with or without internal journal).
>> >
>> > Are you assuming that SSHD is causing the issue?
>> > If you could elaborate on this more, it would be helpful.
>> >
>> >
>> > Probably not (unless they reveal themselves extremely unreliable with
>> Ceph
>> > OSD usage patterns which would be surprising to me).
>> >
>> > For incomplete PG the documentation seems good enough for what should be
>> > done :
>> > http://docs.ceph.com/docs/master/rados/operations/pg-states/
>> >
>> > The relevant text:
>> >
>> > *Incomplete* Ceph detects that a placement group is missing information
>> > about writes that may have occurred, or does not have any healthy
>> copies.
>> > If you see this state, try to start any failed OSDs that may contain the
>> > needed information or temporarily adjust min_size to allow recovery.
>> >
>> > We don't have the full history but the most probable cause of these
>> > incomplete PGs is that min_size is set to 2 or 3 and at some time the 4
>> > incomplete pgs didn't have as many replica as the min_size value. So if
>> > setting min_size to 2 isn't enough setting it to 1 should unfreeze them.
>> >
>> > Lionel
>> >
>>
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Help: pool not responding] Now osd crash

Reply via email to