Bug#840383: ceph: Mon crash on startup

2016-12-19 Thread Gaudenz Steinlin
Control: severity -1 normal
Control: tags -1 unreproducible

Hi

Thanks for your bugreport. Sorry that it took so long to get back to
you. The ceph maintenance team is currently understaffed and struggeling
to keep up with the work.

Hans Grobler  writes:

> Package: ceph
> Version: 0.80.11-1.1
> Severity: grave
> Justification: renders package unusable
>
> Dear Maintainer,
>
> After an upgrade from 0.80.11-1, Ceph monitor start up results in
> the crash seen below. The crash is repeatable / consistent and as a
> result it is not possible to start the monitor with 0.80.11-1.1. OSDs
> however start without problems with 0.80.11-1.1. 
>
> Attempting to create a new monitor results in a similar crash on startup. 
> Reverting back to 0.80.11-1 allows the Ceph monitor to start as normal 
> (with the pre-upgrade leveldb).

I tried to reproduce this in a clean environment but failed. So this
certainly does not affect all users. I also uploaded a new upstream
version (10.2.5) a few days ago, which is currently sitting in NEW. It
would be nice if you could test this version once it's available in
unstable. If you are eager to test, I can also provide you the
debs.

To be able to reproduce this I would need at least the commands you used
to get the traces below and maybe also your /var/lib/ceph/mon/XXX
directory to get the exact same monitor database. I suspect your
database might have gotten corrupt somehow.

But still thanks for testing and I would really appreciate if you could
test the new upstream package.

Gaudenz

>
>
> 2016-10-11 06:35:54.781406 7f0d7abe57c0 -1 *** Caught signal (Aborted) **
> in thread 7f0d7abe57c0
>
> ceph version 0.80.11 (8424145d49264624a3b0a204aedb127835161070)
> 1: (()+0x4c7812) [0x55a47ae33812]
> 2: (()+0x11100) [0x7f0d7a300100]
> 3: (gsignal()+0xcf) [0x7f0d78929fdf]
> 4: (abort()+0x16a) [0x7f0d7892b40a]
> 5: (()+0x23275) [0x7f0d7a78c275]
> 6: (()+0x170af) [0x7f0d7a7800af]
> 7: (operator delete[](void*)+0x25d) [0x7f0d7a7a361d]
> 8: (LevelDBStore::do_open(std::ostream&, bool)+0x69c) [0x55a47addf0ac]
> 9: (main()+0xbc0) [0x55a47aabc120]
> 10: (__libc_start_main()+0xf1) [0x7f0d789172b1]
> 11: (_start()+0x2a) [0x55a47aacb98a]
> NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
> interpret this.
>
> --- begin dump of recent events ---
>   -12> 2016-10-11 06:35:54.776111 7f0d7abe57c0  5 asok(0x55a47cac2160) 
> register_command perfcounters_dump hook 0x55a47ca96020
>   -11> 2016-10-11 06:35:54.776183 7f0d7abe57c0  5 asok(0x55a47cac2160) 
> register_command 1 hook 0x55a47ca96020
>   -10> 2016-10-11 06:35:54.776212 7f0d7abe57c0  5 asok(0x55a47cac2160) 
> register_command perf dump hook 0x55a47ca96020
>-9> 2016-10-11 06:35:54.776218 7f0d7abe57c0  5 asok(0x55a47cac2160) 
> register_command perfcounters_schema hook 0x55a47ca96020
>-8> 2016-10-11 06:35:54.776224 7f0d7abe57c0  5 asok(0x55a47cac2160) 
> register_command 2 hook 0x55a47ca96020
>-7> 2016-10-11 06:35:54.776228 7f0d7abe57c0  5 asok(0x55a47cac2160) 
> register_command perf schema hook 0x55a47ca96020
>-6> 2016-10-11 06:35:54.776233 7f0d7abe57c0  5 asok(0x55a47cac2160) 
> register_command config show hook 0x55a47ca96020
>-5> 2016-10-11 06:35:54.776238 7f0d7abe57c0  5 asok(0x55a47cac2160) 
> register_command config set hook 0x55a47ca96020
>-4> 2016-10-11 06:35:54.776243 7f0d7abe57c0  5 asok(0x55a47cac2160) 
> register_command config get hook 0x55a47ca96020
>-3> 2016-10-11 06:35:54.776248 7f0d7abe57c0  5 asok(0x55a47cac2160) 
> register_command log flush hook 0x55a47ca96020
>-2> 2016-10-11 06:35:54.776261 7f0d7abe57c0  5 asok(0x55a47cac2160) 
> register_command log dump hook 0x55a47ca96020
>-1> 2016-10-11 06:35:54.776269 7f0d7abe57c0  5 asok(0x55a47cac2160) 
> register_command log reopen hook 0x55a47ca96020
> 0> 2016-10-11 06:35:54.781406 7f0d7abe57c0 -1 *** Caught signal (Aborted) 
> **
> in thread 7f0d7abe57c0
>
> ceph version 0.80.11 (8424145d49264624a3b0a204aedb127835161070)
> 1: (()+0x4c7812) [0x55a47ae33812]
> 2: (()+0x11100) [0x7f0d7a300100]
> 3: (gsignal()+0xcf) [0x7f0d78929fdf]
> 4: (abort()+0x16a) [0x7f0d7892b40a]
> 5: (()+0x23275) [0x7f0d7a78c275]
> 6: (()+0x170af) [0x7f0d7a7800af]
> 7: (operator delete[](void*)+0x25d) [0x7f0d7a7a361d]
> 8: (LevelDBStore::do_open(std::ostream&, bool)+0x69c) [0x55a47addf0ac]
> 9: (main()+0xbc0) [0x55a47aabc120]
> 10: (__libc_start_main()+0xf1) [0x7f0d789172b1]
> 11: (_start()+0x2a) [0x55a47aacb98a]
> NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
> interpret this.
>
> --- logging levels ---
>   0/ 5 none
>   0/ 1 lockdep
>   0/ 1 context
>   1/ 1 crush
>   1/ 5 mds
>   1/ 5 mds_balancer
>   1/ 5 mds_locker
>   1/ 5 mds_log
>   1/ 5 mds_log_expire
>   1/ 5 mds_migrator
>   0/ 1 buffer
>   0/ 1 timer
>   0/ 1 filer
>   0/ 1 striper
>   0/ 1 objecter
>   0/ 5 rados
>   0/ 5 rbd
>   0/ 5 journaler
>   0/ 5 objectcacher
>   0/ 5 client
>   0/ 5 osd
>   0/ 5 optracker
>   0/ 5 objclass
>   1/ 3 filestore
>   1/ 3 

Bug#840383: ceph: Mon crash on startup

2016-10-11 Thread Hans Grobler
Package: ceph
Version: 0.80.11-1.1
Severity: grave
Justification: renders package unusable

Dear Maintainer,

After an upgrade from 0.80.11-1, Ceph monitor start up results in
the crash seen below. The crash is repeatable / consistent and as a
result it is not possible to start the monitor with 0.80.11-1.1. OSDs
however start without problems with 0.80.11-1.1. 

Attempting to create a new monitor results in a similar crash on startup. 
Reverting back to 0.80.11-1 allows the Ceph monitor to start as normal 
(with the pre-upgrade leveldb).


2016-10-11 06:35:54.781406 7f0d7abe57c0 -1 *** Caught signal (Aborted) **
in thread 7f0d7abe57c0

ceph version 0.80.11 (8424145d49264624a3b0a204aedb127835161070)
1: (()+0x4c7812) [0x55a47ae33812]
2: (()+0x11100) [0x7f0d7a300100]
3: (gsignal()+0xcf) [0x7f0d78929fdf]
4: (abort()+0x16a) [0x7f0d7892b40a]
5: (()+0x23275) [0x7f0d7a78c275]
6: (()+0x170af) [0x7f0d7a7800af]
7: (operator delete[](void*)+0x25d) [0x7f0d7a7a361d]
8: (LevelDBStore::do_open(std::ostream&, bool)+0x69c) [0x55a47addf0ac]
9: (main()+0xbc0) [0x55a47aabc120]
10: (__libc_start_main()+0xf1) [0x7f0d789172b1]
11: (_start()+0x2a) [0x55a47aacb98a]
NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

--- begin dump of recent events ---
  -12> 2016-10-11 06:35:54.776111 7f0d7abe57c0  5 asok(0x55a47cac2160) 
register_command perfcounters_dump hook 0x55a47ca96020
  -11> 2016-10-11 06:35:54.776183 7f0d7abe57c0  5 asok(0x55a47cac2160) 
register_command 1 hook 0x55a47ca96020
  -10> 2016-10-11 06:35:54.776212 7f0d7abe57c0  5 asok(0x55a47cac2160) 
register_command perf dump hook 0x55a47ca96020
   -9> 2016-10-11 06:35:54.776218 7f0d7abe57c0  5 asok(0x55a47cac2160) 
register_command perfcounters_schema hook 0x55a47ca96020
   -8> 2016-10-11 06:35:54.776224 7f0d7abe57c0  5 asok(0x55a47cac2160) 
register_command 2 hook 0x55a47ca96020
   -7> 2016-10-11 06:35:54.776228 7f0d7abe57c0  5 asok(0x55a47cac2160) 
register_command perf schema hook 0x55a47ca96020
   -6> 2016-10-11 06:35:54.776233 7f0d7abe57c0  5 asok(0x55a47cac2160) 
register_command config show hook 0x55a47ca96020
   -5> 2016-10-11 06:35:54.776238 7f0d7abe57c0  5 asok(0x55a47cac2160) 
register_command config set hook 0x55a47ca96020
   -4> 2016-10-11 06:35:54.776243 7f0d7abe57c0  5 asok(0x55a47cac2160) 
register_command config get hook 0x55a47ca96020
   -3> 2016-10-11 06:35:54.776248 7f0d7abe57c0  5 asok(0x55a47cac2160) 
register_command log flush hook 0x55a47ca96020
   -2> 2016-10-11 06:35:54.776261 7f0d7abe57c0  5 asok(0x55a47cac2160) 
register_command log dump hook 0x55a47ca96020
   -1> 2016-10-11 06:35:54.776269 7f0d7abe57c0  5 asok(0x55a47cac2160) 
register_command log reopen hook 0x55a47ca96020
0> 2016-10-11 06:35:54.781406 7f0d7abe57c0 -1 *** Caught signal (Aborted) **
in thread 7f0d7abe57c0

ceph version 0.80.11 (8424145d49264624a3b0a204aedb127835161070)
1: (()+0x4c7812) [0x55a47ae33812]
2: (()+0x11100) [0x7f0d7a300100]
3: (gsignal()+0xcf) [0x7f0d78929fdf]
4: (abort()+0x16a) [0x7f0d7892b40a]
5: (()+0x23275) [0x7f0d7a78c275]
6: (()+0x170af) [0x7f0d7a7800af]
7: (operator delete[](void*)+0x25d) [0x7f0d7a7a361d]
8: (LevelDBStore::do_open(std::ostream&, bool)+0x69c) [0x55a47addf0ac]
9: (main()+0xbc0) [0x55a47aabc120]
10: (__libc_start_main()+0xf1) [0x7f0d789172b1]
11: (_start()+0x2a) [0x55a47aacb98a]
NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

--- logging levels ---
  0/ 5 none
  0/ 1 lockdep
  0/ 1 context
  1/ 1 crush
  1/ 5 mds
  1/ 5 mds_balancer
  1/ 5 mds_locker
  1/ 5 mds_log
  1/ 5 mds_log_expire
  1/ 5 mds_migrator
  0/ 1 buffer
  0/ 1 timer
  0/ 1 filer
  0/ 1 striper
  0/ 1 objecter
  0/ 5 rados
  0/ 5 rbd
  0/ 5 journaler
  0/ 5 objectcacher
  0/ 5 client
  0/ 5 osd
  0/ 5 optracker
  0/ 5 objclass
  1/ 3 filestore
  1/ 3 keyvaluestore
  1/ 3 journal
  0/ 5 ms
  1/ 5 mon
  0/10 monc
  1/ 5 paxos
  0/ 5 tp
  1/ 5 auth
  1/ 5 crypto
  1/ 1 finisher
  1/ 5 heartbeatmap
  1/ 5 perfcounter
  1/ 5 rgw
  1/10 civetweb
  1/ 5 javaclient
  1/ 5 asok
  1/ 1 throttle
 -2/-2 (syslog threshold)
 -1/-1 (stderr threshold)
 max_recent 1
 max_new 1000
--- end dump of recent events ---


-- System Information:
Debian Release: stretch/sid
 APT prefers testing
 APT policy: (990, 'testing'), (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 4.7.0-1-amd64 (SMP w/24 CPU cores)
Locale: LANG=en_ZA.UTF-8, LC_CTYPE=en_ZA.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)