Re: OSD dies after seconds
I upgraded to ceph 0.56-3 but the problem persist... OSD starts but after a second it finishes: 2013-02-14 12:18:34.504391 7fae613ea760 10 journal _open journal is not a block device, NOT checking disk write cache on '/var/lib/ceph/osd/ceph-0/jour nal' 2013-02-14 12:18:34.504400 7fae613ea760 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 17: 1048576 bytes, block size 4096 bytes, directio = 1 , aio = 0 2013-02-14 12:18:34.504458 7fae613ea760 10 journal journal_start 2013-02-14 12:18:34.504506 7fae5d3c6700 10 journal write_thread_entry start 2013-02-14 12:18:34.504515 7fae5d3c6700 20 journal write_thread_entry going to sleep 2013-02-14 12:18:34.504706 7fae5cbc5700 10 journal write_finish_thread_entry enter 2013-02-14 12:18:34.504716 7fae5cbc5700 20 journal write_finish_thread_entry sleeping 2013-02-14 12:18:34.504893 7fae567fc700 20 filestore(/var/lib/ceph/osd/ceph-0) flusher_entry start 2013-02-14 12:18:34.504903 7fae567fc700 20 filestore(/var/lib/ceph/osd/ceph-0) flusher_entry sleeping 2013-02-14 12:18:34.505013 7fae613ea760 5 filestore(/var/lib/ceph/osd/ceph-0) umount /var/lib/ceph/osd/ceph-0 2013-02-14 12:18:34.505036 7fae567fc700 20 filestore(/var/lib/ceph/osd/ceph-0) flusher_entry awoke 2013-02-14 12:18:34.505044 7fae567fc700 20 filestore(/var/lib/ceph/osd/ceph-0) flusher_entry finish 2013-02-14 12:18:34.505113 7fae5dbc7700 20 filestore(/var/lib/ceph/osd/ceph-0) sync_entry force_sync set 2013-02-14 12:18:34.505129 7fae5dbc7700 10 journal commit_start max_applied_seq 2, open_ops 0 2013-02-14 12:18:34.505136 7fae5dbc7700 10 journal commit_start blocked, all open_ops have completed 2013-02-14 12:18:34.505138 7fae5dbc7700 10 journal commit_start nothing to do 2013-02-14 12:18:34.505141 7fae5dbc7700 10 journal commit_start 2013-02-14 12:18:34.505506 7fae613ea760 10 journal journal_stop 2013-02-14 12:18:34.505698 7fae613ea760 1 journal close /var/lib/ceph/osd/ceph-0/journal 2013-02-14 12:18:34.505787 7fae5d3c6700 20 journal write_thread_entry woke up 2013-02-14 12:18:34.505796 7fae5d3c6700 10 journal write_thread_entry finish 2013-02-14 12:18:34.505845 7fae5cbc5700 10 journal write_finish_thread_entry exit On Wed, Feb 13, 2013 at 6:28 PM, Jesus Cuenca jcue...@cnb.csic.es wrote: thanks for the fast answer. no, it does not segfault: gdb --args /usr/local/bin/ceph-osd -i 0 ... (gdb) run Starting program: /usr/local/bin/ceph-osd -i 0 [Thread debugging using libthread_db enabled] [New Thread 0x75fce700 (LWP 8920)] starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal [Thread 0x75fce700 (LWP 8920) exited] Program exited normally. -- On Wed, Feb 13, 2013 at 6:21 PM, Sage Weil s...@inktank.com wrote: On Wed, 13 Feb 2013, Jesus Cuenca wrote: Hi, I'm setting up a small ceph 0.56.2 cluster on 3 64-bit Debian 6 servers with kernel 3.7.2. This might be http://tracker.ceph.com/issues/3595 which is problems with google perftools (which we use by default) and the version in squeeze, which is buggy. This doesn't seem to affect all squeeze users. Does it seg fault? sage My problem is that OSD die. First I try to start them with the init script: /etc/init.d/ceph start osd.0 ... starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal ps -ef | grep ceph (No ceph-osd process) I then run with debugging: ceph-osd -i 0 --debug_ms 20 --debug_osd 20 --debug_filestore 20 --debug_journal 20 -d starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal 2013-02-13 18:04:40.351830 7fe98cd8a760 10 -- :/0 rank.bind :/0 2013-02-13 18:04:40.351895 7fe98cd8a760 10 accepter.accepter.bind 2013-02-13 18:04:40.351910 7fe98cd8a760 10 accepter.accepter.bind bound on random port 0.0.0.0:6800/0 2013-02-13 18:04:40.351919 7fe98cd8a760 10 accepter.accepter.bind bound to 0.0.0.0:6800/0 2013-02-13 18:04:40.351930 7fe98cd8a760 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6800/8438 need_addr=1 2013-02-13 18:04:40.351935 7fe98cd8a760 10 -- :/0 rank.bind :/0 2013-02-13 18:04:40.351938 7fe98cd8a760 10 accepter.accepter.bind 2013-02-13 18:04:40.351943 7fe98cd8a760 10 accepter.accepter.bind bound on random port 0.0.0.0:6801/0 2013-02-13 18:04:40.351946 7fe98cd8a760 10 accepter.accepter.bind bound to 0.0.0.0:6801/0 2013-02-13 18:04:40.351952 7fe98cd8a760 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6801/8438 need_addr=1 2013-02-13 18:04:40.351959 7fe98cd8a760 10 -- :/0 rank.bind :/0 2013-02-13 18:04:40.351961 7fe98cd8a760 10 accepter.accepter.bind 2013-02-13 18:04:40.351966 7fe98cd8a760 10 accepter.accepter.bind bound on random port 0.0.0.0:6802/0 2013-02-13 18:04:40.351969 7fe98cd8a760 10 accepter.accepter.bind bound to 0.0.0.0:6802/0 2013-02-13 18:04:40.351975 7fe98cd8a760 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6802/8438 need_addr=1 2013-02-13 18:04:40.352636 7fe98cd8a760 5 filestore(/var/lib/ceph/osd/ceph-0) basedir /var
OSD dies after seconds
Hi, I'm setting up a small ceph 0.56.2 cluster on 3 64-bit Debian 6 servers with kernel 3.7.2. My problem is that OSD die. First I try to start them with the init script: /etc/init.d/ceph start osd.0 ... starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal ps -ef | grep ceph (No ceph-osd process) I then run with debugging: ceph-osd -i 0 --debug_ms 20 --debug_osd 20 --debug_filestore 20 --debug_journal 20 -d starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal 2013-02-13 18:04:40.351830 7fe98cd8a760 10 -- :/0 rank.bind :/0 2013-02-13 18:04:40.351895 7fe98cd8a760 10 accepter.accepter.bind 2013-02-13 18:04:40.351910 7fe98cd8a760 10 accepter.accepter.bind bound on random port 0.0.0.0:6800/0 2013-02-13 18:04:40.351919 7fe98cd8a760 10 accepter.accepter.bind bound to 0.0.0.0:6800/0 2013-02-13 18:04:40.351930 7fe98cd8a760 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6800/8438 need_addr=1 2013-02-13 18:04:40.351935 7fe98cd8a760 10 -- :/0 rank.bind :/0 2013-02-13 18:04:40.351938 7fe98cd8a760 10 accepter.accepter.bind 2013-02-13 18:04:40.351943 7fe98cd8a760 10 accepter.accepter.bind bound on random port 0.0.0.0:6801/0 2013-02-13 18:04:40.351946 7fe98cd8a760 10 accepter.accepter.bind bound to 0.0.0.0:6801/0 2013-02-13 18:04:40.351952 7fe98cd8a760 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6801/8438 need_addr=1 2013-02-13 18:04:40.351959 7fe98cd8a760 10 -- :/0 rank.bind :/0 2013-02-13 18:04:40.351961 7fe98cd8a760 10 accepter.accepter.bind 2013-02-13 18:04:40.351966 7fe98cd8a760 10 accepter.accepter.bind bound on random port 0.0.0.0:6802/0 2013-02-13 18:04:40.351969 7fe98cd8a760 10 accepter.accepter.bind bound to 0.0.0.0:6802/0 2013-02-13 18:04:40.351975 7fe98cd8a760 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6802/8438 need_addr=1 2013-02-13 18:04:40.352636 7fe98cd8a760 5 filestore(/var/lib/ceph/osd/ceph-0) basedir /var/lib/ceph/osd/ceph-0 journal /var/lib/ceph/osd/ceph-0/journa l 2013-02-13 18:04:40.352664 7fe98cd8a760 10 filestore(/var/lib/ceph/osd/ceph-0) mount fsid is 0ab92be4-3b42-47bc-bd88-b0e11da5b450 2013-02-13 18:04:40.426222 7fe98cd8a760 0 filestore(/var/lib/ceph/osd/ceph-0) mount FIEMAP ioctl is supported and appears to work 2013-02-13 18:04:40.426234 7fe98cd8a760 0 filestore(/var/lib/ceph/osd/ceph-0) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2013-02-13 18:04:40.426567 7fe98cd8a760 0 filestore(/var/lib/ceph/osd/ceph-0) mount did NOT detect btrfs 2013-02-13 18:04:40.426575 7fe98cd8a760 0 filestore(/var/lib/ceph/osd/ceph-0) mount syncfs(2) syscall not supported 2013-02-13 18:04:40.426630 7fe98cd8a760 0 filestore(/var/lib/ceph/osd/ceph-0) mount no syncfs(2), must use sync(2). 2013-02-13 18:04:40.426631 7fe98cd8a760 0 filestore(/var/lib/ceph/osd/ceph-0) mount WARNING: multiple ceph-osd daemons on the same host will be slow 2013-02-13 18:04:40.426701 7fe98cd8a760 0 filestore(/var/lib/ceph/osd/ceph-0) mount found snaps 2013-02-13 18:04:40.426719 7fe98cd8a760 5 filestore(/var/lib/ceph/osd/ceph-0) mount op_seq is 2 2013-02-13 18:04:40.515151 7fe98cd8a760 20 filestore (init)dbobjectmap: seq is 1 2013-02-13 18:04:40.515217 7fe98cd8a760 10 filestore(/var/lib/ceph/osd/ceph-0) open_journal at /var/lib/ceph/osd/ceph-0/journal 2013-02-13 18:04:40.515243 7fe98cd8a760 0 filestore(/var/lib/ceph/osd/ceph-0) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2013-02-13 18:04:40.515252 7fe98cd8a760 10 filestore(/var/lib/ceph/osd/ceph-0) list_collections 2013-02-13 18:04:40.515352 7fe98cd8a760 10 journal journal_replay fs op_seq 2 2013-02-13 18:04:40.515359 7fe98cd8a760 2 journal open /var/lib/ceph/osd/ceph-0/journal fsid 0ab92be4-3b42-47bc-bd88-b0e11da5b450 fs_op_seq 2 2013-02-13 18:04:40.515373 7fe98cd8a760 10 journal _open journal is not a block device, NOT checking disk write cache on '/var/lib/ceph/osd/ceph-0/jour nal' 2013-02-13 18:04:40.515385 7fe98cd8a760 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 17: 1048576 bytes, block size 4096 bytes, directio = 1 , aio = 0 2013-02-13 18:04:40.515393 7fe98cd8a760 10 journal read_header 2013-02-13 18:04:40.515409 7fe98cd8a760 10 journal header: block_size 4096 alignment 4096 max_size 1048576 2013-02-13 18:04:40.515411 7fe98cd8a760 10 journal header: start 4096 2013-02-13 18:04:40.515412 7fe98cd8a760 10 journal write_pos 4096 2013-02-13 18:04:40.515415 7fe98cd8a760 10 journal open header.fsid = 0ab92be4-3b42-47bc-bd88-b0e11da5b450 2013-02-13 18:04:40.515434 7fe98cd8a760 2 journal read_entry 4096 : seq 2 424 bytes 2013-02-13 18:04:40.515439 7fe98cd8a760 2 journal read_entry 8192 : bad header magic, end of journal 2013-02-13 18:04:40.515443 7fe98cd8a760 10 journal open reached end of journal. 2013-02-13 18:04:40.515446 7fe98cd8a760 2 journal read_entry 8192 : bad header magic, end of journal 2013-02-13 18:04:40.515447 7fe98cd8a760 3 journal journal_replay: end of journal, done. 2013-02-13 18:04:40.515444 7fe989567700 20