RE: ceph-mon terminated with status 28
Brad -- The issue is in tracker now.. http://tracker.ceph.com/issues/14088 -- Tom > -Original Message- > From: Brad Hubbard [mailto:bhubb...@redhat.com] > Sent: Monday, December 14, 2015 3:47 PM > To: Deneau, Tom > Cc: ceph-devel@vger.kernel.org > Subject: Re: ceph-mon terminated with status 28 > > - Original Message - > > From: "Tom Deneau" <tom.den...@amd.com> > > To: "Brad Hubbard" <bhubb...@redhat.com> > > Cc: ceph-devel@vger.kernel.org > > Sent: Tuesday, 15 December, 2015 3:21:27 AM > > Subject: RE: ceph-mon terminated with status 28 > > > > Thanks, Brad. That was the problem. > > Np. > > > > > Is there a reason why we don't log more descriptive info for this kind > of > > failure? > > I guess it may not have been anticipated that init would swallow these > types of > errors early in the process and just report the return code. > > If you wouldn't mind opening a tracker for "Fatal errors at start-up are > not > logged", or something similar, I can take a look at getting some > meaningful log > entries reported during these early failures. > > Let me know the tracker number. > > Cheers, > Brad > > > > > -- Tom > > > > > -Original Message----- > > > From: Brad Hubbard [mailto:bhubb...@redhat.com] > > > Sent: Sunday, December 13, 2015 4:19 PM > > > To: Deneau, Tom > > > Cc: ceph-devel@vger.kernel.org > > > Subject: Re: ceph-mon terminated with status 28 > > > > > > - Original Message - > > > > From: "Tom Deneau" <tom.den...@amd.com> > > > > To: ceph-devel@vger.kernel.org > > > > Sent: Sunday, 13 December, 2015 11:49:16 PM > > > > Subject: ceph-mon terminated with status 28 > > > > > > > > I am trying to understand the following failure: > > > > > > > > A small cluster was running fine, and then was left unused for a > while. > > > > When I went to try to use it again, the mon socket wasn't there and > I > > > > could see that ceph-mon was not running. I saw the lines below at > the > > > > end of dmesg output. > > > > When I tried to restart ceph-mon using sudo start ceph-mon > id=monhost, > > > > I got the same set of errors newly appended to dmesg output. > > > > > > > > I don't see anything more descriptive in /var/log/ceph/ceph-mon.log, > > > > just the recording of new mon processes starting. > > > > > > > > In this particular small cluster, the mon process was running on the > > > > same node with 7 osd processes. sudo initctl list shows that the > osd > > > > procs are still up, although logging the fact that they can't > > > > communicate with the mon socket. > > > > > > > > Is there someplace else I should look for more details as to why mon > > > > is down and can't be restarted? > > > > > > > > -- Tom Deneau > > > > > > > > dmesg output: > > > > -- > > > > init: ceph-mon (ceph/monhost) main process (16538) terminated with > > > > status 28 > > > > init: ceph-mon (ceph/monhost) main process ended, respawning > > > > init: ceph-create-keys main process (16227) killed by TERM signal > > > > init: ceph-mon (ceph/monhost) main process (16546) terminated with > > > > status 28 > > > > init: ceph-mon (ceph/monhost) main process ended, respawning > > > > init: ceph-create-keys main process (16548) killed by TERM signal > > > > init: ceph-mon (ceph/monhost) main process (16556) terminated with > > > > status 28 > > > > init: ceph-mon (ceph/monhost) main process ended, respawning > > > > init: ceph-create-keys main process (16558) killed by TERM signal > > > > init: ceph-mon (ceph/monhost) main process (16566) terminated with > > > > status 28 > > > > init: ceph-mon (ceph/monhost) respawning too fast, stopped > > > > init: ceph-create-keys main process (16568) killed by TERM signal > > > > > > It looks like it's complaining about lack of space? > > > > > > src/ceph_mon.cc: > > > > > > 204 int main(int argc, const char **argv)· > > > 205 { > > > 8< > > > 475 { > > > 476 // check fs stats. don't start if it's critically close to > full. > > > 477 ceph_data_stats_t stats; > > > 4
RE: ceph-mon terminated with status 28
Thanks, Brad. That was the problem. Is there a reason why we don't log more descriptive info for this kind of failure? -- Tom > -Original Message- > From: Brad Hubbard [mailto:bhubb...@redhat.com] > Sent: Sunday, December 13, 2015 4:19 PM > To: Deneau, Tom > Cc: ceph-devel@vger.kernel.org > Subject: Re: ceph-mon terminated with status 28 > > - Original Message - > > From: "Tom Deneau" <tom.den...@amd.com> > > To: ceph-devel@vger.kernel.org > > Sent: Sunday, 13 December, 2015 11:49:16 PM > > Subject: ceph-mon terminated with status 28 > > > > I am trying to understand the following failure: > > > > A small cluster was running fine, and then was left unused for a while. > > When I went to try to use it again, the mon socket wasn't there and I > > could see that ceph-mon was not running. I saw the lines below at the > > end of dmesg output. > > When I tried to restart ceph-mon using sudo start ceph-mon id=monhost, > > I got the same set of errors newly appended to dmesg output. > > > > I don't see anything more descriptive in /var/log/ceph/ceph-mon.log, > > just the recording of new mon processes starting. > > > > In this particular small cluster, the mon process was running on the > > same node with 7 osd processes. sudo initctl list shows that the osd > > procs are still up, although logging the fact that they can't > > communicate with the mon socket. > > > > Is there someplace else I should look for more details as to why mon > > is down and can't be restarted? > > > > -- Tom Deneau > > > > dmesg output: > > -- > > init: ceph-mon (ceph/monhost) main process (16538) terminated with > > status 28 > > init: ceph-mon (ceph/monhost) main process ended, respawning > > init: ceph-create-keys main process (16227) killed by TERM signal > > init: ceph-mon (ceph/monhost) main process (16546) terminated with > > status 28 > > init: ceph-mon (ceph/monhost) main process ended, respawning > > init: ceph-create-keys main process (16548) killed by TERM signal > > init: ceph-mon (ceph/monhost) main process (16556) terminated with > > status 28 > > init: ceph-mon (ceph/monhost) main process ended, respawning > > init: ceph-create-keys main process (16558) killed by TERM signal > > init: ceph-mon (ceph/monhost) main process (16566) terminated with > > status 28 > > init: ceph-mon (ceph/monhost) respawning too fast, stopped > > init: ceph-create-keys main process (16568) killed by TERM signal > > It looks like it's complaining about lack of space? > > src/ceph_mon.cc: > > 204 int main(int argc, const char **argv)· > 205 { > 8< > 475 { > 476 // check fs stats. don't start if it's critically close to full. > 477 ceph_data_stats_t stats; > 478 int err = get_fs_stats(stats, g_conf->mon_data.c_str()); > 479 if (err < 0) { > 480 cerr << "error checking monitor data's fs stats: " << > cpp_strerror(err) > 481<< std::endl; > 482 exit(-err); > 483 } > 484 if (stats.avail_percent <= g_conf->mon_data_avail_crit) { > 485 cerr << "error: monitor data filesystem reached concerning > levels of" > 486<< " available storage space (available: " > 487<< stats.avail_percent << "% " << > prettybyte_t(stats.byte_avail) > 488<< ")\nyou may adjust 'mon data avail crit' to a lower > value" > 489<< " to make this go away (default: " << g_conf- > >mon_data_avail_crit > 490<< "%)\n" << std::endl; > 491 exit(ENOSPC); > 492 } > > #define ENOSPC 28 /* No space left on device */ > > Try starting ceph-mon from the command line and see if you get the above > message. > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > > in the body of a message to majord...@vger.kernel.org More majordomo > > info at http://vger.kernel.org/majordomo-info.html > >
Re: ceph-mon terminated with status 28
- Original Message - > From: "Tom Deneau" <tom.den...@amd.com> > To: "Brad Hubbard" <bhubb...@redhat.com> > Cc: ceph-devel@vger.kernel.org > Sent: Tuesday, 15 December, 2015 3:21:27 AM > Subject: RE: ceph-mon terminated with status 28 > > Thanks, Brad. That was the problem. Np. > > Is there a reason why we don't log more descriptive info for this kind of > failure? I guess it may not have been anticipated that init would swallow these types of errors early in the process and just report the return code. If you wouldn't mind opening a tracker for "Fatal errors at start-up are not logged", or something similar, I can take a look at getting some meaningful log entries reported during these early failures. Let me know the tracker number. Cheers, Brad > > -- Tom > > > -Original Message- > > From: Brad Hubbard [mailto:bhubb...@redhat.com] > > Sent: Sunday, December 13, 2015 4:19 PM > > To: Deneau, Tom > > Cc: ceph-devel@vger.kernel.org > > Subject: Re: ceph-mon terminated with status 28 > > > > - Original Message - > > > From: "Tom Deneau" <tom.den...@amd.com> > > > To: ceph-devel@vger.kernel.org > > > Sent: Sunday, 13 December, 2015 11:49:16 PM > > > Subject: ceph-mon terminated with status 28 > > > > > > I am trying to understand the following failure: > > > > > > A small cluster was running fine, and then was left unused for a while. > > > When I went to try to use it again, the mon socket wasn't there and I > > > could see that ceph-mon was not running. I saw the lines below at the > > > end of dmesg output. > > > When I tried to restart ceph-mon using sudo start ceph-mon id=monhost, > > > I got the same set of errors newly appended to dmesg output. > > > > > > I don't see anything more descriptive in /var/log/ceph/ceph-mon.log, > > > just the recording of new mon processes starting. > > > > > > In this particular small cluster, the mon process was running on the > > > same node with 7 osd processes. sudo initctl list shows that the osd > > > procs are still up, although logging the fact that they can't > > > communicate with the mon socket. > > > > > > Is there someplace else I should look for more details as to why mon > > > is down and can't be restarted? > > > > > > -- Tom Deneau > > > > > > dmesg output: > > > -- > > > init: ceph-mon (ceph/monhost) main process (16538) terminated with > > > status 28 > > > init: ceph-mon (ceph/monhost) main process ended, respawning > > > init: ceph-create-keys main process (16227) killed by TERM signal > > > init: ceph-mon (ceph/monhost) main process (16546) terminated with > > > status 28 > > > init: ceph-mon (ceph/monhost) main process ended, respawning > > > init: ceph-create-keys main process (16548) killed by TERM signal > > > init: ceph-mon (ceph/monhost) main process (16556) terminated with > > > status 28 > > > init: ceph-mon (ceph/monhost) main process ended, respawning > > > init: ceph-create-keys main process (16558) killed by TERM signal > > > init: ceph-mon (ceph/monhost) main process (16566) terminated with > > > status 28 > > > init: ceph-mon (ceph/monhost) respawning too fast, stopped > > > init: ceph-create-keys main process (16568) killed by TERM signal > > > > It looks like it's complaining about lack of space? > > > > src/ceph_mon.cc: > > > > 204 int main(int argc, const char **argv)· > > 205 { > > 8< > > 475 { > > 476 // check fs stats. don't start if it's critically close to full. > > 477 ceph_data_stats_t stats; > > 478 int err = get_fs_stats(stats, g_conf->mon_data.c_str()); > > 479 if (err < 0) { > > 480 cerr << "error checking monitor data's fs stats: " << > > cpp_strerror(err) > > 481<< std::endl; > > 482 exit(-err); > > 483 } > > 484 if (stats.avail_percent <= g_conf->mon_data_avail_crit) { > > 485 cerr << "error: monitor data filesystem reached concerning > > levels of" > > 486<< " available storage space (available: " > > 487<< stats.avail_percent << "% " << > > prettybyte_t(stats.byte_avail) > > 488<< ")\nyou may adjust 'mon data avail crit' to a lower > > value&quo
Re: ceph-mon terminated with status 28
- Original Message - > From: "Tom Deneau"> To: ceph-devel@vger.kernel.org > Sent: Sunday, 13 December, 2015 11:49:16 PM > Subject: ceph-mon terminated with status 28 > > I am trying to understand the following failure: > > A small cluster was running fine, and then was left unused for a while. > When I went to try to use it again, the mon socket wasn't there and I could > see that > ceph-mon was not running. I saw the lines below at the end of dmesg output. > When I tried to restart ceph-mon using sudo start ceph-mon id=monhost, > I got the same set of errors newly appended to dmesg output. > > I don't see anything more descriptive in /var/log/ceph/ceph-mon.log, just > the recording of new mon processes starting. > > In this particular small cluster, the mon process was running on the same > node with 7 osd processes. sudo initctl list shows that the osd procs are > still > up, although logging the fact that they can't communicate with the mon > socket. > > Is there someplace else I should look for more details as to why mon is down > and can't be restarted? > > -- Tom Deneau > > dmesg output: > -- > init: ceph-mon (ceph/monhost) main process (16538) terminated with status 28 > init: ceph-mon (ceph/monhost) main process ended, respawning > init: ceph-create-keys main process (16227) killed by TERM signal > init: ceph-mon (ceph/monhost) main process (16546) terminated with status 28 > init: ceph-mon (ceph/monhost) main process ended, respawning > init: ceph-create-keys main process (16548) killed by TERM signal > init: ceph-mon (ceph/monhost) main process (16556) terminated with status 28 > init: ceph-mon (ceph/monhost) main process ended, respawning > init: ceph-create-keys main process (16558) killed by TERM signal > init: ceph-mon (ceph/monhost) main process (16566) terminated with status 28 > init: ceph-mon (ceph/monhost) respawning too fast, stopped > init: ceph-create-keys main process (16568) killed by TERM signal It looks like it's complaining about lack of space? src/ceph_mon.cc: 204 int main(int argc, const char **argv)· 205 { 8< 475 { 476 // check fs stats. don't start if it's critically close to full. 477 ceph_data_stats_t stats; 478 int err = get_fs_stats(stats, g_conf->mon_data.c_str()); 479 if (err < 0) { 480 cerr << "error checking monitor data's fs stats: " << cpp_strerror(err) 481<< std::endl; 482 exit(-err); 483 } 484 if (stats.avail_percent <= g_conf->mon_data_avail_crit) { 485 cerr << "error: monitor data filesystem reached concerning levels of" 486<< " available storage space (available: " 487<< stats.avail_percent << "% " << prettybyte_t(stats.byte_avail) 488<< ")\nyou may adjust 'mon data avail crit' to a lower value" 489<< " to make this go away (default: " << g_conf->mon_data_avail_crit 490<< "%)\n" << std::endl; 491 exit(ENOSPC); 492 } #define ENOSPC 28 /* No space left on device */ Try starting ceph-mon from the command line and see if you get the above message. > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html