RE: ceph-mon terminated with status 28

2015-12-15 Thread Deneau, Tom
Brad --

The issue is in tracker now..
http://tracker.ceph.com/issues/14088

-- Tom

> -Original Message-
> From: Brad Hubbard [mailto:bhubb...@redhat.com]
> Sent: Monday, December 14, 2015 3:47 PM
> To: Deneau, Tom
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: ceph-mon terminated with status 28
> 
> - Original Message -
> > From: "Tom Deneau" <tom.den...@amd.com>
> > To: "Brad Hubbard" <bhubb...@redhat.com>
> > Cc: ceph-devel@vger.kernel.org
> > Sent: Tuesday, 15 December, 2015 3:21:27 AM
> > Subject: RE: ceph-mon terminated with status 28
> >
> > Thanks, Brad.  That was the problem.
> 
> Np.
> 
> >
> > Is there a reason why we don't log more descriptive info for this kind
> of
> > failure?
> 
> I guess it may not have been anticipated that init would swallow these
> types of
> errors early in the process and just report the return code.
> 
> If you wouldn't mind opening a tracker for "Fatal errors at start-up are
> not
> logged", or something similar,  I can take a look at getting some
> meaningful log
> entries reported during these early failures.
> 
> Let me know the tracker number.
> 
> Cheers,
> Brad
> 
> >
> > -- Tom
> >
> > > -Original Message-----
> > > From: Brad Hubbard [mailto:bhubb...@redhat.com]
> > > Sent: Sunday, December 13, 2015 4:19 PM
> > > To: Deneau, Tom
> > > Cc: ceph-devel@vger.kernel.org
> > > Subject: Re: ceph-mon terminated with status 28
> > >
> > > - Original Message -
> > > > From: "Tom Deneau" <tom.den...@amd.com>
> > > > To: ceph-devel@vger.kernel.org
> > > > Sent: Sunday, 13 December, 2015 11:49:16 PM
> > > > Subject: ceph-mon terminated with status 28
> > > >
> > > > I am trying to understand the following failure:
> > > >
> > > > A small cluster was running fine, and then was left unused for a
> while.
> > > > When I went to try to use it again, the mon socket wasn't there and
> I
> > > > could see that ceph-mon was not running.  I saw the lines below at
> the
> > > > end of dmesg output.
> > > > When I tried to restart ceph-mon using sudo start ceph-mon
> id=monhost,
> > > > I got the same set of errors newly appended to dmesg output.
> > > >
> > > > I don't see anything more descriptive in /var/log/ceph/ceph-mon.log,
> > > > just the recording of new mon processes starting.
> > > >
> > > > In this particular small cluster, the mon process was running on the
> > > > same node with 7 osd processes.  sudo initctl list shows that the
> osd
> > > > procs are still up, although logging the fact that they can't
> > > > communicate with the mon socket.
> > > >
> > > > Is there someplace else I should look for more details as to why mon
> > > > is down and can't be restarted?
> > > >
> > > > -- Tom Deneau
> > > >
> > > > dmesg output:
> > > > --
> > > >  init: ceph-mon (ceph/monhost) main process (16538) terminated with
> > > > status 28
> > > >  init: ceph-mon (ceph/monhost) main process ended, respawning
> > > >  init: ceph-create-keys main process (16227) killed by TERM signal
> > > >  init: ceph-mon (ceph/monhost) main process (16546) terminated with
> > > > status 28
> > > >  init: ceph-mon (ceph/monhost) main process ended, respawning
> > > >  init: ceph-create-keys main process (16548) killed by TERM signal
> > > >  init: ceph-mon (ceph/monhost) main process (16556) terminated with
> > > > status 28
> > > >  init: ceph-mon (ceph/monhost) main process ended, respawning
> > > >  init: ceph-create-keys main process (16558) killed by TERM signal
> > > >  init: ceph-mon (ceph/monhost) main process (16566) terminated with
> > > > status 28
> > > >  init: ceph-mon (ceph/monhost) respawning too fast, stopped
> > > >  init: ceph-create-keys main process (16568) killed by TERM signal
> > >
> > > It looks like it's complaining about lack of space?
> > >
> > > src/ceph_mon.cc:
> > >
> > > 204 int main(int argc, const char **argv)·
> > > 205 {
> > > 8<
> > > 475   {
> > > 476 // check fs stats. don't start if it's critically close to
> full.
> > > 477 ceph_data_stats_t stats;
> > > 4

RE: ceph-mon terminated with status 28

2015-12-14 Thread Deneau, Tom
Thanks, Brad.  That was the problem.

Is there a reason why we don't log more descriptive info for this kind of 
failure?

-- Tom

> -Original Message-
> From: Brad Hubbard [mailto:bhubb...@redhat.com]
> Sent: Sunday, December 13, 2015 4:19 PM
> To: Deneau, Tom
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: ceph-mon terminated with status 28
> 
> - Original Message -
> > From: "Tom Deneau" <tom.den...@amd.com>
> > To: ceph-devel@vger.kernel.org
> > Sent: Sunday, 13 December, 2015 11:49:16 PM
> > Subject: ceph-mon terminated with status 28
> >
> > I am trying to understand the following failure:
> >
> > A small cluster was running fine, and then was left unused for a while.
> > When I went to try to use it again, the mon socket wasn't there and I
> > could see that ceph-mon was not running.  I saw the lines below at the
> > end of dmesg output.
> > When I tried to restart ceph-mon using sudo start ceph-mon id=monhost,
> > I got the same set of errors newly appended to dmesg output.
> >
> > I don't see anything more descriptive in /var/log/ceph/ceph-mon.log,
> > just the recording of new mon processes starting.
> >
> > In this particular small cluster, the mon process was running on the
> > same node with 7 osd processes.  sudo initctl list shows that the osd
> > procs are still up, although logging the fact that they can't
> > communicate with the mon socket.
> >
> > Is there someplace else I should look for more details as to why mon
> > is down and can't be restarted?
> >
> > -- Tom Deneau
> >
> > dmesg output:
> > --
> >  init: ceph-mon (ceph/monhost) main process (16538) terminated with
> > status 28
> >  init: ceph-mon (ceph/monhost) main process ended, respawning
> >  init: ceph-create-keys main process (16227) killed by TERM signal
> >  init: ceph-mon (ceph/monhost) main process (16546) terminated with
> > status 28
> >  init: ceph-mon (ceph/monhost) main process ended, respawning
> >  init: ceph-create-keys main process (16548) killed by TERM signal
> >  init: ceph-mon (ceph/monhost) main process (16556) terminated with
> > status 28
> >  init: ceph-mon (ceph/monhost) main process ended, respawning
> >  init: ceph-create-keys main process (16558) killed by TERM signal
> >  init: ceph-mon (ceph/monhost) main process (16566) terminated with
> > status 28
> >  init: ceph-mon (ceph/monhost) respawning too fast, stopped
> >  init: ceph-create-keys main process (16568) killed by TERM signal
> 
> It looks like it's complaining about lack of space?
> 
> src/ceph_mon.cc:
> 
> 204 int main(int argc, const char **argv)·
> 205 {
> 8<
> 475   {
> 476 // check fs stats. don't start if it's critically close to full.
> 477 ceph_data_stats_t stats;
> 478 int err = get_fs_stats(stats, g_conf->mon_data.c_str());
> 479 if (err < 0) {
> 480   cerr << "error checking monitor data's fs stats: " <<
> cpp_strerror(err)
> 481<< std::endl;
> 482   exit(-err);
> 483 }
> 484 if (stats.avail_percent <= g_conf->mon_data_avail_crit) {
> 485   cerr << "error: monitor data filesystem reached concerning
> levels of"
> 486<< " available storage space (available: "
> 487<< stats.avail_percent << "% " <<
> prettybyte_t(stats.byte_avail)
> 488<< ")\nyou may adjust 'mon data avail crit' to a lower
> value"
> 489<< " to make this go away (default: " << g_conf-
> >mon_data_avail_crit
> 490<< "%)\n" << std::endl;
> 491   exit(ENOSPC);
> 492 }
> 
> #define ENOSPC  28  /* No space left on device */
> 
> Try starting ceph-mon from the command line and see if you get the above
> message.
> 
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majord...@vger.kernel.org More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> >


Re: ceph-mon terminated with status 28

2015-12-14 Thread Brad Hubbard
- Original Message -
> From: "Tom Deneau" <tom.den...@amd.com>
> To: "Brad Hubbard" <bhubb...@redhat.com>
> Cc: ceph-devel@vger.kernel.org
> Sent: Tuesday, 15 December, 2015 3:21:27 AM
> Subject: RE: ceph-mon terminated with status 28
> 
> Thanks, Brad.  That was the problem.

Np.

> 
> Is there a reason why we don't log more descriptive info for this kind of
> failure?

I guess it may not have been anticipated that init would swallow these types of
errors early in the process and just report the return code.

If you wouldn't mind opening a tracker for "Fatal errors at start-up are not
logged", or something similar,  I can take a look at getting some meaningful log
entries reported during these early failures.

Let me know the tracker number.

Cheers,
Brad

> 
> -- Tom
> 
> > -Original Message-
> > From: Brad Hubbard [mailto:bhubb...@redhat.com]
> > Sent: Sunday, December 13, 2015 4:19 PM
> > To: Deneau, Tom
> > Cc: ceph-devel@vger.kernel.org
> > Subject: Re: ceph-mon terminated with status 28
> > 
> > - Original Message -
> > > From: "Tom Deneau" <tom.den...@amd.com>
> > > To: ceph-devel@vger.kernel.org
> > > Sent: Sunday, 13 December, 2015 11:49:16 PM
> > > Subject: ceph-mon terminated with status 28
> > >
> > > I am trying to understand the following failure:
> > >
> > > A small cluster was running fine, and then was left unused for a while.
> > > When I went to try to use it again, the mon socket wasn't there and I
> > > could see that ceph-mon was not running.  I saw the lines below at the
> > > end of dmesg output.
> > > When I tried to restart ceph-mon using sudo start ceph-mon id=monhost,
> > > I got the same set of errors newly appended to dmesg output.
> > >
> > > I don't see anything more descriptive in /var/log/ceph/ceph-mon.log,
> > > just the recording of new mon processes starting.
> > >
> > > In this particular small cluster, the mon process was running on the
> > > same node with 7 osd processes.  sudo initctl list shows that the osd
> > > procs are still up, although logging the fact that they can't
> > > communicate with the mon socket.
> > >
> > > Is there someplace else I should look for more details as to why mon
> > > is down and can't be restarted?
> > >
> > > -- Tom Deneau
> > >
> > > dmesg output:
> > > --
> > >  init: ceph-mon (ceph/monhost) main process (16538) terminated with
> > > status 28
> > >  init: ceph-mon (ceph/monhost) main process ended, respawning
> > >  init: ceph-create-keys main process (16227) killed by TERM signal
> > >  init: ceph-mon (ceph/monhost) main process (16546) terminated with
> > > status 28
> > >  init: ceph-mon (ceph/monhost) main process ended, respawning
> > >  init: ceph-create-keys main process (16548) killed by TERM signal
> > >  init: ceph-mon (ceph/monhost) main process (16556) terminated with
> > > status 28
> > >  init: ceph-mon (ceph/monhost) main process ended, respawning
> > >  init: ceph-create-keys main process (16558) killed by TERM signal
> > >  init: ceph-mon (ceph/monhost) main process (16566) terminated with
> > > status 28
> > >  init: ceph-mon (ceph/monhost) respawning too fast, stopped
> > >  init: ceph-create-keys main process (16568) killed by TERM signal
> > 
> > It looks like it's complaining about lack of space?
> > 
> > src/ceph_mon.cc:
> > 
> > 204 int main(int argc, const char **argv)·
> > 205 {
> > 8<
> > 475   {
> > 476 // check fs stats. don't start if it's critically close to full.
> > 477 ceph_data_stats_t stats;
> > 478 int err = get_fs_stats(stats, g_conf->mon_data.c_str());
> > 479 if (err < 0) {
> > 480   cerr << "error checking monitor data's fs stats: " <<
> > cpp_strerror(err)
> > 481<< std::endl;
> > 482   exit(-err);
> > 483 }
> > 484 if (stats.avail_percent <= g_conf->mon_data_avail_crit) {
> > 485   cerr << "error: monitor data filesystem reached concerning
> > levels of"
> > 486<< " available storage space (available: "
> > 487<< stats.avail_percent << "% " <<
> > prettybyte_t(stats.byte_avail)
> > 488<< ")\nyou may adjust 'mon data avail crit' to a lower
> > value&quo

Re: ceph-mon terminated with status 28

2015-12-13 Thread Brad Hubbard
- Original Message -
> From: "Tom Deneau" 
> To: ceph-devel@vger.kernel.org
> Sent: Sunday, 13 December, 2015 11:49:16 PM
> Subject: ceph-mon terminated with status 28
> 
> I am trying to understand the following failure:
> 
> A small cluster was running fine, and then was left unused for a while.
> When I went to try to use it again, the mon socket wasn't there and I could
> see that
> ceph-mon was not running.  I saw the lines below at the end of dmesg output.
> When I tried to restart ceph-mon using sudo start ceph-mon id=monhost,
> I got the same set of errors newly appended to dmesg output.
> 
> I don't see anything more descriptive in /var/log/ceph/ceph-mon.log, just
> the recording of new mon processes starting.
> 
> In this particular small cluster, the mon process was running on the same
> node with 7 osd processes.  sudo initctl list shows that the osd procs are
> still
> up, although logging the fact that they can't communicate with the mon
> socket.
> 
> Is there someplace else I should look for more details as to why mon is down
> and can't be restarted?
> 
> -- Tom Deneau
> 
> dmesg output:
> --
>  init: ceph-mon (ceph/monhost) main process (16538) terminated with status 28
>  init: ceph-mon (ceph/monhost) main process ended, respawning
>  init: ceph-create-keys main process (16227) killed by TERM signal
>  init: ceph-mon (ceph/monhost) main process (16546) terminated with status 28
>  init: ceph-mon (ceph/monhost) main process ended, respawning
>  init: ceph-create-keys main process (16548) killed by TERM signal
>  init: ceph-mon (ceph/monhost) main process (16556) terminated with status 28
>  init: ceph-mon (ceph/monhost) main process ended, respawning
>  init: ceph-create-keys main process (16558) killed by TERM signal
>  init: ceph-mon (ceph/monhost) main process (16566) terminated with status 28
>  init: ceph-mon (ceph/monhost) respawning too fast, stopped
>  init: ceph-create-keys main process (16568) killed by TERM signal

It looks like it's complaining about lack of space?

src/ceph_mon.cc:

204 int main(int argc, const char **argv)·
205 {
8<
475   {
476 // check fs stats. don't start if it's critically close to full.
477 ceph_data_stats_t stats;
478 int err = get_fs_stats(stats, g_conf->mon_data.c_str());
479 if (err < 0) {
480   cerr << "error checking monitor data's fs stats: " << 
cpp_strerror(err)
481<< std::endl;
482   exit(-err);
483 }
484 if (stats.avail_percent <= g_conf->mon_data_avail_crit) {
485   cerr << "error: monitor data filesystem reached concerning levels of"
486<< " available storage space (available: "
487<< stats.avail_percent << "% " << prettybyte_t(stats.byte_avail)
488<< ")\nyou may adjust 'mon data avail crit' to a lower value"
489<< " to make this go away (default: " << 
g_conf->mon_data_avail_crit
490<< "%)\n" << std::endl;
491   exit(ENOSPC);
492 }

#define ENOSPC  28  /* No space left on device */

Try starting ceph-mon from the command line and see if you get the above 
message.

> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html