We use Icinga, but simple make sure the service is alive.

[http://lenovocentral.lenovo.com/marketing/branding/email_signature/images/gradient.gif]

Gilad Berman
HPC Architect
Lenovo EMEA

[Phone]+972-52-2554262
[Email]gber...@lenovo.com<mailto:gber...@lenovo.com>



Lenovo.com <http://www.lenovo.com/>
Twitter<http://twitter.com/lenovo> | Facebook<http://www.facebook.com/lenovo> | 
Instagram<https://instagram.com/lenovo> | Blogs<http://blog.lenovo.com/> | 
Forums<http://forums.lenovo.com/>


[DCG-Hardware]




From: Xiao Peng Wang [mailto:w...@cn.ibm.com]
Sent: Wednesday, June 7, 2017 5:45 PM
To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
Cc: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
Subject: [xcat-user] 回复: xcatd service failed (while actually still up)

Someone from xcat will try to recreate. I am curious that how do you monitor 
the xcat service.

Using IBM Verse, send from my iPhone.
________________________________
在 2017年6月7日,下午10:29:20,gber...@lenovo.com<mailto:gber...@lenovo.com> 写道:

From: gber...@lenovo.com<mailto:gber...@lenovo.com>
To: xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net>
Cc:
Date: 2017年6月7日 下午10:29:20
Subject: Re: [xcat-user] xcatd service failed (while actually still up)
Probably yes. But the service was OK since then (we are monitoring the xCAT 
service status)

[http://lenovocentral.lenovo.com/marketing/branding/email_signature/images/gradient.gif]

Gilad Berman
HPC Architect
Lenovo EMEA

[Phone]+972-52-2554262<tel:+972-52-2554262>
[Email]gber...@lenovo.com<mailto:gber...@lenovo.com>



Lenovo.com <http://www.lenovo.com/>
Twitter<http://twitter.com/lenovo> | Facebook<http://www.facebook.com/lenovo> | 
Instagram<https://instagram.com/lenovo> | Blogs<http://blog.lenovo.com/> | 
Forums<http://forums.lenovo.com/>


[DCG-Hardware]




From: Xiao Peng Wang [mailto:w...@cn.ibm.com]
Sent: Wednesday, June 7, 2017 3:53 PM
To: xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net>
Cc: xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net>
Subject: Re: [xcat-user] xcatd service failed (while actually still up)


The start of xcatd looks failed to finish:
  Process: 7630 ExecStart=/etc/init.d/xcatd start (code=killed, signal=TERM)

Did you restart xcatd 16h ago?
  Active: failed (Result: timeout) since Tue 2017-06-06 19:12:36 CEST; 16h ago

Best Regards
----------------------------------------------------------------------
Wang Xiaopeng (王晓朋)

Manager for HPC SW Dev: xCAT, ESSL, SMI, Test
IBM China Systems Laboratory (CSL)

Tel: 86-10-82453455<tel:86-10-82453455>
Email: w...@cn.ibm.com<mailto:w...@cn.ibm.com>


----- Original message -----
From: Gilad Berman <gber...@lenovo.com<mailto:gber...@lenovo.com>>
To: xCAT Users Mailing list 
<xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net>>
Cc:
Subject: [xcat-user] xcatd service failed (while actually still up)
Date: Wed, Jun 7, 2017 5:35 PM


Hello,



We are seeing some strange behavior with xCAT service –



Systemctl status xcatd return the following output –



s08:~ # systemctl status xcatd

● xcatd.service - LSB: xcatd

   Loaded: loaded (/etc/init.d/xcatd; bad; vendor preset: disabled)

   Active: failed (Result: timeout) since Tue 2017-06-06 19:12:36 CEST; 16h ago

     Docs: man:systemd-sysv-generator(8)

  Process: 30751 ExecStop=/etc/init.d/xcatd stop (code=exited, status=0/SUCCESS)

  Process: 7630 ExecStart=/etc/init.d/xcatd start (code=killed, signal=TERM)

    Tasks: 8 (limit: 512)

   CGroup: /system.slice/xcatd.service

           ├─23978 /usr/sbin/in.tftpd -v -l -s /tftpboot -m 
/etc/tftpmapfile4xcat.conf

           ├─28343 xcatd: SSL listener

           ├─28344 xcatd: DB Access

           ├─28345 xcatd: UDP listener

           ├─28346 xcatd: install monitor

           ├─28347 xcatd: Discovery worker

           ├─28348 xcatd: Command log writer

           └─28727 xcatd: DB Access





While xcatd is still running and operational (tabdump every other functionality 
seems to work).



Doing a systemctl restart xcatd immediately works and seems to fix the issue.



Any idea what could make the service thinks it failed? Anything we should look 
at?



THX!!

[http://lenovocentral.lenovo.com/marketing/branding/email_signature/images/gradient.gif]


Gilad Berman
HPC Architect
Lenovo EMEA


[Phone]+972-52-2554262<tel:+972-52-2554262>
[Email]gber...@lenovo.com<mailto:gber...@lenovo.com>







Lenovo.com <http://www.lenovo.com/>
Twitter<http://twitter.com/lenovo> | Facebook<http://www.facebook.com/lenovo> | 
Instagram<https://instagram.com/lenovo> | Blogs<http://blog.lenovo.com/> | 
Forums<http://forums.lenovo.com/>



[DCG-Hardware]








------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net<mailto:xCAT-user@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/xcat-user



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to