Re: [Gluster-devel] Gluster health/status

2010-02-24 Thread Alexey Filin
2010/2/23 Harald Stürzebecher hara...@cs.tu-berlin.de

 2010/2/22 Samuel Hassine samuel.hass...@gmail.com:
  I'm also looking for a way to monitor gluster nodes.
 
  Any solutions ?
 
  Le lundi 22 février 2010 à 10:12 +0500, Anton a écrit :
  Hello!
 
 
 
  I'm looking for the way to determine the health of the GLUSTER
  cluster. Is there any way to determine if any of the nodes failed? In
  the log files it is possible to grep that there is remotexx:
  disconnected - but it is not sutable for monitoring. There should be
  the simple way to just query the cluster against the .vol file and
  see, if any node/brick failed to attach and so trigger the alarm. Is
  there anything like gluster --reporthealth?

 Checking if a connection to the GlusterFS TCP server port (6996 IIRC)
 is possible might be an indicator for working/failing - at least for
 setups that use TCP. I don't know if anything like that is possible
 for Infiniband-only setups.

IPoIB (IP over Infiniband)?


 IIRC, Nagios can check if a port is open on a remote machine. That
 won't find something like disk/filesystem problems on the server, but
 it could report crashed GlusterFS server processes and machines that
 are not working at all.

nagios can run checks remotely

http://www.logix.cz/michal/devel/nagios/
http://blogs.techrepublic.com.com/opensource/?p=321

so it can check the real status of glusterfsd or whatever we want on remote
host


 I know that this simple method won't provide a positive status (=it
 works) which would be preferable, but at least it can provide a
 negative status (=_something_ failed on _that_ machine) in some cases.

glusterfsd port can be stolen, check of open port is indirect and unreliable
way to check status



@gluster.org:
 IIRC, some time ago someone requested a syslog feature to debug
 problems with GlusterFS as root filesystem for a diskless cluster -
 are there any news on that?
 Having the clients report problems to a central logging server might
 be useful for monitoring.

monitoring of glusterfs daemons from client side is unreliable as monitoring
errors can be caused by faults on the client side (I suppose nagios server
host(s) to be reliable host)

I insist on remote checks because
  1) glusterfsd should abort if non-recoverable error happened, in the case
remote check of real status is the most reliable check
  2) if glustefsd or any FS-related service continues to work in a
non-healthy state after non-recoverable error happened then it can lead to
damage and irreversible loss of data. Non-recoverable errors should be
investigated and fixed only by system administrator with complete set of
system tools at hands.

Regards,

Alexey.



 Regards,

 Harald


 ___
 Gluster-devel mailing list
 Gluster-devel@nongnu.org
 http://lists.nongnu.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster health/status

2010-02-22 Thread Samuel Hassine
I'm also looking for a way to monitor gluster nodes. 

Any solutions ?

Le lundi 22 février 2010 à 10:12 +0500, Anton a écrit :
 Hello!
 
 
 
 I'm looking for the way to determine the health of the GLUSTER
 cluster. Is there any way to determine if any of the nodes failed? In
 the log files it is possible to grep that there is remotexx:
 disconnected - but it is not sutable for monitoring. There should be
 the simple way to just query the cluster against the .vol file and
 see, if any node/brick failed to attach and so trigger the alarm. Is
 there anything like gluster --reporthealth?
 
 
 
 Regards,
 Anton
 
 ___
 Gluster-devel mailing list
 Gluster-devel@nongnu.org
 http://lists.nongnu.org/mailman/listinfo/gluster-devel




___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster health/status

2010-02-22 Thread Alexey Filin
I think systems like nagios are enough good for general monitoring.
Glusterfs itself is a part of storage, so a monitoring system should monitor
e.g. disk health/RAID status too.

Advanced monitoring (requiring knowledge of glusterfs internals) could be
useful e.g. to define bottlenecks. AFAIK glusterfs developers welcome
contributions so somebody could be a pioneer by the question.

Regards,

Alexey.
___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster health/status

2010-02-22 Thread Harald Stürzebecher
2010/2/22 Samuel Hassine samuel.hass...@gmail.com:
 I'm also looking for a way to monitor gluster nodes.

 Any solutions ?

 Le lundi 22 février 2010 à 10:12 +0500, Anton a écrit :
 Hello!



 I'm looking for the way to determine the health of the GLUSTER
 cluster. Is there any way to determine if any of the nodes failed? In
 the log files it is possible to grep that there is remotexx:
 disconnected - but it is not sutable for monitoring. There should be
 the simple way to just query the cluster against the .vol file and
 see, if any node/brick failed to attach and so trigger the alarm. Is
 there anything like gluster --reporthealth?

Checking if a connection to the GlusterFS TCP server port (6996 IIRC)
is possible might be an indicator for working/failing - at least for
setups that use TCP. I don't know if anything like that is possible
for Infiniband-only setups.

IIRC, Nagios can check if a port is open on a remote machine. That
won't find something like disk/filesystem problems on the server, but
it could report crashed GlusterFS server processes and machines that
are not working at all.

I know that this simple method won't provide a positive status (=it
works) which would be preferable, but at least it can provide a
negative status (=_something_ failed on _that_ machine) in some cases.

@gluster.org:
IIRC, some time ago someone requested a syslog feature to debug
problems with GlusterFS as root filesystem for a diskless cluster -
are there any news on that?
Having the clients report problems to a central logging server might
be useful for monitoring.


Regards,

Harald


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster health/status

2010-02-22 Thread Raghavendra G
Hi all,

Here is some work related to Health monitoring. glfs-health.sh is a shell
script to check the health of glusterfs.
http://git.gluster.com/?p=users/avati/glfs-health.git;a=blob_plain;f=glfs-health.sh;hb=5bf3cb50452525f545018fa5f8eed06cb2fbbe7d

Documentation can be found from
http://git.gluster.com/?p=users/avati/glfs-health.git;a=blob_plain;f=README;hb=5bf3cb50452525f545018fa5f8eed06cb2fbbe7d

We welcome improvements and discussions on this.

regards,
2010/2/23 Harald Stürzebecher hara...@cs.tu-berlin.de

 2010/2/22 Samuel Hassine samuel.hass...@gmail.com:
  I'm also looking for a way to monitor gluster nodes.
 
  Any solutions ?
 
  Le lundi 22 février 2010 à 10:12 +0500, Anton a écrit :
  Hello!
 
 
 
  I'm looking for the way to determine the health of the GLUSTER
  cluster. Is there any way to determine if any of the nodes failed? In
  the log files it is possible to grep that there is remotexx:
  disconnected - but it is not sutable for monitoring. There should be
  the simple way to just query the cluster against the .vol file and
  see, if any node/brick failed to attach and so trigger the alarm. Is
  there anything like gluster --reporthealth?

 Checking if a connection to the GlusterFS TCP server port (6996 IIRC)
 is possible might be an indicator for working/failing - at least for
 setups that use TCP. I don't know if anything like that is possible
 for Infiniband-only setups.

 IIRC, Nagios can check if a port is open on a remote machine. That
 won't find something like disk/filesystem problems on the server, but
 it could report crashed GlusterFS server processes and machines that
 are not working at all.

 I know that this simple method won't provide a positive status (=it
 works) which would be preferable, but at least it can provide a
 negative status (=_something_ failed on _that_ machine) in some cases.

 @gluster.org:
 IIRC, some time ago someone requested a syslog feature to debug
 problems with GlusterFS as root filesystem for a diskless cluster -
 are there any news on that?
 Having the clients report problems to a central logging server might
 be useful for monitoring.


 Regards,

 Harald


 ___
 Gluster-devel mailing list
 Gluster-devel@nongnu.org
 http://lists.nongnu.org/mailman/listinfo/gluster-devel




-- 
Raghavendra G
___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel