Re: [Gluster-devel] Gluster health/status

2010-02-24 Thread Alexey Filin
2010/2/23 Harald Stürzebecher 

> 2010/2/22 Samuel Hassine :
> > I'm also looking for a way to monitor gluster nodes.
> >
> > Any solutions ?
> >
> > Le lundi 22 février 2010 à 10:12 +0500, Anton a écrit :
> >> Hello!
> >>
> >>
> >>
> >> I'm looking for the way to determine the health of the GLUSTER
> >> cluster. Is there any way to determine if any of the nodes failed? In
> >> the log files it is possible to grep that there is "remotexx:
> >> disconnected" - but it is not sutable for monitoring. There should be
> >> the simple way to just query the cluster against the .vol file and
> >> see, if any node/brick failed to attach and so trigger the alarm. Is
> >> there anything like "gluster --reporthealth"?
>
> Checking if a connection to the GlusterFS TCP server port (6996 IIRC)
> is possible might be an indicator for working/failing - at least for
> setups that use TCP. I don't know if anything like that is possible
> for Infiniband-only setups.
>
IPoIB (IP over Infiniband)?
>
>
> IIRC, Nagios can check if a port is open on a remote machine. That
> won't find something like disk/filesystem problems on the server, but
> it could report crashed GlusterFS server processes and machines that
> are not working at all.
>
nagios can run checks remotely

http://www.logix.cz/michal/devel/nagios/
http://blogs.techrepublic.com.com/opensource/?p=321

so it can check the real status of glusterfsd or whatever we want on remote
host

>
> I know that this simple method won't provide a positive status (=it
> works) which would be preferable, but at least it can provide a
> negative status (=_something_ failed on _that_ machine) in some cases.

glusterfsd port can be stolen, check of open port is indirect and unreliable
way to check status

>

@gluster.org:
> IIRC, some time ago someone requested a syslog feature to debug
> problems with GlusterFS as root filesystem for a diskless cluster -
> are there any news on that?
> Having the clients report problems to a central logging server might
> be useful for monitoring.
>
monitoring of glusterfs daemons from client side is unreliable as monitoring
errors can be caused by faults on the client side (I suppose nagios server
host(s) to be reliable host)

I insist on remote checks because
  1) glusterfsd should abort if non-recoverable error happened, in the case
remote check of real status is the most reliable check
  2) if glustefsd or any FS-related service continues to work in a
non-healthy state after non-recoverable error happened then it can lead to
damage and irreversible loss of data. Non-recoverable errors should be
investigated and fixed only by system administrator with complete set of
system tools at hands.

Regards,

Alexey.

>
>
> Regards,
>
> Harald
>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster health/status

2010-02-22 Thread Raghavendra G
Hi all,

Here is some work related to Health monitoring. glfs-health.sh is a shell
script to check the health of glusterfs.
http://git.gluster.com/?p=users/avati/glfs-health.git;a=blob_plain;f=glfs-health.sh;hb=5bf3cb50452525f545018fa5f8eed06cb2fbbe7d

Documentation can be found from
http://git.gluster.com/?p=users/avati/glfs-health.git;a=blob_plain;f=README;hb=5bf3cb50452525f545018fa5f8eed06cb2fbbe7d

We welcome improvements and discussions on this.

regards,
2010/2/23 Harald Stürzebecher 

> 2010/2/22 Samuel Hassine :
> > I'm also looking for a way to monitor gluster nodes.
> >
> > Any solutions ?
> >
> > Le lundi 22 février 2010 à 10:12 +0500, Anton a écrit :
> >> Hello!
> >>
> >>
> >>
> >> I'm looking for the way to determine the health of the GLUSTER
> >> cluster. Is there any way to determine if any of the nodes failed? In
> >> the log files it is possible to grep that there is "remotexx:
> >> disconnected" - but it is not sutable for monitoring. There should be
> >> the simple way to just query the cluster against the .vol file and
> >> see, if any node/brick failed to attach and so trigger the alarm. Is
> >> there anything like "gluster --reporthealth"?
>
> Checking if a connection to the GlusterFS TCP server port (6996 IIRC)
> is possible might be an indicator for working/failing - at least for
> setups that use TCP. I don't know if anything like that is possible
> for Infiniband-only setups.
>
> IIRC, Nagios can check if a port is open on a remote machine. That
> won't find something like disk/filesystem problems on the server, but
> it could report crashed GlusterFS server processes and machines that
> are not working at all.
>
> I know that this simple method won't provide a positive status (=it
> works) which would be preferable, but at least it can provide a
> negative status (=_something_ failed on _that_ machine) in some cases.
>
> @gluster.org:
> IIRC, some time ago someone requested a syslog feature to debug
> problems with GlusterFS as root filesystem for a diskless cluster -
> are there any news on that?
> Having the clients report problems to a central logging server might
> be useful for monitoring.
>
>
> Regards,
>
> Harald
>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>



-- 
Raghavendra G
___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster health/status

2010-02-22 Thread Harald Stürzebecher
2010/2/22 Samuel Hassine :
> I'm also looking for a way to monitor gluster nodes.
>
> Any solutions ?
>
> Le lundi 22 février 2010 à 10:12 +0500, Anton a écrit :
>> Hello!
>>
>>
>>
>> I'm looking for the way to determine the health of the GLUSTER
>> cluster. Is there any way to determine if any of the nodes failed? In
>> the log files it is possible to grep that there is "remotexx:
>> disconnected" - but it is not sutable for monitoring. There should be
>> the simple way to just query the cluster against the .vol file and
>> see, if any node/brick failed to attach and so trigger the alarm. Is
>> there anything like "gluster --reporthealth"?

Checking if a connection to the GlusterFS TCP server port (6996 IIRC)
is possible might be an indicator for working/failing - at least for
setups that use TCP. I don't know if anything like that is possible
for Infiniband-only setups.

IIRC, Nagios can check if a port is open on a remote machine. That
won't find something like disk/filesystem problems on the server, but
it could report crashed GlusterFS server processes and machines that
are not working at all.

I know that this simple method won't provide a positive status (=it
works) which would be preferable, but at least it can provide a
negative status (=_something_ failed on _that_ machine) in some cases.

@gluster.org:
IIRC, some time ago someone requested a syslog feature to debug
problems with GlusterFS as root filesystem for a diskless cluster -
are there any news on that?
Having the clients report problems to a central logging server might
be useful for monitoring.


Regards,

Harald


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster health/status

2010-02-22 Thread Alexey Filin
I think systems like nagios are enough good for general monitoring.
Glusterfs itself is a part of storage, so a monitoring system should monitor
e.g. disk health/RAID status too.

Advanced monitoring (requiring knowledge of glusterfs internals) could be
useful e.g. to define bottlenecks. AFAIK glusterfs developers welcome
contributions so somebody could be a pioneer by the question.

Regards,

Alexey.
___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster health/status

2010-02-22 Thread Samuel Hassine
I'm also looking for a way to monitor gluster nodes. 

Any solutions ?

Le lundi 22 février 2010 à 10:12 +0500, Anton a écrit :
> Hello!
> 
> 
> 
> I'm looking for the way to determine the health of the GLUSTER
> cluster. Is there any way to determine if any of the nodes failed? In
> the log files it is possible to grep that there is "remotexx:
> disconnected" - but it is not sutable for monitoring. There should be
> the simple way to just query the cluster against the .vol file and
> see, if any node/brick failed to attach and so trigger the alarm. Is
> there anything like "gluster --reporthealth"?
> 
> 
> 
> Regards,
> Anton
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel




___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel