Re: [Gluster-devel] Gluster health/status
2010/2/23 Harald Stürzebecher > 2010/2/22 Samuel Hassine : > > I'm also looking for a way to monitor gluster nodes. > > > > Any solutions ? > > > > Le lundi 22 février 2010 à 10:12 +0500, Anton a écrit : > >> Hello! > >> > >> > >> > >> I'm looking for the way to determine the health of the GLUSTER > >> cluster. Is there any way to determine if any of the nodes failed? In > >> the log files it is possible to grep that there is "remotexx: > >> disconnected" - but it is not sutable for monitoring. There should be > >> the simple way to just query the cluster against the .vol file and > >> see, if any node/brick failed to attach and so trigger the alarm. Is > >> there anything like "gluster --reporthealth"? > > Checking if a connection to the GlusterFS TCP server port (6996 IIRC) > is possible might be an indicator for working/failing - at least for > setups that use TCP. I don't know if anything like that is possible > for Infiniband-only setups. > IPoIB (IP over Infiniband)? > > > IIRC, Nagios can check if a port is open on a remote machine. That > won't find something like disk/filesystem problems on the server, but > it could report crashed GlusterFS server processes and machines that > are not working at all. > nagios can run checks remotely http://www.logix.cz/michal/devel/nagios/ http://blogs.techrepublic.com.com/opensource/?p=321 so it can check the real status of glusterfsd or whatever we want on remote host > > I know that this simple method won't provide a positive status (=it > works) which would be preferable, but at least it can provide a > negative status (=_something_ failed on _that_ machine) in some cases. glusterfsd port can be stolen, check of open port is indirect and unreliable way to check status > @gluster.org: > IIRC, some time ago someone requested a syslog feature to debug > problems with GlusterFS as root filesystem for a diskless cluster - > are there any news on that? > Having the clients report problems to a central logging server might > be useful for monitoring. > monitoring of glusterfs daemons from client side is unreliable as monitoring errors can be caused by faults on the client side (I suppose nagios server host(s) to be reliable host) I insist on remote checks because 1) glusterfsd should abort if non-recoverable error happened, in the case remote check of real status is the most reliable check 2) if glustefsd or any FS-related service continues to work in a non-healthy state after non-recoverable error happened then it can lead to damage and irreversible loss of data. Non-recoverable errors should be investigated and fixed only by system administrator with complete set of system tools at hands. Regards, Alexey. > > > Regards, > > Harald > > > ___ > Gluster-devel mailing list > Gluster-devel@nongnu.org > http://lists.nongnu.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Gluster health/status
Hi all, Here is some work related to Health monitoring. glfs-health.sh is a shell script to check the health of glusterfs. http://git.gluster.com/?p=users/avati/glfs-health.git;a=blob_plain;f=glfs-health.sh;hb=5bf3cb50452525f545018fa5f8eed06cb2fbbe7d Documentation can be found from http://git.gluster.com/?p=users/avati/glfs-health.git;a=blob_plain;f=README;hb=5bf3cb50452525f545018fa5f8eed06cb2fbbe7d We welcome improvements and discussions on this. regards, 2010/2/23 Harald Stürzebecher > 2010/2/22 Samuel Hassine : > > I'm also looking for a way to monitor gluster nodes. > > > > Any solutions ? > > > > Le lundi 22 février 2010 à 10:12 +0500, Anton a écrit : > >> Hello! > >> > >> > >> > >> I'm looking for the way to determine the health of the GLUSTER > >> cluster. Is there any way to determine if any of the nodes failed? In > >> the log files it is possible to grep that there is "remotexx: > >> disconnected" - but it is not sutable for monitoring. There should be > >> the simple way to just query the cluster against the .vol file and > >> see, if any node/brick failed to attach and so trigger the alarm. Is > >> there anything like "gluster --reporthealth"? > > Checking if a connection to the GlusterFS TCP server port (6996 IIRC) > is possible might be an indicator for working/failing - at least for > setups that use TCP. I don't know if anything like that is possible > for Infiniband-only setups. > > IIRC, Nagios can check if a port is open on a remote machine. That > won't find something like disk/filesystem problems on the server, but > it could report crashed GlusterFS server processes and machines that > are not working at all. > > I know that this simple method won't provide a positive status (=it > works) which would be preferable, but at least it can provide a > negative status (=_something_ failed on _that_ machine) in some cases. > > @gluster.org: > IIRC, some time ago someone requested a syslog feature to debug > problems with GlusterFS as root filesystem for a diskless cluster - > are there any news on that? > Having the clients report problems to a central logging server might > be useful for monitoring. > > > Regards, > > Harald > > > ___ > Gluster-devel mailing list > Gluster-devel@nongnu.org > http://lists.nongnu.org/mailman/listinfo/gluster-devel > -- Raghavendra G ___ Gluster-devel mailing list Gluster-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Gluster health/status
2010/2/22 Samuel Hassine : > I'm also looking for a way to monitor gluster nodes. > > Any solutions ? > > Le lundi 22 février 2010 à 10:12 +0500, Anton a écrit : >> Hello! >> >> >> >> I'm looking for the way to determine the health of the GLUSTER >> cluster. Is there any way to determine if any of the nodes failed? In >> the log files it is possible to grep that there is "remotexx: >> disconnected" - but it is not sutable for monitoring. There should be >> the simple way to just query the cluster against the .vol file and >> see, if any node/brick failed to attach and so trigger the alarm. Is >> there anything like "gluster --reporthealth"? Checking if a connection to the GlusterFS TCP server port (6996 IIRC) is possible might be an indicator for working/failing - at least for setups that use TCP. I don't know if anything like that is possible for Infiniband-only setups. IIRC, Nagios can check if a port is open on a remote machine. That won't find something like disk/filesystem problems on the server, but it could report crashed GlusterFS server processes and machines that are not working at all. I know that this simple method won't provide a positive status (=it works) which would be preferable, but at least it can provide a negative status (=_something_ failed on _that_ machine) in some cases. @gluster.org: IIRC, some time ago someone requested a syslog feature to debug problems with GlusterFS as root filesystem for a diskless cluster - are there any news on that? Having the clients report problems to a central logging server might be useful for monitoring. Regards, Harald ___ Gluster-devel mailing list Gluster-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Gluster health/status
I think systems like nagios are enough good for general monitoring. Glusterfs itself is a part of storage, so a monitoring system should monitor e.g. disk health/RAID status too. Advanced monitoring (requiring knowledge of glusterfs internals) could be useful e.g. to define bottlenecks. AFAIK glusterfs developers welcome contributions so somebody could be a pioneer by the question. Regards, Alexey. ___ Gluster-devel mailing list Gluster-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Gluster health/status
I'm also looking for a way to monitor gluster nodes. Any solutions ? Le lundi 22 février 2010 à 10:12 +0500, Anton a écrit : > Hello! > > > > I'm looking for the way to determine the health of the GLUSTER > cluster. Is there any way to determine if any of the nodes failed? In > the log files it is possible to grep that there is "remotexx: > disconnected" - but it is not sutable for monitoring. There should be > the simple way to just query the cluster against the .vol file and > see, if any node/brick failed to attach and so trigger the alarm. Is > there anything like "gluster --reporthealth"? > > > > Regards, > Anton > > ___ > Gluster-devel mailing list > Gluster-devel@nongnu.org > http://lists.nongnu.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/gluster-devel