There is only one source file with original code in it, the
HbaseThriftTesterMain.java file.
I've checked in a new version of that file with an Apache grant/license.
If it were to be included in HBase in any fashion, I think the best way
that could be done would be to modify the Thrift server to have an
optional HTTP port it could listen on for health checks. That way it
would be easy for a non-thrift client such as Nagios or a load balancer
to still query the Thrift server for general health.
-Daniel
On 9/9/10 5:39 PM, Ryan Rawson wrote:
This looks very cool.
If you were able to offer an ASF grant we could include this in hbase.
-ryan
On Thu, Sep 9, 2010 at 5:36 PM, Daniel Einspanjer
<[email protected]> wrote:
Cross posting my recent blog entry...
As documented in THRIFT-601, sending random data to Thrift can cause it to
leak memory.
At Mozilla, we use a web load balancer to distribute traffic to our Thrift
machines, and the default liveness check it uses is a simple TCP connect. We
also had Nagios performing TCP connect checks on these nodes for general
alerting.
All these connects were causing the Thrift servers to start generating OOM
errors sometimes as quickly as a few days after being started.
I wrote a test utility that performs a legitimate Thrift API call (it
actually tries to get the schema of the .META. table) and returns a success
if it can execute the call.
The utility can either run from the command line, or it can use the
lightweight HTTP server class that is part of the Sun JRE 6 and it will
listen for a request to /thrift/health and report back the status.
$ java -jar HbaseThriftTester.jar
Missing required option: [-check Immediately checks the following host:port
combinations and returns a summary message with an exit value of the number
of failures., -listen Run as an HTTP daemon listening on port. Checks the
hosts every time /thrift/health URL is requested.]
usage: HbaseThriftTester [-timeout<ms>]<mode> <host:port>...
-check Immediately checks the following host:port
combinations and returns a summary message with an
exit value of the number of failures.
-listen<port> Run as an HTTP daemon listening on port. Checks the
hosts every time /thrift/health URL is requested.
-timeout<seconds> Number of seconds to wait for Thrift call to
complete
The app is bundled up using one-jar so it is simple and easy to call from
within a Nagios script or some-such. Maybe it will be useful to someone
else. Just pull down the project then build with ant.
https://svn.mozilla.org/metrics/hadoop/hbase/HbaseThriftTester/
-Daniel