Hi fellow HBase users, I hope you don't mind if I make a quick announcement somewhat related to HBase, as OpenTSDB was finally released today!
OpenTSDB is a distributed, scalable Time Series Database (TSDB) written on top of HBase. OpenTSDB was written to address a common need: store, index and serve metrics collected from computer systems (network gear, operating systems, applications) at a large scale, and make this data easily accessible and graphable. Thanks to HBase's scalability, OpenTSDB allows you to collect many thousands of metrics from thousands of hosts and applications, at a high rate (every few seconds). OpenTSDB will never delete or downsample data and can easily store billions of data points. As a matter of fact, at StumbleUpon we use it to keep track of hundred of thousands of time series and we collect over 100 million data points per day in our main production cluster. If you use HBase then presumably you have a bunch of machines and you need to monitor them somehow. OpenTSDB can help you store fine-grained data about your machines and your applications. At StumbleUpon, we also use OpenTSDB to keep track of HBase metrics (from GC activity to request rate or region count per region server, latency distributions across the board for all RPCs sent from our site to HBase, that kinda stuff). This allows us to understand the performance of our clusters, think about capacity planning and other things like that. I think OpenTSDB shows that it is possible to build reliable, scalable, distributed applications on top of HBase. When HBase replication and coprocessors are "production ready", HBase is really going to provide a platform with a lot of the key features that will enable engineers to easily write "Google scale" applications. -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com
