Hello,
I have a HBase cluster chock-full of data and would like to run canned reports
(i.e.,
reports known ahead of time), but also ad-hoc reports against that data.
Are there any open-source or commercial tools one can use?
Here's what I *think* I know so far, but please correct me wherever I wrong, so
I don't spread false info:
* Use HBase-Hive Integration
Pluses:
- lots of tools to query Hive are available
Minuses:
- data duplication
- Hive's copy of data is always behind
- I heard the integration is fairly alpha (e.g. you can't copy deltas to
Hive, you have to copy all data every time you want to update your Hive store)
* Use Pig
https://issues.apache.org/jira/browse/PIG-970
https://issues.apache.org/jira/browse/PIG-1205
Pluses:
- runs directly against HBase, no need to copy data
Minuses:
- PigLatin learning curve - in my case people wanting ad-hoc reports are
not
techies
- No pretty front-end with syntax highlighting or visual querying or that
accepts SQL and translates it to PigLatin
* Use PigPen
Pluses:
- Visual == easy
Minuses:
- Looks abandoned justing by http://search-hadoop.com/m/Noacz1MECC7 and
https://issues.apache.org/jira/browse/PIG-366
* Use Toad for Cloud
Pluses:
- accepts SQL, runs, and returns data
- runs directly against HBase, no need to copy data
Minuses:
- some people reported it crashes
- it allows the person querying the data to also modify the data, which is
bad in my environment
* Datameer DAS, Karmasphere Analyst, Pentaho, Beeswax -- they all seem to be
able to get the
data out of Hive, but not out of HBase. More info below:
* Pentaho
* http://www.pentaho.com/products/hadoop/ - looks like it supports only Hive
* http://forums.pentaho.com/showthread.php?77926-HBase-and-ETL
* http://search-hadoop.com/?q=pentaho&src=moz-search
* Datameer
* http://wiki.datameer.com/display/DAS1/DAS+Supported+Platforms - looks
like
it
supports only Hive
* http://wiki.datameer.com/display/DAS11/Using+the+Plug-in+SDK - looks like
one
can add support for HBase by writing a plugin?
Karmasphere Analyst
* http://www.karmasphere.com/Products-Information/karmasphere-analyst.html
-
Hive only
Is any of the above incorrect?
Did I miss a tool, free or non-free, that I could use to run ad-hoc reports
against data in HBase?
Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase
Hadoop ecosystem search :: http://search-hadoop.com/