Hi J-D, Yes, I'm interested in HBase-Hive integration. Thanks for the pointer to the external tables. I was aware of that at some point, but for some reason started thinking that data copying is necessary.
Are there any gotchas or serious limitations around this integration? Thanks, Otis ----- Original Message ---- > From: Jean-Daniel Cryans <[email protected]> > To: [email protected] > Sent: Fri, February 25, 2011 4:17:09 PM > Subject: Re: Ad-hoc reports against HBase - any way? any tools? > > We use the HBase+Hive integration here for ad-hoc queries, I don't > understand the data duplication you're talking about... when you > create an external table you can directly query your existing tables. > We run with the latest patch posted in HIVE-1634 since we have a lot > of binary values and I made a very very hacky patch to be able to use > our binary composite row keys. > > I'll be happy to give you more details if you want to try going down that >road. > > J-D > > On Fri, Feb 25, 2011 at 1:02 PM, Otis Gospodnetic > <[email protected]> wrote: > > Hello, > > > > I have a HBase cluster chock-full of data and would like to run canned >reports > > (i.e., > > > > reports known ahead of time), but also ad-hoc reports against that data. > > Are there any open-source or commercial tools one can use? > > > > Here's what I *think* I know so far, but please correct me wherever I > > wrong, > >so > > I don't spread false info: > > > > * Use HBase-Hive Integration > > Pluses: > > - lots of tools to query Hive are available > > Minuses: > > - data duplication > > - Hive's copy of data is always behind > > - I heard the integration is fairly alpha (e.g. you can't copy deltas to > > Hive, you have to copy all data every time you want to update your Hive >store) > > > > * Use Pig > > https://issues.apache.org/jira/browse/PIG-970 > > https://issues.apache.org/jira/browse/PIG-1205 > > Pluses: > > - runs directly against HBase, no need to copy data > > Minuses: > > - PigLatin learning curve - in my case people wanting ad-hoc reports > > are > >not > > > > techies > > - No pretty front-end with syntax highlighting or visual querying or that > > accepts SQL and translates it to PigLatin > > > > * Use PigPen > > Pluses: > > - Visual == easy > > Minuses: > > - Looks abandoned justing by http://search-hadoop.com/m/Noacz1MECC7 and > > https://issues.apache.org/jira/browse/PIG-366 > > > > * Use Toad for Cloud > > Pluses: > > - accepts SQL, runs, and returns data > > - runs directly against HBase, no need to copy data > > Minuses: > > - some people reported it crashes > > - it allows the person querying the data to also modify the data, which >is > > bad in my environment > > > > * Datameer DAS, Karmasphere Analyst, Pentaho, Beeswax -- they all seem to be > > able to get the > > > > data out of Hive, but not out of HBase. More info below: > > > > * Pentaho > > * http://www.pentaho.com/products/hadoop/ - looks like it supports only >Hive > > * http://forums.pentaho.com/showthread.php?77926-HBase-and-ETL > > * http://search-hadoop.com/?q=pentaho&src=moz-search > > > > * Datameer > > * http://wiki.datameer.com/display/DAS1/DAS+Supported+Platforms - looks >like > > it > > > > supports only Hive > > * http://wiki.datameer.com/display/DAS11/Using+the+Plug-in+SDK - looks >like > > one > > > > can add support for HBase by writing a plugin? > > > > Karmasphere Analyst > > * > > http://www.karmasphere.com/Products-Information/karmasphere-analyst.html > >- > > > > Hive only > > > > > > Is any of the above incorrect? > > Did I miss a tool, free or non-free, that I could use to run ad-hoc reports > > against data in HBase? > > > > Thanks, > > Otis > > ---- > > Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase > > Hadoop ecosystem search :: http://search-hadoop.com/ > > > > >
