HIVE-1634 will be a serious limitation if anything you store is in binary and you don't want to patch hive. There's also some skew issues I have yet to investigate that may be due to the hbase integration... or not. Apart from that, our internal users are pretty happy.
J-D On Fri, Feb 25, 2011 at 1:54 PM, Otis Gospodnetic <[email protected]> wrote: > Hi J-D, > > Yes, I'm interested in HBase-Hive integration. > Thanks for the pointer to the external tables. I was aware of that at some > point, but for some reason started thinking that data copying is necessary. > > Are there any gotchas or serious limitations around this integration? > > Thanks, > Otis > > > > > > ----- Original Message ---- >> From: Jean-Daniel Cryans <[email protected]> >> To: [email protected] >> Sent: Fri, February 25, 2011 4:17:09 PM >> Subject: Re: Ad-hoc reports against HBase - any way? any tools? >> >> We use the HBase+Hive integration here for ad-hoc queries, I don't >> understand the data duplication you're talking about... when you >> create an external table you can directly query your existing tables. >> We run with the latest patch posted in HIVE-1634 since we have a lot >> of binary values and I made a very very hacky patch to be able to use >> our binary composite row keys. >> >> I'll be happy to give you more details if you want to try going down that >>road. >> >> J-D >> >> On Fri, Feb 25, 2011 at 1:02 PM, Otis Gospodnetic >> <[email protected]> wrote: >> > Hello, >> > >> > I have a HBase cluster chock-full of data and would like to run canned >>reports >> > (i.e., >> > >> > reports known ahead of time), but also ad-hoc reports against that data. >> > Are there any open-source or commercial tools one can use? >> > >> > Here's what I *think* I know so far, but please correct me wherever I >> > wrong, >> >>so >> > I don't spread false info: >> > >> > * Use HBase-Hive Integration >> > Pluses: >> > - lots of tools to query Hive are available >> > Minuses: >> > - data duplication >> > - Hive's copy of data is always behind >> > - I heard the integration is fairly alpha (e.g. you can't copy deltas >> > to >> > Hive, you have to copy all data every time you want to update your Hive >>store) >> > >> > * Use Pig >> > https://issues.apache.org/jira/browse/PIG-970 >> > https://issues.apache.org/jira/browse/PIG-1205 >> > Pluses: >> > - runs directly against HBase, no need to copy data >> > Minuses: >> > - PigLatin learning curve - in my case people wanting ad-hoc reports >> > are >> >>not >> > >> > techies >> > - No pretty front-end with syntax highlighting or visual querying or > that >> > accepts SQL and translates it to PigLatin >> > >> > * Use PigPen >> > Pluses: >> > - Visual == easy >> > Minuses: >> > - Looks abandoned justing by http://search-hadoop.com/m/Noacz1MECC7 and >> > https://issues.apache.org/jira/browse/PIG-366 >> > >> > * Use Toad for Cloud >> > Pluses: >> > - accepts SQL, runs, and returns data >> > - runs directly against HBase, no need to copy data >> > Minuses: >> > - some people reported it crashes >> > - it allows the person querying the data to also modify the data, which >>is >> > bad in my environment >> > >> > * Datameer DAS, Karmasphere Analyst, Pentaho, Beeswax -- they all seem to > be >> > able to get the >> > >> > data out of Hive, but not out of HBase. More info below: >> > >> > * Pentaho >> > * http://www.pentaho.com/products/hadoop/ - looks like it supports only >>Hive >> > * http://forums.pentaho.com/showthread.php?77926-HBase-and-ETL >> > * http://search-hadoop.com/?q=pentaho&src=moz-search >> > >> > * Datameer >> > * http://wiki.datameer.com/display/DAS1/DAS+Supported+Platforms - looks >>like >> > it >> > >> > supports only Hive >> > * http://wiki.datameer.com/display/DAS11/Using+the+Plug-in+SDK - looks >>like >> > one >> > >> > can add support for HBase by writing a plugin? >> > >> > Karmasphere Analyst >> > * >> > http://www.karmasphere.com/Products-Information/karmasphere-analyst.html >> >>- >> > >> > Hive only >> > >> > >> > Is any of the above incorrect? >> > Did I miss a tool, free or non-free, that I could use to run ad-hoc >> > reports >> > against data in HBase? >> > >> > Thanks, >> > Otis >> > ---- >> > Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase >> > Hadoop ecosystem search :: http://search-hadoop.com/ >> > >> > >> > >
