HIVE-1634 will be a serious limitation if anything you store is in
binary and you don't want to patch hive. There's also some skew issues
I have yet to investigate that may be due to the hbase integration...
or not. Apart from that, our internal users are pretty happy.

J-D

On Fri, Feb 25, 2011 at 1:54 PM, Otis Gospodnetic
<[email protected]> wrote:
> Hi J-D,
>
> Yes, I'm interested in HBase-Hive integration.
> Thanks for the pointer to the external tables.  I was aware of that at some
> point, but for some reason started thinking that data copying is necessary.
>
> Are there any gotchas or serious limitations around this integration?
>
> Thanks,
> Otis
>
>
>
>
>
> ----- Original Message ----
>> From: Jean-Daniel Cryans <[email protected]>
>> To: [email protected]
>> Sent: Fri, February 25, 2011 4:17:09 PM
>> Subject: Re: Ad-hoc reports against HBase - any way? any tools?
>>
>> We use the HBase+Hive integration here for ad-hoc queries, I don't
>> understand  the data duplication you're talking about... when you
>> create an external  table you can directly query your existing tables.
>> We run with the latest  patch posted in HIVE-1634 since we have a lot
>> of binary values and I made a  very very hacky patch to be able to use
>> our binary composite row  keys.
>>
>> I'll be happy to give you more details if you want to try going  down that
>>road.
>>
>> J-D
>>
>> On Fri, Feb 25, 2011 at 1:02 PM, Otis  Gospodnetic
>> <[email protected]>  wrote:
>> > Hello,
>> >
>> > I have a HBase cluster chock-full of data  and would like to run canned
>>reports
>> > (i.e.,
>> >
>> > reports  known ahead of time), but also ad-hoc reports against that data.
>> > Are  there any open-source or commercial tools one can use?
>> >
>> > Here's  what I *think* I know so far, but please correct me wherever I 
>> > wrong,
>>
>>so
>> >  I don't spread false info:
>> >
>> > * Use HBase-Hive Integration
>> >   Pluses:
>> >    - lots of tools to query Hive are available
>> >   Minuses:
>> >    - data duplication
>> >    - Hive's copy of data is  always behind
>> >    - I heard the integration is fairly alpha (e.g. you  can't copy deltas 
>> > to
>> > Hive, you have to copy all data every time you want  to update your Hive
>>store)
>> >
>> > * Use Pig
>> >  https://issues.apache.org/jira/browse/PIG-970
>> >  https://issues.apache.org/jira/browse/PIG-1205
>> >  Pluses:
>> >     - runs directly against HBase, no need to copy data
>> >  Minuses:
>> >     - PigLatin learning curve - in my case people wanting ad-hoc reports 
>> > are
>>
>>not
>> >
>> > techies
>> >    - No pretty front-end with syntax  highlighting or visual querying or
> that
>> > accepts SQL and translates it to  PigLatin
>> >
>> > * Use PigPen
>> >  Pluses:
>> >    - Visual ==  easy
>> >  Minuses:
>> >    - Looks abandoned justing by http://search-hadoop.com/m/Noacz1MECC7 and
>> > https://issues.apache.org/jira/browse/PIG-366
>> >
>> > * Use Toad  for Cloud
>> >  Pluses:
>> >    - accepts SQL, runs, and returns  data
>> >    - runs directly against HBase, no need to copy data
>> >   Minuses:
>> >    - some people reported it crashes
>> >    - it allows  the person querying the data to also modify the data, which
>>is
>> > bad in my  environment
>> >
>> > * Datameer DAS, Karmasphere Analyst, Pentaho,  Beeswax -- they all seem to
> be
>> > able to get the
>> >
>> > data out  of Hive, but not out of HBase.  More info below:
>> >
>> > *  Pentaho
>> >    * http://www.pentaho.com/products/hadoop/ - looks like it  supports only
>>Hive
>> >    * http://forums.pentaho.com/showthread.php?77926-HBase-and-ETL
>> >    * http://search-hadoop.com/?q=pentaho&src=moz-search
>> >
>> > *  Datameer
>> >    * http://wiki.datameer.com/display/DAS1/DAS+Supported+Platforms - looks
>>like
>> > it
>> >
>> > supports only Hive
>> >    * http://wiki.datameer.com/display/DAS11/Using+the+Plug-in+SDK - looks
>>like
>> > one
>> >
>> > can add support for HBase by writing a  plugin?
>> >
>> > Karmasphere Analyst
>> >    * 
>> > http://www.karmasphere.com/Products-Information/karmasphere-analyst.html
>>
>>-
>> >
>> > Hive only
>> >
>> >
>> > Is any of the above  incorrect?
>> > Did I miss a tool, free or non-free, that I could use to run  ad-hoc 
>> > reports
>> > against data in HBase?
>> >
>> > Thanks,
>> >  Otis
>> > ----
>> > Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop -  HBase
>> > Hadoop ecosystem search :: http://search-hadoop.com/
>> >
>> >
>>
>
>

Reply via email to