Re: Ad-hoc reports against HBase - any way? any tools?

Jean-Daniel Cryans Fri, 25 Feb 2011 14:50:27 -0800

You may want to use a scheduler and assign a lower
priority/capacity/whatnot to hive queries.


The cluster we're querying against does mainly data ingestion and that
data is used for reporting, so adding hive in the mix is "normal" for
our use case.

J-D

On Fri, Feb 25, 2011 at 2:39 PM, Otis Gospodnetic
<[email protected]> wrote:
> Ah, I have another question:
>
> When you have HBase-Hive integration in place, how do you control how queries
> that come in via Hive affect the HBase cluster?
> Consider a HBase cluster whose primary task is to ingest data, process it with
> MR jobs, and store it back in some table(s).  That's what the cluster does
> today.
> Now if we add HBase-Hive to the mix and people start writing HQL and that runs
> MR jobs against data in HBase, this will affect the performance of those data
> ingestion jobs.
>
> How do you deal with that?
> Are there ways to maybe split the cluster in such a way that HQL-triggered MR
> jobs run only on some set of nodes, while MR jobs that are part of ingestion
> process run on a disjoint set of nodes?  Yes, have Hive's nodes see the new 
> data
> that continuously gets ingested.
>
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase
> Hadoop ecosystem search :: http://search-hadoop.com/
>
>
>
> ----- Original Message ----
>> From: Jean-Daniel Cryans <[email protected]>
>> To: [email protected]
>> Sent: Fri, February 25, 2011 5:04:31 PM
>> Subject: Re: Ad-hoc reports against HBase - any way? any tools?
>>
>> HIVE-1634 will be a serious limitation if anything you store is in
>> binary and  you don't want to patch hive. There's also some skew issues
>> I have yet to  investigate that may be due to the hbase integration...
>> or not. Apart from  that, our internal users are pretty happy.
>>
>> J-D
>>
>> On Fri, Feb 25,  2011 at 1:54 PM, Otis Gospodnetic
>> <[email protected]>  wrote:
>> > Hi J-D,
>> >
>> > Yes, I'm interested in HBase-Hive  integration.
>> > Thanks for the pointer to the external tables.  I was aware  of that at 
>> > some
>> > point, but for some reason started thinking that data  copying is 
>> > necessary.
>> >
>> > Are there any gotchas or serious  limitations around this integration?
>> >
>> > Thanks,
>> >  Otis
>> >
>> >
>> >
>> >
>> >
>> > ----- Original Message  ----
>> >> From: Jean-Daniel Cryans <[email protected]>
>> >> To: [email protected]
>> >> Sent:  Fri, February 25, 2011 4:17:09 PM
>> >> Subject: Re: Ad-hoc reports  against HBase - any way? any tools?
>> >>
>> >> We use the  HBase+Hive integration here for ad-hoc queries, I don't
>> >> understand   the data duplication you're talking about... when you
>> >> create an  external  table you can directly query your existing tables.
>> >> We run  with the latest  patch posted in HIVE-1634 since we have a lot
>> >> of  binary values and I made a  very very hacky patch to be able to use
>> >>  our binary composite row  keys.
>> >>
>> >> I'll be happy to give  you more details if you want to try going  down
> that
>> >>road.
>> >>
>> >> J-D
>> >>
>> >> On  Fri, Feb 25, 2011 at 1:02 PM, Otis  Gospodnetic
>> >> <[email protected]>   wrote:
>> >> > Hello,
>> >> >
>> >> > I have a HBase  cluster chock-full of data  and would like to run  
>> >> > canned
>> >>reports
>> >> > (i.e.,
>> >> >
>> >>  > reports  known ahead of time), but also ad-hoc reports against that
>>data.
>> >> > Are  there any open-source or commercial tools one can  use?
>> >> >
>> >> > Here's  what I *think* I know so far, but  please correct me wherever I
>>wrong,
>> >>
>> >>so
>> >> >   I don't spread false info:
>> >> >
>> >> > * Use HBase-Hive  Integration
>> >> >   Pluses:
>> >> >    - lots of tools to  query Hive are available
>> >> >   Minuses:
>> >> >    - data  duplication
>> >> >    - Hive's copy of data is  always  behind
>> >> >    - I heard the integration is fairly alpha (e.g. you   can't copy 
>> >> > deltas
>>to
>> >> > Hive, you have to copy all data every  time you want  to update your 
>> >> > Hive
>> >>store)
>> >>  >
>> >> > * Use Pig
>> >> >  https://issues.apache.org/jira/browse/PIG-970
>> >> >  https://issues.apache.org/jira/browse/PIG-1205
>> >> >   Pluses:
>> >> >     - runs directly against HBase, no need to copy  data
>> >> >  Minuses:
>> >> >     - PigLatin learning curve -  in my case people wanting ad-hoc 
>> >> > reports
>>are
>> >>
>> >>not
>> >> >
>> >> >  techies
>> >> >    - No pretty front-end with syntax  highlighting or  visual querying 
>> >> > or
>> > that
>> >> > accepts SQL and translates it  to  PigLatin
>> >> >
>> >> > * Use PigPen
>> >> >   Pluses:
>> >> >    - Visual ==  easy
>> >> >   Minuses:
>> >> >    - Looks abandoned justing by http://search-hadoop.com/m/Noacz1MECC7
> and
>> >> > https://issues.apache.org/jira/browse/PIG-366
>> >> >
>> >>  > * Use Toad  for Cloud
>> >> >  Pluses:
>> >> >    -  accepts SQL, runs, and returns  data
>> >> >    - runs directly against  HBase, no need to copy data
>> >> >   Minuses:
>> >> >    -  some people reported it crashes
>> >> >    - it allows  the person  querying the data to also modify the data,
>>which
>> >>is
>> >> >  bad in my  environment
>> >> >
>> >> > * Datameer DAS,  Karmasphere Analyst, Pentaho,  Beeswax -- they all seem
>>to
>> >  be
>> >> > able to get the
>> >> >
>> >> > data out   of Hive, but not out of HBase.  More info below:
>> >> >
>> >>  > *  Pentaho
>> >> >    * http://www.pentaho.com/products/hadoop/ - looks like it  supports
>>only
>> >>Hive
>> >> >    * http://forums.pentaho.com/showthread.php?77926-HBase-and-ETL
>> >>  >    * http://search-hadoop.com/?q=pentaho&src=moz-search
>> >>  >
>> >> > *  Datameer
>> >> >    * http://wiki.datameer.com/display/DAS1/DAS+Supported+Platforms -
> looks
>> >>like
>> >> > it
>> >> >
>> >> >  supports only Hive
>> >> >    * http://wiki.datameer.com/display/DAS11/Using+the+Plug-in+SDK - 
>> >> > looks
>> >>like
>> >> > one
>> >> >
>> >> >  can add support for HBase by writing a  plugin?
>> >> >
>> >>  > Karmasphere Analyst
>> >> >    *
>>http://www.karmasphere.com/Products-Information/karmasphere-analyst.html
>> >>
>> >>-
>> >>  >
>> >> > Hive only
>> >> >
>> >> >
>> >>  > Is any of the above  incorrect?
>> >> > Did I miss a tool, free or  non-free, that I could use to run  ad-hoc
>>reports
>> >> > against data  in HBase?
>> >> >
>> >> > Thanks,
>> >> >   Otis
>> >> > ----
>> >> > Sematext :: http://sematext.com/ :: Solr -  Lucene - Hadoop -  HBase
>> >> > Hadoop ecosystem search :: http://search-hadoop.com/
>> >> >
>> >>  >
>> >>
>> >
>> >
>>
>

Re: Ad-hoc reports against HBase - any way? any tools?

Reply via email to