RE: Ad-hoc reports against HBase - any way? any tools?

Peter Haidinyak Fri, 25 Feb 2011 14:50:20 -0800

Sorry to jump in here but does HBase use Map/Reduce under the covers? I was 
under the impression that HBase used the DFS of Hadoop but not Map/Reduce.


Thanks

-Pete

-----Original Message-----
From: Otis Gospodnetic [mailto:[email protected]] 
Sent: Friday, February 25, 2011 2:39 PM
To: [email protected]
Subject: Re: Ad-hoc reports against HBase - any way? any tools?

Ah, I have another question:

When you have HBase-Hive integration in place, how do you control how queries 
that come in via Hive affect the HBase cluster?
Consider a HBase cluster whose primary task is to ingest data, process it with 
MR jobs, and store it back in some table(s).  That's what the cluster does 
today.
Now if we add HBase-Hive to the mix and people start writing HQL and that runs 
MR jobs against data in HBase, this will affect the performance of those data 
ingestion jobs.

How do you deal with that?
Are there ways to maybe split the cluster in such a way that HQL-triggered MR 
jobs run only on some set of nodes, while MR jobs that are part of ingestion 
process run on a disjoint set of nodes?  Yes, have Hive's nodes see the new 
data 
that continuously gets ingested.

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase
Hadoop ecosystem search :: http://search-hadoop.com/



----- Original Message ----
> From: Jean-Daniel Cryans <[email protected]>
> To: [email protected]
> Sent: Fri, February 25, 2011 5:04:31 PM
> Subject: Re: Ad-hoc reports against HBase - any way? any tools?
> 
> HIVE-1634 will be a serious limitation if anything you store is in
> binary and  you don't want to patch hive. There's also some skew issues
> I have yet to  investigate that may be due to the hbase integration...
> or not. Apart from  that, our internal users are pretty happy.
> 
> J-D
> 
> On Fri, Feb 25,  2011 at 1:54 PM, Otis Gospodnetic
> <[email protected]>  wrote:
> > Hi J-D,
> >
> > Yes, I'm interested in HBase-Hive  integration.
> > Thanks for the pointer to the external tables.  I was aware  of that at some
> > point, but for some reason started thinking that data  copying is necessary.
> >
> > Are there any gotchas or serious  limitations around this integration?
> >
> > Thanks,
> >  Otis
> >
> >
> >
> >
> >
> > ----- Original Message  ----
> >> From: Jean-Daniel Cryans <[email protected]>
> >> To: [email protected]
> >> Sent:  Fri, February 25, 2011 4:17:09 PM
> >> Subject: Re: Ad-hoc reports  against HBase - any way? any tools?
> >>
> >> We use the  HBase+Hive integration here for ad-hoc queries, I don't
> >> understand   the data duplication you're talking about... when you
> >> create an  external  table you can directly query your existing tables.
> >> We run  with the latest  patch posted in HIVE-1634 since we have a lot
> >> of  binary values and I made a  very very hacky patch to be able to use
> >>  our binary composite row  keys.
> >>
> >> I'll be happy to give  you more details if you want to try going  down  
that
> >>road.
> >>
> >> J-D
> >>
> >> On  Fri, Feb 25, 2011 at 1:02 PM, Otis  Gospodnetic
> >> <[email protected]>   wrote:
> >> > Hello,
> >> >
> >> > I have a HBase  cluster chock-full of data  and would like to run  canned
> >>reports
> >> > (i.e.,
> >> >
> >>  > reports  known ahead of time), but also ad-hoc reports against that  
>data.
> >> > Are  there any open-source or commercial tools one can  use?
> >> >
> >> > Here's  what I *think* I know so far, but  please correct me wherever I 
>wrong,
> >>
> >>so
> >> >   I don't spread false info:
> >> >
> >> > * Use HBase-Hive  Integration
> >> >   Pluses:
> >> >    - lots of tools to  query Hive are available
> >> >   Minuses:
> >> >    - data  duplication
> >> >    - Hive's copy of data is  always  behind
> >> >    - I heard the integration is fairly alpha (e.g. you   can't copy 
> >> > deltas 
>to
> >> > Hive, you have to copy all data every  time you want  to update your Hive
> >>store)
> >>  >
> >> > * Use Pig
> >> >  https://issues.apache.org/jira/browse/PIG-970
> >> >  https://issues.apache.org/jira/browse/PIG-1205
> >> >   Pluses:
> >> >     - runs directly against HBase, no need to copy  data
> >> >  Minuses:
> >> >     - PigLatin learning curve -  in my case people wanting ad-hoc 
> >> > reports  
>are
> >>
> >>not
> >> >
> >> >  techies
> >> >    - No pretty front-end with syntax  highlighting or  visual querying or
> > that
> >> > accepts SQL and translates it  to  PigLatin
> >> >
> >> > * Use PigPen
> >> >   Pluses:
> >> >    - Visual ==  easy
> >> >   Minuses:
> >> >    - Looks abandoned justing by http://search-hadoop.com/m/Noacz1MECC7 
and
> >> > https://issues.apache.org/jira/browse/PIG-366
> >> >
> >>  > * Use Toad  for Cloud
> >> >  Pluses:
> >> >    -  accepts SQL, runs, and returns  data
> >> >    - runs directly against  HBase, no need to copy data
> >> >   Minuses:
> >> >    -  some people reported it crashes
> >> >    - it allows  the person  querying the data to also modify the data, 
>which
> >>is
> >> >  bad in my  environment
> >> >
> >> > * Datameer DAS,  Karmasphere Analyst, Pentaho,  Beeswax -- they all seem 
>to
> >  be
> >> > able to get the
> >> >
> >> > data out   of Hive, but not out of HBase.  More info below:
> >> >
> >>  > *  Pentaho
> >> >    * http://www.pentaho.com/products/hadoop/ - looks like it  supports  
>only
> >>Hive
> >> >    * http://forums.pentaho.com/showthread.php?77926-HBase-and-ETL
> >>  >    * http://search-hadoop.com/?q=pentaho&src=moz-search
> >>  >
> >> > *  Datameer
> >> >    * http://wiki.datameer.com/display/DAS1/DAS+Supported+Platforms - 
looks
> >>like
> >> > it
> >> >
> >> >  supports only Hive
> >> >    * http://wiki.datameer.com/display/DAS11/Using+the+Plug-in+SDK - looks
> >>like
> >> > one
> >> >
> >> >  can add support for HBase by writing a  plugin?
> >> >
> >>  > Karmasphere Analyst
> >> >    * 
>http://www.karmasphere.com/Products-Information/karmasphere-analyst.html
> >>
> >>-
> >>  >
> >> > Hive only
> >> >
> >> >
> >>  > Is any of the above  incorrect?
> >> > Did I miss a tool, free or  non-free, that I could use to run  ad-hoc 
>reports
> >> > against data  in HBase?
> >> >
> >> > Thanks,
> >> >   Otis
> >> > ----
> >> > Sematext :: http://sematext.com/ :: Solr -  Lucene - Hadoop -  HBase
> >> > Hadoop ecosystem search :: http://search-hadoop.com/
> >> >
> >>  >
> >>
> >
> >
>

RE: Ad-hoc reports against HBase - any way? any tools?

Reply via email to