Otis,

There is early work on hbase replication happening.  That might be useful to
firewall the load in the way that you are looking for.

On Fri, Feb 25, 2011 at 2:39 PM, Otis Gospodnetic <
[email protected]> wrote:

> Ah, I have another question:
>
> When you have HBase-Hive integration in place, how do you control how
> queries
> that come in via Hive affect the HBase cluster?
> Consider a HBase cluster whose primary task is to ingest data, process it
> with
> MR jobs, and store it back in some table(s).  That's what the cluster does
> today.
> Now if we add HBase-Hive to the mix and people start writing HQL and that
> runs
> MR jobs against data in HBase, this will affect the performance of those
> data
> ingestion jobs.
>
> How do you deal with that?
> Are there ways to maybe split the cluster in such a way that HQL-triggered
> MR
> jobs run only on some set of nodes, while MR jobs that are part of
> ingestion
> process run on a disjoint set of nodes?  Yes, have Hive's nodes see the new
> data
> that continuously gets ingested.
>
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase
> Hadoop ecosystem search :: http://search-hadoop.com/
>
>
>
> ----- Original Message ----
> > From: Jean-Daniel Cryans <[email protected]>
> > To: [email protected]
> > Sent: Fri, February 25, 2011 5:04:31 PM
> > Subject: Re: Ad-hoc reports against HBase - any way? any tools?
> >
> > HIVE-1634 will be a serious limitation if anything you store is in
> > binary and  you don't want to patch hive. There's also some skew issues
> > I have yet to  investigate that may be due to the hbase integration...
> > or not. Apart from  that, our internal users are pretty happy.
> >
> > J-D
> >
> > On Fri, Feb 25,  2011 at 1:54 PM, Otis Gospodnetic
> > <[email protected]>  wrote:
> > > Hi J-D,
> > >
> > > Yes, I'm interested in HBase-Hive  integration.
> > > Thanks for the pointer to the external tables.  I was aware  of that at
> some
> > > point, but for some reason started thinking that data  copying is
> necessary.
> > >
> > > Are there any gotchas or serious  limitations around this integration?
> > >
> > > Thanks,
> > >  Otis
> > >
> > >
> > >
> > >
> > >
> > > ----- Original Message  ----
> > >> From: Jean-Daniel Cryans <[email protected]>
> > >> To: [email protected]
> > >> Sent:  Fri, February 25, 2011 4:17:09 PM
> > >> Subject: Re: Ad-hoc reports  against HBase - any way? any tools?
> > >>
> > >> We use the  HBase+Hive integration here for ad-hoc queries, I don't
> > >> understand   the data duplication you're talking about... when you
> > >> create an  external  table you can directly query your existing
> tables.
> > >> We run  with the latest  patch posted in HIVE-1634 since we have a lot
> > >> of  binary values and I made a  very very hacky patch to be able to
> use
> > >>  our binary composite row  keys.
> > >>
> > >> I'll be happy to give  you more details if you want to try going  down
> that
> > >>road.
> > >>
> > >> J-D
> > >>
> > >> On  Fri, Feb 25, 2011 at 1:02 PM, Otis  Gospodnetic
> > >> <[email protected]>   wrote:
> > >> > Hello,
> > >> >
> > >> > I have a HBase  cluster chock-full of data  and would like to run
>  canned
> > >>reports
> > >> > (i.e.,
> > >> >
> > >>  > reports  known ahead of time), but also ad-hoc reports against that
> >data.
> > >> > Are  there any open-source or commercial tools one can  use?
> > >> >
> > >> > Here's  what I *think* I know so far, but  please correct me
> wherever I
> >wrong,
> > >>
> > >>so
> > >> >   I don't spread false info:
> > >> >
> > >> > * Use HBase-Hive  Integration
> > >> >   Pluses:
> > >> >    - lots of tools to  query Hive are available
> > >> >   Minuses:
> > >> >    - data  duplication
> > >> >    - Hive's copy of data is  always  behind
> > >> >    - I heard the integration is fairly alpha (e.g. you   can't copy
> deltas
> >to
> > >> > Hive, you have to copy all data every  time you want  to update your
> Hive
> > >>store)
> > >>  >
> > >> > * Use Pig
> > >> >  https://issues.apache.org/jira/browse/PIG-970
> > >> >  https://issues.apache.org/jira/browse/PIG-1205
> > >> >   Pluses:
> > >> >     - runs directly against HBase, no need to copy  data
> > >> >  Minuses:
> > >> >     - PigLatin learning curve -  in my case people wanting ad-hoc
> reports
> >are
> > >>
> > >>not
> > >> >
> > >> >  techies
> > >> >    - No pretty front-end with syntax  highlighting or  visual
> querying or
> > > that
> > >> > accepts SQL and translates it  to  PigLatin
> > >> >
> > >> > * Use PigPen
> > >> >   Pluses:
> > >> >    - Visual ==  easy
> > >> >   Minuses:
> > >> >    - Looks abandoned justing by
> http://search-hadoop.com/m/Noacz1MECC7
> and
> > >> > https://issues.apache.org/jira/browse/PIG-366
> > >> >
> > >>  > * Use Toad  for Cloud
> > >> >  Pluses:
> > >> >    -  accepts SQL, runs, and returns  data
> > >> >    - runs directly against  HBase, no need to copy data
> > >> >   Minuses:
> > >> >    -  some people reported it crashes
> > >> >    - it allows  the person  querying the data to also modify the
> data,
> >which
> > >>is
> > >> >  bad in my  environment
> > >> >
> > >> > * Datameer DAS,  Karmasphere Analyst, Pentaho,  Beeswax -- they all
> seem
> >to
> > >  be
> > >> > able to get the
> > >> >
> > >> > data out   of Hive, but not out of HBase.  More info below:
> > >> >
> > >>  > *  Pentaho
> > >> >    * http://www.pentaho.com/products/hadoop/ - looks like it
>  supports
> >only
> > >>Hive
> > >> >    * http://forums.pentaho.com/showthread.php?77926-HBase-and-ETL
> > >>  >    * http://search-hadoop.com/?q=pentaho&src=moz-search
> > >>  >
> > >> > *  Datameer
> > >> >    * http://wiki.datameer.com/display/DAS1/DAS+Supported+Platforms-
> looks
> > >>like
> > >> > it
> > >> >
> > >> >  supports only Hive
> > >> >    * http://wiki.datameer.com/display/DAS11/Using+the+Plug-in+SDK -
> looks
> > >>like
> > >> > one
> > >> >
> > >> >  can add support for HBase by writing a  plugin?
> > >> >
> > >>  > Karmasphere Analyst
> > >> >    *
> >http://www.karmasphere.com/Products-Information/karmasphere-analyst.html
> > >>
> > >>-
> > >>  >
> > >> > Hive only
> > >> >
> > >> >
> > >>  > Is any of the above  incorrect?
> > >> > Did I miss a tool, free or  non-free, that I could use to run
>  ad-hoc
> >reports
> > >> > against data  in HBase?
> > >> >
> > >> > Thanks,
> > >> >   Otis
> > >> > ----
> > >> > Sematext :: http://sematext.com/ :: Solr -  Lucene - Hadoop -
>  HBase
> > >> > Hadoop ecosystem search :: http://search-hadoop.com/
> > >> >
> > >>  >
> > >>
> > >
> > >
> >
>

Reply via email to