Otis, There is early work on hbase replication happening. That might be useful to firewall the load in the way that you are looking for.
On Fri, Feb 25, 2011 at 2:39 PM, Otis Gospodnetic < [email protected]> wrote: > Ah, I have another question: > > When you have HBase-Hive integration in place, how do you control how > queries > that come in via Hive affect the HBase cluster? > Consider a HBase cluster whose primary task is to ingest data, process it > with > MR jobs, and store it back in some table(s). That's what the cluster does > today. > Now if we add HBase-Hive to the mix and people start writing HQL and that > runs > MR jobs against data in HBase, this will affect the performance of those > data > ingestion jobs. > > How do you deal with that? > Are there ways to maybe split the cluster in such a way that HQL-triggered > MR > jobs run only on some set of nodes, while MR jobs that are part of > ingestion > process run on a disjoint set of nodes? Yes, have Hive's nodes see the new > data > that continuously gets ingested. > > Thanks, > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase > Hadoop ecosystem search :: http://search-hadoop.com/ > > > > ----- Original Message ---- > > From: Jean-Daniel Cryans <[email protected]> > > To: [email protected] > > Sent: Fri, February 25, 2011 5:04:31 PM > > Subject: Re: Ad-hoc reports against HBase - any way? any tools? > > > > HIVE-1634 will be a serious limitation if anything you store is in > > binary and you don't want to patch hive. There's also some skew issues > > I have yet to investigate that may be due to the hbase integration... > > or not. Apart from that, our internal users are pretty happy. > > > > J-D > > > > On Fri, Feb 25, 2011 at 1:54 PM, Otis Gospodnetic > > <[email protected]> wrote: > > > Hi J-D, > > > > > > Yes, I'm interested in HBase-Hive integration. > > > Thanks for the pointer to the external tables. I was aware of that at > some > > > point, but for some reason started thinking that data copying is > necessary. > > > > > > Are there any gotchas or serious limitations around this integration? > > > > > > Thanks, > > > Otis > > > > > > > > > > > > > > > > > > ----- Original Message ---- > > >> From: Jean-Daniel Cryans <[email protected]> > > >> To: [email protected] > > >> Sent: Fri, February 25, 2011 4:17:09 PM > > >> Subject: Re: Ad-hoc reports against HBase - any way? any tools? > > >> > > >> We use the HBase+Hive integration here for ad-hoc queries, I don't > > >> understand the data duplication you're talking about... when you > > >> create an external table you can directly query your existing > tables. > > >> We run with the latest patch posted in HIVE-1634 since we have a lot > > >> of binary values and I made a very very hacky patch to be able to > use > > >> our binary composite row keys. > > >> > > >> I'll be happy to give you more details if you want to try going down > that > > >>road. > > >> > > >> J-D > > >> > > >> On Fri, Feb 25, 2011 at 1:02 PM, Otis Gospodnetic > > >> <[email protected]> wrote: > > >> > Hello, > > >> > > > >> > I have a HBase cluster chock-full of data and would like to run > canned > > >>reports > > >> > (i.e., > > >> > > > >> > reports known ahead of time), but also ad-hoc reports against that > >data. > > >> > Are there any open-source or commercial tools one can use? > > >> > > > >> > Here's what I *think* I know so far, but please correct me > wherever I > >wrong, > > >> > > >>so > > >> > I don't spread false info: > > >> > > > >> > * Use HBase-Hive Integration > > >> > Pluses: > > >> > - lots of tools to query Hive are available > > >> > Minuses: > > >> > - data duplication > > >> > - Hive's copy of data is always behind > > >> > - I heard the integration is fairly alpha (e.g. you can't copy > deltas > >to > > >> > Hive, you have to copy all data every time you want to update your > Hive > > >>store) > > >> > > > >> > * Use Pig > > >> > https://issues.apache.org/jira/browse/PIG-970 > > >> > https://issues.apache.org/jira/browse/PIG-1205 > > >> > Pluses: > > >> > - runs directly against HBase, no need to copy data > > >> > Minuses: > > >> > - PigLatin learning curve - in my case people wanting ad-hoc > reports > >are > > >> > > >>not > > >> > > > >> > techies > > >> > - No pretty front-end with syntax highlighting or visual > querying or > > > that > > >> > accepts SQL and translates it to PigLatin > > >> > > > >> > * Use PigPen > > >> > Pluses: > > >> > - Visual == easy > > >> > Minuses: > > >> > - Looks abandoned justing by > http://search-hadoop.com/m/Noacz1MECC7 > and > > >> > https://issues.apache.org/jira/browse/PIG-366 > > >> > > > >> > * Use Toad for Cloud > > >> > Pluses: > > >> > - accepts SQL, runs, and returns data > > >> > - runs directly against HBase, no need to copy data > > >> > Minuses: > > >> > - some people reported it crashes > > >> > - it allows the person querying the data to also modify the > data, > >which > > >>is > > >> > bad in my environment > > >> > > > >> > * Datameer DAS, Karmasphere Analyst, Pentaho, Beeswax -- they all > seem > >to > > > be > > >> > able to get the > > >> > > > >> > data out of Hive, but not out of HBase. More info below: > > >> > > > >> > * Pentaho > > >> > * http://www.pentaho.com/products/hadoop/ - looks like it > supports > >only > > >>Hive > > >> > * http://forums.pentaho.com/showthread.php?77926-HBase-and-ETL > > >> > * http://search-hadoop.com/?q=pentaho&src=moz-search > > >> > > > >> > * Datameer > > >> > * http://wiki.datameer.com/display/DAS1/DAS+Supported+Platforms- > looks > > >>like > > >> > it > > >> > > > >> > supports only Hive > > >> > * http://wiki.datameer.com/display/DAS11/Using+the+Plug-in+SDK - > looks > > >>like > > >> > one > > >> > > > >> > can add support for HBase by writing a plugin? > > >> > > > >> > Karmasphere Analyst > > >> > * > >http://www.karmasphere.com/Products-Information/karmasphere-analyst.html > > >> > > >>- > > >> > > > >> > Hive only > > >> > > > >> > > > >> > Is any of the above incorrect? > > >> > Did I miss a tool, free or non-free, that I could use to run > ad-hoc > >reports > > >> > against data in HBase? > > >> > > > >> > Thanks, > > >> > Otis > > >> > ---- > > >> > Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - > HBase > > >> > Hadoop ecosystem search :: http://search-hadoop.com/ > > >> > > > >> > > > >> > > > > > > > > >
