Re: operational overhead for HBase

2011-08-17 Thread Friso van Vollenhoven
I worked with a cluster of about that size. Once everything is spinning, it requires little attention in my experience. Just have sensible checks (Nagios or alike) on things like disks filling up, especially on the namenode, and have an alert on swap usage (that's usually the beginning of

TestMasterFailover fails occasionally

2011-08-17 Thread Gaojinchao
It seems a bug. The root in RIT can't be moved.. In the failover process, it enforces root on-line. But not clean zk node. test will wait forever. void processFailover() throws KeeperException, IOException, InterruptedException { // we enforce on-line root. HServerInfo hsi =

Re: Accessing a sparate HBase cluster

2011-08-17 Thread Doug Meil
To be on the safe side, you probably want to double-check this. http://hbase.apache.org/book.html#client_dependencies On 8/17/11 3:00 AM, Hari Sreekumar hsreeku...@clickable.com wrote: Hi, I want to separate my application machines from the HBase cluster. So far, we have always run the

Hbase query

2011-08-17 Thread Stuti Awasthi
Hi all I have read that Hbase query will depend on Row-Key only . I was trying to map my tables of RDBMS to HBASE and was thinking way out for the following ? RDBMS Schema : Table Sales Column: * Sales_Id (PK) * User_Id (FK) * Product_ID (FK) * Name *

Re: Accessing a sparate HBase cluster

2011-08-17 Thread Hari Sreekumar
ha.. silly mistake, the hbase-site.xml file was using aliases, and I had the alias pointing to a different machine in /etc/hosts on this machine! my bad.. Thanks Doug.. On Wed, Aug 17, 2011 at 7:09 PM, Doug Meil doug.m...@explorysmedical.comwrote: To be on the safe side, you probably want to

RE: Hbase query

2011-08-17 Thread Buttler, David
One way to do it would be to drop the sales_id and use a composite key of user_id/product_id (assuming that a user may only by one product). Then you could do a simple get(xyz/123) to get the full row. If you wanted to get the email of people who bought product 123, then a row key of

Versioning

2011-08-17 Thread Mark
I'm trying to fully understand all the possibilities of what HBase has to offer but I can determine a valid use case for multiple versions. Can someone please explain some real life use cases for this? Also, at what point is there too many versions. For example to store all the queries a user

Re: Hbase query

2011-08-17 Thread Christopher Tarnas
Hi Stuti, There are several approaches depending on your exact situation, but most involve secondary indexes. You should read the HBase book, specifically the chapter on secondary indexes: http://hbase.apache.org/book.html#secondary.indexes -chris On Wed, Aug 17, 2011 at 10:18 AM, Stuti

Re: Versioning

2011-08-17 Thread Doug Meil
Versioning can be used to see the previous state of a record. Some people need this feature, others don't. One thing that may be worth a review is this... http://hbase.apache.org/book.html#keysize ... and specifically the fact about all the values being freighted with timestamp (aka version)

Re: mini-hbase configuration for tests

2011-08-17 Thread Garrett Wu
Thanks for the suggestions. I tweaked jobclient.completion.poll.interval and hbase.regionserver.msginterval, but that didn't seem to do much. I'll just not delete the tables, which is fine since they're all in a mini hbase anyway. On Mon, Aug 15, 2011 at 5:37 PM, Bill Graham

RE: version mismatch exception

2011-08-17 Thread Geoff Hendrey
Hi St.Ack, Keying off of what you said: Did you update the info:regioninfo cell so it has a new hregioninfo with same start and end row? You know this makes a new region, rather than extend the range of the previous region? (So the old region will be in the filesystem still with the old data).

Re: operational overhead for HBase

2011-08-17 Thread Jean-Daniel Cryans
I'm obviously not in a good position to answer since I've been a committer since 2008, but my experience if you can somehow relate is the following: At StumbleUpon we have 2 committers on staff (including me, oh and we're looking to hire a third one if anyone is interested). We've been using

RE: GZ better than LZO?

2011-08-17 Thread Sandy Pratt
I also switched from LZO to GZ a while back. I didn't do any micro-benchmarks, but I did note that the overall time of some MR jobs on our small cluster (~2B records at the time IIRC) went down slightly after the change. The primary reason I switched was not due to performance, however, but

RE: version mismatch exception

2011-08-17 Thread Rohit Nigam
Hi St.Ack The region in the file System are good, all I am looking is to change the end key of that region in the .META. table so that chaining problem goes away .The way I am planning to do is to get the HRegionInfo object for that existing region key from the .META. table . Create a new

About puppet and fabric (WAS: operational overhead for HBase)

2011-08-17 Thread Alex Holmes
Hi, On thread operational overhead for HBase, J-D gave out some interesting insights into automated deployments: - Have tools to automate cluster maintenance, such as doing rolling upgrades. We use Puppet and Fabric[2]. I'm currently evaluating the use of Puppet for Hadoop/HBase automated

Re: About puppet and fabric (WAS: operational overhead for HBase)

2011-08-17 Thread Ryan Rawson
I think my assessment would be that everyone has their pre chosen toolset and goes with it. You can make any of them work (with enough effort). Personally, we are using chef. They are building service orchestration, which few toolsets support. On Aug 17, 2011 1:42 PM, Alex Holmes

Re: About puppet and fabric (WAS: operational overhead for HBase)

2011-08-17 Thread Jean-Daniel Cryans
I'm currently evaluating the use of Puppet for Hadoop/HBase automated deploys and Fabric looks a lot simpler and more descriptive.  I'm curious how well Fabric would work in its own right without Puppet for automate installs? I'll let my puppet masters answer that. Apologies if this isn't

Re: GZ better than LZO?

2011-08-17 Thread BlueDavy Lin
We test gz also,but when we use gz,it seems will cause memory out of usage. It seems maybe because gz not use Deflater/Inflater correctly (not call end method explicit) 2011/8/18 Sandy Pratt prat...@adobe.com: I also switched from LZO to GZ a while back.  I didn't do any micro-benchmarks, but

Where are .META. and ROOT tables data

2011-08-17 Thread vamshi krishna
Hi all, i want to know the actual location of .META. and ROOT data after we install HBase in our machine. i mean, can i directly see that data in a particular file or something like that unlike querying and getting some specific data from them?? And the most important doubt is, which version of

Re: About puppet and fabric (WAS: operational overhead for HBase)

2011-08-17 Thread Dave Barr
On Wed, Aug 17, 2011 at 4:45 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: I'm currently evaluating the use of Puppet for Hadoop/HBase automated deploys and Fabric looks a lot simpler and more descriptive.  I'm curious how well Fabric would work in its own right without Puppet for automate

Re: Where are .META. and ROOT tables data

2011-08-17 Thread lars hofhansl
Hi Vamshi, at this point HBase needs a version of Hadoop that did not have a stable release, yet. Check out:http://hbase.apache.org/book/notsoquick.html For more details. We are using the CDH3 distribution and it works very well so far. We also have successfully used custom builds of HBase