I worked with a cluster of about that size. Once everything is spinning, it
requires little attention in my experience. Just have sensible checks (Nagios
or alike) on things like disks filling up, especially on the namenode, and have
an alert on swap usage (that's usually the beginning of
It seems a bug. The root in RIT can't be moved..
In the failover process, it enforces root on-line. But not clean zk node.
test will wait forever.
void processFailover() throws KeeperException, IOException,
InterruptedException {
// we enforce on-line root.
HServerInfo hsi =
To be on the safe side, you probably want to double-check this.
http://hbase.apache.org/book.html#client_dependencies
On 8/17/11 3:00 AM, Hari Sreekumar hsreeku...@clickable.com wrote:
Hi,
I want to separate my application machines from the HBase cluster. So far,
we have always run the
Hi all
I have read that Hbase query will depend on Row-Key only . I was trying to
map my tables of RDBMS to HBASE and was thinking way out for the following ?
RDBMS Schema :
Table Sales
Column:
* Sales_Id (PK)
* User_Id (FK)
* Product_ID (FK)
* Name
*
ha.. silly mistake, the hbase-site.xml file was using aliases, and I had the
alias pointing to a different machine in /etc/hosts on this machine! my
bad..
Thanks Doug..
On Wed, Aug 17, 2011 at 7:09 PM, Doug Meil doug.m...@explorysmedical.comwrote:
To be on the safe side, you probably want to
One way to do it would be to drop the sales_id and use a composite key of
user_id/product_id (assuming that a user may only by one product). Then you
could do a simple get(xyz/123) to get the full row. If you wanted to get the
email of people who bought product 123, then a row key of
I'm trying to fully understand all the possibilities of what HBase has
to offer but I can determine a valid use case for multiple versions. Can
someone please explain some real life use cases for this?
Also, at what point is there too many versions. For example to store
all the queries a user
Hi Stuti,
There are several approaches depending on your exact situation, but most
involve secondary indexes. You should read the HBase book, specifically the
chapter on secondary indexes:
http://hbase.apache.org/book.html#secondary.indexes
-chris
On Wed, Aug 17, 2011 at 10:18 AM, Stuti
Versioning can be used to see the previous state of a record. Some people
need this feature, others don't.
One thing that may be worth a review is this...
http://hbase.apache.org/book.html#keysize
... and specifically the fact about all the values being freighted with
timestamp (aka version)
Thanks for the suggestions. I tweaked jobclient.completion.poll.interval
and hbase.regionserver.msginterval, but that didn't seem to do much. I'll
just not delete the tables, which is fine since they're all in a mini hbase
anyway.
On Mon, Aug 15, 2011 at 5:37 PM, Bill Graham
Hi St.Ack,
Keying off of what you said: Did you update the
info:regioninfo cell so it has a new hregioninfo with same start and
end row? You know this makes a new region, rather than extend the
range of the previous region? (So the old region will be in the
filesystem still with the old data).
I'm obviously not in a good position to answer since I've been a
committer since 2008, but my experience if you can somehow relate is
the following:
At StumbleUpon we have 2 committers on staff (including me, oh and
we're looking to hire a third one if anyone is interested). We've been
using
I also switched from LZO to GZ a while back. I didn't do any micro-benchmarks,
but I did note that the overall time of some MR jobs on our small cluster (~2B
records at the time IIRC) went down slightly after the change.
The primary reason I switched was not due to performance, however, but
Hi St.Ack
The region in the file System are good, all I am looking is to change the end
key of that region in the .META. table so that chaining problem goes away .The
way I am planning to do is to get the HRegionInfo object for that existing
region key from the .META. table . Create a new
Hi,
On thread operational overhead for HBase, J-D gave out some
interesting insights into automated deployments:
- Have tools to automate cluster maintenance, such as doing
rolling upgrades. We use Puppet and Fabric[2].
I'm currently evaluating the use of Puppet for Hadoop/HBase automated
I think my assessment would be that everyone has their pre chosen toolset
and goes with it. You can make any of them work (with enough effort).
Personally, we are using chef. They are building service orchestration,
which few toolsets support.
On Aug 17, 2011 1:42 PM, Alex Holmes
I'm currently evaluating the use of Puppet for Hadoop/HBase automated
deploys and Fabric looks a lot simpler and more descriptive. I'm
curious how well Fabric would work in its own right without Puppet for
automate installs?
I'll let my puppet masters answer that.
Apologies if this isn't
We test gz also,but when we use gz,it seems will cause memory out of usage.
It seems maybe because gz not use Deflater/Inflater correctly (not
call end method explicit)
2011/8/18 Sandy Pratt prat...@adobe.com:
I also switched from LZO to GZ a while back. I didn't do any
micro-benchmarks, but
Hi all, i want to know the actual location of .META. and ROOT data after we
install HBase in our machine. i mean, can i directly see that data in a
particular file or something like that unlike querying and getting some
specific data from them??
And the most important doubt is, which version of
On Wed, Aug 17, 2011 at 4:45 PM, Jean-Daniel Cryans jdcry...@apache.org wrote:
I'm currently evaluating the use of Puppet for Hadoop/HBase automated
deploys and Fabric looks a lot simpler and more descriptive. I'm
curious how well Fabric would work in its own right without Puppet for
automate
Hi Vamshi,
at this point HBase needs a version of Hadoop that did not have a stable
release, yet.
Check out:http://hbase.apache.org/book/notsoquick.html For more details.
We are using the CDH3 distribution and it works very well so far.
We also have successfully used custom builds of HBase
21 matches
Mail list logo