But when I directly load data into HDFS using HDFS API, the disks are
balanced.
I use hadoop-0.20.2.
2010/9/7 Todd Lipcon t...@cloudera.com
On Mon, Sep 6, 2010 at 9:08 PM, Jonathan Gray jg...@facebook.com wrote:
You're looking at sizes on disk? Then this has nothing to do with HBase
load
but yes you will not be having different versions of those objects as they
are not stored as such in a table. So, that's the down side. In case your
objects are write once read multi types, I think it should work.
Let's see what others say :)
~Himanshu
On Tue, Sep 7, 2010 at 12:49 AM, Himanshu
Hi,
Came across a problem that I need to walk through.
On the client side, when you instantiate an HTable object, you can specify
HTable.setAutoFlush(true/false). Setting the boolean value to true means that
when you execute a put(), the write is not buffered on the client and will be
Hi guys,
More and more data in our company is moving from mysql tables to hbase
and more and more worried I am about the no backups situation with
that data. I've started looking for possible solutions to backup the
data and found two major options:
1) distcp of /hbase directory somewhere
2)
I think Lars explains it best:
http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
Short version: writing to the WAL is a backup solution if the region
server dies, because it's the MemStore that's being used for reads
(not the WAL). If you autoFlush, then everyone can
If you are asking about current solutions, then yes you can distcp
but I would consider that a last resort solution for the reasons you
described (yes, you could end up with an inconsistent state that
requires manual fixing). Also it completely bypasses row locks.
Another choice is using the
Stack,
I don't think that is my case. I am doing random reads across the namespace and
the way the table is designed, they should be distributed across region
servers. As I understand, rows are sorted by the key and we should design the
table such that we fetch data across regions and I have
In addition to what Jon said please be aware that if compression is specified
in the table schema, it happens at the store file level -- compression happens
after write I/O, before read I/O, so if you transmit a 100MB object that
compresses to 30MB, the performance impact is that of 100MB, not
On Mon, Sep 6, 2010 at 11:34 PM, Tao Xie xietao.mail...@gmail.com wrote:
But when I directly load data into HDFS using HDFS API, the disks are
balanced.
I use hadoop-0.20.2.
Yes, the bugs occur when processing a large volume of block deletions. See
HADOOP-5124 and HDFS-611. HBase's
Hi, could someone please tell me why stop-hbase.sh takes more than 24 hrs and
still running? I was able to started / stopped hbase in the past two months.
Now it suddenly stops working.
I am running hbase-0.20.4 with Linux 64-bit CPU / 64-bit operating system. I
downloaded hbase-0.20.4 and
Never worked for me (and I believe there was a JIRA for that).
On Tue, Sep 7, 2010 at 5:44 PM, Jian Lu j...@local.com wrote:
Hi, could someone please tell me why stop-hbase.sh takes more than 24 hrs and
still running? I was able to started / stopped hbase in the past two months.
Now it
Check the master log. It'll usually say what its waiting on. At this
stage, just kill your servers. Try kill PID first. If that doesn't
work, try kill -9 PID. Also, update your hbase to 0.20.6.
St.Ack
On Tue, Sep 7, 2010 at 2:44 PM, Jian Lu j...@local.com wrote:
Hi, could someone please
Don't know if this helps..but here are couple of reasons when I had the issue
how i resolved it
- If zookeeper is not running (or do not have the quorum) in a cluster setup,
hbase does not go down..bring up zookeeper
- Make sure pid file is not under /tmp...somtimes files get cleaned out of
Hi Ron,
The first thing that jumps out at me is that you are getting localhost as the
address for your zookeeper server. This is almost certainly wrong. You should
be getting a list of your zookeeper quorum here. Until you fix that nothing
will work.
You need something like the following in
Thanks gentlemen! It works now. I manually killed the three PID found in /tmp
dir, and changed all /tmp in hbase-env.sh to other dir. Thanks again!
-Original Message-
From: Venkatesh [mailto:vramanatha...@aol.com]
Sent: Tuesday, September 07, 2010 3:13 PM
To: user@hbase.apache.org
We had a weird problem when we accidentally kept old jars (0.20.4) around and tried to connect to
hbase 0.89. Zookeeper would connect but no data would be sent. That may not be your problem, but
it is something to watch out for.
~Jeff
On 9/7/2010 4:18 PM, Taylor, Ronald C wrote:
Hello
J-D, David, and Jeff,
Thanks for getting back to me so quickly. Problem has been resolved. I added
/home/hbase/hbase/conf
to my CLASSPATH var,
and made sure that both these files:
hbase-default.xml
and
hbase-site.xml
in the
/home/hbase/hbase/conf
directory use the values below
Are you sure you want 9 peers in zookeeper? I think the standard advice is to
have:
* 1 peer for clusters of size 10
* 5 peers for medium size clusters (10-40)
* 1 peer per rack for large clusters
9 seems like overkill for a cluster that has 25 nodes. Zookeeper should
probably have its own
Thanks - I'll talk to Tim as to cutting down on the zookeeper peers. At the
moment we at least don't have to worry about storage space - we have 25 Tb of
disk on each node - 600 Tb total to play with, which is plenty for us. (I'd
trade some of that disk capacity for more RAM per node, but have
Hi,
Thanks for your reply. How about the row size? I read that a row should not
be larger than the hdfs file on region server which is 256M in default. Is
it right? Many thanks.
William
On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell apurt...@apache.org wrote:
In addition to what Jon said
Jinsong Hu wrote:
I tried, this doesn't work. I noticed
$transport-open();
is missing in this code. so I added it.
Yup. Sorry about that. Copy and paste error :(
following code first successfully print all tables, then in the line
getRow(), it throws exception, even with ruby client, the row
You can go way beyond the max region split / split size. HBase will never
split the region once it is a single row, even if beyond the split size.
Also, if you're using large values, you should have region sizes much larger
than the default. It's common to run with 1-2GB regions in many
There is no firewall. As you can see, on the same client machine, I am able
to get the ruby version of the code to work.
This confirms that the thrift server is not the problem. Basically I am just
trying to fetch the same row of data
as that of the ruby program.
I am not running thrift server
Hi,
What's the performance looks like if we put large cell in HDFS vs local file
system? Random access to HDFS would be slow, right?
William
On Tue, Sep 7, 2010 at 11:30 PM, Jonathan Gray jg...@facebook.com wrote:
You can go way beyond the max region split / split size. HBase will never
There are 2 definitions of random access:
1) within a file (hdfs can be less than ideal)
2) randomly getting an entire file (not usually considered random gets)
for the latter, streaming an entire file from HDFS is actually pretty
good. You can see performances of substantial percentages (think
25 matches
Mail list logo