Re: question about RegionManager

2010-09-07 Thread Tao Xie
But when I directly load data into HDFS using HDFS API, the disks are balanced. I use hadoop-0.20.2. 2010/9/7 Todd Lipcon t...@cloudera.com On Mon, Sep 6, 2010 at 9:08 PM, Jonathan Gray jg...@facebook.com wrote: You're looking at sizes on disk? Then this has nothing to do with HBase load

Re: Limits on HBase

2010-09-07 Thread Himanshu Vashishtha
but yes you will not be having different versions of those objects as they are not stored as such in a table. So, that's the down side. In case your objects are write once read multi types, I think it should work. Let's see what others say :) ~Himanshu On Tue, Sep 7, 2010 at 12:49 AM, Himanshu

Client Side buffering vs WAL

2010-09-07 Thread Michael Segel
Hi, Came across a problem that I need to walk through. On the client side, when you instantiate an HTable object, you can specify HTable.setAutoFlush(true/false). Setting the boolean value to true means that when you execute a put(), the write is not buffered on the client and will be

Hbase Backups

2010-09-07 Thread Alexey Kovyrin
Hi guys, More and more data in our company is moving from mysql tables to hbase and more and more worried I am about the no backups situation with that data. I've started looking for possible solutions to backup the data and found two major options: 1) distcp of /hbase directory somewhere 2)

Re: Client Side buffering vs WAL

2010-09-07 Thread Jean-Daniel Cryans
I think Lars explains it best: http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html Short version: writing to the WAL is a backup solution if the region server dies, because it's the MemStore that's being used for reads (not the WAL). If you autoFlush, then everyone can

Re: Hbase Backups

2010-09-07 Thread Jean-Daniel Cryans
If you are asking about current solutions, then yes you can distcp but I would consider that a last resort solution for the reasons you described (yes, you could end up with an inconsistent state that requires manual fixing). Also it completely bypasses row locks. Another choice is using the

RE: regionserver skew

2010-09-07 Thread Sharma, Avani
Stack, I don't think that is my case. I am doing random reads across the namespace and the way the table is designed, they should be distributed across region servers. As I understand, rows are sorted by the key and we should design the table such that we fetch data across regions and I have

RE: Limits on HBase

2010-09-07 Thread Andrew Purtell
In addition to what Jon said please be aware that if compression is specified in the table schema, it happens at the store file level -- compression happens after write I/O, before read I/O, so if you transmit a 100MB object that compresses to 30MB, the performance impact is that of 100MB, not

Re: question about RegionManager

2010-09-07 Thread Todd Lipcon
On Mon, Sep 6, 2010 at 11:34 PM, Tao Xie xietao.mail...@gmail.com wrote: But when I directly load data into HDFS using HDFS API, the disks are balanced. I use hadoop-0.20.2. Yes, the bugs occur when processing a large volume of block deletions. See HADOOP-5124 and HDFS-611. HBase's

stop-hbase.sh takes forever (never stops)

2010-09-07 Thread Jian Lu
Hi, could someone please tell me why stop-hbase.sh takes more than 24 hrs and still running? I was able to started / stopped hbase in the past two months. Now it suddenly stops working. I am running hbase-0.20.4 with Linux 64-bit CPU / 64-bit operating system. I downloaded hbase-0.20.4 and

Re: stop-hbase.sh takes forever (never stops)

2010-09-07 Thread Alexey Kovyrin
Never worked for me (and I believe there was a JIRA for that). On Tue, Sep 7, 2010 at 5:44 PM, Jian Lu j...@local.com wrote: Hi, could someone please tell me why stop-hbase.sh takes more than 24 hrs and still running?  I was able to started / stopped hbase in the past two months.  Now it

Re: stop-hbase.sh takes forever (never stops)

2010-09-07 Thread Stack
Check the master log. It'll usually say what its waiting on. At this stage, just kill your servers. Try kill PID first. If that doesn't work, try kill -9 PID. Also, update your hbase to 0.20.6. St.Ack On Tue, Sep 7, 2010 at 2:44 PM, Jian Lu j...@local.com wrote: Hi, could someone please

Re: stop-hbase.sh takes forever (never stops)

2010-09-07 Thread Venkatesh
Don't know if this helps..but here are couple of reasons when I had the issue how i resolved it - If zookeeper is not running (or do not have the quorum) in a cluster setup, hbase does not go down..bring up zookeeper - Make sure pid file is not under /tmp...somtimes files get cleaned out of

RE: Question on Hbase 0.89 - interactive shell works, programs don't - could use help

2010-09-07 Thread Buttler, David
Hi Ron, The first thing that jumps out at me is that you are getting localhost as the address for your zookeeper server. This is almost certainly wrong. You should be getting a list of your zookeeper quorum here. Until you fix that nothing will work. You need something like the following in

RE: stop-hbase.sh takes forever (never stops)

2010-09-07 Thread Jian Lu
Thanks gentlemen! It works now. I manually killed the three PID found in /tmp dir, and changed all /tmp in hbase-env.sh to other dir. Thanks again! -Original Message- From: Venkatesh [mailto:vramanatha...@aol.com] Sent: Tuesday, September 07, 2010 3:13 PM To: user@hbase.apache.org

Re: Question on Hbase 0.89 - interactive shell works, programs don't - could use help

2010-09-07 Thread Jeff Whiting
We had a weird problem when we accidentally kept old jars (0.20.4) around and tried to connect to hbase 0.89. Zookeeper would connect but no data would be sent. That may not be your problem, but it is something to watch out for. ~Jeff On 9/7/2010 4:18 PM, Taylor, Ronald C wrote: Hello

Solved - Question on Hbase 0.89 - interactive shell works, programs don't - could use help

2010-09-07 Thread Taylor, Ronald C
J-D, David, and Jeff, Thanks for getting back to me so quickly. Problem has been resolved. I added /home/hbase/hbase/conf to my CLASSPATH var, and made sure that both these files: hbase-default.xml and hbase-site.xml in the /home/hbase/hbase/conf directory use the values below

RE: Solved - Question on Hbase 0.89 - interactive shell works, programs don't - could use help

2010-09-07 Thread Buttler, David
Are you sure you want 9 peers in zookeeper? I think the standard advice is to have: * 1 peer for clusters of size 10 * 5 peers for medium size clusters (10-40) * 1 peer per rack for large clusters 9 seems like overkill for a cluster that has 25 nodes. Zookeeper should probably have its own

RE: Solved - Question on Hbase 0.89 - interactive shell works, programs don't - could use help

2010-09-07 Thread Taylor, Ronald C
Thanks - I'll talk to Tim as to cutting down on the zookeeper peers. At the moment we at least don't have to worry about storage space - we have 25 Tb of disk on each node - 600 Tb total to play with, which is plenty for us. (I'd trade some of that disk capacity for more RAM per node, but have

Re: Limits on HBase

2010-09-07 Thread William Kang
Hi, Thanks for your reply. How about the row size? I read that a row should not be larger than the hdfs file on region server which is 256M in default. Is it right? Many thanks. William On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell apurt...@apache.org wrote: In addition to what Jon said

Re: thrift for hbase in CDH3 broken ?

2010-09-07 Thread Igor Ranitovic
Jinsong Hu wrote: I tried, this doesn't work. I noticed $transport-open(); is missing in this code. so I added it. Yup. Sorry about that. Copy and paste error :( following code first successfully print all tables, then in the line getRow(), it throws exception, even with ruby client, the row

RE: Limits on HBase

2010-09-07 Thread Jonathan Gray
You can go way beyond the max region split / split size. HBase will never split the region once it is a single row, even if beyond the split size. Also, if you're using large values, you should have region sizes much larger than the default. It's common to run with 1-2GB regions in many

Re: thrift for hbase in CDH3 broken ?

2010-09-07 Thread Jinsong Hu
There is no firewall. As you can see, on the same client machine, I am able to get the ruby version of the code to work. This confirms that the thrift server is not the problem. Basically I am just trying to fetch the same row of data as that of the ruby program. I am not running thrift server

Re: Limits on HBase

2010-09-07 Thread William Kang
Hi, What's the performance looks like if we put large cell in HDFS vs local file system? Random access to HDFS would be slow, right? William On Tue, Sep 7, 2010 at 11:30 PM, Jonathan Gray jg...@facebook.com wrote: You can go way beyond the max region split / split size. HBase will never

Re: Limits on HBase

2010-09-07 Thread Ryan Rawson
There are 2 definitions of random access: 1) within a file (hdfs can be less than ideal) 2) randomly getting an entire file (not usually considered random gets) for the latter, streaming an entire file from HDFS is actually pretty good. You can see performances of substantial percentages (think