Re: question about RegionManager
But when I directly load data into HDFS using HDFS API, the disks are balanced. I use hadoop-0.20.2. 2010/9/7 Todd Lipcon t...@cloudera.com On Mon, Sep 6, 2010 at 9:08 PM, Jonathan Gray jg...@facebook.com wrote: You're looking at sizes on disk? Then this has nothing to do with HBase load balancing. HBase does not move blocks around on the HDFS layer or deal with which physical disks are used, that is completely the responsibility of HDFS. Periodically HBase will perform major compactions on regions which causes data to be rewritten. This creates new files so could change what is in HDFS. There are some bugs in HDFS in 0.20 which can create this out-of-balance scenario. If you use CDH3b2 you should have a few patches which help to rectify the situation, in particular HDFS-611. Thanks -Todd JG -Original Message- From: Tao Xie [mailto:xietao.mail...@gmail.com] Sent: Monday, September 06, 2010 8:38 PM To: user@hbase.apache.org Subject: Re: question about RegionManager Actually, I'm a newbie of HBase. I went to read the code of assigning region because I met a load imbalance problem in my hbase cluster. I run 1+6 nodes hbase cluster, 1 node as master client, the other nodes as region server and data nodes. I run YCSB to insert records. In the inserting time, I find the data written to data nodes have different data size on disks. I think HDFS is doing well in balancing write. So is this problem due to HBase? Btw, after finished writing for minutes, the disks get balanced finally. I think maybe there is a LoadBalance like deamon thread working on this. Can anyone explain this? Many thanks. After inserting 160M 1k records, my six datanodes are greatly imbalanced. 10.1.0.125: /dev/sdb1 280G 89G 178G 34% /mnt/DP_disk1 10.1.0.125: /dev/sdc1 280G 91G 176G 35% /mnt/DP_disk2 10.1.0.125: /dev/sdd1 280G 91G 176G 34% /mnt/DP_disk3 10.1.0.121: /dev/sdb1 280G 15G 251G 6% /mnt/DP_disk1 10.1.0.121: /dev/sdc1 280G 16G 250G 6% /mnt/DP_disk2 10.1.0.121: /dev/sdd1 280G 15G 251G 6% /mnt/DP_disk3 10.1.0.122: /dev/sdb1 280G 15G 251G 6% /mnt/DP_disk1 10.1.0.122: /dev/sdc1 280G 15G 252G 6% /mnt/DP_disk2 10.1.0.122: /dev/sdd1 280G 13G 253G 5% /mnt/DP_disk3 10.1.0.124: /dev/sdb1 280G 14G 253G 5% /mnt/DP_disk1 10.1.0.124: /dev/sdc1 280G 15G 252G 6% /mnt/DP_disk2 10.1.0.124: /dev/sdd1 280G 14G 253G 6% /mnt/DP_disk3 10.1.0.123: /dev/sdb1 280G 66G 200G 25% /mnt/DP_disk1 10.1.0.123: /dev/sdc1 280G 65G 201G 25% /mnt/DP_disk2 10.1.0.123: /dev/sdd1 280G 65G 202G 25% /mnt/DP_disk3 10.1.0.126: /dev/sdb1 280G 14G 252G 6% /mnt/DP_disk1 10.1.0.126: /dev/sdc1 280G 14G 252G 6% /mnt/DP_disk2 10.1.0.126: /dev/sdd1 280G 13G 253G 5% /mnt/DP_disk3 2010/9/7 Tao Xie xietao.mail...@gmail.com I have a look at the following method in 0.89. Is the the following line correct ? nRegions *= e.getValue().size(); private int regionsToGiveOtherServers(final int numUnassignedRegions, final HServerLoad thisServersLoad) { SortedMapHServerLoad, SetString lightServers = new TreeMapHServerLoad, SetString(); this.master.getLightServers(thisServersLoad, lightServers); // Examine the list of servers that are more lightly loaded than this one. // Pretend that we will assign regions to these more lightly loaded servers // until they reach load equal with ours. Then, see how many regions are left // unassigned. That is how many regions we should assign to this server. int nRegions = 0; for (Map.EntryHServerLoad, SetString e: lightServers.entrySet()) { HServerLoad lightLoad = new HServerLoad(e.getKey()); do { lightLoad.setNumberOfRegions(lightLoad.getNumberOfRegions() + 1); nRegions += 1; } while (lightLoad.compareTo(thisServersLoad) = 0 nRegions numUnassignedRegions); nRegions *= e.getValue().size(); if (nRegions = numUnassignedRegions) { break; } } return nRegions; } 2010/9/7 Jonathan Gray jg...@facebook.com That code does actually exist in the latest 0.89 release. It was a protection put in place to guard against a weird behavior that we had seen during load balancing. As Ryan suggests, this code was in need of a rewrite and was just committed last week to trunk/0.90. If you're interested in the new load
Re: Limits on HBase
but yes you will not be having different versions of those objects as they are not stored as such in a table. So, that's the down side. In case your objects are write once read multi types, I think it should work. Let's see what others say :) ~Himanshu On Tue, Sep 7, 2010 at 12:49 AM, Himanshu Vashishtha vashishth...@gmail.com wrote: Assuming you will be using hdfs as the file system: wouldn't saving those large objects in the fs and keeping a pointer to them in a hbase table serve the purpose. [I haven't done it myself but I can't see it not working. In fact, I remember reading it somewhere in the list.] ~Himanshu On Mon, Sep 6, 2010 at 11:40 PM, William Kang weliam.cl...@gmail.comwrote: Hi JG, Thanks for your reply. As far as I have read in Hbase's documentation and wiki, the cell size is not supposed to be larger than 10 MB. For the row, I am not quite sure, but it looks like 256 MB is the upper limit. I am considering store some binary data used to be stored in RDBM blob field. The size of those binary objects may vary from hundreds of KB to hundreds of MB. What would be a good way to use Hbase for it? We really want to use hbase to avoid that scaling problem. Many thanks. William On Mon, Sep 6, 2010 at 7:10 PM, Jonathan Gray jg...@facebook.com wrote: I'm not sure what you mean by optimized cell size or whether you're just asking about practical limits? HBase is generally used with cells in the range of tens of bytes to hundreds of kilobytes. However, I have used it with cells that are several megabytes, up to about 50MB. Up at that level, I have seen some weird performance issues. The most important thing is to be sure to tweak all of your settings. If you have 20MB cells, you need to be sure to increase the flush size beyond 64MB and the split size beyond 256MB. You also need enough memory to support all this large object allocation. And of course, test test test. That's the easiest way to see if what you want to do will work :) When you run into problems, e-mail the list. As far as row size is concerned, the only issue is that a row can never span multiple regions so a given row can only be in one region and thus be hosted on one server at a time. JG -Original Message- From: William Kang [mailto:weliam.cl...@gmail.com] Sent: Monday, September 06, 2010 1:57 PM To: hbase-user Subject: Limits on HBase Hi folks, I know this question may have been asked many times, but I am wondering if there is any update on the optimized cell size (in megabytes) and row size (in megabytes)? Many thanks. William
Client Side buffering vs WAL
Hi, Came across a problem that I need to walk through. On the client side, when you instantiate an HTable object, you can specify HTable.setAutoFlush(true/false). Setting the boolean value to true means that when you execute a put(), the write is not buffered on the client and will be written directly to HBase. This overrides the client side buffering that you can set in your configuration files. While for many applications its ok for the app to buffer up its writes, however there's a set of apps where you don't want to do this. That is when your app writes a record to HBase, you want it exposed ASAP. On the server side, you have the Write Ahead Log. If I understand the WAL, it abstracts the actual process of writing to disk so that as far as your application is concerned, when you write to the WAL, its in HBase. So, my question is how long does it take for a record in the WAL to be written to Disk? Also if a record is in the WAL, if I did a get() will the record be found? Its possible that in a m/r job that client side buffering could mean that it could take a relatively 'long' time to actually have a record written to HBase, where as once the record is written to the WAL, it should be consistent in the time it takes to be written to disk for access by other HBase apps. Or what am I missing? Thx -Mike
Hbase Backups
Hi guys, More and more data in our company is moving from mysql tables to hbase and more and more worried I am about the no backups situation with that data. I've started looking for possible solutions to backup the data and found two major options: 1) distcp of /hbase directory somewhere 2) HBASE-1684 So, I have a few questions for hbase users: 1) How do you backup your small (up to a hundred gb) tables? 2) How do you backup your huge (terabytes in size) tables? And a question for hbase developers: what kind of problems could cause a distcp from a non-locked hbase table (there is no way to lock table writes while backing it up AFAIU)? I understand I could lose writes made after I begin the backup, but if my distcp takes an hour to complete, I imagine lots of things will happen on the filesystem during this period of time. Will hbase be able to recover from this kind of mess? Thanks a lot for your comments. -- Alexey Kovyrin http://kovyrin.net/
Re: Client Side buffering vs WAL
I think Lars explains it best: http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html Short version: writing to the WAL is a backup solution if the region server dies, because it's the MemStore that's being used for reads (not the WAL). If you autoFlush, then everyone can read the data if once your put() call returns without errors. J-D On Tue, Sep 7, 2010 at 7:44 AM, Michael Segel michael_se...@hotmail.com wrote: Hi, Came across a problem that I need to walk through. On the client side, when you instantiate an HTable object, you can specify HTable.setAutoFlush(true/false). Setting the boolean value to true means that when you execute a put(), the write is not buffered on the client and will be written directly to HBase. This overrides the client side buffering that you can set in your configuration files. While for many applications its ok for the app to buffer up its writes, however there's a set of apps where you don't want to do this. That is when your app writes a record to HBase, you want it exposed ASAP. On the server side, you have the Write Ahead Log. If I understand the WAL, it abstracts the actual process of writing to disk so that as far as your application is concerned, when you write to the WAL, its in HBase. So, my question is how long does it take for a record in the WAL to be written to Disk? Also if a record is in the WAL, if I did a get() will the record be found? Its possible that in a m/r job that client side buffering could mean that it could take a relatively 'long' time to actually have a record written to HBase, where as once the record is written to the WAL, it should be consistent in the time it takes to be written to disk for access by other HBase apps. Or what am I missing? Thx -Mike
Re: Hbase Backups
If you are asking about current solutions, then yes you can distcp but I would consider that a last resort solution for the reasons you described (yes, you could end up with an inconsistent state that requires manual fixing). Also it completely bypasses row locks. Another choice is using the Export MR job, using the start time option to do incremental backups. But then you have to distcp the result of that MR. And it's not a point in time that you are snapshotting, since it doesn't lock all rows (and you don't really want that hehe). Since you are on 0.89, you can use cluster replication. This will keep an almost up-to-date replica on another cluster. Cons are that it requires another cluster (may be a good thing to have in any case), and it's still experimental so you could run into issues. See http://hbase.apache.org/docs/r0.89.20100726/replication.html In the future there's HBASE-50 that should also be useful. J-D On Tue, Sep 7, 2010 at 9:27 AM, Alexey Kovyrin ale...@kovyrin.net wrote: Hi guys, More and more data in our company is moving from mysql tables to hbase and more and more worried I am about the no backups situation with that data. I've started looking for possible solutions to backup the data and found two major options: 1) distcp of /hbase directory somewhere 2) HBASE-1684 So, I have a few questions for hbase users: 1) How do you backup your small (up to a hundred gb) tables? 2) How do you backup your huge (terabytes in size) tables? And a question for hbase developers: what kind of problems could cause a distcp from a non-locked hbase table (there is no way to lock table writes while backing it up AFAIU)? I understand I could lose writes made after I begin the backup, but if my distcp takes an hour to complete, I imagine lots of things will happen on the filesystem during this period of time. Will hbase be able to recover from this kind of mess? Thanks a lot for your comments. -- Alexey Kovyrin http://kovyrin.net/
RE: regionserver skew
Stack, I don't think that is my case. I am doing random reads across the namespace and the way the table is designed, they should be distributed across region servers. As I understand, rows are sorted by the key and we should design the table such that we fetch data across regions and I have tried to achieve the same. If there is something else you want me to read, please point me to it. I have read the Hbase Architecture doc and also the one Lars George has posted I have one 2G file and other smaller ones on the cluster, but currently I am fetching data from this 2G lookup only. The number of regions is as follows: Server1: regions=41, 2G heap , also the hbase master, regionserver, namenode, tasktracker, jobtracker, datanode Server2: regions=36, 4G heap , datanode, tasktracker and regionserver Server3: regions=37 - this server gets 0 requests or 0 hitRatio, 4G heap , datanode, tasktracker and regionserver Total:114 That link mentioned that some servers have 0 hitRatio and says that is acceptable (?) , but that's for inserts- I am not sure if same applies to reads. http://search-hadoop.com/m/ESeeZ1B082l How do I confirm where the .META is hosted. Currently, I look the master log and check the machine it is hitting for .META table. My main concern is that before the upgrade to 0.20.6, .5M rows took 520 seconds (which you though was slow) on this 3-node cluster and now, after the upgrade and whatever other changes hbase/hdfs went through, it takes nearly an hour to do the same (with the same data and same rows being fetched). There is something really wrong with HDFS/Hbase here. I need help with diagnosing this. Let me know if you need any logs from me for this. I did send some logs last time. Did you get a chance to look at those? Thanks. -Original Message- From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack Sent: Monday, September 06, 2010 12:04 PM To: user@hbase.apache.org Subject: Re: regionserver skew On Fri, Sep 3, 2010 at 6:22 PM, Sharma, Avani agsha...@ebay.com wrote: I read on the mailing list that the region server that has .META table handles more requests. That sounds okay, but in my case the 3rd regionserver has 0 requests! And I feel that's what slowing down the read performance. Also the hit ratio at the other regionserver is 87% or so. Only the one that hosts .META has 95+% hit ratio. Are your reads distributed across the whole namespace or are they only fetching some subset? If a subset, it can be the case that the subset is totally hosted by a single regionserver and while your test is running, its only pulling form this single server. Is that your case? (You do understand how rows are distributed on an hbase cluster?) Also, how many regions do you have? You said you have 2G of data total at one stage. That likely does not make for many regions. If so, it could also be the case that all the server that is not fielding requests may not be actually carrying data, or little data. Is this your case? St.Ack
RE: Limits on HBase
In addition to what Jon said please be aware that if compression is specified in the table schema, it happens at the store file level -- compression happens after write I/O, before read I/O, so if you transmit a 100MB object that compresses to 30MB, the performance impact is that of 100MB, not 30MB. I also try not to go above 50MB as largest cell size, for the same reason. I have tried storing objects larger than 100MB but this can cause out of memory issues on busy regionservers no matter the size of the heap. When/if HBase RPC can send large objects in smaller chunks, this will be less of an issue. Best regards, - Andy Why is this email five sentences or less? http://five.sentenc.es/ --- On Mon, 9/6/10, Jonathan Gray jg...@facebook.com wrote: From: Jonathan Gray jg...@facebook.com Subject: RE: Limits on HBase To: user@hbase.apache.org user@hbase.apache.org Date: Monday, September 6, 2010, 4:10 PM I'm not sure what you mean by optimized cell size or whether you're just asking about practical limits? HBase is generally used with cells in the range of tens of bytes to hundreds of kilobytes. However, I have used it with cells that are several megabytes, up to about 50MB. Up at that level, I have seen some weird performance issues. The most important thing is to be sure to tweak all of your settings. If you have 20MB cells, you need to be sure to increase the flush size beyond 64MB and the split size beyond 256MB. You also need enough memory to support all this large object allocation. And of course, test test test. That's the easiest way to see if what you want to do will work :) When you run into problems, e-mail the list. As far as row size is concerned, the only issue is that a row can never span multiple regions so a given row can only be in one region and thus be hosted on one server at a time. JG -Original Message- From: William Kang [mailto:weliam.cl...@gmail.com] Sent: Monday, September 06, 2010 1:57 PM To: hbase-user Subject: Limits on HBase Hi folks, I know this question may have been asked many times, but I am wondering if there is any update on the optimized cell size (in megabytes) and row size (in megabytes)? Many thanks. William
Re: question about RegionManager
On Mon, Sep 6, 2010 at 11:34 PM, Tao Xie xietao.mail...@gmail.com wrote: But when I directly load data into HDFS using HDFS API, the disks are balanced. I use hadoop-0.20.2. Yes, the bugs occur when processing a large volume of block deletions. See HADOOP-5124 and HDFS-611. HBase's compactions cause a larger deletion rate than typical HDFS usage. -Todd 2010/9/7 Todd Lipcon t...@cloudera.com On Mon, Sep 6, 2010 at 9:08 PM, Jonathan Gray jg...@facebook.com wrote: You're looking at sizes on disk? Then this has nothing to do with HBase load balancing. HBase does not move blocks around on the HDFS layer or deal with which physical disks are used, that is completely the responsibility of HDFS. Periodically HBase will perform major compactions on regions which causes data to be rewritten. This creates new files so could change what is in HDFS. There are some bugs in HDFS in 0.20 which can create this out-of-balance scenario. If you use CDH3b2 you should have a few patches which help to rectify the situation, in particular HDFS-611. Thanks -Todd JG -Original Message- From: Tao Xie [mailto:xietao.mail...@gmail.com] Sent: Monday, September 06, 2010 8:38 PM To: user@hbase.apache.org Subject: Re: question about RegionManager Actually, I'm a newbie of HBase. I went to read the code of assigning region because I met a load imbalance problem in my hbase cluster. I run 1+6 nodes hbase cluster, 1 node as master client, the other nodes as region server and data nodes. I run YCSB to insert records. In the inserting time, I find the data written to data nodes have different data size on disks. I think HDFS is doing well in balancing write. So is this problem due to HBase? Btw, after finished writing for minutes, the disks get balanced finally. I think maybe there is a LoadBalance like deamon thread working on this. Can anyone explain this? Many thanks. After inserting 160M 1k records, my six datanodes are greatly imbalanced. 10.1.0.125: /dev/sdb1 280G 89G 178G 34% /mnt/DP_disk1 10.1.0.125: /dev/sdc1 280G 91G 176G 35% /mnt/DP_disk2 10.1.0.125: /dev/sdd1 280G 91G 176G 34% /mnt/DP_disk3 10.1.0.121: /dev/sdb1 280G 15G 251G 6% /mnt/DP_disk1 10.1.0.121: /dev/sdc1 280G 16G 250G 6% /mnt/DP_disk2 10.1.0.121: /dev/sdd1 280G 15G 251G 6% /mnt/DP_disk3 10.1.0.122: /dev/sdb1 280G 15G 251G 6% /mnt/DP_disk1 10.1.0.122: /dev/sdc1 280G 15G 252G 6% /mnt/DP_disk2 10.1.0.122: /dev/sdd1 280G 13G 253G 5% /mnt/DP_disk3 10.1.0.124: /dev/sdb1 280G 14G 253G 5% /mnt/DP_disk1 10.1.0.124: /dev/sdc1 280G 15G 252G 6% /mnt/DP_disk2 10.1.0.124: /dev/sdd1 280G 14G 253G 6% /mnt/DP_disk3 10.1.0.123: /dev/sdb1 280G 66G 200G 25% /mnt/DP_disk1 10.1.0.123: /dev/sdc1 280G 65G 201G 25% /mnt/DP_disk2 10.1.0.123: /dev/sdd1 280G 65G 202G 25% /mnt/DP_disk3 10.1.0.126: /dev/sdb1 280G 14G 252G 6% /mnt/DP_disk1 10.1.0.126: /dev/sdc1 280G 14G 252G 6% /mnt/DP_disk2 10.1.0.126: /dev/sdd1 280G 13G 253G 5% /mnt/DP_disk3 2010/9/7 Tao Xie xietao.mail...@gmail.com I have a look at the following method in 0.89. Is the the following line correct ? nRegions *= e.getValue().size(); private int regionsToGiveOtherServers(final int numUnassignedRegions, final HServerLoad thisServersLoad) { SortedMapHServerLoad, SetString lightServers = new TreeMapHServerLoad, SetString(); this.master.getLightServers(thisServersLoad, lightServers); // Examine the list of servers that are more lightly loaded than this one. // Pretend that we will assign regions to these more lightly loaded servers // until they reach load equal with ours. Then, see how many regions are left // unassigned. That is how many regions we should assign to this server. int nRegions = 0; for (Map.EntryHServerLoad, SetString e: lightServers.entrySet()) { HServerLoad lightLoad = new HServerLoad(e.getKey()); do { lightLoad.setNumberOfRegions(lightLoad.getNumberOfRegions() + 1); nRegions += 1; } while (lightLoad.compareTo(thisServersLoad) = 0 nRegions numUnassignedRegions); nRegions *= e.getValue().size(); if (nRegions = numUnassignedRegions) { break; } } return
stop-hbase.sh takes forever (never stops)
Hi, could someone please tell me why stop-hbase.sh takes more than 24 hrs and still running? I was able to started / stopped hbase in the past two months. Now it suddenly stops working. I am running hbase-0.20.4 with Linux 64-bit CPU / 64-bit operating system. I downloaded hbase-0.20.4 and run on a standalone mode (http://hbase.apache.org/docs/current/api/overview-summary.html#overview_description) Thanks! Jack.
Re: stop-hbase.sh takes forever (never stops)
Never worked for me (and I believe there was a JIRA for that). On Tue, Sep 7, 2010 at 5:44 PM, Jian Lu j...@local.com wrote: Hi, could someone please tell me why stop-hbase.sh takes more than 24 hrs and still running? I was able to started / stopped hbase in the past two months. Now it suddenly stops working. I am running hbase-0.20.4 with Linux 64-bit CPU / 64-bit operating system. I downloaded hbase-0.20.4 and run on a standalone mode (http://hbase.apache.org/docs/current/api/overview-summary.html#overview_description) Thanks! Jack. -- Alexey Kovyrin http://kovyrin.net/
Re: stop-hbase.sh takes forever (never stops)
Check the master log. It'll usually say what its waiting on. At this stage, just kill your servers. Try kill PID first. If that doesn't work, try kill -9 PID. Also, update your hbase to 0.20.6. St.Ack On Tue, Sep 7, 2010 at 2:44 PM, Jian Lu j...@local.com wrote: Hi, could someone please tell me why stop-hbase.sh takes more than 24 hrs and still running? I was able to started / stopped hbase in the past two months. Now it suddenly stops working. I am running hbase-0.20.4 with Linux 64-bit CPU / 64-bit operating system. I downloaded hbase-0.20.4 and run on a standalone mode (http://hbase.apache.org/docs/current/api/overview-summary.html#overview_description) Thanks! Jack.
Re: stop-hbase.sh takes forever (never stops)
Don't know if this helps..but here are couple of reasons when I had the issue how i resolved it - If zookeeper is not running (or do not have the quorum) in a cluster setup, hbase does not go down..bring up zookeeper - Make sure pid file is not under /tmp...somtimes files get cleaned out of tmp..Change *env.sh to point to diff dir. -Original Message- From: Jian Lu j...@local.com To: user@hbase.apache.org user@hbase.apache.org Sent: Tue, Sep 7, 2010 5:44 pm Subject: stop-hbase.sh takes forever (never stops) Hi, could someone please tell me why stop-hbase.sh takes more than 24 hrs and still running? I was able to started / stopped hbase in the past two months. Now it suddenly stops working. I am running hbase-0.20.4 with Linux 64-bit CPU / 64-bit operating system. I downloaded hbase-0.20.4 and run on a standalone mode (http://hbase.apache.org/docs/current/api/overview-summary.html#overview_description) Thanks! Jack.
RE: Question on Hbase 0.89 - interactive shell works, programs don't - could use help
Hi Ron, The first thing that jumps out at me is that you are getting localhost as the address for your zookeeper server. This is almost certainly wrong. You should be getting a list of your zookeeper quorum here. Until you fix that nothing will work. You need something like the following in your hbase-site.xml file (and your hbase-site.xml file should be in the classpath of all of the jobs you expect to run against your cluster): property namehbase.zookeeper.property.clientPort/name value2181/value description the port at which the clients will connect /description /property property namehbase.zookeeper.quorum/name valuenode-01,node-02,node-03,node-04,node-05/value descriptionComma separated list of servers in the ZooKeeper Quorum. For example, host1.mydomain.com,host2.mydomain.com,host3.mydomain.com. By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on. /description /property Let me know if that helps, Dave -Original Message- From: Taylor, Ronald C [mailto:ronald.tay...@pnl.gov] Sent: Tuesday, September 07, 2010 3:18 PM To: 'hbase-u...@hadoop.apache.org' Cc: Taylor, Ronald C; Witteveen, Tim Subject: Question on Hbase 0.89 - interactive shell works, programs don't - could use help Hello folks, We've just installed Hbase 0.89 on a 24-node cluster running Hadoop 0.20.2 here at our government lab. Got a problem. The Hbase interactive shell works fine. I can create a table with a column family, add a couple rows, get the rows back out. Also, the Hbase web site on our cluster at http://*h01.emsl.pnl.gov:60010/master.jsp doesn't appear (to our untrained eyes) to show anything going wrong However, the Hbase programs that I used on another cluster that ran an earlier version of Hbase no longer run. I altered such a program to use the new API, and it compiles fine. However, when I try to run it, I get the error msgs seen below. So - I downloaded the sample 0.89 Hbase program from the Hbase web site and tried that, simply altering the table name used to peptideTable, column family to f1, and column to name. The interactive shell shows that the table and data are there . But the slightly altered program from the Hbase web site, while compiling fine, again shows the same errors as I got using my own Hbase program. I've tried running the programs in both my own 'rtaylor' account, and in the 'hbase' account - I get the same errors. So my colleague Tim and I think we missed something in the install. I have appended the test program in full below, followed by the error msgs that it generated. Lastly, I have appended a screen dump of the contents of the web page at http://*h01.emsl.pnl.gov:60010/master.jsp on our cluster. We would very much appreciate some guidance. Cheers, Ron Taylor ___ Ronald Taylor, Ph.D. Computational Biology Bioinformatics Group Pacific Northwest National Laboratory 902 Battelle Boulevard P.O. Box 999, Mail Stop J4-33 Richland, WA 99352 USA Office: 509-372-6568 Email: ronald.tay...@pnl.gov % Contents of MyLittleHBaseClient.java: import java.io.IOException; // javac MyLittleHBaseClient.java // javac -Xlint MyLittleHBaseClient.java // java MyLittleHBaseClient import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; // Class that has nothing but a main. // Does a Put, Get and a Scan against an hbase table. public class MyLittleHBaseClient { public static void main(String[] args) throws IOException { // You need a configuration object to tell the client where to connect. // When you create a HBaseConfiguration, it reads in whatever you've set // into your hbase-site.xml and in hbase-default.xml, as long as these can // be found on the CLASSPATH HBaseConfiguration config = new HBaseConfiguration(); // This instantiates an HTable object that connects you to // the myLittleHBaseTable table. HTable table = new HTable(config, peptideTable); // To add to a row, use Put. A Put constructor takes the name of the row // you want to insert into as a byte array. In HBase, the Bytes class has // utility for converting all kinds of java types to byte arrays. In the // below, we are converting the String myLittleRow into a byte array to // use as a
RE: stop-hbase.sh takes forever (never stops)
Thanks gentlemen! It works now. I manually killed the three PID found in /tmp dir, and changed all /tmp in hbase-env.sh to other dir. Thanks again! -Original Message- From: Venkatesh [mailto:vramanatha...@aol.com] Sent: Tuesday, September 07, 2010 3:13 PM To: user@hbase.apache.org Subject: Re: stop-hbase.sh takes forever (never stops) Don't know if this helps..but here are couple of reasons when I had the issue how i resolved it - If zookeeper is not running (or do not have the quorum) in a cluster setup, hbase does not go down..bring up zookeeper - Make sure pid file is not under /tmp...somtimes files get cleaned out of tmp..Change *env.sh to point to diff dir. -Original Message- From: Jian Lu j...@local.com To: user@hbase.apache.org user@hbase.apache.org Sent: Tue, Sep 7, 2010 5:44 pm Subject: stop-hbase.sh takes forever (never stops) Hi, could someone please tell me why stop-hbase.sh takes more than 24 hrs and still running? I was able to started / stopped hbase in the past two months. Now it suddenly stops working. I am running hbase-0.20.4 with Linux 64-bit CPU / 64-bit operating system. I downloaded hbase-0.20.4 and run on a standalone mode (http://hbase.apache.org/docs/current/api/overview-summary.html#overview_description) Thanks! Jack.
Re: Question on Hbase 0.89 - interactive shell works, programs don't - could use help
We had a weird problem when we accidentally kept old jars (0.20.4) around and tried to connect to hbase 0.89. Zookeeper would connect but no data would be sent. That may not be your problem, but it is something to watch out for. ~Jeff On 9/7/2010 4:18 PM, Taylor, Ronald C wrote: Hello folks, We've just installed Hbase 0.89 on a 24-node cluster running Hadoop 0.20.2 here at our government lab. Got a problem. The Hbase interactive shell works fine. I can create a table with a column family, add a couple rows, get the rows back out. Also, the Hbase web site on our cluster at http://h01.emsl.pnl.gov:60010/master.jsp doesn't appear (to our untrained eyes) to show anything going wrong However, the Hbase programs that I used on another cluster that ran an earlier version of Hbase no longer run. I altered such a program to use the new API, and it compiles fine. However, when I try to run it, I get the error msgs seen below. So - I downloaded the sample 0.89 Hbase program from the Hbase web site and tried that, simply altering the table name used to peptideTable, column family to f1, and column to name. The interactive shell shows that the table and data are there . But the slightly altered program from the Hbase web site, while compiling fine, again shows the same errors as I got using my own Hbase program. I've tried running the programs in both my own 'rtaylor' account, and in the 'hbase' account - I get the same errors. So my colleague Tim and I think we missed something in the install. I have appended the test program in full below, followed by the error msgs that it generated. Lastly, I have appended a screen dump of the contents of the web page at http://h01.emsl.pnl.gov:60010/master.jsp on our cluster. We would very much appreciate some guidance. Cheers, Ron Taylor ___ Ronald Taylor, Ph.D. Computational Biology Bioinformatics Group Pacific Northwest National Laboratory 902 Battelle Boulevard P.O. Box 999, Mail Stop J4-33 Richland, WA 99352 USA Office: 509-372-6568 Email: ronald.tay...@pnl.gov % Contents of MyLittleHBaseClient.java: import java.io.IOException; // javac MyLittleHBaseClient.java // javac -Xlint MyLittleHBaseClient.java // java MyLittleHBaseClient import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; // Class that has nothing but a main. // Does a Put, Get and a Scan against an hbase table. public class MyLittleHBaseClient { public static void main(String[] args) throws IOException { // You need a configuration object to tell the client where to connect. // When you create a HBaseConfiguration, it reads in whatever you've set // into your hbase-site.xml and in hbase-default.xml, as long as these can // be found on the CLASSPATH HBaseConfiguration config = new HBaseConfiguration(); // This instantiates an HTable object that connects you to // the myLittleHBaseTable table. HTable table = new HTable(config, peptideTable); // To add to a row, use Put. A Put constructor takes the name of the row // you want to insert into as a byte array. In HBase, the Bytes class has // utility for converting all kinds of java types to byte arrays. In the // below, we are converting the String myLittleRow into a byte array to // use as a row key for our update. Once you have a Put instance, you can // adorn it by setting the names of columns you want to update on the row, // the timestamp to use in your update, etc.If no timestamp, the server // applies current time to the edits. // Put p = new Put(Bytes.toBytes(2001)); // To set the value you'd like to update in the row 'myLittleRow', specify // the column family, column qualifier, and value of the table cell you'd // like to update. The column family must already exist in your table // schema. The qualifier can be anything. All must be specified as byte // arrays as hbase is all about byte arrays. Lets pretend the table // 'myLittleHBaseTable' was created with a family 'myLittleFamily'. // p.add(Bytes.toBytes(f1), Bytes.toBytes(name), Bytes.toBytes(p2001)); // Once you've adorned your Put instance with all the updates you want to // make, to commit it do the following (The HTable#put method takes the // Put instance you've been building and pushes the changes you made into // hbase)
Solved - Question on Hbase 0.89 - interactive shell works, programs don't - could use help
J-D, David, and Jeff, Thanks for getting back to me so quickly. Problem has been resolved. I added /home/hbase/hbase/conf to my CLASSPATH var, and made sure that both these files: hbase-default.xml and hbase-site.xml in the /home/hbase/hbase/conf directory use the values below for setting the quorum (using the h02,h03, etc nodes on our cluster): property namehbase.zookeeper.quorum/name valueh02,h03,h04,h05,h06,h07,h08,h09,h10/value descriptionComma separated list of servers in the ZooKeeper Quorum. For example, host1.mydomain.com,host2.mydomain.com,host3.mydomain.com. By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on. /description /property This appears to have fixed the problem. Thanks again. Ron ___ Ronald Taylor, Ph.D. Computational Biology Bioinformatics Group Pacific Northwest National Laboratory 902 Battelle Boulevard P.O. Box 999, Mail Stop J4-33 Richland, WA 99352 USA Office: 509-372-6568 Email: ronald.tay...@pnl.gov -Original Message- From: Buttler, David [mailto:buttl...@llnl.gov] Sent: Tuesday, September 07, 2010 3:24 PM To: user@hbase.apache.org; 'hbase-u...@hadoop.apache.org' Cc: Witteveen, Tim Subject: RE: Question on Hbase 0.89 - interactive shell works, programs don't - could use help Hi Ron, The first thing that jumps out at me is that you are getting localhost as the address for your zookeeper server. This is almost certainly wrong. You should be getting a list of your zookeeper quorum here. Until you fix that nothing will work. You need something like the following in your hbase-site.xml file (and your hbase-site.xml file should be in the classpath of all of the jobs you expect to run against your cluster): property namehbase.zookeeper.property.clientPort/name value2181/value description the port at which the clients will connect /description /property property namehbase.zookeeper.quorum/name valuenode-01,node-02,node-03,node-04,node-05/value descriptionComma separated list of servers in the ZooKeeper Quorum. For example, host1.mydomain.com,host2.mydomain.com,host3.mydomain.com. By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on. /description /property Let me know if that helps, Dave -Original Message- From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel Cryans Sent: Tuesday, September 07, 2010 3:23 PM To: user@hbase.apache.org Subject: Re: Question on Hbase 0.89 - interactive shell works, programs don't - could use help Your client is trying to connect to a local zookeeper ensemble (grep for connectString in the message). This means that the client doesn't know about the proper configurations in order to connect to the cluster. Either put your hbase-site.xml on the client's classpath or set the proper settings on the HBaseConfiguration object. J-D On Tue, Sep 7, 2010 at 3:18 PM, Taylor, Ronald C ronald.tay...@pnl.gov wrote: Hello folks, We've just installed Hbase 0.89 on a 24-node cluster running Hadoop 0.20.2 here at our government lab. Got a problem. The Hbase interactive shell works fine. I can create a table with a column family, add a couple rows, get the rows back out. Also, the Hbase web site on our cluster at http://h01.emsl.pnl.gov:60010/master.jsp doesn't appear (to our untrained eyes) to show anything going wrong However, the Hbase programs that I used on another cluster that ran an earlier version of Hbase no longer run. I altered such a program to use the new API, and it compiles fine. However, when I try to run it, I get the error msgs seen below. So - I downloaded the sample 0.89 Hbase program from the Hbase web site and tried that, simply altering the table name used to peptideTable, column family to f1, and column to name. The interactive shell shows that the table and data are there . But the slightly altered program from the Hbase web site, while compiling fine, again shows the same errors as I got using my own Hbase program. I've tried running the programs in both my own 'rtaylor' account, and in the 'hbase' account - I get the same errors. So my colleague Tim and I think we missed something in the install. I have appended the test program in full below, followed by the error msgs that it generated. Lastly, I have appended a screen dump of the contents of the web page at http://h01.emsl.pnl.gov:60010/master.jsp on
RE: Solved - Question on Hbase 0.89 - interactive shell works, programs don't - could use help
Are you sure you want 9 peers in zookeeper? I think the standard advice is to have: * 1 peer for clusters of size 10 * 5 peers for medium size clusters (10-40) * 1 peer per rack for large clusters 9 seems like overkill for a cluster that has 25 nodes. Zookeeper should probably have its own disk on each device (which will reduce your potential storage space), and it has to write to disk on every peer before a zookeeper write will succeed -- more peers means that the cost per write is higher. Dave -Original Message- From: Taylor, Ronald C [mailto:ronald.tay...@pnl.gov] Sent: Tuesday, September 07, 2010 4:40 PM To: 'user@hbase.apache.org' Cc: Taylor, Ronald C; Witteveen, Tim Subject: Solved - Question on Hbase 0.89 - interactive shell works, programs don't - could use help J-D, David, and Jeff, Thanks for getting back to me so quickly. Problem has been resolved. I added /home/hbase/hbase/conf to my CLASSPATH var, and made sure that both these files: hbase-default.xml and hbase-site.xml in the /home/hbase/hbase/conf directory use the values below for setting the quorum (using the h02,h03, etc nodes on our cluster): property namehbase.zookeeper.quorum/name valueh02,h03,h04,h05,h06,h07,h08,h09,h10/value descriptionComma separated list of servers in the ZooKeeper Quorum. For example, host1.mydomain.com,host2.mydomain.com,host3.mydomain.com. By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on. /description /property This appears to have fixed the problem. Thanks again. Ron ___ Ronald Taylor, Ph.D. Computational Biology Bioinformatics Group Pacific Northwest National Laboratory 902 Battelle Boulevard P.O. Box 999, Mail Stop J4-33 Richland, WA 99352 USA Office: 509-372-6568 Email: ronald.tay...@pnl.gov -Original Message- From: Buttler, David [mailto:buttl...@llnl.gov] Sent: Tuesday, September 07, 2010 3:24 PM To: user@hbase.apache.org; 'hbase-u...@hadoop.apache.org' Cc: Witteveen, Tim Subject: RE: Question on Hbase 0.89 - interactive shell works, programs don't - could use help Hi Ron, The first thing that jumps out at me is that you are getting localhost as the address for your zookeeper server. This is almost certainly wrong. You should be getting a list of your zookeeper quorum here. Until you fix that nothing will work. You need something like the following in your hbase-site.xml file (and your hbase-site.xml file should be in the classpath of all of the jobs you expect to run against your cluster): property namehbase.zookeeper.property.clientPort/name value2181/value description the port at which the clients will connect /description /property property namehbase.zookeeper.quorum/name valuenode-01,node-02,node-03,node-04,node-05/value descriptionComma separated list of servers in the ZooKeeper Quorum. For example, host1.mydomain.com,host2.mydomain.com,host3.mydomain.com. By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on. /description /property Let me know if that helps, Dave -Original Message- From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel Cryans Sent: Tuesday, September 07, 2010 3:23 PM To: user@hbase.apache.org Subject: Re: Question on Hbase 0.89 - interactive shell works, programs don't - could use help Your client is trying to connect to a local zookeeper ensemble (grep for connectString in the message). This means that the client doesn't know about the proper configurations in order to connect to the cluster. Either put your hbase-site.xml on the client's classpath or set the proper settings on the HBaseConfiguration object. J-D On Tue, Sep 7, 2010 at 3:18 PM, Taylor, Ronald C ronald.tay...@pnl.gov wrote: Hello folks, We've just installed Hbase 0.89 on a 24-node cluster running Hadoop 0.20.2 here at our government lab. Got a problem. The Hbase interactive shell works fine. I can create a table with a column family, add a couple rows, get the rows back out. Also, the Hbase web site on our cluster at http://*h01.emsl.pnl.gov:60010/master.jsp doesn't appear (to our untrained eyes) to show anything going wrong However, the Hbase programs that I used on another cluster that ran an earlier version of Hbase no longer run. I altered such a program to use the new API, and it compiles fine. However, when I try to run it, I get the error msgs seen below. So - I
RE: Solved - Question on Hbase 0.89 - interactive shell works, programs don't - could use help
Thanks - I'll talk to Tim as to cutting down on the zookeeper peers. At the moment we at least don't have to worry about storage space - we have 25 Tb of disk on each node - 600 Tb total to play with, which is plenty for us. (I'd trade some of that disk capacity for more RAM per node, but have to work with the cluster we were given for testing purposes - hopefully we'll expand in the future.) Ron ___ Ronald Taylor, Ph.D. Computational Biology Bioinformatics Group Pacific Northwest National Laboratory 902 Battelle Boulevard P.O. Box 999, Mail Stop J4-33 Richland, WA 99352 USA Office: 509-372-6568 Email: ronald.tay...@pnl.gov -Original Message- From: Buttler, David [mailto:buttl...@llnl.gov] Sent: Tuesday, September 07, 2010 4:47 PM To: user@hbase.apache.org Cc: Witteveen, Tim Subject: RE: Solved - Question on Hbase 0.89 - interactive shell works, programs don't - could use help Are you sure you want 9 peers in zookeeper? I think the standard advice is to have: * 1 peer for clusters of size 10 * 5 peers for medium size clusters (10-40) * 1 peer per rack for large clusters 9 seems like overkill for a cluster that has 25 nodes. Zookeeper should probably have its own disk on each device (which will reduce your potential storage space), and it has to write to disk on every peer before a zookeeper write will succeed -- more peers means that the cost per write is higher. Dave -Original Message- From: Taylor, Ronald C [mailto:ronald.tay...@pnl.gov] Sent: Tuesday, September 07, 2010 4:40 PM To: 'user@hbase.apache.org' Cc: Taylor, Ronald C; Witteveen, Tim Subject: Solved - Question on Hbase 0.89 - interactive shell works, programs don't - could use help J-D, David, and Jeff, Thanks for getting back to me so quickly. Problem has been resolved. I added /home/hbase/hbase/conf to my CLASSPATH var, and made sure that both these files: hbase-default.xml and hbase-site.xml in the /home/hbase/hbase/conf directory use the values below for setting the quorum (using the h02,h03, etc nodes on our cluster): property namehbase.zookeeper.quorum/name valueh02,h03,h04,h05,h06,h07,h08,h09,h10/value descriptionComma separated list of servers in the ZooKeeper Quorum. For example, host1.mydomain.com,host2.mydomain.com,host3.mydomain.com. By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on. /description /property This appears to have fixed the problem. Thanks again. Ron ___ Ronald Taylor, Ph.D. Computational Biology Bioinformatics Group Pacific Northwest National Laboratory 902 Battelle Boulevard P.O. Box 999, Mail Stop J4-33 Richland, WA 99352 USA Office: 509-372-6568 Email: ronald.tay...@pnl.gov -Original Message- From: Buttler, David [mailto:buttl...@llnl.gov] Sent: Tuesday, September 07, 2010 3:24 PM To: user@hbase.apache.org; 'hbase-u...@hadoop.apache.org' Cc: Witteveen, Tim Subject: RE: Question on Hbase 0.89 - interactive shell works, programs don't - could use help Hi Ron, The first thing that jumps out at me is that you are getting localhost as the address for your zookeeper server. This is almost certainly wrong. You should be getting a list of your zookeeper quorum here. Until you fix that nothing will work. You need something like the following in your hbase-site.xml file (and your hbase-site.xml file should be in the classpath of all of the jobs you expect to run against your cluster): property namehbase.zookeeper.property.clientPort/name value2181/value description the port at which the clients will connect /description /property property namehbase.zookeeper.quorum/name valuenode-01,node-02,node-03,node-04,node-05/value descriptionComma separated list of servers in the ZooKeeper Quorum. For example, host1.mydomain.com,host2.mydomain.com,host3.mydomain.com. By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on. /description /property Let me know if that helps, Dave -Original Message- From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel Cryans Sent: Tuesday, September 07, 2010 3:23 PM To: user@hbase.apache.org Subject: Re: Question on Hbase 0.89 - interactive shell works, programs don't - could use help Your client is trying to connect to a local zookeeper ensemble (grep for connectString in the message). This means that the client doesn't know about the
Re: Limits on HBase
Hi, Thanks for your reply. How about the row size? I read that a row should not be larger than the hdfs file on region server which is 256M in default. Is it right? Many thanks. William On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell apurt...@apache.org wrote: In addition to what Jon said please be aware that if compression is specified in the table schema, it happens at the store file level -- compression happens after write I/O, before read I/O, so if you transmit a 100MB object that compresses to 30MB, the performance impact is that of 100MB, not 30MB. I also try not to go above 50MB as largest cell size, for the same reason. I have tried storing objects larger than 100MB but this can cause out of memory issues on busy regionservers no matter the size of the heap. When/if HBase RPC can send large objects in smaller chunks, this will be less of an issue. Best regards, - Andy Why is this email five sentences or less? http://five.sentenc.es/ --- On Mon, 9/6/10, Jonathan Gray jg...@facebook.com wrote: From: Jonathan Gray jg...@facebook.com Subject: RE: Limits on HBase To: user@hbase.apache.org user@hbase.apache.org Date: Monday, September 6, 2010, 4:10 PM I'm not sure what you mean by optimized cell size or whether you're just asking about practical limits? HBase is generally used with cells in the range of tens of bytes to hundreds of kilobytes. However, I have used it with cells that are several megabytes, up to about 50MB. Up at that level, I have seen some weird performance issues. The most important thing is to be sure to tweak all of your settings. If you have 20MB cells, you need to be sure to increase the flush size beyond 64MB and the split size beyond 256MB. You also need enough memory to support all this large object allocation. And of course, test test test. That's the easiest way to see if what you want to do will work :) When you run into problems, e-mail the list. As far as row size is concerned, the only issue is that a row can never span multiple regions so a given row can only be in one region and thus be hosted on one server at a time. JG -Original Message- From: William Kang [mailto:weliam.cl...@gmail.com] Sent: Monday, September 06, 2010 1:57 PM To: hbase-user Subject: Limits on HBase Hi folks, I know this question may have been asked many times, but I am wondering if there is any update on the optimized cell size (in megabytes) and row size (in megabytes)? Many thanks. William
Re: thrift for hbase in CDH3 broken ?
Jinsong Hu wrote: I tried, this doesn't work. I noticed $transport-open(); is missing in this code. so I added it. Yup. Sorry about that. Copy and paste error :( following code first successfully print all tables, then in the line getRow(), it throws exception, even with ruby client, the row data is there $transport-open(); my @names=$client-getTableNames(); print Dumper(@names); print \n; my $row = $client-getRow('table12345', key123); print Dumper($row); print \n; $transport-close(); So you can scan META table on the master, but can fetch a row from a RS. Are there any firewalls in place ? Are you running thrift servers on the same nodes as region servers? What kind of exception do you get? i. -- From: Igor Ranitovic irani...@gmail.com Sent: Friday, September 03, 2010 11:45 AM To: user@hbase.apache.org Subject: Re: thrift for hbase in CDH3 broken ? Not should what the test code is...would this test your setup? #!/usr/bin/env perl use strict; use warnings; use Thrift::BinaryProtocol; use Thrift::BufferedTransport; use Thrift::Socket; use Hbase::Hbase; use Data::Dumper; my $sock = Thrift::Socket-new('127.0.0.1', '9090'); $sock-setRecvTimeout(6); my $transport = Thrift::BufferedTransport-new($sock); my $protocol = Thrift::BinaryProtocol-new($transport); my $client = Hbase::HbaseClient-new($protocol); my $row = $client-getRow('table_test', 'row_123'); print Dumper($row); $transport-close(); BTW, I am not sure why you would want to use java to talk to the HBase via the thirft server. i. Jinsong Hu wrote: by the way, does anybody have a perl version of the test code ? Jimmy -- From: Jinsong Hu jinsong...@hotmail.com Sent: Friday, September 03, 2010 11:17 AM To: user@hbase.apache.org Subject: Re: thrift for hbase in CDH3 broken ? I tried your code and indeed it works. but the java version doesn't work. so it looks like it is a bug of the java library supplied by the thrift-0.2.0 version. Jimmy. -- From: Alexey Kovyrin ale...@kovyrin.net Sent: Friday, September 03, 2010 12:31 AM To: user@hbase.apache.org Subject: Re: thrift for hbase in CDH3 broken ? yes, Centos 5.5 + CDH3b2 On Fri, Sep 3, 2010 at 3:26 AM, Jinsong Hu jinsong...@hotmail.com wrote: are you using CDH3 distribution ? Jinsong -- From: Alexey Kovyrin ale...@kovyrin.net Sent: Friday, September 03, 2010 12:04 AM To: user@hbase.apache.org Subject: Re: thrift for hbase in CDH3 broken ? http://github.com/kovyrin/hbase-thrift-client-examples - just wrote this example and tested it in our cluster, works as expected. For this to work you'd need to install rubygems and thrift gem (gem install thrift). On Fri, Sep 3, 2010 at 12:01 AM, Jinsong Hu jinsong...@hotmail.com wrote: Can you send me some ruby test code and so I can try against the latest CDH3 ? Jimmy. -- From: Alexey Kovyrin ale...@kovyrin.net Sent: Thursday, September 02, 2010 8:15 PM To: user@hbase.apache.org Subject: Re: thrift for hbase in CDH3 broken ? We use it in Scribd.com. All clients are ruby web apps. On Thu, Sep 2, 2010 at 10:49 PM, Todd Lipcon t...@cloudera.com wrote: On Thu, Sep 2, 2010 at 5:35 PM, Jinsong Hu jinsong...@hotmail.com wrote: Yes, I confirmed that it is indeed thrift server. and the fact that the API Listbyte[] tableNamesList=client.getTableNames(); for (byte [] name : tableNamesList) { System.out.println(new String(name)); } successfully printed all table names shows that it is indeed thrift server. if it is hue, it won't print the table names. Ah, sorry, I missed that in your original message. Not sure what's up, then - we don't have any changes in CDH that would affect this. Anyone here used thrift on 0.89.20100621? -Todd Jimmy. -- From: Todd Lipcon t...@cloudera.com Sent: Thursday, September 02, 2010 5:18 PM To: user@hbase.apache.org Subject: Re: thrift for hbase in CDH3 broken ? Hi Jinsong, Are you sure that the port you're connecting to is indeed the thrift server? Unfortunately both the HBase thrift server and the Hue namenode plugin listen on port 9090, so you might be having an issue where your HBase client is trying to connect to the Namenode server instead of HBase. You can verify the ports using a command like /sbin/fuser -n tcp 9090 to see which pid has it open, then cross reference against sudo jps. Thanks -Todd On Thu, Sep 2, 2010 at 4:40 PM, Jinsong Hu jinsong...@hotmail.com wrote: Hi, There, I am trying to test and see if thrift for hbase works. I followed the example from http://www.workhabit.com/labs/centos-55-and-thriftscribe http://incubator.apache.org/thrift/
RE: Limits on HBase
You can go way beyond the max region split / split size. HBase will never split the region once it is a single row, even if beyond the split size. Also, if you're using large values, you should have region sizes much larger than the default. It's common to run with 1-2GB regions in many cases. What you may have seen are recommendations that if your cell values are approaching the default block size on HDFS (64MB), you should consider putting the data directly into HDFS rather than HBase. JG -Original Message- From: William Kang [mailto:weliam.cl...@gmail.com] Sent: Tuesday, September 07, 2010 7:36 PM To: user@hbase.apache.org; apurt...@apache.org Subject: Re: Limits on HBase Hi, Thanks for your reply. How about the row size? I read that a row should not be larger than the hdfs file on region server which is 256M in default. Is it right? Many thanks. William On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell apurt...@apache.org wrote: In addition to what Jon said please be aware that if compression is specified in the table schema, it happens at the store file level -- compression happens after write I/O, before read I/O, so if you transmit a 100MB object that compresses to 30MB, the performance impact is that of 100MB, not 30MB. I also try not to go above 50MB as largest cell size, for the same reason. I have tried storing objects larger than 100MB but this can cause out of memory issues on busy regionservers no matter the size of the heap. When/if HBase RPC can send large objects in smaller chunks, this will be less of an issue. Best regards, - Andy Why is this email five sentences or less? http://five.sentenc.es/ --- On Mon, 9/6/10, Jonathan Gray jg...@facebook.com wrote: From: Jonathan Gray jg...@facebook.com Subject: RE: Limits on HBase To: user@hbase.apache.org user@hbase.apache.org Date: Monday, September 6, 2010, 4:10 PM I'm not sure what you mean by optimized cell size or whether you're just asking about practical limits? HBase is generally used with cells in the range of tens of bytes to hundreds of kilobytes. However, I have used it with cells that are several megabytes, up to about 50MB. Up at that level, I have seen some weird performance issues. The most important thing is to be sure to tweak all of your settings. If you have 20MB cells, you need to be sure to increase the flush size beyond 64MB and the split size beyond 256MB. You also need enough memory to support all this large object allocation. And of course, test test test. That's the easiest way to see if what you want to do will work :) When you run into problems, e-mail the list. As far as row size is concerned, the only issue is that a row can never span multiple regions so a given row can only be in one region and thus be hosted on one server at a time. JG -Original Message- From: William Kang [mailto:weliam.cl...@gmail.com] Sent: Monday, September 06, 2010 1:57 PM To: hbase-user Subject: Limits on HBase Hi folks, I know this question may have been asked many times, but I am wondering if there is any update on the optimized cell size (in megabytes) and row size (in megabytes)? Many thanks. William
Re: thrift for hbase in CDH3 broken ?
There is no firewall. As you can see, on the same client machine, I am able to get the ruby version of the code to work. This confirms that the thrift server is not the problem. Basically I am just trying to fetch the same row of data as that of the ruby program. I am not running thrift server on the same regionserver. I am running the thrift server on a standalone machine that is configured to point to the zookeeper for the hbase cluster. since the ruby version of the client code works, I would assume that the thrift server is not the problem. I also tried java version and it doesn't work either. in the previous post somebody asked why I use java. The reason is because I want to test and see if the thrift server works. I never managed to get java working, even until now. Have you gotten the perl version to work ? Have you been able to read a row of data using perl ? Jimmy. -- From: Igor Ranitovic irani...@gmail.com Sent: Tuesday, September 07, 2010 8:18 PM To: user@hbase.apache.org Subject: Re: thrift for hbase in CDH3 broken ? Jinsong Hu wrote: I tried, this doesn't work. I noticed $transport-open(); is missing in this code. so I added it. Yup. Sorry about that. Copy and paste error :( following code first successfully print all tables, then in the line getRow(), it throws exception, even with ruby client, the row data is there $transport-open(); my @names=$client-getTableNames(); print Dumper(@names); print \n; my $row = $client-getRow('table12345', key123); print Dumper($row); print \n; $transport-close(); So you can scan META table on the master, but can fetch a row from a RS. Are there any firewalls in place ? Are you running thrift servers on the same nodes as region servers? What kind of exception do you get? i. -- From: Igor Ranitovic irani...@gmail.com Sent: Friday, September 03, 2010 11:45 AM To: user@hbase.apache.org Subject: Re: thrift for hbase in CDH3 broken ? Not should what the test code is...would this test your setup? #!/usr/bin/env perl use strict; use warnings; use Thrift::BinaryProtocol; use Thrift::BufferedTransport; use Thrift::Socket; use Hbase::Hbase; use Data::Dumper; my $sock = Thrift::Socket-new('127.0.0.1', '9090'); $sock-setRecvTimeout(6); my $transport = Thrift::BufferedTransport-new($sock); my $protocol = Thrift::BinaryProtocol-new($transport); my $client = Hbase::HbaseClient-new($protocol); my $row = $client-getRow('table_test', 'row_123'); print Dumper($row); $transport-close(); BTW, I am not sure why you would want to use java to talk to the HBase via the thirft server. i. Jinsong Hu wrote: by the way, does anybody have a perl version of the test code ? Jimmy -- From: Jinsong Hu jinsong...@hotmail.com Sent: Friday, September 03, 2010 11:17 AM To: user@hbase.apache.org Subject: Re: thrift for hbase in CDH3 broken ? I tried your code and indeed it works. but the java version doesn't work. so it looks like it is a bug of the java library supplied by the thrift-0.2.0 version. Jimmy. -- From: Alexey Kovyrin ale...@kovyrin.net Sent: Friday, September 03, 2010 12:31 AM To: user@hbase.apache.org Subject: Re: thrift for hbase in CDH3 broken ? yes, Centos 5.5 + CDH3b2 On Fri, Sep 3, 2010 at 3:26 AM, Jinsong Hu jinsong...@hotmail.com wrote: are you using CDH3 distribution ? Jinsong -- From: Alexey Kovyrin ale...@kovyrin.net Sent: Friday, September 03, 2010 12:04 AM To: user@hbase.apache.org Subject: Re: thrift for hbase in CDH3 broken ? http://github.com/kovyrin/hbase-thrift-client-examples - just wrote this example and tested it in our cluster, works as expected. For this to work you'd need to install rubygems and thrift gem (gem install thrift). On Fri, Sep 3, 2010 at 12:01 AM, Jinsong Hu jinsong...@hotmail.com wrote: Can you send me some ruby test code and so I can try against the latest CDH3 ? Jimmy. -- From: Alexey Kovyrin ale...@kovyrin.net Sent: Thursday, September 02, 2010 8:15 PM To: user@hbase.apache.org Subject: Re: thrift for hbase in CDH3 broken ? We use it in Scribd.com. All clients are ruby web apps. On Thu, Sep 2, 2010 at 10:49 PM, Todd Lipcon t...@cloudera.com wrote: On Thu, Sep 2, 2010 at 5:35 PM, Jinsong Hu jinsong...@hotmail.com wrote: Yes, I confirmed that it is indeed thrift server. and the fact that the API Listbyte[] tableNamesList=client.getTableNames(); for (byte [] name : tableNamesList) { System.out.println(new String(name)); } successfully printed all table names shows that it is indeed thrift server. if it is hue, it won't print the table names. Ah, sorry, I missed that in your original message. Not
Re: Limits on HBase
Hi, What's the performance looks like if we put large cell in HDFS vs local file system? Random access to HDFS would be slow, right? William On Tue, Sep 7, 2010 at 11:30 PM, Jonathan Gray jg...@facebook.com wrote: You can go way beyond the max region split / split size. HBase will never split the region once it is a single row, even if beyond the split size. Also, if you're using large values, you should have region sizes much larger than the default. It's common to run with 1-2GB regions in many cases. What you may have seen are recommendations that if your cell values are approaching the default block size on HDFS (64MB), you should consider putting the data directly into HDFS rather than HBase. JG -Original Message- From: William Kang [mailto:weliam.cl...@gmail.com] Sent: Tuesday, September 07, 2010 7:36 PM To: user@hbase.apache.org; apurt...@apache.org Subject: Re: Limits on HBase Hi, Thanks for your reply. How about the row size? I read that a row should not be larger than the hdfs file on region server which is 256M in default. Is it right? Many thanks. William On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell apurt...@apache.org wrote: In addition to what Jon said please be aware that if compression is specified in the table schema, it happens at the store file level -- compression happens after write I/O, before read I/O, so if you transmit a 100MB object that compresses to 30MB, the performance impact is that of 100MB, not 30MB. I also try not to go above 50MB as largest cell size, for the same reason. I have tried storing objects larger than 100MB but this can cause out of memory issues on busy regionservers no matter the size of the heap. When/if HBase RPC can send large objects in smaller chunks, this will be less of an issue. Best regards, - Andy Why is this email five sentences or less? http://five.sentenc.es/ --- On Mon, 9/6/10, Jonathan Gray jg...@facebook.com wrote: From: Jonathan Gray jg...@facebook.com Subject: RE: Limits on HBase To: user@hbase.apache.org user@hbase.apache.org Date: Monday, September 6, 2010, 4:10 PM I'm not sure what you mean by optimized cell size or whether you're just asking about practical limits? HBase is generally used with cells in the range of tens of bytes to hundreds of kilobytes. However, I have used it with cells that are several megabytes, up to about 50MB. Up at that level, I have seen some weird performance issues. The most important thing is to be sure to tweak all of your settings. If you have 20MB cells, you need to be sure to increase the flush size beyond 64MB and the split size beyond 256MB. You also need enough memory to support all this large object allocation. And of course, test test test. That's the easiest way to see if what you want to do will work :) When you run into problems, e-mail the list. As far as row size is concerned, the only issue is that a row can never span multiple regions so a given row can only be in one region and thus be hosted on one server at a time. JG -Original Message- From: William Kang [mailto:weliam.cl...@gmail.com] Sent: Monday, September 06, 2010 1:57 PM To: hbase-user Subject: Limits on HBase Hi folks, I know this question may have been asked many times, but I am wondering if there is any update on the optimized cell size (in megabytes) and row size (in megabytes)? Many thanks. William
Re: Limits on HBase
There are 2 definitions of random access: 1) within a file (hdfs can be less than ideal) 2) randomly getting an entire file (not usually considered random gets) for the latter, streaming an entire file from HDFS is actually pretty good. You can see performances of substantial percentages (think 80%+) of the raw disk perf. I benched hdfs and got 90MB/sec last year some time just writing raw files. -ryan On Tue, Sep 7, 2010 at 9:07 PM, William Kang weliam.cl...@gmail.com wrote: Hi, What's the performance looks like if we put large cell in HDFS vs local file system? Random access to HDFS would be slow, right? William On Tue, Sep 7, 2010 at 11:30 PM, Jonathan Gray jg...@facebook.com wrote: You can go way beyond the max region split / split size. HBase will never split the region once it is a single row, even if beyond the split size. Also, if you're using large values, you should have region sizes much larger than the default. It's common to run with 1-2GB regions in many cases. What you may have seen are recommendations that if your cell values are approaching the default block size on HDFS (64MB), you should consider putting the data directly into HDFS rather than HBase. JG -Original Message- From: William Kang [mailto:weliam.cl...@gmail.com] Sent: Tuesday, September 07, 2010 7:36 PM To: user@hbase.apache.org; apurt...@apache.org Subject: Re: Limits on HBase Hi, Thanks for your reply. How about the row size? I read that a row should not be larger than the hdfs file on region server which is 256M in default. Is it right? Many thanks. William On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell apurt...@apache.org wrote: In addition to what Jon said please be aware that if compression is specified in the table schema, it happens at the store file level -- compression happens after write I/O, before read I/O, so if you transmit a 100MB object that compresses to 30MB, the performance impact is that of 100MB, not 30MB. I also try not to go above 50MB as largest cell size, for the same reason. I have tried storing objects larger than 100MB but this can cause out of memory issues on busy regionservers no matter the size of the heap. When/if HBase RPC can send large objects in smaller chunks, this will be less of an issue. Best regards, - Andy Why is this email five sentences or less? http://five.sentenc.es/ --- On Mon, 9/6/10, Jonathan Gray jg...@facebook.com wrote: From: Jonathan Gray jg...@facebook.com Subject: RE: Limits on HBase To: user@hbase.apache.org user@hbase.apache.org Date: Monday, September 6, 2010, 4:10 PM I'm not sure what you mean by optimized cell size or whether you're just asking about practical limits? HBase is generally used with cells in the range of tens of bytes to hundreds of kilobytes. However, I have used it with cells that are several megabytes, up to about 50MB. Up at that level, I have seen some weird performance issues. The most important thing is to be sure to tweak all of your settings. If you have 20MB cells, you need to be sure to increase the flush size beyond 64MB and the split size beyond 256MB. You also need enough memory to support all this large object allocation. And of course, test test test. That's the easiest way to see if what you want to do will work :) When you run into problems, e-mail the list. As far as row size is concerned, the only issue is that a row can never span multiple regions so a given row can only be in one region and thus be hosted on one server at a time. JG -Original Message- From: William Kang [mailto:weliam.cl...@gmail.com] Sent: Monday, September 06, 2010 1:57 PM To: hbase-user Subject: Limits on HBase Hi folks, I know this question may have been asked many times, but I am wondering if there is any update on the optimized cell size (in megabytes) and row size (in megabytes)? Many thanks. William