Re: Bug when reusing org.apache.hadoop.conf.Configuration in HbaseConfiguration
Hi St.Ack, thinking about it again, I think that HBase cannot really do much about it. I checked in hadoop/trunk and Configuration still has the WeakHashmap REGISTRY that is used upon calling addDefaultResource (which is called from mapred classes at least) to reset all reachable Configuration objects plus it has addResource(InputStream). I sent a note to the hadoop mailing list on this. Thanks, Henning On Wed, 2010-10-13 at 18:06 -0400, Stack wrote: On Wed, Oct 13, 2010 at 10:09 AM, Henning Blohm henning.bl...@zfabrik.de wrote: Hi, I used Configuration cfg = new Configuration(); cfg.addResource(InputStream) on org.apache.hadoop.conf.Configuration to create hadoop configuration. That configuration is used to construct HbaseConfiguration objects several times later on: HbaseConfiguration hbcfg = new HbaseConfiguration(cfg) In TRUNK HBaseConfiguration is deprecated. Do you see same phenomena when you do HBaseConfiguration.create(cfg)? See below... At the second this leads to an error as the passed on configuraiton object is asked to load its properties again, trying to read from the inputstream once more (as the stream is memorized!!). The fact that the properties of the passed on configuration object have been reset somewhat surprising: When using the hbase configuration object constructed above in HbaseAdmin, this will eventually load JobConf which has a static initializer that calls Configuration.addDefaultResource(mapred-default.xml); which in turn goes through the Configuration.REGISTRY which holds on to any previously not yet collected Configuration objects to (hold your breath!) force them to reload their configuration by calling reloadConfiguration that (now we are there) sets properties=null. Did anybody follow that It seems there is somewhat surprising side effects in hadoop/hbase configuration handling. Wouldn't it be better to have the default resource (pragmatically) defined once in Configuration and not (even think about) touch already instantiated config objects later on? Yes. That sounds completely reasonable. See if HBaseConfiguration.create(cfg) gives you the same issue. If so, then its the way Hadoop Configuration works currently. I haven't spent time on it in a while but I remember getting into interesting scenarios loading properties. If you can confine the issue some, lets file an issue up in hadoop common? Thanks Henning, St.Ack
Advice sought for mixed hardware installation
Hi all, We are about to setup a new installation using the following machines, and CDH3 beta 3: - 10 nodes of single quad core, 8GB memory, 2x500GB SATA - 3 nodes of dual quad core, 24GB memory, 6x250GB SATA We are finding our feet, and will blog tests, metrics etc as we go but our initial usage patterns will be: - initial load of 250 million records to HBase - data harvesters pushing 300-600 records per second of insert or update (under 1KB per record) to TABLE_1 in HBase - MR job processing changed content in TABLE_1 into TABLE_2 on an (e.g.) 6 hourly cron job (potentially using co-processors in the future) - MR job processing changed content in TABLE_2 into TABLE_3 on an (e.g.) 6 hourly cron job (potentially using co-processors in the future) - MR jobs building Lucene, SOLR, PostGIS (hive+sqoop) indexes on a 6,12 or 24 hourly cron job either by a) bulk export from HBase to .txt and then Hive or custom MR processing b) hive or custom MR processing straight from HBase tables as the input format - MR jobs building analytical counts (e.g. 4 way group bys in SQL using Hive) on 6,12,4 hourly cron either by a) bulk export from HBase to .txt and then Hive / custom MR processing b) hive, MR processing straight from HBase tables To give an idea, at the moment on the 10 node cluster Hive against .txt files does full scan in 3-4 minutes (our live system is Mysql and we export to .txt for Hive) I see we have 2 options, but I am inexperienced and seek any guidance: a) run HDFS across all 13 nodes, MR on the 10 small nodes, region servers on the 3 big nodes - MR will never benefit from data locality when using HBase (? I think) b) run 2 completely separate clusters clu1: 10 nodes, HDFS, MR clu2: 3 nodes, HDFS, MR, RegionServer With option b) we would do 6 hourly exports from clu2 - clu1 and really keep the processing load off the HBase cluster We are prepared to run both, benchmark and provide metrics, but I wonder if someone has some advice beforehand. We are anticipating: - NN, 2nd NN, JT on 3 of the 10 smaller nodes - HBase master on 1 of the 3 big nodes - 1 ZK daemon on 1 of the 3 big nodes (or should we go for an assemble of 3, with one on each) Thanks for any help anyone can provide, Tim (- and Lars F.)
Re: How to set HBase to listen on aparticular IP
Below is my configuration configuration property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.rootdir/name valuehdfs://localhost:8020/hbase/value /property property namehbase.master/name value192.168.0.2:6/value /property /configuration Is there a way to set the IP address on which hbase can listen? *Clement Jebakumar,* 111/27 Keelamutharamman Kovil Street, Tenkasi, 627 811 http://www.declum.com/clement.html On 14 October 2010 00:13, Stack st...@duboce.net wrote: Its unlikely hbase is pulling 192.168.2.2 from thin air. Can you check your networking? Have you made any configuration in hbase-site.xml? Thanks, St.Ack On Wed, Oct 13, 2010 at 1:50 PM, Clement Jebakumar jeba.r...@gmail.com wrote: I was setting up HBase, and found it was trying to listent on 192.168.2.2. But my PCs IP is 192.168.0.2. SO how to setup this with hbase? Because of this, i am not able to lunch HBase master. Can some one help? *Clement Jebakumar,* 111/27 Keelamutharamman Kovil Street, Tenkasi, 627 811 http://www.declum.com/clement.html
Re: Hbase thrift interface and non ascii characters
On Wed, 13 Oct 2010 19:23:05 -0400 Stack wrote: I asked our thrift expert Bryan and he thinks it could be because of old python bindings: I think the summary is that you want to use a more recent version of Python Thrift and generate your clientside with the utf8strings option. This causes strings to be utf8-encoded and -decoded when communicating with servers. Thank your very much for this information. On the server I'm using the Hadoop distribution from Cloudera. To enable Thirft I have installed hadoop-hbase-thrift (version: 0.89.20100924+28-1~lucid-cdh3b3) On the client site I have installed the latest Python Thrift with pip from http://pypi.python.org/pypi/hbase-thrift/0.20.4 Are there newer versions available? Do you know how to generate the clientside while using the Cloudera distribution? I don't have the program thrift and can't find a Hbase.thrift file on my system. best wishes, Björn
Re: Advice sought for mixed hardware installation
That's a lot of information to digest Tim, so bear with me if I miss on some details :) a) isn't really good, the big nodes have a lot of computational power AND spindles so leaving them like that is a waste, is there's 0 locality (MR to HBase, HBase to HDFS) b) sounds weird, would need more time to think about it and let me propose c) 10 nodes with HDFS and HBase, the big nodes with HDFS and MR. - My main concern in this setup is giving HBase some processing power and lots of RAM. In this case you can give 6GB to the RSs, 1GB to the DN, and 1GB for the OS (caching, etc). - On the 3 nodes, setup MR so that it uses as many tasks as those machines can support (do they have hyper-threading? if so you can even use more than 8 tasks). At the same time, the tasks can enjoy a full 1GB of heap each. - On locality, HBase will be collocated with DNs so this is great in many ways, better than collocating HBase with MR since it's not always useful (like on a batch import job, the tasks may use different regions at the same time and you cannot predict that... so they still go on the network). - On other thing on locality, MR tasks do write intermediate data on HDFS so having them collocated with DNs will help. Regarding the master/NN/ZK, since it's a very small cluster I would use one of the small node to collocate the 3 of them (this means you will only have 9 RS). You don't really need an ensemble, unless you're planning to share that ZK setup with other apps. In any case, you should test all setups. J-D On Thu, Oct 14, 2010 at 4:51 AM, Tim Robertson timrobertson...@gmail.com wrote: Hi all, We are about to setup a new installation using the following machines, and CDH3 beta 3: - 10 nodes of single quad core, 8GB memory, 2x500GB SATA - 3 nodes of dual quad core, 24GB memory, 6x250GB SATA We are finding our feet, and will blog tests, metrics etc as we go but our initial usage patterns will be: - initial load of 250 million records to HBase - data harvesters pushing 300-600 records per second of insert or update (under 1KB per record) to TABLE_1 in HBase - MR job processing changed content in TABLE_1 into TABLE_2 on an (e.g.) 6 hourly cron job (potentially using co-processors in the future) - MR job processing changed content in TABLE_2 into TABLE_3 on an (e.g.) 6 hourly cron job (potentially using co-processors in the future) - MR jobs building Lucene, SOLR, PostGIS (hive+sqoop) indexes on a 6,12 or 24 hourly cron job either by a) bulk export from HBase to .txt and then Hive or custom MR processing b) hive or custom MR processing straight from HBase tables as the input format - MR jobs building analytical counts (e.g. 4 way group bys in SQL using Hive) on 6,12,4 hourly cron either by a) bulk export from HBase to .txt and then Hive / custom MR processing b) hive, MR processing straight from HBase tables To give an idea, at the moment on the 10 node cluster Hive against .txt files does full scan in 3-4 minutes (our live system is Mysql and we export to .txt for Hive) I see we have 2 options, but I am inexperienced and seek any guidance: a) run HDFS across all 13 nodes, MR on the 10 small nodes, region servers on the 3 big nodes - MR will never benefit from data locality when using HBase (? I think) b) run 2 completely separate clusters clu1: 10 nodes, HDFS, MR clu2: 3 nodes, HDFS, MR, RegionServer With option b) we would do 6 hourly exports from clu2 - clu1 and really keep the processing load off the HBase cluster We are prepared to run both, benchmark and provide metrics, but I wonder if someone has some advice beforehand. We are anticipating: - NN, 2nd NN, JT on 3 of the 10 smaller nodes - HBase master on 1 of the 3 big nodes - 1 ZK daemon on 1 of the 3 big nodes (or should we go for an assemble of 3, with one on each) Thanks for any help anyone can provide, Tim (- and Lars F.)
Re: How to set HBase to listen on aparticular IP
Is 192.168.0.2 == localhost? Try setting hbase.rootdir to hdfs://192.168.0.2:8020/hbase. See what happens. Is it possible that localhost is 192.168.2.2 in your local networking? St.Ack On Thu, Oct 14, 2010 at 4:09 AM, Clement Jebakumar jeba.r...@gmail.com wrote: Below is my configuration configuration property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.rootdir/name valuehdfs://localhost:8020/hbase/value /property property namehbase.master/name value192.168.0.2:6/value /property /configuration Is there a way to set the IP address on which hbase can listen? *Clement Jebakumar,* 111/27 Keelamutharamman Kovil Street, Tenkasi, 627 811 http://www.declum.com/clement.html On 14 October 2010 00:13, Stack st...@duboce.net wrote: Its unlikely hbase is pulling 192.168.2.2 from thin air. Can you check your networking? Have you made any configuration in hbase-site.xml? Thanks, St.Ack On Wed, Oct 13, 2010 at 1:50 PM, Clement Jebakumar jeba.r...@gmail.com wrote: I was setting up HBase, and found it was trying to listent on 192.168.2.2. But my PCs IP is 192.168.0.2. SO how to setup this with hbase? Because of this, i am not able to lunch HBase master. Can some one help? *Clement Jebakumar,* 111/27 Keelamutharamman Kovil Street, Tenkasi, 627 811 http://www.declum.com/clement.html
Re: Number of column families vs Number of column family qualifiers
So, even if I use get.addColumn(byte[] family, byte[] qualifier) for a certain cell, the HBase will have to traverse from the beginning of the column family to the qualifier I defined? Is it because HBase has to traverse all the blocks in the HFile to find the row key or the qualifier? The answer is different for 0.20 and 0.90, but the short version would be: sometimes yes and sometimes not all of the KVs will be read. HBase is getting better at this but there's still work to do. I am confused here, in the keyvalue pairs in the data block, does the key refer to the row key or it refer to qualifier? Where is the row key and where is the qualifier? This has bothered me for a while. It would be nice to figure it out. Many thanks. Down in the HBase internals we use KeyValue where the key is basically row + family + qualifier + timestamp. See http://hbase.apache.org/docs/r0.89.20100924/apidocs/org/apache/hadoop/hbase/KeyValue.html J-D
Re: Advice sought for mixed hardware installation
Thanks again. One of the things we struggle with currently on the RDBMS, is the organisation of 250million records to complex taxonomies, and also point in polygon intersections. Having such memory available the MR jobs allows us to consider loading taxonomies / polygons / RTree indexes into memory to do those calculations in parallel with MR. I was playing with that a couple of years ago when I first ventured into Hadoop (http://biodivertido.blogspot.com/2008/11/reproducing-spatial-joins-using-hadoop.html) but might get back into it... Tim On Thu, Oct 14, 2010 at 8:07 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: I had it in my mind that HBase liked big memory, hence assuming the region servers should stay on the 24G machines with plenty of memory at their disposal. We'll come up with a test platform and then try some benchmarking and do a blog on it all and share. They do, but because of JVM limitations the recommended setting is around 4-8GB. Giving more would cause bigger heap fragmentation issues, leading to full GC pauses, which could cause session timeouts. J-D
Re: Advice sought for mixed hardware installation
That'd be one more argument to put MR only on the big memory machines. J-D On Thu, Oct 14, 2010 at 2:15 PM, Tim Robertson timrobertson...@gmail.com wrote: Thanks again. One of the things we struggle with currently on the RDBMS, is the organisation of 250million records to complex taxonomies, and also point in polygon intersections. Having such memory available the MR jobs allows us to consider loading taxonomies / polygons / RTree indexes into memory to do those calculations in parallel with MR. I was playing with that a couple of years ago when I first ventured into Hadoop (http://biodivertido.blogspot.com/2008/11/reproducing-spatial-joins-using-hadoop.html) but might get back into it... Tim On Thu, Oct 14, 2010 at 8:07 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: I had it in my mind that HBase liked big memory, hence assuming the region servers should stay on the 24G machines with plenty of memory at their disposal. We'll come up with a test platform and then try some benchmarking and do a blog on it all and share. They do, but because of JVM limitations the recommended setting is around 4-8GB. Giving more would cause bigger heap fragmentation issues, leading to full GC pauses, which could cause session timeouts. J-D
Re: Help needed! Performance related questions
Hi guys, Thanks so much for answering my questions. I really appreciate that. They helps a lot! I have a few more follow up questions though. 1. about the row searching mechanism, I understand the part before the HBase locate where the row resides in which region. I am confused after that. So, I am going to write down what I understand so far, please correct me if it's wrong. a. The HRegion Store identifies where the row is in which HFile. b. There is a block index in HFile identify which block this row resides. c. If the row size is smaller than block size (which mean a block has multiple rows), HBase has to traverse in that block to locate the row matching the key. The traverse is sequence traverse. 2. And if the row size is larger than the block size, what's going to happen? Does the block index in HFile point to multiple blocks which contains different cells of that row? 3. Does a column family has to reside inside one block, which means a column family cannot be larger than a block? Many thanks! William On Thu, Oct 14, 2010 at 2:16 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: If you could answer any of these following questions, I would greatly grateful for that. People usually give me beer in exchange for quick help, let me know if that works for you ;) 1. For cell size, why it should not be larger than 20m in general? General answer: it pokes HBase in all the corner cases. You have to change a lot of default configs in order to keep some sort of efficiency. 2. What is the block size if the cell is 20m? Can a cell covers multiple blocks? No, one HFile block per cell (KeyValue) in this case. It basically gives you a perfect index. 3. For single cell column family (it has only one cell), does it share the same size limit as cell? In other words, does single column family should be smaller than 20m? It's the same to me. 4. Is there any advantage to put rows close in HBase, if these rows have a high chance to be queried together? If you do Scans, then you want your rows together right? 5. Any general rule for row size? Try not to go into the MBs, it's currently missing some optimizations that would make this use case work perfectly. 6. Where does the HReigion host the row keys in HFile or other files? Block index in HFile, not all the row keys are there if a single block fits more than one row. J-D
Re: Incredibly slow response to Scan
If setCaching didn't make a difference, either the API isn't used properly or the bottleneck is elsewhere. J-D On Thu, Oct 7, 2010 at 9:33 PM, Venkatesh vramanatha...@aol.com wrote: J-D et.al I've put the mapreduce issue that I had in the back burner for now. I'm getting incredible slow response to Scan.. On a 10 node cluster for a table with 1200 regions ..it takes 20 minutes to scan a column with given value..Got 100 or so records for the response.. Is this normal? thanks venkatesh PS. setCaching(100) ..did n't make a dent in performance
Re: Help needed! Performance related questions
1. about the row searching mechanism, I understand the part before the HBase locate where the row resides in which region. I am confused after that. So, I am going to write down what I understand so far, please correct me if it's wrong. a. The HRegion Store identifies where the row is in which HFile. b. There is a block index in HFile identify which block this row resides. c. If the row size is smaller than block size (which mean a block has multiple rows), HBase has to traverse in that block to locate the row matching the key. The traverse is sequence traverse. More or less. 2. And if the row size is larger than the block size, what's going to happen? Does the block index in HFile point to multiple blocks which contains different cells of that row? The block index stores full keys, row+family+qualifier+timestamp, so it's not talking in terms of total row size. A single row can have multiple blocks (in multiple files) with possibly as many entries in the block index. If a single cell is larger than the block size, then the size of that block will be the size of that cell. 3. Does a column family has to reside inside one block, which means a column family cannot be larger than a block? My previous answer covers this. J-D
Re: hbase.client.retries.number
Thanks J-D Yeah..Found out the hard way in prod :) set to zero..since client requests were backing up.. everything stopped working/region server would n't come up..etc..(did not realize hbase client property would be used by server :) I reverted all retries back to default.. So far everything seems good...(fingers crossed).after making several tunables along the way.. - Using HBase 0.20.6 -Processing about 300 million event puts -85% of requests are under 10 milli.sec..while the mean is about 300 millisecs..Trying to narrow that..if it's during our client GC or Hbase pause..Tuning region server handler count -mapreduce job to process 40 million records takes about an hour..Majority in the reduce phase. Trying to optimize that..by varying buffer size of writes..Going to try the in_memory option as well. - Full table scan takes about 30 minutes..Is that reasonable for a table size of 10 mill records? hbase.client.scanner.caching - If set in hbase-site.xml, Scan calls should pick that up correct? thanks venkatesh -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Thu, Oct 14, 2010 2:39 pm Subject: Re: hbase.client.retries.number hbase.client.retries.number is used by HConnectionManager, so this means anything that uses the HBase client. I think some parts of the region server code use it, or used it at some point, I'd have to dig in. But definitely never set this to 0, as any region move/split will kill your client, About this RetriesExhaustedException, it seems that either the region is in an unknown state or that it just took a lot of time to close and be moved. You need to correlate this with the master log (look for this region's name) since the client cannot possibly know what went on inside the cluster. Also, which version are you using? J-D On Mon, Oct 11, 2010 at 3:06 PM, Venkatesh vramanatha...@aol.com wrote: BTW..get this exception while trying a new put.. Also, get this exception on gets on some region servers org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server, retryOnlyOne=true, index=0, islastrow=true, tries=9, numtries=10, i=0, listsize=1, region=user_activity,1286789413060_atanackovics_30306_4a3e0812,1286789581757 for region user_activity,1286789413060_30306_4a3e0812,1286789581757, row '1286823659253_v6_1_df34b22f', but failed after 10 attempts. Exceptions: org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1149) org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1230) org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666) org.apache.hadoop.hbase.client.HTable.close(HTable.java:682) com.aol.mail.antispam.Profiler.notifyEmailSendActivity.processGetRequest(notifyEmailSendActivity.java:363) com.aol.mail.antispam.Profiler.notifyEmailSendActivity.doGet(notifyEmailSendActivity.java:450) javax.servlet.http.HttpServlet.service(HttpServlet.java:617) javax.servlet.http.HttpServlet.service(HttpServlet.java:717) -Original Message- From: Venkatesh vramanatha...@aol.com To: user@hbase.apache.org Sent: Mon, Oct 11, 2010 2:35 pm Subject: hbase.client.retries.number HBase was seamless for first couple of weeks..now all kinds of issues in production :) fun fun.. Curious ..does this property have to match up on hbase client side region server side.. I've this number set to 0 on region server side default on client side.. I can't do any put (new) thanks venkatesh
Re: Help needed! Performance related questions
Hey J-D, Thanks a lot! That has cleared a lot of my confusions. :) I really appreciate it. William On Thu, Oct 14, 2010 at 2:51 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: 1. about the row searching mechanism, I understand the part before the HBase locate where the row resides in which region. I am confused after that. So, I am going to write down what I understand so far, please correct me if it's wrong. a. The HRegion Store identifies where the row is in which HFile. b. There is a block index in HFile identify which block this row resides. c. If the row size is smaller than block size (which mean a block has multiple rows), HBase has to traverse in that block to locate the row matching the key. The traverse is sequence traverse. More or less. 2. And if the row size is larger than the block size, what's going to happen? Does the block index in HFile point to multiple blocks which contains different cells of that row? The block index stores full keys, row+family+qualifier+timestamp, so it's not talking in terms of total row size. A single row can have multiple blocks (in multiple files) with possibly as many entries in the block index. If a single cell is larger than the block size, then the size of that block will be the size of that cell. 3. Does a column family has to reside inside one block, which means a column family cannot be larger than a block? My previous answer covers this. J-D
Re: Increase region server throughput
Though this setup, setautoflush(false), increases the thoughput, the data loss rate increases significantly -- there is no way for the client to know what has been lost and what has gone through. That bothers me. Sean On Tue, Oct 12, 2010 at 11:32 AM, Stack st...@duboce.net wrote: Have you played with these settings HTable API? http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean) http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/client/HTable.html#setWriteBufferSize(long) There is something seriously wrong if you are seeing 5 seconds per put (unless your put is gigabytes in size?). Are you doing 'new HTable(tablename)' in your client or are you doing 'new HTable(conf, tablename)' in your client code? Do the latter if not -- share the configuration with HTable instances. St.Ack On Mon, Oct 11, 2010 at 10:47 PM, Venkatesh vramanatha...@aol.com wrote: I would like to tune region server to increase throughput..On a 10 node cluster, I'm getting 5 sec per put. (this is unbatched/unbuffered). Other than region server handler count property is there anything else I can tune to increase throughput? ( this operation i can't use buffered write without code change) thx venkatesh
Re: Increase region server throughput
Thanks St.Ack. I've both those settings autoflush writebuffer size. I'll try the new HTable(conf, ..)..(I just have new HTable(table) now) Right now upto 85% under 10ms..I'm trying to bring the mean down PS: I can tolerate some loss of data for getting better throughput. -Original Message- From: Sean Bigdatafun sean.bigdata...@gmail.com To: user@hbase.apache.org Sent: Thu, Oct 14, 2010 8:11 pm Subject: Re: Increase region server throughput Though this setup, setautoflush(false), increases the thoughput, the data loss rate increases significantly -- there is no way for the client to know what has been lost and what has gone through. That bothers me. Sean On Tue, Oct 12, 2010 at 11:32 AM, Stack st...@duboce.net wrote: Have you played with these settings HTable API? http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean) http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/client/HTable.html#setWriteBufferSize(long) There is something seriously wrong if you are seeing 5 seconds per put (unless your put is gigabytes in size?). Are you doing 'new HTable(tablename)' in your client or are you doing 'new HTable(conf, tablename)' in your client code? Do the latter if not -- share the configuration with HTable instances. St.Ack On Mon, Oct 11, 2010 at 10:47 PM, Venkatesh vramanatha...@aol.com wrote: I would like to tune region server to increase throughput..On a 10 node cluster, I'm getting 5 sec per put. (this is unbatched/unbuffered). Other than region server handler count property is there anything else I can tune to increase throughput? ( this operation i can't use buffered write without code change) thx venkatesh
Does HBase need caching layer if it served webpage
Google mentioned that it uses Bigtable to serve map-tiles, which basically means it uses Bigtable to serve huge amount of data online, but it does not mention any caching layer. But we all know HBase's random read performance sucks (this is true for Bigtable as well, see their paper: even you increase the number of servers for 500 times, the random read throughput just increases 3-4 times). I am wonder how are people handle it, any experience to share? Also, can someone talk about what kind of caching layer can be added if anyone practieces that? Sean
Re: Limits on HBase
If you have a single row that approaches then exceeds the size of a region, eventually you will end up having that row as a single region, with the region encompassing only that one region. The reason for HBase and bigtable is that the overhead that HDFS has... every file in HDFS uses a size of RAM that is not dependent on the size of the file. Meaning the more files you have, that are small, you use more and more RAM and run out of namenode scalability. So HBase exists to store smaller values. There is some overhead. Thus once you start putting in larger values, you might as well avoid the overhead and go straight to/from HDFS. -ryan On Thu, Oct 14, 2010 at 5:23 PM, Sean Bigdatafun sean.bigdata...@gmail.com wrote: Let me ask this question from another angle: The first question is --- if I have millions of column in a column family in the same row, such that the sum of the key-value pairs exceeds 256MB, what will happen? example: I have a column with key of 256bytes, and the value of 2K, then let's assume (256 + timestampe size + 2056) ~=2.5k, then I understand I can at most story 256 * 1024 / 2.5 = 104,875 columns in this column family at this row. Anyone has comments on the math I gave above? The second question is -- By the way, if I do not turn on the LZO, is my data also compressed (by the system)? -- if so, then the above number will increase a couple of times, but still there exists a number for the limit of how many columns I can put in a row. The third question is -- If I do turn on LZO, does that mean the value get compressed first, and then the HBase mechanism further compress the key-value pair? Thanks, Sean On Tue, Sep 7, 2010 at 8:30 PM, Jonathan Gray jg...@facebook.com wrote: You can go way beyond the max region split / split size. HBase will never split the region once it is a single row, even if beyond the split size. Also, if you're using large values, you should have region sizes much larger than the default. It's common to run with 1-2GB regions in many cases. What you may have seen are recommendations that if your cell values are approaching the default block size on HDFS (64MB), you should consider putting the data directly into HDFS rather than HBase. JG -Original Message- From: William Kang [mailto:weliam.cl...@gmail.com] Sent: Tuesday, September 07, 2010 7:36 PM To: user@hbase.apache.org; apurt...@apache.org Subject: Re: Limits on HBase Hi, Thanks for your reply. How about the row size? I read that a row should not be larger than the hdfs file on region server which is 256M in default. Is it right? Many thanks. William On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell apurt...@apache.org wrote: In addition to what Jon said please be aware that if compression is specified in the table schema, it happens at the store file level -- compression happens after write I/O, before read I/O, so if you transmit a 100MB object that compresses to 30MB, the performance impact is that of 100MB, not 30MB. I also try not to go above 50MB as largest cell size, for the same reason. I have tried storing objects larger than 100MB but this can cause out of memory issues on busy regionservers no matter the size of the heap. When/if HBase RPC can send large objects in smaller chunks, this will be less of an issue. Best regards, - Andy Why is this email five sentences or less? http://five.sentenc.es/ --- On Mon, 9/6/10, Jonathan Gray jg...@facebook.com wrote: From: Jonathan Gray jg...@facebook.com Subject: RE: Limits on HBase To: user@hbase.apache.org user@hbase.apache.org Date: Monday, September 6, 2010, 4:10 PM I'm not sure what you mean by optimized cell size or whether you're just asking about practical limits? HBase is generally used with cells in the range of tens of bytes to hundreds of kilobytes. However, I have used it with cells that are several megabytes, up to about 50MB. Up at that level, I have seen some weird performance issues. The most important thing is to be sure to tweak all of your settings. If you have 20MB cells, you need to be sure to increase the flush size beyond 64MB and the split size beyond 256MB. You also need enough memory to support all this large object allocation. And of course, test test test. That's the easiest way to see if what you want to do will work :) When you run into problems, e-mail the list. As far as row size is concerned, the only issue is that a row can never span multiple regions so a given row can only be in one region and thus be hosted on one server at a time. JG -Original Message- From: William Kang [mailto:weliam.cl...@gmail.com] Sent: Monday, September 06, 2010 1:57 PM To: hbase-user Subject: Limits on HBase
Re: Does HBase need caching layer if it served webpage
HBase has excellent performance, and is suitable for serving webpages. We do this at Stumbleupon. We can get responses out of hbase in millisecond oriented intervals, and generally things are great. You will need to adjust all these expectations for your environment - we run i7 based hardware in our own datacenter. You can't really get much better perf for your bucks, and ec2 has substantially lower performance. -ryan On Thu, Oct 14, 2010 at 5:32 PM, Sean Bigdatafun sean.bigdata...@gmail.com wrote: Google mentioned that it uses Bigtable to serve map-tiles, which basically means it uses Bigtable to serve huge amount of data online, but it does not mention any caching layer. But we all know HBase's random read performance sucks (this is true for Bigtable as well, see their paper: even you increase the number of servers for 500 times, the random read throughput just increases 3-4 times). I am wonder how are people handle it, any experience to share? Also, can someone talk about what kind of caching layer can be added if anyone practieces that? Sean
hmaster reports 0 region servers
I applied the patch for HBASE-2939. (The patch is for 0.89 but my code is 0.20.6, I checked the patch found it only changed one connection thread at client side to a pool strategy.) But when I rebuild the source and start hbase cluster. The master cannot recognize regionservers though they are running. Any one encounter this problem before? Thanks for any suggestions. See the logs below. master: 2010-10-15 18:25:13,220 INFO org.apache.hadoop.hbase.master.ServerManager: 0 region servers, 0 dead, average load NaN 2010-10-15 18:25:13,268 INFO org.apache.hadoop.hbase.master.BaseScanner: All 0 .META. region(s) scanned 2010-10-15 18:26:13,224 INFO org.apache.hadoop.hbase.master.ServerManager: 0 region servers, 0 dead, average load NaN 2010-10-15 18:26:13,272 INFO org.apache.hadoop.hbase.master.BaseScanner: All 0 .META. region(s) scanned 2010-10-15 18:27:13,228 INFO org.apache.hadoop.hbase.master.ServerManager: 0 region servers, 0 dead, average load NaN 2010-10-15 18:27:13,276 INFO org.apache.hadoop.hbase.master.BaseScanner: All 0 .META. region(s) scanned zookeeper: 2010-10-15 18:21:03,577 INFO org.apache.zookeeper.server.NIOServerCnxn: Connected to /10.1.0.15:42299 lastZxid 0 2010-10-15 18:21:03,577 INFO org.apache.zookeeper.server.NIOServerCnxn: Creating new session 0x12baf682c67 2010-10-15 18:21:03,578 INFO org.apache.zookeeper.server.NIOServerCnxn: Connected to /10.1.0.17:45088 lastZxid 0 2010-10-15 18:21:03,579 INFO org.apache.zookeeper.server.NIOServerCnxn: Creating new session 0x12baf682c670001 2010-10-15 18:21:03,580 INFO org.apache.zookeeper.server.NIOServerCnxn: Connected to /10.1.0.16:40916 lastZxid 0 2010-10-15 18:21:03,580 INFO org.apache.zookeeper.server.NIOServerCnxn: Creating new session 0x12baf682c670002 2010-10-15 18:21:03,587 INFO org.apache.zookeeper.server.NIOServerCnxn: Finished init of 0x12baf682c67 valid:true 2010-10-15 18:21:03,588 INFO org.apache.zookeeper.server.NIOServerCnxn: Finished init of 0x12baf682c670001 valid:true 2010-10-15 18:21:03,588 INFO org.apache.zookeeper.server.NIOServerCnxn: Finished init of 0x12baf682c670002 valid:true 2010-10-15 18:21:03,730 INFO org.apache.zookeeper.server.NIOServerCnxn: Connected to /10.1.0.20:57115 lastZxid 0 2010-10-15 18:21:03,730 INFO org.apache.zookeeper.server.NIOServerCnxn: Creating new session 0x12baf682c670003 2010-10-15 18:21:03,731 INFO org.apache.zookeeper.server.NIOServerCnxn: Finished init of 0x12baf682c670003 valid:true 2010-10-15 18:21:03,776 INFO org.apache.zookeeper.server.NIOServerCnxn: Connected to /10.1.0.19:29878 lastZxid 0 2010-10-15 18:21:03,776 INFO org.apache.zookeeper.server.NIOServerCnxn: Creating new session 0x12baf682c670004 2010-10-15 18:21:03,777 INFO org.apache.zookeeper.server.NIOServerCnxn: Finished init of 0x12baf682c670004 valid:true 2010-10-15 18:21:03,831 INFO org.apache.zookeeper.server.NIOServerCnxn: Connected to /10.1.0.22:40435 lastZxid 0 2010-10-15 18:21:03,831 INFO org.apache.zookeeper.server.NIOServerCnxn: Creating new session 0x12baf682c670005 2010-10-15 18:21:03,832 INFO org.apache.zookeeper.server.NIOServerCnxn: Finished init of 0x12baf682c670005 valid:true 2010-10-15 18:21:13,211 INFO org.apache.zookeeper.server.NIOServerCnxn: Connected to /10.1.0.14:36176 lastZxid 0 2010-10-15 18:21:13,211 INFO org.apache.zookeeper.server.NIOServerCnxn: Creating new session 0x12baf682c670006 2010-10-15 18:21:13,212 INFO org.apache.zookeeper.server.NIOServerCnxn: Finished init of 0x12baf682c670006 valid:true 2010-10-15 18:23:06,003 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session 0x12baf34a9d70003 2010-10-15 18:23:06,003 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x12baf34a9d70003 2010-10-15 18:23:06,003 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session 0x12baf34a9d70006 2010-10-15 18:23:06,003 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x12baf34a9d70006 2010-10-15 18:23:06,003 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session 0x12baf34a9d70005 2010-10-15 18:23:06,003 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x12baf34a9d70005 2010-10-15 18:23:06,003 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session 0x12baf34a9d70001 2010-10-15 18:23:06,003 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x12baf34a9d70001 2010-10-15 18:23:06,003 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session 0x12baf34a9d70004 2010-10-15 18:23:06,003 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x12baf34a9d70004 2010-10-15 18:23:06,003 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session 0x12baf34a9d70002 2010-10-15 18:23:06,003 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x12baf34a9d70002 2010-10-15 18:23:06,004 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session 0x12baf34a9d70007 2010-10-15 18:23:06,004 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring