re: A question about Hmaster startup.
It reproduces when HMaster is started for the first time and NN is started without starting DN. So, It may be nothing. Hbase version 0.90.1 : public static void waitOnSafeMode(final Configuration conf, final long wait) throws IOException { FileSystem fs = FileSystem.get(conf); if (!(fs instanceof DistributedFileSystem)) return; DistributedFileSystem dfs = (DistributedFileSystem)fs; // Are there any data nodes up yet? // Currently the safe mode check falls through if the namenode is up but no // datanodes have reported in yet. try { // This code is deleted while (dfs.getDataNodeStats().length == 0) { LOG.info(Waiting for dfs to come up...); try { Thread.sleep(wait); } catch (InterruptedException e) { //continue } } } catch (IOException e) { // getDataNodeStats can fail if superuser privilege is required to run // the datanode report, just ignore it } // Make sure dfs is not in safe mode while (dfs.setSafeMode(FSConstants.SafeModeAction.SAFEMODE_GET)) { LOG.info(Waiting for dfs to exit safe mode...); try { Thread.sleep(wait); } catch (InterruptedException e) { //continue } } } Hbase version 0.90.2: public static void waitOnSafeMode(final Configuration conf, final long wait) throws IOException { FileSystem fs = FileSystem.get(conf); if (!(fs instanceof DistributedFileSystem)) return; DistributedFileSystem dfs = (DistributedFileSystem)fs; // Make sure dfs is not in safe mode while (dfs.setSafeMode(FSConstants.SafeModeAction.SAFEMODE_GET)) { LOG.info(Waiting for dfs to exit safe mode...); try { Thread.sleep(wait); } catch (InterruptedException e) { //continue } } } -邮件原件- 发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack 发送时间: 2011年4月19日 13:15 收件人: user@hbase.apache.org 主题: Re: A question about Hmaster startup. On Mon, Apr 18, 2011 at 9:26 PM, Gaojinchao gaojinc...@huawei.com wrote: Sorry. My queston is: If HMaster is started after NN without starting DN in Hbase 090.2 then HMaster is not able to start due to AlreadyCreatedException for /hbase/hbase.version. In hbase version 0.90.1, It will wait for data node start up. I try to dig the code and find the code changes in hbase version 0.90.2 and can't find issue for this. Thanks for digging in. I don't see the code block you are referring to in HMaster in 0.90.1. As per J-D, its out in FSUtils.java when we get to 0.90 (I checked 0.90.0 and its not there either). What you are seeing seems similar to: HBASE-3502 Can't open region because can't open .regioninfo because AlreadyBeingCreatedException except in your case its hbase.version. Is there another master running by chance that still has the lease on this file? Looking at code, it should be doing as it used to. We go into checkRootDir and first thing we call is FSUtils.waitOnSafeMode and then we just hang there till dfs says its left safe mode. Maybe add some logging in there? St.Ack
Re: A question about Hmaster startup.
I think it need fix. Because Hmaster can't start up when DN is up. Can It recover the code ? Hmaster logs. 2011-04-19 16:49:09,208 DEBUG org.apache.hadoop.hbase.master.ActiveMasterManager: A master is now available 2011-04-19 16:49:09,400 WARN org.apache.hadoop.hbase.util.FSUtils: Version file was empty, odd, will try to set it. 2011-04-19 16:51:09,674 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /hbase/hbase.version could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1310) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:469) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962) at org.apache.hadoop.ipc.Client.call(Client.java:817) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221) at $Proxy5.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy5.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3000) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2881) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2139) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2329) 2011-04-19 16:51:09,674 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null 2011-04-19 16:51:09,674 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file /hbase/hbase.version - Aborting... 2011-04-19 16:51:09,674 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to create version file at hdfs://C4C1:9000/hbase, retrying: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /hbase/hbase.version could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1310) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:469) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962) at org.apache.hadoop.ipc.Client.call(Client.java:817) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221) at $Proxy5.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy5.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3000) at
Re: Do u know more details about facebook use hbase
I don't know the details but I believe they had a good idea of the key space since versions of the applications now running on hbase were migrated from elsewhere. In conversations, they've said that they have disabled splitting and run splits manually on Tuesdays from which I understand, someone takes a look at the cluster periodically to see how its doing and if a region seems hot, they'll manually split it from shell or UI. St.Ack On Mon, Apr 18, 2011 at 10:58 PM, BlueDavy Lin blued...@gmail.com wrote: hi! In QCon facebook said they use constant region counts rolling splits,but do anybody know more details about it,such as how it initialy define regions,now we occur the problem is it difficultly to design the initial region range. -- = | BlueDavy | | http://www.bluedavy.com | =
Re: A question about Hmaster startup.
Mind making an issue and a patch? We can apply it for 0.90.3 which should be out soon. Thank you Gaojinchao. St.Ack 2011/4/19 Gaojinchao gaojinc...@huawei.com: I think it need fix. Because Hmaster can't start up when DN is up. Can It recover the code ? Hmaster logs. 2011-04-19 16:49:09,208 DEBUG org.apache.hadoop.hbase.master.ActiveMasterManager: A master is now available 2011-04-19 16:49:09,400 WARN org.apache.hadoop.hbase.util.FSUtils: Version file was empty, odd, will try to set it. 2011-04-19 16:51:09,674 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /hbase/hbase.version could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1310) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:469) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962) at org.apache.hadoop.ipc.Client.call(Client.java:817) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221) at $Proxy5.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy5.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3000) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2881) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2139) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2329) 2011-04-19 16:51:09,674 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null 2011-04-19 16:51:09,674 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file /hbase/hbase.version - Aborting... 2011-04-19 16:51:09,674 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to create version file at hdfs://C4C1:9000/hbase, retrying: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /hbase/hbase.version could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1310) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:469) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962) at org.apache.hadoop.ipc.Client.call(Client.java:817) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221) at $Proxy5.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at
Re: HBase 0.90.2 CDH3B4 -Compression algorithm 'lzo' previously failed test
Good morning. No it seems test was successful. Native libraries located in /home/ hbase/hadoop-lzo-0.4.10/lib/native/Linux-amd64-64/ that were compiled after built project. [hbase@dhbasetest01 shell]$ ../hbase/bin/hbase org.apache.hadoop.hbase.util.CompressionTest hdfs:// dhbasetest01.tag-dev.com:9000/hbase lzo 11/04/19 09:47:39 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library 11/04/19 09:47:40 INFO lzo.LzoCodec: Successfully loaded initialized native-lzo library [hadoop-lzo rev 4a3bd479610257d27e4ece33ce3f772ea6830d6d] 11/04/19 09:47:40 INFO compress.CodecPool: Got brand-new compressor On Mon, Apr 18, 2011 at 10:47 PM, Stack st...@duboce.net wrote: It looks like its still not installed correctly; HBase when it runs can't find it (You restarted between installs? What does the CompressionTool say when you run it? See http://hbase.apache.org/book.html#compression The same thing?) Where did you put the native libs? St.Ack On Mon, Apr 18, 2011 at 10:41 PM, Vadim Keylis vkeylis2...@gmail.com wrote: Good evening. I've read the article( https://github.com/toddlipcon/hadoop-lzo) and configured the way it describes, but I still get the same error. 2011-04-18 15:09:11,635 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=user,10L\xA2\x13\xDB\xDB\xDB\xDB\xDA\xDA\xB0\x05Z\xCCw!\xCC\x93\xB0!\xCC\x93\x93\xB0!\xCCw!\xCCw!\xCCw ,1303164462772.dbbb665cfc461075670f06f4dfe6b632. java.io.IOException: Compression algorithm 'lzo' previously failed test. at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:77) at org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:2555) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2544) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2532) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:262) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:94) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) I check out hadoop-lzo from the link you provided and rebuild the library based on the instruction in FAQ. Here my configuration: hadoop-env.sh: export JAVA_LIBRARY_PATH=/home/hbase/hadoop-lzo-0.4.10/lib/native/Linux-amd64-64/ # Extra Java CLASSPATH elements. Optional. export HADOOP_CLASSPATH=/home/hbase/hadoop-lzo-0.4.10/hadoop-lzo-0.4.10.jar core-site.xml property nameio.compression.codecs/name valueorg.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec/value /property property nameio.compression.codec.lzo.class/name valuecom.hadoop.compression.lzo.LzoCodec/value /property hbase-env.sh: export HBASE_CLASSPATH=/home/hbase/ hadoop-lzo-0.4.10/hadoop-lzo-0.4.10.jar:$HBASE_CLASSPATH What am I doing wrong
Re: HBase 0.90.2 CDH3B4 -Compression algorithm 'lzo' previously failed test
Thanks so much. Figure the problem that caused lzo not to work. Thanks again. On Tue, Apr 19, 2011 at 9:50 AM, Vadim Keylis vkeylis2...@gmail.com wrote: Good morning. No it seems test was successful. Native libraries located in /home/ hbase/hadoop-lzo-0.4.10/lib/native/Linux-amd64-64/ that were compiled after built project. [hbase@dhbasetest01 shell]$ ../hbase/bin/hbase org.apache.hadoop.hbase.util.CompressionTest hdfs:// dhbasetest01.tag-dev.com:9000/hbase lzo 11/04/19 09:47:39 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library 11/04/19 09:47:40 INFO lzo.LzoCodec: Successfully loaded initialized native-lzo library [hadoop-lzo rev 4a3bd479610257d27e4ece33ce3f772ea6830d6d] 11/04/19 09:47:40 INFO compress.CodecPool: Got brand-new compressor On Mon, Apr 18, 2011 at 10:47 PM, Stack st...@duboce.net wrote: It looks like its still not installed correctly; HBase when it runs can't find it (You restarted between installs? What does the CompressionTool say when you run it? See http://hbase.apache.org/book.html#compression The same thing?) Where did you put the native libs? St.Ack On Mon, Apr 18, 2011 at 10:41 PM, Vadim Keylis vkeylis2...@gmail.com wrote: Good evening. I've read the article( https://github.com/toddlipcon/hadoop-lzo) and configured the way it describes, but I still get the same error. 2011-04-18 15:09:11,635 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=user,10L\xA2\x13\xDB\xDB\xDB\xDB\xDA\xDA\xB0\x05Z\xCCw!\xCC\x93\xB0!\xCC\x93\x93\xB0!\xCCw!\xCCw!\xCCw ,1303164462772.dbbb665cfc461075670f06f4dfe6b632. java.io.IOException: Compression algorithm 'lzo' previously failed test. at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:77) at org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:2555) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2544) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2532) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:262) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:94) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) I check out hadoop-lzo from the link you provided and rebuild the library based on the instruction in FAQ. Here my configuration: hadoop-env.sh: export JAVA_LIBRARY_PATH=/home/hbase/hadoop-lzo-0.4.10/lib/native/Linux-amd64-64/ # Extra Java CLASSPATH elements. Optional. export HADOOP_CLASSPATH=/home/hbase/hadoop-lzo-0.4.10/hadoop-lzo-0.4.10.jar core-site.xml property nameio.compression.codecs/name valueorg.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec/value /property property nameio.compression.codec.lzo.class/name valuecom.hadoop.compression.lzo.LzoCodec/value /property hbase-env.sh: export HBASE_CLASSPATH=/home/hbase/ hadoop-lzo-0.4.10/hadoop-lzo-0.4.10.jar:$HBASE_CLASSPATH What am I doing wrong
[ANN]: HBaseWD: Distribute Sequential Writes in HBase
Hello guys, I'd like to introduce a new small java project/lib around HBase: HBaseWD. It is aimed to help with distribution of the load (across regionservers) when writing sequential (becasue of the row key nature) records. It implements the solution which was discussed several times on this mailing list (e.g. here: http://search-hadoop.com/m/gNRA82No5Wk). Please find the sources at https://github.com/sematext/HBaseWD (there's also a jar of current version for convenience). It is very easy to make use of it: e.g. I added it to one existing project with 1+2 lines of code (one where I write to HBase and 2 for configuring MapReduce job). Any feedback is highly appreciated! Please find below the short intro to the lib [1]. Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase [1] Description: HBaseWD stands for Distributing (sequential) Writes. It was inspired by discussions on HBase mailing lists around the problem of choosing between: * writing records with sequential row keys (e.g. time-series data with row key built based on ts) * using random unique IDs for records First approach makes possible to perform fast range scans with help of setting start/stop keys on Scanner, but creates single region server hot-spotting problem upon writing data (as row keys go in sequence all records end up written into a single region at a time). Second approach aims for fastest writing performance by distributing new records over random regions but makes not possible doing fast range scans against written data. The suggested approach stays in the middle of the two above and proved to perform well by distributing records over the cluster during writing data while allowing range scans over it. HBaseWD provides very simple API to work with which makes it perfect to use with existing code. Please refer to unit-tests for lib usage info as they aimed to act as example. Brief Usage Info (Examples): Distributing records with sequential keys which are being written in up to Byte.MAX_VALUE buckets: byte bucketsCount = (byte) 32; // distributing into 32 buckets RowKeyDistributor keyDistributor = new RowKeyDistributorByOneBytePrefix(bucketsCount); for (int i = 0; i 100; i++) { Put put = new Put(keyDistributor.getDistributedKey(originalKey)); ... // add values hTable.put(put); } Performing a range scan over written data (internally bucketsCount scanners executed): Scan scan = new Scan(startKey, stopKey); ResultScanner rs = DistributedScanner.create(hTable, scan, keyDistributor); for (Result current : rs) { ... } Performing mapreduce job over written data chunk specified by Scan: Configuration conf = HBaseConfiguration.create(); Job job = new Job(conf, testMapreduceJob); Scan scan = new Scan(startKey, stopKey); TableMapReduceUtil.initTableMapperJob(table, scan, RowCounterMapper.class, ImmutableBytesWritable.class, Result.class, job); // Substituting standard TableInputFormat which was set in // TableMapReduceUtil.initTableMapperJob(...) job.setInputFormatClass(WdTableInputFormat.class); keyDistributor.addInfo(job.getConfiguration()); Extending Row Keys Distributing Patterns: - HBaseWD is designed to be flexible and to support custom row key distribution approaches. To define custom row key distributing logic just implement AbstractRowKeyDistributor abstract class which is really very simple: public abstract class AbstractRowKeyDistributor implements Parametrizable { public abstract byte[] getDistributedKey(byte[] originalKey); public abstract byte[] getOriginalKey(byte[] adjustedKey); public abstract byte[][] getAllDistributedKeys(byte[] originalKey); ... // some utility methods }
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase
On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau alex.barano...@gmail.com wrote: Hello guys, And girls! Thanks for making this addition Alex (and posting the list). Good stuff, St.Ack
Re: HBase 0.90.2 CDH3B4 -Compression algorithm 'lzo' previously failed test
What was the issue (so the rest of us can learn from your experience?). Thanks Vadim, St.Ack On Tue, Apr 19, 2011 at 10:20 AM, Vadim Keylis vkeylis2...@gmail.com wrote: Thanks so much. Figure the problem that caused lzo not to work. Thanks again. On Tue, Apr 19, 2011 at 9:50 AM, Vadim Keylis vkeylis2...@gmail.com wrote: Good morning. No it seems test was successful. Native libraries located in /home/ hbase/hadoop-lzo-0.4.10/lib/native/Linux-amd64-64/ that were compiled after built project. [hbase@dhbasetest01 shell]$ ../hbase/bin/hbase org.apache.hadoop.hbase.util.CompressionTest hdfs://dhbasetest01.tag-dev.com:9000/hbase lzo 11/04/19 09:47:39 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library 11/04/19 09:47:40 INFO lzo.LzoCodec: Successfully loaded initialized native-lzo library [hadoop-lzo rev 4a3bd479610257d27e4ece33ce3f772ea6830d6d] 11/04/19 09:47:40 INFO compress.CodecPool: Got brand-new compressor On Mon, Apr 18, 2011 at 10:47 PM, Stack st...@duboce.net wrote: It looks like its still not installed correctly; HBase when it runs can't find it (You restarted between installs? What does the CompressionTool say when you run it? See http://hbase.apache.org/book.html#compression The same thing?) Where did you put the native libs? St.Ack On Mon, Apr 18, 2011 at 10:41 PM, Vadim Keylis vkeylis2...@gmail.com wrote: Good evening. I've read the article( https://github.com/toddlipcon/hadoop-lzo) and configured the way it describes, but I still get the same error. 2011-04-18 15:09:11,635 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=user,10L\xA2\x13\xDB\xDB\xDB\xDB\xDA\xDA\xB0\x05Z\xCCw!\xCC\x93\xB0!\xCC\x93\x93\xB0!\xCCw!\xCCw!\xCCw ,1303164462772.dbbb665cfc461075670f06f4dfe6b632. java.io.IOException: Compression algorithm 'lzo' previously failed test. at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:77) at org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:2555) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2544) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2532) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:262) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:94) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) I check out hadoop-lzo from the link you provided and rebuild the library based on the instruction in FAQ. Here my configuration: hadoop-env.sh: export JAVA_LIBRARY_PATH=/home/hbase/hadoop-lzo-0.4.10/lib/native/Linux-amd64-64/ # Extra Java CLASSPATH elements. Optional. export HADOOP_CLASSPATH=/home/hbase/hadoop-lzo-0.4.10/hadoop-lzo-0.4.10.jar core-site.xml property nameio.compression.codecs/name valueorg.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec/value /property property nameio.compression.codec.lzo.class/name valuecom.hadoop.compression.lzo.LzoCodec/value /property hbase-env.sh: export HBASE_CLASSPATH=/home/hbase/ hadoop-lzo-0.4.10/hadoop-lzo-0.4.10.jar:$HBASE_CLASSPATH What am I doing wrong
hbase 0.90.2 - incredibly slow response
Just upgraded to 0.90.2 from 0.20.6..Doing a simple put to table ( 100 bytes per put).. Only code change was to retrofit the HTable API to work with 0.90.2 Initializing HBaseConfiguration in servlet.init()... reusing that config for HTable constructor doing put Performance is very slow 90% of requests are well over 2 sec..(With 0.20.6, 90% use to be 10 milli sec) I did run set_meta_memstore_size.rb as per the book.. Any help to debug is appreciated..I also see periodic pauses between hbase puts thanks v
Re: Repeating log message in a [custom] unit test
So you have your special lucene region that's opened on some region server and when the master starts shutting down, it doesn't seem to see it because while closing regions it says: 2011-04-18 21:35:09,221 INFO [IPC Server handler 4 on 32141] master.ServerManager(283): Only catalog regions remaining; running unassign But the region is still assigned. I see that first it did: 2011-04-18 21:35:08,474 INFO [RegionServer:0;j-laptop,56437,1303187684214.compactor] regionserver.SplitTransaction(207): Starting split of region lucene,,1303187697156.d9ccbf93327587883207d3151bd74e76. and just moments after that: 2011-04-18 21:35:08,477 INFO [main] hbase.HBaseTestingUtility(410): Shutting down minicluster and splitting is still going on, it's eventually done when this is printed: 2011-04-18 21:35:08,621 INFO [RegionServer:0;j-laptop,56437,1303187684214.compactor] catalog.MetaEditor(85): Offlined parent region lucene,,1303187697156.d9ccbf93327587883207d3151bd74e76. in META Calling a split is async, so your client doesn't wait for that operation to end so it goes forward and the test ends (by closing the cluster). It seems to be a bug, if any region is split and the parent is marked offline when no other region is opened, the master will start closing root and meta which screws the opening of the daughters: 2011-04-18 21:35:09,853 INFO [j-laptop,56437,1303187684214-daughterOpener=6ee9617a0f64eeeca10c6807eb807b84] catalog.CatalogTracker(441): Failed verification of .META.,,1 at address=j-laptop:56437; org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region is not online: .META.,,1 Please open a jira. J-D
Re: HBase Performance
On Wed, Apr 6, 2011 at 2:39 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: Look for how Facebook is using HBase for messages. Also look for how we have been using HBase at StumbleUpon for 2 years now and for both live and batch queries. Numbers are usually included in the decks. In addition to this, one of our clusters also houses our time series database (see http://opentsdb.net), and it stores many billions of data points, writes about 250 million new data points per day (that's between 2500 and 3500 writes per second, 24/7), and it's frequently asked to read back hundreds of thousands of data points per second (when someone opens a dashboard for instance). Note that this makes up only a small fraction of the workload on that cluster. The writes are fast. I looked at a few hundred requests on one of our TSDs (the application server of OpenTSDB that talks to HBase) and here's what I saw for writes: Average: 22ms Median: 19ms (±14ms) 99th percentile: 62ms Note that the TSD batches multiple writes together, so the timings above are for batches of edits. Batches have the following sizes: Average: 15KB Median: 10KB (±21KB) 99th percentile: 126KB Reads of similar sizes tend to be equally fast, especially if you're able to achieve a good hit rate in the block cache. The main gotcha with using HBase for user-facing latency-sensitive applications is that right now HBase is pretty slow to split or move regions around. When this happens, data will become unavailable for up to several seconds. You need to make sure that your application can handle such periods of unavailability every once in a while. When an HBase server fails, it might take several seconds, or sometimes over a minute for HBase to bring back online the data that was served by that server onto another server. The recovery time depends on many factors (how many regions you have, what size they have, how many logs and what size they have, chiefly). Having said that, there are strategies to minimize the time during which data is unavailable, so HBase can only improve in this respect. -- Benoit tsuna Sigoure Software Engineer @ www.StumbleUpon.com
Re: Repeating log message in a [custom] unit test
Some more digging, the reason it stays stuck is that the DaughterOpener thread uses the region server's CatalogTracker which has a default timeout of Integer.MAX_VALUE and it was stuck in this code: while(!stopped !metaAvailable.get() (timeout == 0 || System.currentTimeMillis() stop)) { if (getMetaServerConnection(true) != null) { return metaLocation; } metaAvailable.wait(timeout == 0 ? 50 : timeout); } I can figure that getMetaServerConnection was called, then it wasn't able to find .META. and then -ROOT-, so it returned null and started waiting. Instead we should wait in increments (basically always sleep a small amount of time, up to the specified timeout). On a future loop it would have seen that the server was stopped. J-D On Tue, Apr 19, 2011 at 11:04 AM, Jean-Daniel Cryans jdcry...@apache.org wrote: So you have your special lucene region that's opened on some region server and when the master starts shutting down, it doesn't seem to see it because while closing regions it says: 2011-04-18 21:35:09,221 INFO [IPC Server handler 4 on 32141] master.ServerManager(283): Only catalog regions remaining; running unassign But the region is still assigned. I see that first it did: 2011-04-18 21:35:08,474 INFO [RegionServer:0;j-laptop,56437,1303187684214.compactor] regionserver.SplitTransaction(207): Starting split of region lucene,,1303187697156.d9ccbf93327587883207d3151bd74e76. and just moments after that: 2011-04-18 21:35:08,477 INFO [main] hbase.HBaseTestingUtility(410): Shutting down minicluster and splitting is still going on, it's eventually done when this is printed: 2011-04-18 21:35:08,621 INFO [RegionServer:0;j-laptop,56437,1303187684214.compactor] catalog.MetaEditor(85): Offlined parent region lucene,,1303187697156.d9ccbf93327587883207d3151bd74e76. in META Calling a split is async, so your client doesn't wait for that operation to end so it goes forward and the test ends (by closing the cluster). It seems to be a bug, if any region is split and the parent is marked offline when no other region is opened, the master will start closing root and meta which screws the opening of the daughters: 2011-04-18 21:35:09,853 INFO [j-laptop,56437,1303187684214-daughterOpener=6ee9617a0f64eeeca10c6807eb807b84] catalog.CatalogTracker(441): Failed verification of .META.,,1 at address=j-laptop:56437; org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region is not online: .META.,,1 Please open a jira. J-D
Re: hbase 0.90.2 - incredibly slow response
I was hoping that too.. I don't have scripts to generate # requests from shell..I will try that.. I did n't pre-create regions in 0.20.6 it handled fine the same load.. I'll try performance in 0.90.2 by precreating regions.. Would sharing a single HBaseConfiguration object for all threads hurt performance? frustrating..thanks for your help -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Tue, Apr 19, 2011 1:40 pm Subject: Re: hbase 0.90.2 - incredibly slow response 0.90.2 should be faster. Running same query from shell, it gives you same lag? St.Ack On Tue, Apr 19, 2011 at 10:35 AM, Venkatesh vramanatha...@aol.com wrote: Just upgraded to 0.90.2 from 0.20.6..Doing a simple put to table ( 100 bytes per put).. Only code change was to retrofit the HTable API to work with 0.90.2 Initializing HBaseConfiguration in servlet.init()... reusing that config for HTable constructor doing put Performance is very slow 90% of requests are well over 2 sec..(With 0.20.6, 90% use to be 10 milli sec) I did run set_meta_memstore_size.rb as per the book.. Any help to debug is appreciated..I also see periodic pauses between hbase puts thanks v
Latency related configs for 0.90
Hi all, In this chapter of our 0.89 to 0.90 migration saga, we are seeing what we suspect might be latency related artifacts. The setting: - Our EC2 dev environment running our CI builds - CDH3 U0 (both hadoop and hbase) setup in pseudo-clustered mode We have several unit tests that have started mysteriously failing in random ways as soon as we migrated our EC2 CI build to the new 0.90 CDH3. Those tests used to run against 0.89 and never failed before. They also run OK on our local macbooks. On EC2, we are seeing lots of issues where the setup data is not being persisted in time for the tests to assert against them. They are also not always being torn down properly. We first suspected our new code around secondary indexes; we do have extensive unit tests around it that provide us with a solid level of confidence that it works properly in our CRUD scenarios. We also performance tested against the old hbase-trx contrib code and our new secondary indexes seem to be running slightly faster as well (of course, that could be due to the bump from 0.89 to 0.90). We first started seeing issues running our hudson build on the same machine as the hbase pseudo-cluster. We figured that was putting too much load on the box, so we created a separate large instance on EC2 to host just the 0.90 stack. This migration nearly quadrupled the number of unit tests failing at times. The only difference between for first and second CI setup is the network in between. Before we start tearing down our code line by line, I'd like to see if there are latency related configuration tweaks we could try to make the setup more resilient to network lag. Are there any hbase/zookepper settings that might help? For instance, we see things such as HBASE_SLAVE_SLEEP in hbase-env.sh . Can that help? Any suggestions are more than welcome. Also, the overview above may not be enough to go on, so please let me know if I could provide more details. Thank you in advance for any help. -GS
Region replication?
Hi, I imagine lots of HBase folks have read or will want to read http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/ , including comments. My question has to do with one of the good comments from Edward Capriolo, who pointed out that some of the Configurations he described in his Cassandra as Memcached talk ( http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp ) are not possible with HBase because in HBase there is only 1 copy of any given Region and it lives on a single RegionServer (I'm assuming this is correct?), thus making it impossible to spread reads of data from one Region over multiple RegionServers: http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/#comment-187253604 So I poked around on search-hadoop.com and JIRA, and looked at http://hbase.apache.org/book/regions.arch.html to see about this limitation, whether it's even mentioned as a limitation, whether there are plans to change it or if there are some configuration alternatives that would make some of those configurations described by Ed possible with HBase, but I actually didn't find any explicit information about that. Would anyone care to comment? :) Many thanks, Otis -- We're hiring HBase hackers for Data Mining and Analytics http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/
Re: apache hbase 0.90.2 vs CDH3 hbase0.90.1+15.18
On Tue, Apr 12, 2011 at 11:01 AM, Stack st...@duboce.net wrote: On Tue, Apr 12, 2011 at 7:28 AM, 茅旭峰 m9s...@gmail.com wrote: Hi, I've noticed that Cloudera has announced the CDH3 release, but the apache hbase 0.90.2 is also just released. All should upgrade to the CDH3 release. It includes hdfs-1520, hdfs-1555, and hdfs-1554 -- important features not present in CDH3b2/3/4 (but in branch-0.20-append with a while now) -- and the changes to hbase to make use of these additions to hdfs. CDH3 release cherry-picked critical fixes from hbase 0.90.2 so it has the good stuff. Otherwise, all should be on tip of branch-0.20-append and hbase 0.90.2. I think both of them could run smoothly on CDH3 hadoop 0.20.2+923.21 Is this some clouderism? I'm not familiar. Yep, this is the patchlevel for CDH3u0 (aka GA or stable) I agree with Stack's assessment above. We'll continue to cherry-pick bug fixes back into the CDH3 version of HBase for our quarterly update releases (CDH3u1, CDH3u2, etc). -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Region replication?
This is kind of true. There is only one regionserver to handle the reads, but there are multiple copies of the data to handle fail-over. On Tue, Apr 19, 2011 at 12:33 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: My question has to do with one of the good comments from Edward Capriolo, who pointed out that some of the Configurations he described in his Cassandra as Memcached talk ( http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp ) are not possible with HBase because in HBase there is only 1 copy of any given Region and it lives on a single RegionServer (I'm assuming this is correct?), thus making it impossible to spread reads of data from one Region over multiple RegionServers:
Re: Region replication?
We have something on the menu: https://issues.apache.org/jira/browse/HBASE-2357 Coprocessors: Add read-only region replicas (slaves) for availability and fast region recovery Something to keep in mind is that you have to cache the data for each replica, so a row could be in 3 different caches (which also have to be warmed). I guess this is useful for very hot rows compared to a much larger read distribution, in which case you'd really want to cache it only once else you'd need 3x the memory to hold your dataset in cache. J-D On Tue, Apr 19, 2011 at 12:33 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, I imagine lots of HBase folks have read or will want to read http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/ , including comments. My question has to do with one of the good comments from Edward Capriolo, who pointed out that some of the Configurations he described in his Cassandra as Memcached talk ( http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp ) are not possible with HBase because in HBase there is only 1 copy of any given Region and it lives on a single RegionServer (I'm assuming this is correct?), thus making it impossible to spread reads of data from one Region over multiple RegionServers: http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/#comment-187253604 So I poked around on search-hadoop.com and JIRA, and looked at http://hbase.apache.org/book/regions.arch.html to see about this limitation, whether it's even mentioned as a limitation, whether there are plans to change it or if there are some configuration alternatives that would make some of those configurations described by Ed possible with HBase, but I actually didn't find any explicit information about that. Would anyone care to comment? :) Many thanks, Otis -- We're hiring HBase hackers for Data Mining and Analytics http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/
Re: Region replication?
On Tue, Apr 19, 2011 at 4:09 PM, Ted Dunning tdunn...@maprtech.com wrote: This is kind of true. There is only one regionserver to handle the reads, but there are multiple copies of the data to handle fail-over. On Tue, Apr 19, 2011 at 12:33 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: My question has to do with one of the good comments from Edward Capriolo, who pointed out that some of the Configurations he described in his Cassandra as Memcached talk ( http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp ) are not possible with HBase because in HBase there is only 1 copy of any given Region and it lives on a single RegionServer (I'm assuming this is correct?), thus making it impossible to spread reads of data from one Region over multiple RegionServers: It is not kinda of true. It is true. A summary of slide 22 is: Cassandra 20 nodes Replication Factor 20 Results in: 20 nodes capable of serving this reads! With HBase, regardless of how many HDFS file copies exist, only one RegionServer can actively serve a region.
Re: Region replication?
Thanks J-D! Yeah, what you describe below is also something that I think Edward pointed out in some of his slides - that you could route all requests for X to the place where X is when you don't want to have X cached (in app-level caches and/or OS-level caches) on multiple servers, but that sometimes you do want to waste memory like this because you have to spread requests for X over more servers. Are these two modes going to be supported in HBase? Thanks, Otis We're hiring HBase hackers for Data Mining and Analytics http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/ - Original Message From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, April 19, 2011 5:10:07 PM Subject: Re: Region replication? We have something on the menu: https://issues.apache.org/jira/browse/HBASE-2357 Coprocessors: Add read-only region replicas (slaves) for availability and fast region recovery Something to keep in mind is that you have to cache the data for each replica, so a row could be in 3 different caches (which also have to be warmed). I guess this is useful for very hot rows compared to a much larger read distribution, in which case you'd really want to cache it only once else you'd need 3x the memory to hold your dataset in cache. J-D On Tue, Apr 19, 2011 at 12:33 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, I imagine lots of HBase folks have read or will want to read http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/ , including comments. My question has to do with one of the good comments from Edward Capriolo, who pointed out that some of the Configurations he described in his Cassandra as Memcached talk ( http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp ) are not possible with HBase because in HBase there is only 1 copy of any given Region and it lives on a single RegionServer (I'm assuming this is correct?), thus making it impossible to spread reads of data from one Region over multiple RegionServers: http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/#comment-187253604 So I poked around on search-hadoop.com and JIRA, and looked at http://hbase.apache.org/book/regions.arch.html to see about this limitation, whether it's even mentioned as a limitation, whether there are plans to change it or if there are some configuration alternatives that would make some of those configurations described by Ed possible with HBase, but I actually didn't find any explicit information about that. Would anyone care to comment? :) Many thanks, Otis -- We're hiring HBase hackers for Data Mining and Analytics http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/
Re: Region replication?
I don't know why you would want to serve from other region servers if all they did was transferring data, the current situation would be better. J-D On Tue, Apr 19, 2011 at 2:26 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Thanks J-D! Yeah, what you describe below is also something that I think Edward pointed out in some of his slides - that you could route all requests for X to the place where X is when you don't want to have X cached (in app-level caches and/or OS-level caches) on multiple servers, but that sometimes you do want to waste memory like this because you have to spread requests for X over more servers. Are these two modes going to be supported in HBase? Thanks, Otis We're hiring HBase hackers for Data Mining and Analytics http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/ - Original Message From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, April 19, 2011 5:10:07 PM Subject: Re: Region replication? We have something on the menu: https://issues.apache.org/jira/browse/HBASE-2357 Coprocessors: Add read-only region replicas (slaves) for availability and fast region recovery Something to keep in mind is that you have to cache the data for each replica, so a row could be in 3 different caches (which also have to be warmed). I guess this is useful for very hot rows compared to a much larger read distribution, in which case you'd really want to cache it only once else you'd need 3x the memory to hold your dataset in cache. J-D On Tue, Apr 19, 2011 at 12:33 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, I imagine lots of HBase folks have read or will want to read http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/ , including comments. My question has to do with one of the good comments from Edward Capriolo, who pointed out that some of the Configurations he described in his Cassandra as Memcached talk ( http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp ) are not possible with HBase because in HBase there is only 1 copy of any given Region and it lives on a single RegionServer (I'm assuming this is correct?), thus making it impossible to spread reads of data from one Region over multiple RegionServers: http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/#comment-187253604 So I poked around on search-hadoop.com and JIRA, and looked at http://hbase.apache.org/book/regions.arch.html to see about this limitation, whether it's even mentioned as a limitation, whether there are plans to change it or if there are some configuration alternatives that would make some of those configurations described by Ed possible with HBase, but I actually didn't find any explicit information about that. Would anyone care to comment? :) Many thanks, Otis -- We're hiring HBase hackers for Data Mining and Analytics http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/
Re: Region replication?
To make Configuration 4 possible (last slide in http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp ) -- Big Request Load, not so Big Data. Otis -- We're hiring HBase hackers for Data Mining and Analytics http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/ - Original Message From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, April 19, 2011 5:28:46 PM Subject: Re: Region replication? I don't know why you would want to serve from other region servers if all they did was transferring data, the current situation would be better. J-D On Tue, Apr 19, 2011 at 2:26 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Thanks J-D! Yeah, what you describe below is also something that I think Edward pointed out in some of his slides - that you could route all requests for X to the place where X is when you don't want to have X cached (in app-level caches and/or OS-level caches) on multiple servers, but that sometimes you do want to waste memory like this because you have to spread requests for X over more servers. Are these two modes going to be supported in HBase? Thanks, Otis We're hiring HBase hackers for Data Mining and Analytics http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/ - Original Message From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, April 19, 2011 5:10:07 PM Subject: Re: Region replication? We have something on the menu: https://issues.apache.org/jira/browse/HBASE-2357 Coprocessors: Add read-only region replicas (slaves) for availability and fast region recovery Something to keep in mind is that you have to cache the data for each replica, so a row could be in 3 different caches (which also have to be warmed). I guess this is useful for very hot rows compared to a much larger read distribution, in which case you'd really want to cache it only once else you'd need 3x the memory to hold your dataset in cache. J-D On Tue, Apr 19, 2011 at 12:33 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, I imagine lots of HBase folks have read or will want to read http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/ , including comments. My question has to do with one of the good comments from Edward Capriolo, who pointed out that some of the Configurations he described in his Cassandra as Memcached talk ( http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp ) are not possible with HBase because in HBase there is only 1 copy of any given Region and it lives on a single RegionServer (I'm assuming this is correct?), thus making it impossible to spread reads of data from one Region over multiple RegionServers: http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/#comment-187253604 4 So I poked around on search-hadoop.com and JIRA, and looked at http://hbase.apache.org/book/regions.arch.html to see about this limitation, whether it's even mentioned as a limitation, whether there are plans to change it or if there are some configuration alternatives that would make some of those configurations described by Ed possible with HBase, but I actually didn't find any explicit information about that. Would anyone care to comment? :) Many thanks, Otis -- We're hiring HBase hackers for Data Mining and Analytics http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/ /
Re: Region replication?
That configuration is more like what 2357 would be used for. You wrote: that you could route all requests for X to the place where X is when you don't want to have X cached And it's for that case that I say you should not go through the nodes and talk directly to the RS. J-D On Tue, Apr 19, 2011 at 2:36 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: To make Configuration 4 possible (last slide in http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp ) -- Big Request Load, not so Big Data. Otis -- We're hiring HBase hackers for Data Mining and Analytics http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/ - Original Message From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, April 19, 2011 5:28:46 PM Subject: Re: Region replication? I don't know why you would want to serve from other region servers if all they did was transferring data, the current situation would be better. J-D On Tue, Apr 19, 2011 at 2:26 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Thanks J-D! Yeah, what you describe below is also something that I think Edward pointed out in some of his slides - that you could route all requests for X to the place where X is when you don't want to have X cached (in app-level caches and/or OS-level caches) on multiple servers, but that sometimes you do want to waste memory like this because you have to spread requests for X over more servers. Are these two modes going to be supported in HBase? Thanks, Otis We're hiring HBase hackers for Data Mining and Analytics http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/ - Original Message From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, April 19, 2011 5:10:07 PM Subject: Re: Region replication? We have something on the menu: https://issues.apache.org/jira/browse/HBASE-2357 Coprocessors: Add read-only region replicas (slaves) for availability and fast region recovery Something to keep in mind is that you have to cache the data for each replica, so a row could be in 3 different caches (which also have to be warmed). I guess this is useful for very hot rows compared to a much larger read distribution, in which case you'd really want to cache it only once else you'd need 3x the memory to hold your dataset in cache. J-D On Tue, Apr 19, 2011 at 12:33 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, I imagine lots of HBase folks have read or will want to read http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/ , including comments. My question has to do with one of the good comments from Edward Capriolo, who pointed out that some of the Configurations he described in his Cassandra as Memcached talk ( http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp ) are not possible with HBase because in HBase there is only 1 copy of any given Region and it lives on a single RegionServer (I'm assuming this is correct?), thus making it impossible to spread reads of data from one Region over multiple RegionServers: http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/#comment-187253604 4 So I poked around on search-hadoop.com and JIRA, and looked at http://hbase.apache.org/book/regions.arch.html to see about this limitation, whether it's even mentioned as a limitation, whether there are plans to change it or if there are some configuration alternatives that would make some of those configurations described by Ed possible with HBase, but I actually didn't find any explicit information about that. Would anyone care to comment? :) Many thanks, Otis -- We're hiring HBase hackers for Data Mining and Analytics http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/ /
Re: hlog async replay tool.
I put a script up in https://issues.apache.org/jira/browse/HBASE-3752. I did some basic testing. Try it out. If it works for you, add a comment to the issue. St.Ack On Mon, Apr 18, 2011 at 10:49 PM, Jack Levin magn...@gmail.com wrote: In some cases its important to bring hbase up after hdfs crash without recovering hlogs first, is it possible to have a tool that just takes hlogs and replays them after HBASE already been started? -Jack
HBase - Map Reduce - Client Question
Hidey Ho, I went to a talk last week on HBase Do's and Don'ts and discovered the Java client I used to populate my HBase tables is a don't. I spent the weekend trying to come up with a better way to populate the table but couldn't, so I throw the question to the group. Conditions: Receive a new log file every ten minutes. The log files contain anywhere from 500-2,000k rows. The rows contain anywhere from 28 to 100 columns of data to be parsed. Receive a new Click Log every morning. The Click Log contains around 300-400k rows with each row having 15 columns of data. I have a six node cluster (32bit 4G RAM) with four of the servers being Region Servers. Constraints: The data in HBase from the Search Logs can't lag by more than ten minutes. Queries to HBase must have an average return time of less than one second, worst case four seconds. Reports are based on a summary of a day's data. Need to add new reports rapidly. (Under a day). Currently my 'solution' consists of a long running Java application that reads in a new Search Log when it appears, aggregates the required columns and then updates the HBase Tables. I keep a running total of the day's aggregated columns in Maps so I don't have to reread the day's data to update my totals. Currently a day's worth of data fits in 10G of memory but that won't scale for ever. The Click Logs are only read once from a database and the placed into an HBase table. I can add a new report by updating the import to collect the new data and then store that new data in a new HBase Table. I then create a new query just for that table. My question is... What would be a better approach (map/reduce, etc) that with the current conditions satisfies my constraints? Thanks -Pete
0.90 latency performance, cdh3b4
Hi, I would like to see how i can attack hbase performance. Right now i am shooting scans returning between 3 and 40 rows and regardless of data size, approximately 500-400 QPS. The data tables are almost empty and in-memory, so they surely should fit in those 40% heap dedicated to them. My local 1-node test shows read times between 1 and 2 ms. Great. As soon as i go to our 10-node cluster, the response times drop to 25ms per scan, regardless of # of records. I set scan block cache size to 100 (rows?), otherwise i was getting outrages numbers reaching as far out as 300-400ms. It's my understanding the timing should be actually still much closer to my local tests than to 25ms. So... how do i attack this ? increase regionserver handler count? What the latency should i be able to reach for extremely small data records (=200bytes)? (CDH3b4). HBase debug logging switched off. Thanks in advance. -Dmitriy
Re: 0.90 latency performance, cdh3b4
How many regions? How are they distributed? Typically it is good to fill the table some what and then drive some splits and balance operations via the shell. One more split to make the regions be local and you should be good to go. Make sure you have enough keys in the table to support these splits, of course. Under load, you can look at the hbase home page to see how transactions are spread around your cluster. Without splits and local region files, you aren't going to see what you want in terms of performance. On Tue, Apr 19, 2011 at 4:46 PM, Dmitriy Lyubimov dlyubi...@apache.org wrote: Hi, I would like to see how i can attack hbase performance. Right now i am shooting scans returning between 3 and 40 rows and regardless of data size, approximately 500-400 QPS. The data tables are almost empty and in-memory, so they surely should fit in those 40% heap dedicated to them. My local 1-node test shows read times between 1 and 2 ms. Great. As soon as i go to our 10-node cluster, the response times drop to 25ms per scan, regardless of # of records. I set scan block cache size to 100 (rows?), otherwise i was getting outrages numbers reaching as far out as 300-400ms. It's my understanding the timing should be actually still much closer to my local tests than to 25ms. So... how do i attack this ? increase regionserver handler count? What the latency should i be able to reach for extremely small data records (=200bytes)? (CDH3b4). HBase debug logging switched off. Thanks in advance. -Dmitriy
Re: 0.90 latency performance, cdh3b4
for this test, there's just no more than 40 rows in every given table. This is just a laugh check. so i think it's safe to assume it all goes to same region server. But latency would not depend on which server call is going to, would it? Only throughput would, assuming we are not overloading. And we clearly are not as my single-node local version runs quite ok response times with the same throughput. It's something with either client connections or network latency or ... i don't know what it is. I did not set up the cluster but i gotta troubleshoot it now :) On Tue, Apr 19, 2011 at 5:23 PM, Ted Dunning tdunn...@maprtech.com wrote: How many regions? How are they distributed? Typically it is good to fill the table some what and then drive some splits and balance operations via the shell. One more split to make the regions be local and you should be good to go. Make sure you have enough keys in the table to support these splits, of course. Under load, you can look at the hbase home page to see how transactions are spread around your cluster. Without splits and local region files, you aren't going to see what you want in terms of performance.
Re: 0.90 latency performance, cdh3b4
PS so what should latency be for reads in 0.90, assuming moderate thruput? On Tue, Apr 19, 2011 at 5:39 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: for this test, there's just no more than 40 rows in every given table. This is just a laugh check. so i think it's safe to assume it all goes to same region server. But latency would not depend on which server call is going to, would it? Only throughput would, assuming we are not overloading. And we clearly are not as my single-node local version runs quite ok response times with the same throughput. It's something with either client connections or network latency or ... i don't know what it is. I did not set up the cluster but i gotta troubleshoot it now :) On Tue, Apr 19, 2011 at 5:23 PM, Ted Dunning tdunn...@maprtech.com wrote: How many regions? How are they distributed? Typically it is good to fill the table some what and then drive some splits and balance operations via the shell. One more split to make the regions be local and you should be good to go. Make sure you have enough keys in the table to support these splits, of course. Under load, you can look at the hbase home page to see how transactions are spread around your cluster. Without splits and local region files, you aren't going to see what you want in terms of performance.
Re: HBase - Map Reduce - Client Question
We've been using pig to read bulk data from hdfs, transform it and load it into HBase using the HBaseStorage class, which has worked well for us. If you try it out you'll want to build from the 0.9.0 branch (being cut as we speak I beleive) or the trunk. There's an open pig JIRA with a patch to disable the WAL that you might want to consider too, but I don't recall the jira # OTTOMH. Bill On Tuesday, April 19, 2011, Peter Haidinyak phaidin...@local.com wrote: Hidey Ho, I went to a talk last week on HBase Do's and Don'ts and discovered the Java client I used to populate my HBase tables is a don't. I spent the weekend trying to come up with a better way to populate the table but couldn't, so I throw the question to the group. Conditions: Receive a new log file every ten minutes. The log files contain anywhere from 500-2,000k rows. The rows contain anywhere from 28 to 100 columns of data to be parsed. Receive a new Click Log every morning. The Click Log contains around 300-400k rows with each row having 15 columns of data. I have a six node cluster (32bit 4G RAM) with four of the servers being Region Servers. Constraints: The data in HBase from the Search Logs can't lag by more than ten minutes. Queries to HBase must have an average return time of less than one second, worst case four seconds. Reports are based on a summary of a day's data. Need to add new reports rapidly. (Under a day). Currently my 'solution' consists of a long running Java application that reads in a new Search Log when it appears, aggregates the required columns and then updates the HBase Tables. I keep a running total of the day's aggregated columns in Maps so I don't have to reread the day's data to update my totals. Currently a day's worth of data fits in 10G of memory but that won't scale for ever. The Click Logs are only read once from a database and the placed into an HBase table. I can add a new report by updating the import to collect the new data and then store that new data in a new HBase Table. I then create a new query just for that table. My question is... What would be a better approach (map/reduce, etc) that with the current conditions satisfies my constraints? Thanks -Pete
Re: 0.90 latency performance, cdh3b4
also we had another cluster running previous CDH versions with pre-0.89 hbase and the latencies weren't as nearly as bad. On Tue, Apr 19, 2011 at 5:39 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: PS so what should latency be for reads in 0.90, assuming moderate thruput? On Tue, Apr 19, 2011 at 5:39 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: for this test, there's just no more than 40 rows in every given table. This is just a laugh check. so i think it's safe to assume it all goes to same region server. But latency would not depend on which server call is going to, would it? Only throughput would, assuming we are not overloading. And we clearly are not as my single-node local version runs quite ok response times with the same throughput. It's something with either client connections or network latency or ... i don't know what it is. I did not set up the cluster but i gotta troubleshoot it now :) On Tue, Apr 19, 2011 at 5:23 PM, Ted Dunning tdunn...@maprtech.com wrote: How many regions? How are they distributed? Typically it is good to fill the table some what and then drive some splits and balance operations via the shell. One more split to make the regions be local and you should be good to go. Make sure you have enough keys in the table to support these splits, of course. Under load, you can look at the hbase home page to see how transactions are spread around your cluster. Without splits and local region files, you aren't going to see what you want in terms of performance.
Re: 0.90 latency performance, cdh3b4
For a tiny test like this, everything should be in memory and latency should be very low. On Tue, Apr 19, 2011 at 5:39 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: PS so what should latency be for reads in 0.90, assuming moderate thruput? On Tue, Apr 19, 2011 at 5:39 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: for this test, there's just no more than 40 rows in every given table. This is just a laugh check. so i think it's safe to assume it all goes to same region server. But latency would not depend on which server call is going to, would it? Only throughput would, assuming we are not overloading. And we clearly are not as my single-node local version runs quite ok response times with the same throughput. It's something with either client connections or network latency or ... i don't know what it is. I did not set up the cluster but i gotta troubleshoot it now :) On Tue, Apr 19, 2011 at 5:23 PM, Ted Dunning tdunn...@maprtech.com wrote: How many regions? How are they distributed? Typically it is good to fill the table some what and then drive some splits and balance operations via the shell. One more split to make the regions be local and you should be good to go. Make sure you have enough keys in the table to support these splits, of course. Under load, you can look at the hbase home page to see how transactions are spread around your cluster. Without splits and local region files, you aren't going to see what you want in terms of performance.
Rs: Does it necessarily to handle the Zookeeper.ConnectionLossException in ZKUtil.getDataAndWatch?
Thanks J-D. I have learned that there's several possibilities can lead to ConnectionLossException, like FullGC, heavily swap space, or IO waits reasons. Especially about the IO waits reasons, does any good suggestions you can provide about the networking mode? In my current env, I put the Zookeeper, hdfs, hbase in the same machine, any problems about that? Regards, Jeason Bean -邮件原件- 发件人: jdcry...@gmail.com [mailto:jdcry...@gmail.com] 代表 Jean-Daniel Cryans 发送时间: 2011年4月19日 1:14 收件人: user@hbase.apache.org 主题: Re: Does it necessarily to handle the Zookeeper.ConnectionLossException in ZKUtil.getDataAndWatch? Take a look at the zookeeper server log, it should give you a clue. If it says there's too many connections, then you're hitting a well known problem with HBase 0.90, just look for the other threads in this mailing list about that. J-D On Sat, Apr 16, 2011 at 3:01 AM, bijieshan bijies...@huawei.com wrote: Thanks for Jean-Daniel Cryans's reply. I have refered to the issue of HBASE-3065.And it's indeed the same problem. Liyin Tang has given a resolvent to this issue . When the ConnectionLossException happened, take some retries to re-connetct to the ZK server. Maybe it can be reconnect successfully with high probability, but not always. In my scenario: 1. The ConnectionLossException happened. 2. The Hmaster process aborted due to session got expired. 3. When I restart the Hmaster process, the ConnectionLossException was happened again. So the initialization failed, and the Hmaster aborted again. My question is under what conditions does the ConnectionLossException happened? I know the network reasons can cause this problem. Does any other possibilities exists? Thanks! Jieshan Bean === -邮件原件- 发件人: jdcry...@gmail.com [mailto:jdcry...@gmail.com] 代表 Jean-Daniel Cryans 发送时间: 2011年4月15日 2:27 收件人: user@hbase.apache.org 主题: Re: Does it necessarily to handle the Zookeeper.ConnectionLossException in ZKUtil.getDataAndWatch? I guess we should, there's https://issues.apache.org/jira/browse/HBASE-3065 that's open, but in your case like I mentioned in your other email there seems to be something weird in your environment. J-D On Thu, Apr 14, 2011 at 12:51 AM, bijieshan bijies...@huawei.com wrote: Hi, The KeeperException$ConnectionLossException exception occurred while the cluster is running, as we know, it's a Zookeeper recoverable exception(And this exception has been handled in the method of ZooKeeperWatcher.ZooKeeperWatcher),and the suggestion is that we should retry a while. Does it necessarily? Here is the exception logs: 2011-03-21 13:26:53,135 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:6-0x22e8e6ee15f0046 Unable to get data of znode /hbase/unassigned/59ba25120921011b7d9ed4025d30c105 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/unassigned/59ba25120921011b7d9ed4025d30c105 at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:932) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549) at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:739) at org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:525) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:268) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) 2011-03-21 13:26:53,137 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:6-0x22e8e6ee15f0046 Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/unassigned/59ba25120921011b7d9ed4025d30c105 at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:932) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549) at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:739) at org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:525) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:268) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) Expecting for the reply! Thank you. Regards, Jeason Bean
Re: Possible dead lock
I see what you are saying, and I understand the deadlock, but what escapes me is why ResourceBundle has to go touch all the classes every time to find the locale as I see 2 threads doing the same. Maybe my understanding of what it does is just poor, but I also see that you are using the yourkit profiler so it's one more variable in the equation. In any case, using a Date strikes me as odd. Using a long representing System.currentTimeMillis is usually what we do. J-D On Tue, Apr 19, 2011 at 9:16 PM, Ramkrishna S Vasudevan ramakrish...@huawei.com wrote: Possible dead lock in hBase.
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase
Interesting project, Alex. Since there're bucketsCount scanners compared to one scanner originally, have you performed load testing to see the impact ? Thanks On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau alex.barano...@gmail.comwrote: Hello guys, I'd like to introduce a new small java project/lib around HBase: HBaseWD. It is aimed to help with distribution of the load (across regionservers) when writing sequential (becasue of the row key nature) records. It implements the solution which was discussed several times on this mailing list (e.g. here: http://search-hadoop.com/m/gNRA82No5Wk). Please find the sources at https://github.com/sematext/HBaseWD (there's also a jar of current version for convenience). It is very easy to make use of it: e.g. I added it to one existing project with 1+2 lines of code (one where I write to HBase and 2 for configuring MapReduce job). Any feedback is highly appreciated! Please find below the short intro to the lib [1]. Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase [1] Description: HBaseWD stands for Distributing (sequential) Writes. It was inspired by discussions on HBase mailing lists around the problem of choosing between: * writing records with sequential row keys (e.g. time-series data with row key built based on ts) * using random unique IDs for records First approach makes possible to perform fast range scans with help of setting start/stop keys on Scanner, but creates single region server hot-spotting problem upon writing data (as row keys go in sequence all records end up written into a single region at a time). Second approach aims for fastest writing performance by distributing new records over random regions but makes not possible doing fast range scans against written data. The suggested approach stays in the middle of the two above and proved to perform well by distributing records over the cluster during writing data while allowing range scans over it. HBaseWD provides very simple API to work with which makes it perfect to use with existing code. Please refer to unit-tests for lib usage info as they aimed to act as example. Brief Usage Info (Examples): Distributing records with sequential keys which are being written in up to Byte.MAX_VALUE buckets: byte bucketsCount = (byte) 32; // distributing into 32 buckets RowKeyDistributor keyDistributor = new RowKeyDistributorByOneBytePrefix(bucketsCount); for (int i = 0; i 100; i++) { Put put = new Put(keyDistributor.getDistributedKey(originalKey)); ... // add values hTable.put(put); } Performing a range scan over written data (internally bucketsCount scanners executed): Scan scan = new Scan(startKey, stopKey); ResultScanner rs = DistributedScanner.create(hTable, scan, keyDistributor); for (Result current : rs) { ... } Performing mapreduce job over written data chunk specified by Scan: Configuration conf = HBaseConfiguration.create(); Job job = new Job(conf, testMapreduceJob); Scan scan = new Scan(startKey, stopKey); TableMapReduceUtil.initTableMapperJob(table, scan, RowCounterMapper.class, ImmutableBytesWritable.class, Result.class, job); // Substituting standard TableInputFormat which was set in // TableMapReduceUtil.initTableMapperJob(...) job.setInputFormatClass(WdTableInputFormat.class); keyDistributor.addInfo(job.getConfiguration()); Extending Row Keys Distributing Patterns: - HBaseWD is designed to be flexible and to support custom row key distribution approaches. To define custom row key distributing logic just implement AbstractRowKeyDistributor abstract class which is really very simple: public abstract class AbstractRowKeyDistributor implements Parametrizable { public abstract byte[] getDistributedKey(byte[] originalKey); public abstract byte[] getOriginalKey(byte[] adjustedKey); public abstract byte[][] getAllDistributedKeys(byte[] originalKey); ... // some utility methods }