Hi there, as a sanity check with respect to writing have you double-checked this section of the RefGuide..
http://hbase.apache.org/book.html#perf.writing ... regarding pre-created regions and monotonically increasing keys? Also as a sanity check refer to this case study as a diagnostic roadmap.. http://hbase.apache.org/book.html#casestudies.perftroub On 4/26/12 7:38 AM, "Rajgopal Vaithiyanathan" <[email protected]> wrote: >Hey all, > >The default - HBaseStorage() takes hell lot of time for puts. > >In a cluster of 5 machines, insertion of 175 Million records took 4Hours >45 >minutes >Question - Is this good enough ? >each machine has 32 cores and 32GB ram with 7*600GB harddisks. HBASE's >heap >has been configured to 8GB. >If the put speed is low, how can i improve them..? > >I tried tweaking the TableOutputFormat by increasing the WriteBufferSize >to >24MB, and adding the multi put feature (by adding 10,000 puts in ArrayList >and putting it as a batch). After doing this, it started throwing > >java.util.concurrent.ExecutionException: java.net.SocketTimeoutException: >Call to slave1/172.21.208.176:60020 failed on socket timeout exception: >java.net.SocketTimeoutException: 60000 millis timeout while waiting for >channel to be ready for read. ch : >java.nio.channels.SocketChannel[connected >local=/172.21.208.176:41135remote=slave1/ >172.21.208.176:60020] > >Which i assume is because, the clients took too long to put. > >The detailed log is as follows from one of the reduce job is as follows. > >I've 'censored' some of the details. which i assume is Okay.! :P >2012-04-23 20:07:12,815 INFO org.apache.hadoop.util.NativeCodeLoader: >Loaded the native-hadoop library >2012-04-23 20:07:13,097 WARN >org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already >exists! >2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client >environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46 GMT >2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client >environment:host.name=*****.***** >2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client >environment:java.version=1.6.0_22 >2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client >environment:java.vendor=Sun Microsystems Inc. >2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client >environment:java.home=/usr/lib/jvm/java-6-openjdk/jre >2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client >environment:java.class.path=**************************** >2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client >environment:java.library.path=********************** >2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client >environment:java.io.tmpdir=*************************** >2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client >environment:java.compiler=<NA> >2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client >environment:os.name=Linux >2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client >environment:os.arch=amd64 >2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client >environment:os.version=2.6.38-8-server >2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client >environment:user.name=raj > >2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client >environment:user.home=********* >2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client >environment:user.dir=**********************: >2012-04-23 20:07:13,790 INFO org.apache.zookeeper.ZooKeeper: Initiating >client connection, connectString=master:2181 sessionTimeout=180000 >watcher=hconnection >2012-04-23 20:07:13,822 INFO org.apache.zookeeper.ClientCnxn: Opening >socket connection to server /172.21.208.180:2181 >2012-04-23 20:07:13,823 INFO >org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of >this process is [email protected] >2012-04-23 20:07:13,825 INFO org.apache.zookeeper.ClientCnxn: Socket >connection established to master/172.21.208.180:2181, initiating session >2012-04-23 20:07:13,840 INFO org.apache.zookeeper.ClientCnxn: Session >establishment complete on server master/172.21.208.180:2181, sessionid = >0x136dfa124e90015, negotiated timeout = 180000 >2012-04-23 20:07:14,129 INFO com.raj.OptimisedTableOutputFormat: Created >table instance for index >2012-04-23 20:07:14,184 INFO org.apache.hadoop.util.ProcessTree: setsid >exited with exit code 0 >2012-04-23 20:07:14,205 INFO org.apache.hadoop.mapred.Task: Using >ResourceCalculatorPlugin : >org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4513e9fd >2012-04-23 20:08:49,852 WARN >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio >n: >Failed all from >region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb., >hostname=slave1, port=60020 >java.util.concurrent.ExecutionException: java.net.SocketTimeoutException: >Call to slave1/172.21.208.176:60020 failed on socket timeout exception: >java.net.SocketTimeoutException: 60000 millis timeout while waiting for >channel to be ready for read. ch : >java.nio.channels.SocketChannel[connected >local=/172.21.208.176:41135remote=slave1/ >172.21.208.176:60020] > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) > at java.util.concurrent.FutureTask.get(FutureTask.java:111) > at >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio >n.processBatchCallback(HConnectionManager.java:1557) > at >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio >n.processBatch(HConnectionManager.java:1409) > at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900) > at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773) > at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760) > at >com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableO >utputFormat.java:142) > at >com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableO >utputFormat.java:1) > at com.raj.HBaseStorage.putNext(HBaseStorage.java:583) > at >org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputForm >at$PigRecordWriter.write(PigOutputFormat.java:139) > at >org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputForm >at$PigRecordWriter.write(PigOutputFormat.java:98) > at >org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.ja >va:639) > at >org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputCo >ntext.java:80) > at >org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Ma >p.collect(PigMapOnly.java:48) > at >org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMap >Base.runPipeline(PigGenericMapBase.java:269) > at >org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMap >Base.map(PigGenericMapBase.java:262) > at >org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMap >Base.map(PigGenericMapBase.java:64) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation. >java:1083) > at org.apache.hadoop.mapred.Child.main(Child.java:249) >Caused by: java.net.SocketTimeoutException: Call to slave1/ >172.21.208.176:60020 failed on socket timeout exception: >java.net.SocketTimeoutException: 60000 millis timeout while waiting for >channel to be ready for read. ch : >java.nio.channels.SocketChannel[connected >local=/172.21.208.176:41135remote=slave1/ >172.21.208.176:60020] > at >org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930 >) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903) > at >org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEn >gine.java:150) > at $Proxy7.multi(Unknown Source) > at >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio >n$3$1.call(HConnectionManager.java:1386) > at >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio >n$3$1.call(HConnectionManager.java:1384) > at >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio >n.getRegionServerWithoutRetries(HConnectionManager.java:1365) > at >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio >n$3.call(HConnectionManager.java:1383) > at >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio >n$3.call(HConnectionManager.java:1381) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java: >1110) > at >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java >:603) > at java.lang.Thread.run(Thread.java:679) >Caused by: java.net.SocketTimeoutException: 60000 millis timeout while >waiting for channel to be ready for read. ch : >java.nio.channels.SocketChannel[connected >local=/172.21.208.176:41135remote=slave1/ >172.21.208.176:60020] > at >org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:16 >4) > at >org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > at >org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at >org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HB >aseClient.java:311) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read(BufferedInputStream.java:254) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at >org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseCl >ient.java:571) > at >org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:50 >5) >2012-04-23 20:09:51,018 WARN >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio >n: >Failed all from >region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb., >hostname=slave1, port=60020 >java.util.concurrent.ExecutionException: java.net.SocketTimeoutException: >Call to slave1/172.21.208.176:60020 failed on socket timeout exception: >java.net.SocketTimeoutException: 60000 millis timeout while waiting for >channel to be ready for read. ch : >java.nio.channels.SocketChannel[connected >local=/172.21.208.176:41150remote=slave1/ >172.21.208.176:60020] > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) > at java.util.concurrent.FutureTask.get(FutureTask.java:111) > at >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio >n.processBatchCallback(HConnectionManager.java:1557) > at >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio >n.processBatch(HConnectionManager.java:1409) > at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900) > at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773) > at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760) > at >com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableO >utputFormat.java:142) > at >com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableO >utputFormat.java:1) > at com.raj.HBaseStorage.putNext(HBaseStorage.java:583) > at >org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputForm >at$PigRecordWriter.write(PigOutputFormat.java:139) > at >org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputForm >at$PigRecordWriter.write(PigOutputFormat.java:98) > at >org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.ja >va:639) > at >org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputCo >ntext.java:80) > at >org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Ma >p.collect(PigMapOnly.java:48) > at >org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMap >Base.runPipeline(PigGenericMapBase.java:269) > at >org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMap >Base.map(PigGenericMapBase.java:262) > at >org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMap >Base.map(PigGenericMapBase.java:64) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation. >java:1083) > at org.apache.hadoop.mapred.Child.main(Child.java:249) >Caused by: java.net.SocketTimeoutException: Call to slave1/ >172.21.208.176:60020 failed on socket timeout exception: >java.net.SocketTimeoutException: 60000 millis timeout while waiting for >channel to be ready for read. ch : >java.nio.channels.SocketChannel[connected >local=/172.21.208.176:41150remote=slave1/ >172.21.208.176:60020] > at >org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930 >) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903) > at >org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEn >gine.java:150) > at $Proxy7.multi(Unknown Source) > at >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio >n$3$1.call(HConnectionManager.java:1386) > at >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio >n$3$1.call(HConnectionManager.java:1384) > at >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio >n.getRegionServerWithoutRetries(HConnectionManager.java:1365) > at >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio >n$3.call(HConnectionManager.java:1383) > at >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio >n$3.call(HConnectionManager.java:1381) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java: >1110) > at >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java >:603) > at java.lang.Thread.run(Thread.java:679) >Caused by: java.net.SocketTimeoutException: 60000 millis timeout while >waiting for channel to be ready for read. ch : >java.nio.channels.SocketChannel[connected >local=/172.21.208.176:41150remote=slave1/ >172.21.208.176:60020] > at >org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:16 >4) > at >org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > at >org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at >org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HB >aseClient.java:311) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read(BufferedInputStream.java:254) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at >org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseCl >ient.java:571) > at >org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:50 >5) > >-- >Thanks and Regards, >Raj
