One thing you might want to watch out for is that if you are starting with 421 regions on the source but the dest table isn't pre-split then it's going to try to slam all the data into one region and then have to split (and split and split, etc.).
http://hbase.apache.org/book.html#perf.writing On 5/7/12 8:22 AM, "Damien HARDY" <[email protected]> wrote: >Hello, > >I try to copy a table from on cluster to another. > >source is a 2 nodes cluster 16cpu / 32GoRAM (hadoop001, hadoop002). >destination is a 3 nodes cluster 16cpu /64GoRAM (hbase01, hbase02, >hbase04). >nodes are all implementing datanode, regionserver,masterserver and >zookeeper of CDH3u3 >region size is 2G on source and 4G on destination >Max Heap is 8G on region server >There is 421 regions on source (80% is empty because of TTL and time >based rowkey) >about 400k rows of syslogs (1 or 2ko). > >My problem is that when I performe au MR job with >hbase org.apache.hadoop.hbase.mapreduce.CopyTable >--peer.adr=hbase01,hbase02,hbase04:2181:/hbase <table> >on the source cluster. Nodes seems not doing so much (load average is >low) and load on regionservers is unconstant (going from 0 to 60k >request/sec with bursts) > >12/05/07 12:08:17 INFO mapred.JobClient: Task Id : >attempt_201204241409_0005_m_000343_0, Status : FAILED >org.apache.hadoop.hbase.client.ScannerTimeoutException: 184008ms passed >since the last invocation, timeout is currently set to 120000 > at >org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1179) > at >org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(Table >RecordReaderImpl.java:133) > at >org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableReco >rdReader.java:142) > at >org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapT >ask.java:456) > at >org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) > at org.apache.hadoop.mapred.Child$4.run(Child.java:270) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation. >j > >After some retries the tasks seems to succeed but it's very long. > >parameters that I tryied to change on submiting job: >hbase.client.scanner.caching is 2000 >hbase.regionserver.lease.period is 120000 > >What other parameters could be helpfull to maximise the use of the >cluster resources for this simple copy. >I can't see where is bootleneck. > >Thank you. > >-- >Damien >
