Looks like you're running into HBASE-5898.
----- Original Message ----- From: Varun Sharma <[email protected]> To: [email protected] Cc: Sent: Wednesday, December 5, 2012 3:51 PM Subject: .META. region server DDOSed by too many clients Hi, I am running hbase 0.94.0 and I have a significant write load being put on a table with 98 regions on a 15 node cluster - also this write load comes from a very large number of clients (~ 1000). I am running with 10 priority IPC handlers and 200 IPC handlers. It seems the region server holding .META is DDOSed. All the 200 handlers are busy serving the .META. region and they are all locked onto on object. The Jstack is here for the regoin server "IPC Server handler 182 on 60020" daemon prio=10 tid=0x00007f329872c800 nid=0x4401 waiting on condition [0x00007f328807f000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000542d72e30> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:871) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1201) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290) at java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:445) at java.util.concurrent.ConcurrentHashMap.putIfAbsent(ConcurrentHashMap.java:925) at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:71) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:290) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.seekToDataBlock(HFileBlockIndex.java:213) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:455) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167) at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:299) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:244) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521) - locked <0x000000063b4965d0> (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402) - locked <0x000000063b4965d0> (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3354) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3310) - locked <0x0000000523c211e0> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3327) - locked <0x0000000523c211e0> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4066) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4039) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1941) The client side trace shows that we are looking for META region. thrift-worker-3499" daemon prio=10 tid=0x00007f789dd98800 nid=0xb52 waiting for monitor entry [0x00007f778672d000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:943) - waiting to lock <0x0000000707978298> (a java.lang.Object) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1482) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1367) at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:729) - locked <0x000000070821d5a0> (a org.apache.hadoop.hbase.client.HTable) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:698) at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:371) On the RS page, I see 68 million read requests for the META region while for the other 98 regions - we have done like 20 million write requests in total - regions have not moved around at all and no crashes have happened. Why do we have such an incredible number of scans over META and is there something I can do about this issue ? Varun
