> Like Stack said in his reply, have you thread dumped the slow region
> servers when this happens?
I've been having difficulty reproducing this behavior in controlled
manner. While I haven't been able to get my client to hang up while
doing deletes, I have found a query that when issued after a big delete
takes a very long (>10m) time.
The timeline for this is:
1) insert a bunch of data
2) delete it all in 500 calls to HTable.delete(List<Delete>)
3) scan table for data that was just deleted (500 scans with various row
start/end, where scan bails as soon as first key of first row is found
for a particular row start/end pair)
The last part is very fast on undeleted data. For my recently deleted
data this takes ~15min. When I look at RS CPU it is spiking like my
unresponsive episodes in the "wild".
Looking at busy RS I see:
Thread 50 (IPC Server handler 1 on 60020):
State: RUNNABLE
Blocked count: 384389
Waited count: 12192804
Stack:
org.apache.hadoop.hbase.regionserver.StoreFileScanner.peek(StoreFileScan
ner.java:121)
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyVal
ueHeap.java:282)
org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.ja
va:244)
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.ja
va:521)
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java
:402)
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java
:127)
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInter
nal(HRegion.java:3354)
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HReg
ion.java:3310)
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HReg
ion.java:3327)
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.ja
va:2393)
sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
java.lang.reflect.Method.invoke(Method.java:597)
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEng
ine.java:364)
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:137
6)
And lots of block cache churn in the logs:
2012-06-20 13:13:55,572 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
started; Attempting to free 409.4 MB of total=3.4 GB
> It would also help to see the log during
> that time
More of logs here: http://pastebin.com/4annviTS