Re: RS unresponsive after series of deletes

Jean-Daniel Cryans Wed, 20 Jun 2012 14:32:47 -0700

What you are describing here seems very different from what shown earlier.

In any case, a few remarks:


 - You have major compactions running during the time of that log
trace, this usually sucks up a lot of IO. See
http://hbase.apache.org/book.html#managed.compactions
 - That data you are deleting needs to be read when you scan, like I
said earlier a delete is in fact an insert in HBase and this isn't
cleared up until a major compaction happens. This is a lot of dead
data to cache during a scan :)
 - Do you have scanner caching turned on? Just to be sure set
scan.setCaching(1) and see if it makes any difference.

Also I was re-reading your previous answers and I'd like more info about this:

> I estimate about 1000 column qualifiers per row for
> a total of about 144k cells per batch.

Are you saying that you have Delete objects on which you did
deleteColumn() 1000x? If so, look no further there's your problem. It
would also explain why in your previous log we don't see any blocking
but just calls that take more than 5 minutes to run. Like Stack
mentioned, a cell delete requires reading the old cell first. Let's
say that takes on average 1.5ms. Then doing the insert takes another
1.5ms. The total time to process a 144k cell batch would be:

144000 * 1.5 * 1.5 / 1000 = 324 seconds

This is not very far from the numbers I'm seeing in your original log:
http://pastebin.com/z9qxGuJS

"processingtimems":314804
"processingtimems":356151
"processingtimems":398570
etc

It seems you need to review how you are using HBase because right now
unless you aggressively major compact your tables you won't get good
performance with all those delete markers.

J-D

On Wed, Jun 20, 2012 at 1:49 PM, Ted Tuttle <[email protected]> wrote:
>> Like Stack said in his reply, have you thread dumped the slow region
>> servers when this happens?
>
> I've been having difficulty reproducing this behavior in controlled
> manner. While I haven't been able to get my client to hang up while
> doing deletes, I have found a query that when issued after a big delete
> takes a very long (>10m) time.
>
> The timeline for this is:
>
> 1) insert a bunch of data
> 2) delete it all in 500 calls to HTable.delete(List<Delete>)
> 3) scan table for data that was just deleted (500 scans with various row
> start/end, where scan bails as soon as first key of first row is found
> for a particular row start/end pair)
>
> The last part is very fast on undeleted data.  For my recently deleted
> data this takes ~15min.  When I look at RS CPU it is spiking like my
> unresponsive episodes in the "wild".
>
> Looking at busy RS I see:
>
> Thread 50 (IPC Server handler 1 on 60020):
>  State: RUNNABLE
>  Blocked count: 384389
>  Waited count: 12192804
>  Stack:
>
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.peek(StoreFileScan
> ner.java:121)
>
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyVal
> ueHeap.java:282)
>
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.ja
> va:244)
>
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.ja
> va:521)
>
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java
> :402)
>
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java
> :127)
>
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInter
> nal(HRegion.java:3354)
>
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HReg
> ion.java:3310)
>
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HReg
> ion.java:3327)
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.ja
> va:2393)
>    sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
> Impl.java:25)
>    java.lang.reflect.Method.invoke(Method.java:597)
>
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEng
> ine.java:364)
>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:137
> 6)
>
> And lots of block cache churn in the logs:
>
> 2012-06-20 13:13:55,572 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
> started; Attempting to free 409.4 MB of total=3.4 GB
>
>> It would also help to see the log during
>> that time
>
> More of logs here: http://pastebin.com/4annviTS
>
>
>
>
>
>

Re: RS unresponsive after series of deletes

Reply via email to