Looking at the stack trace, I found the following hot spot:

   1.
   
org.apache.hadoop.hbase.regionserver.StoreFileScanner.realSeekDone(StoreFileScanner.java:340)
   2.
   
org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:331)
   3.
   org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:105)
   4.
   org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:406)
   5.
   org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127)
   6.
   
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3354)
   7.
   
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3310)
   8.
   
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3327)
   9.
   org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4066)
   10.
   
org.apache.hadoop.hbase.regionserver.HRegion.prepareDeleteTimestamps(HRegion.java:1710)
   11.
   org.apache.hadoop.hbase.regionserver.HRegion.internalDelete(HRegion.java:1753

>From HRegion:

      for (KeyValue kv: kvs) {
        //  Check if time is LATEST, change to time of most recent addition
if so
        //  This is expensive.
        if (kv.isLatestTimestamp() && kv.isDeleteType()) {
...
          List<KeyValue> result = get(get, false);

We perform get() for each kv whose time is LATEST.
This explains the unresponsiveness.

FYI

On Wed, Jun 20, 2012 at 5:07 PM, Ted Tuttle <[email protected]>wrote:

> First off, J-D, thanks for helping me work through this.  You've
> inspired some different angles and I think I've finally made it bleed in
> a controlled way.
>
> > - That data you are deleting needs to be read when you scan, like I
> > said earlier a delete is in fact an insert in HBase and this isn't
> > cleared up until a major compaction happens.
>
> I manually compacted (via UI) the table that I deleted from.  The scan
> times are still >10min.  When reading through each node's log, I see
> some messages indicating the major compactions were going to be skipped.
> Is it safe to say that hitting that 'Compact' button is just a
> recommendation?  Is there an operation we can perform after a big delete
> to guarantee that deletes get compacted away?
>
> > Do you have scanner caching turned on? Just to be sure set
> > scan.setCaching(1) and see if it makes any difference.
>
> A bit confused here.  Under what conditions would you recommend setting
> the scan caching to 1?  My read path doesn't know about whether a lot of
> data was recently deleted so I can't disable it conditionally. I want
> scan caching in general, I believe.
>
> > Are you saying that you have Delete objects on which you did
> > deleteColumn() 1000x? If so, look no further there's your problem.
>
> I am calling deleteColumn() thousands of time per Delete object.
>
> I can delete a row w/ 20k keys in ~2 sec. If I issue 10 of these (they
> appear to fired off asynchronously by the client), the unresponsive RS
> behavior ensues.  Here is a stack dump from a RS that is running at >90%
> utilization as it processes my deletes:
>
> http://pastebin.com/8y5x4xU7
>
> Some logs around this time:
>
> http://pastebin.com/UpPMbsmn
>
> So, my takeaway is the RS don't like being slammed w/ 100s of thousands
> cell deletes.  I can be more measured about these deletes going forward.
> That the RSs don't handle this more gracefully sounds like a bug. At a
> minimum, there appears to be a nonlinear response. What do you think?
>
>
>
>

Reply via email to