Hmm...

Does something like the below help?


diff --git 
a/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
index f9627ed..0cee8e3 100644
--- a/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
+++ b/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
@@ -2137,11 +2137,7 @@ public class HRegionServer implements HRegionInterface, 
HBaseRPCErrorHandler,
       }
       throw e;
     }
-    Leases.Lease lease = null;
     try {
-      // Remove lease while its being processed in server; protects against 
case
-      // where processing of request takes > lease expiration time.
-      lease = this.leases.removeLease(scannerName);
       List<Result> results = new ArrayList<Result>(nbRows);
       long currentScanResultSize = 0;
       List<KeyValue> values = new ArrayList<KeyValue>();
@@ -2197,10 +2193,9 @@ public class HRegionServer implements HRegionInterface, 
HBaseRPCErrorHandler,
       }
       throw convertThrowableToIOE(cleanup(t));
     } finally {
-      // We're done. On way out readd the above removed lease.  Adding resets
-      // expiration time on lease.
+      // We're done. On way out reset expiration time on lease.
       if (this.scanners.containsKey(scannerName)) {
-        if (lease != null) this.leases.addLease(lease);
+        this.leases.renewLease(scannerName);
       }
     }
   }


 
Best regards,

    - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)



----- Original Message -----
> From: Jean-Daniel Cryans <[email protected]>
> To: [email protected]
> Cc: 
> Sent: Wednesday, February 15, 2012 10:17 AM
> Subject: Re: LeaseException while extracting data via pig/hbase integration
> 
> You would have to grep the lease's id, in your first email it was
> "-7220618182832784549".
> 
> About the time it takes to process each row, I meant client (pig) side
> not in the RS.
> 
> J-D
> 
> On Tue, Feb 14, 2012 at 1:33 PM, Mikael Sitruk <[email protected]> 
> wrote:
>>  Please see answer inline
>>  Thanks
>>  Mikael.S
>> 
>>  On Tue, Feb 14, 2012 at 8:30 PM, Jean-Daniel Cryans 
> <[email protected]>wrote:
>> 
>>>  On Tue, Feb 14, 2012 at 2:01 AM, Mikael Sitruk 
> <[email protected]>
>>>  wrote:
>>>  > hi,
>>>  > Well no, i can't figure out what is the problem, but i saw 
> that someone
>>>  > else had the same problem (see email: "LeaseException despite 
> high
>>>  > hbase.regionserver.lease.period")
>>>  > What can i tell is the following:
>>>  > Last week the problem was consistent
>>>  > 1. I updated hbase.regionserver.lease.period=300000 (5 mins), 
> restarted
>>>  the
>>>  > cluster and still got the problem, the map got this exception 
> event
>>>  before
>>>  > the 5 mins, (some after 1 min and 20 sec)
>>> 
>>>  That's extremely suspicious. Are you sure the setting is getting 
> picked
>>>  up? :) I hope so :-)
>>> 
>>>  You should be able to tell when the lease really expires by simply
>>>  grepping for the number in the region server log, it should give you a
>>>  good idea of what your lease period is.
>>>   greeping on which value? the lease configured here:300000? It does not
>>>  return anything, also tried in current execution where some were ok and
>>>  some were not
>>> 
>>>  2. The problem occurs only on job that will extract a large number of
>>>  > columns (>150 cols per row)
>>> 
>>>  What's your scanner caching set to? Are you spending a lot of time
>>>  processing each row? from the job configuration generated by pig i can 
> see
>>>  caching set to 1, regarding the processing time of each row i have no 
> clue
>>>  how many time it spent. the data for each row is 150 columns of 2k 
> each.
>>>  This is approx 5 block to bring.
>>> 
>>>  > 3. The problem never occurred when only 1 map per server is 
> running (i
>>>  have
>>>  > 8 CPU with hyper-threaded enabled = 16, so using only 1 map per 
> machine
>>>  is
>>>  > just a waste), (at this stage I was thinking perhaps there is a
>>>  > multi-threaded problem)
>>> 
>>>  More mappers would pull more data from the region servers so more
>>>  concurrency from the disks, using more mappers might just slow you
>>>  down enough that you hit the issue.
>>> 
>>  Today i ran with 8 mappers and some failed and some didn't (2 of 4), 
> they
>>  got the lease exception after 5 mins, i will try to check the
>>  logs/sar/metric files for additional info
>> 
>>> 
>>>  >
>>>  > This week i got a sightly different behavior, after having 
> restarted the
>>>  > servers. The extract were able to ran ok in most of the runs even 
> with 4
>>>  > maps running (per servers), i got only once the exception but the 
> job was
>>>  > not killed as other runs last week
>>> 
>>>  If the client got an UnknownScannerException before the timeout
>>>  expires (the client also keeps track of it, although it may have a
>>>  different configuration), it will recreate the scanner.
>>> 
>>  No this is not the case.
>> 
>>> 
>>>  Which reminds me, are your regions moving around? If so, and your
>>>  clients don't know about the high timeout, then they might let the
>>>  exception pass on to your own code.
>>> 
>>  Region are presplited ahead, i do not have any region split during the run,
>>  region size is set of 8GB, storefile is around 3.5G
>> 
>>  The test was run after major compaction, so the number of store file is 1
>>  per RS/family
>> 
>> 
>>> 
>>>  J-D
>>> 
>> 
>> 
>> 
>>  --
>>  Mikael.S
>

Reply via email to