[ 
https://issues.apache.org/jira/browse/HBASE-26997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-26997.
---------------------------------------
    Resolution: Not A Problem

I'm going to close this actually. I realized that the retries of 
UnknownScannerException are actually scoped to the RPC, not the overall scanner 
itself. So it should theoretically work to exceed the scanner lease timeout for 
every single next() call and the job should still continue. This may not be the 
most efficient thing, but you're already not doing the most efficient thing by 
this point and calling renewLease probably won't improve that.

> Auto renew scanner lease in TableRecordReader
> ---------------------------------------------
>
>                 Key: HBASE-26997
>                 URL: https://issues.apache.org/jira/browse/HBASE-26997
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Major
>              Labels: patch-available
>
> A common problem with hadoop jobs is when the mapper takes too long to 
> process individual inputs. This is especially problematic with 
> TableInputFormat because if you don't process a scanner.next() batch within 
> the scanner timeout period your job will fail with UnknownScannerException.
> The fix here is usually to reduce Scan.setCaching, so that fewer rows are 
> returned within each batch. This isn't always a great solution because maybe 
> not all batches are uniform in their processing time, or maybe even 
> processing a single row (the smallest caching size) might take a while.
> We can improve this for users by providing a configurable period at which the 
> TableRecordReader will automatically call scanner.renewLease() unless next() 
> was recently called.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to