[jira] [Updated] (HBASE-26997) Auto renew scanner lease in TableRecordReader

2022-05-04 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault updated HBASE-26997:
--
Status: Open  (was: Patch Available)

> Auto renew scanner lease in TableRecordReader
> -
>
> Key: HBASE-26997
> URL: https://issues.apache.org/jira/browse/HBASE-26997
> Project: HBase
>  Issue Type: New Feature
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: patch-available
>
> A common problem with hadoop jobs is when the mapper takes too long to 
> process individual inputs. This is especially problematic with 
> TableInputFormat because if you don't process a scanner.next() batch within 
> the scanner timeout period your job will fail with UnknownScannerException.
> The fix here is usually to reduce Scan.setCaching, so that fewer rows are 
> returned within each batch. This isn't always a great solution because maybe 
> not all batches are uniform in their processing time, or maybe even 
> processing a single row (the smallest caching size) might take a while.
> We can improve this for users by providing a configurable period at which the 
> TableRecordReader will automatically call scanner.renewLease() unless next() 
> was recently called.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-26997) Auto renew scanner lease in TableRecordReader

2022-05-04 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault updated HBASE-26997:
--
Labels: patch-available  (was: )
Status: Patch Available  (was: Open)

> Auto renew scanner lease in TableRecordReader
> -
>
> Key: HBASE-26997
> URL: https://issues.apache.org/jira/browse/HBASE-26997
> Project: HBase
>  Issue Type: New Feature
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: patch-available
>
> A common problem with hadoop jobs is when the mapper takes too long to 
> process individual inputs. This is especially problematic with 
> TableInputFormat because if you don't process a scanner.next() batch within 
> the scanner timeout period your job will fail with UnknownScannerException.
> The fix here is usually to reduce Scan.setCaching, so that fewer rows are 
> returned within each batch. This isn't always a great solution because maybe 
> not all batches are uniform in their processing time, or maybe even 
> processing a single row (the smallest caching size) might take a while.
> We can improve this for users by providing a configurable period at which the 
> TableRecordReader will automatically call scanner.renewLease() unless next() 
> was recently called.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)