Which release of hbase are you using ? Since the query to hbase comes from Spark, I assume there is no hbase Filter involved. So HBASE-13704 wouldn't be applicable in your case.
Can you pastebin region server log(s) around the OutOfOrderScannerNextException ? Thanks On Fri, Oct 28, 2016 at 8:38 AM, Pat Ferrel <[email protected]> wrote: > I’m getting data from HBase using a large Spark cluster with parallelism > of near 400. The query fails quire often with the message below. Sometimes > a retry will work and sometimes the ultimate failure results (below). > > If I reduce parallelism in Spark it slows other parts of the algorithm > unacceptably. I have also experimented with very large RPC/Scanner timeouts > of many minutes—to no avail. > > Any clues about what to look for or what may be setup wrong in my tables? > > Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 times, > most recent failure: Lost task 44.3 in stage 147.0 (TID 24833, > ip-172-16-3-9.eu-central-1.compute.internal): > org.apache.hadoop.hbase.DoNotRetryIOException: > Failed after retry of OutOfOrderScannerNextException: was there a rpc > timeout?+details > Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 times, > most recent failure: Lost task 44.3 in stage 147.0 (TID 24833, > ip-172-16-3-9.eu-central-1.compute.internal): > org.apache.hadoop.hbase.DoNotRetryIOException: > Failed after retry of OutOfOrderScannerNextException: was there a rpc > timeout? at > org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:403) > at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue( > TableRecordReaderImpl.java:232) at org.apache.hadoop.hbase. > mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138) at >
