How did you change the timeout(s) ? bq. timeout is currently set to 60000
Did you pass hbase-site.xml using --files to Spark job ? Cheers On Fri, Oct 28, 2016 at 9:27 AM, Pat Ferrel <[email protected]> wrote: > Using standalone Spark. I don’t recall seeing connection lost errors, but > there are lots of logs. I’ve set the scanner and RPC timeouts to large > numbers on the servers. > > But I also saw in the logs: > > org.apache.hadoop.hbase.client.ScannerTimeoutException: 381788ms > passed since the last invocation, timeout is currently set to 60000 > > Not sure where that is coming from. Does the driver machine making queries > need to have the timeout config also? > > And why so large, am I doing something wrong? > > > On Oct 28, 2016, at 8:50 AM, Ted Yu <[email protected]> wrote: > > Mich: > The OutOfOrderScannerNextException indicated problem with read from hbase. > > How did you know connection to Spark cluster was lost ? > > Cheers > > On Fri, Oct 28, 2016 at 8:47 AM, Mich Talebzadeh < > [email protected]> > wrote: > > > Looks like it lost the connection to Spark cluster. > > > > What mode you are using with Spark, Standalone, Yarn or others. The issue > > looks like a resource manager issue. > > > > I have seen this when running Zeppelin with Spark on Hbase. > > > > HTH > > > > Dr Mich Talebzadeh > > > > > > > > LinkedIn * https://www.linkedin.com/profile/view?id= > > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd > > OABUrV8Pw>* > > > > > > > > http://talebzadehmich.wordpress.com > > > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > > loss, damage or destruction of data or any other property which may arise > > from relying on this email's technical content is explicitly disclaimed. > > The author will in no case be liable for any monetary damages arising > from > > such loss, damage or destruction. > > > > > > > > On 28 October 2016 at 16:38, Pat Ferrel <[email protected]> wrote: > > > >> I’m getting data from HBase using a large Spark cluster with parallelism > >> of near 400. The query fails quire often with the message below. > > Sometimes > >> a retry will work and sometimes the ultimate failure results (below). > >> > >> If I reduce parallelism in Spark it slows other parts of the algorithm > >> unacceptably. I have also experimented with very large RPC/Scanner > > timeouts > >> of many minutes—to no avail. > >> > >> Any clues about what to look for or what may be setup wrong in my > tables? > >> > >> Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 times, > >> most recent failure: Lost task 44.3 in stage 147.0 (TID 24833, > >> ip-172-16-3-9.eu-central-1.compute.internal): org.apache.hadoop.hbase. > > DoNotRetryIOException: > >> Failed after retry of OutOfOrderScannerNextException: was there a rpc > >> timeout?+details > >> Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 times, > >> most recent failure: Lost task 44.3 in stage 147.0 (TID 24833, > >> ip-172-16-3-9.eu-central-1.compute.internal): org.apache.hadoop.hbase. > > DoNotRetryIOException: > >> Failed after retry of OutOfOrderScannerNextException: was there a rpc > >> timeout? at org.apache.hadoop.hbase.client.ClientScanner.next( > > ClientScanner.java:403) > >> at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl. > nextKeyValue( > >> TableRecordReaderImpl.java:232) at org.apache.hadoop.hbase. > >> mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138) at > >> > > > >
