I will check that, but if that is a server startup thing I was not aware I had to send it to the executors. So it’s like a connection timeout from executor code?
On Oct 28, 2016, at 9:48 AM, Ted Yu <[email protected]> wrote: How did you change the timeout(s) ? bq. timeout is currently set to 60000 Did you pass hbase-site.xml using --files to Spark job ? Cheers On Fri, Oct 28, 2016 at 9:27 AM, Pat Ferrel <[email protected]> wrote: > Using standalone Spark. I don’t recall seeing connection lost errors, but > there are lots of logs. I’ve set the scanner and RPC timeouts to large > numbers on the servers. > > But I also saw in the logs: > > org.apache.hadoop.hbase.client.ScannerTimeoutException: 381788ms > passed since the last invocation, timeout is currently set to 60000 > > Not sure where that is coming from. Does the driver machine making queries > need to have the timeout config also? > > And why so large, am I doing something wrong? > > > On Oct 28, 2016, at 8:50 AM, Ted Yu <[email protected]> wrote: > > Mich: > The OutOfOrderScannerNextException indicated problem with read from hbase. > > How did you know connection to Spark cluster was lost ? > > Cheers > > On Fri, Oct 28, 2016 at 8:47 AM, Mich Talebzadeh < > [email protected]> > wrote: > >> Looks like it lost the connection to Spark cluster. >> >> What mode you are using with Spark, Standalone, Yarn or others. The issue >> looks like a resource manager issue. >> >> I have seen this when running Zeppelin with Spark on Hbase. >> >> HTH >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * https://www.linkedin.com/profile/view?id= >> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd >> OABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for any >> loss, damage or destruction of data or any other property which may arise >> from relying on this email's technical content is explicitly disclaimed. >> The author will in no case be liable for any monetary damages arising > from >> such loss, damage or destruction. >> >> >> >> On 28 October 2016 at 16:38, Pat Ferrel <[email protected]> wrote: >> >>> I’m getting data from HBase using a large Spark cluster with parallelism >>> of near 400. The query fails quire often with the message below. >> Sometimes >>> a retry will work and sometimes the ultimate failure results (below). >>> >>> If I reduce parallelism in Spark it slows other parts of the algorithm >>> unacceptably. I have also experimented with very large RPC/Scanner >> timeouts >>> of many minutes—to no avail. >>> >>> Any clues about what to look for or what may be setup wrong in my > tables? >>> >>> Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 times, >>> most recent failure: Lost task 44.3 in stage 147.0 (TID 24833, >>> ip-172-16-3-9.eu-central-1.compute.internal): org.apache.hadoop.hbase. >> DoNotRetryIOException: >>> Failed after retry of OutOfOrderScannerNextException: was there a rpc >>> timeout?+details >>> Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 times, >>> most recent failure: Lost task 44.3 in stage 147.0 (TID 24833, >>> ip-172-16-3-9.eu-central-1.compute.internal): org.apache.hadoop.hbase. >> DoNotRetryIOException: >>> Failed after retry of OutOfOrderScannerNextException: was there a rpc >>> timeout? at org.apache.hadoop.hbase.client.ClientScanner.next( >> ClientScanner.java:403) >>> at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl. > nextKeyValue( >>> TableRecordReaderImpl.java:232) at org.apache.hadoop.hbase. >>> mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138) at >>> >> > >
