I will check that, but if that is a server startup thing I was not aware I had 
to send it to the executors. So it’s like a connection timeout from executor 
code?


On Oct 28, 2016, at 9:48 AM, Ted Yu <[email protected]> wrote:

How did you change the timeout(s) ?

bq. timeout is currently set to 60000

Did you pass hbase-site.xml using --files to Spark job ?

Cheers

On Fri, Oct 28, 2016 at 9:27 AM, Pat Ferrel <[email protected]> wrote:

> Using standalone Spark. I don’t recall seeing connection lost errors, but
> there are lots of logs. I’ve set the scanner and RPC timeouts to large
> numbers on the servers.
> 
> But I also saw in the logs:
> 
>    org.apache.hadoop.hbase.client.ScannerTimeoutException: 381788ms
> passed since the last invocation, timeout is currently set to 60000
> 
> Not sure where that is coming from. Does the driver machine making queries
> need to have the timeout config also?
> 
> And why so large, am I doing something wrong?
> 
> 
> On Oct 28, 2016, at 8:50 AM, Ted Yu <[email protected]> wrote:
> 
> Mich:
> The OutOfOrderScannerNextException indicated problem with read from hbase.
> 
> How did you know connection to Spark cluster was lost ?
> 
> Cheers
> 
> On Fri, Oct 28, 2016 at 8:47 AM, Mich Talebzadeh <
> [email protected]>
> wrote:
> 
>> Looks like it lost the connection to Spark cluster.
>> 
>> What mode you are using with Spark, Standalone, Yarn or others. The issue
>> looks like a resource manager issue.
>> 
>> I have seen this when running Zeppelin with Spark on Hbase.
>> 
>> HTH
>> 
>> Dr Mich Talebzadeh
>> 
>> 
>> 
>> LinkedIn * https://www.linkedin.com/profile/view?id=
>> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
>> OABUrV8Pw>*
>> 
>> 
>> 
>> http://talebzadehmich.wordpress.com
>> 
>> 
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
>> loss, damage or destruction of data or any other property which may arise
>> from relying on this email's technical content is explicitly disclaimed.
>> The author will in no case be liable for any monetary damages arising
> from
>> such loss, damage or destruction.
>> 
>> 
>> 
>> On 28 October 2016 at 16:38, Pat Ferrel <[email protected]> wrote:
>> 
>>> I’m getting data from HBase using a large Spark cluster with parallelism
>>> of near 400. The query fails quire often with the message below.
>> Sometimes
>>> a retry will work and sometimes the ultimate failure results (below).
>>> 
>>> If I reduce parallelism in Spark it slows other parts of the algorithm
>>> unacceptably. I have also experimented with very large RPC/Scanner
>> timeouts
>>> of many minutes—to no avail.
>>> 
>>> Any clues about what to look for or what may be setup wrong in my
> tables?
>>> 
>>> Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 times,
>>> most recent failure: Lost task 44.3 in stage 147.0 (TID 24833,
>>> ip-172-16-3-9.eu-central-1.compute.internal): org.apache.hadoop.hbase.
>> DoNotRetryIOException:
>>> Failed after retry of OutOfOrderScannerNextException: was there a rpc
>>> timeout?+details
>>> Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 times,
>>> most recent failure: Lost task 44.3 in stage 147.0 (TID 24833,
>>> ip-172-16-3-9.eu-central-1.compute.internal): org.apache.hadoop.hbase.
>> DoNotRetryIOException:
>>> Failed after retry of OutOfOrderScannerNextException: was there a rpc
>>> timeout? at org.apache.hadoop.hbase.client.ClientScanner.next(
>> ClientScanner.java:403)
>>> at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.
> nextKeyValue(
>>> TableRecordReaderImpl.java:232) at org.apache.hadoop.hbase.
>>> mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138) at
>>> 
>> 
> 
> 

Reply via email to