Re: Scanner timeouts

Pat Ferrel Fri, 28 Oct 2016 10:23:19 -0700

So to clarify there are some values in hbase/conf/hbase-site.xml that are 
needed by the calling code in the Spark driver and executors and so must be 
passed using --files to spark-submit? If so I can do this.


But do I have a deeper issue? Is it typical to need a scan like this? Have I 
missed indexing some column maybe?


On Oct 28, 2016, at 9:59 AM, Ted Yu <[email protected]> wrote:

Mich:
bq. on table 'hbase:meta' *at region=hbase:meta,,1.1588230740

What you observed was different issue.
The above looks like trouble with locating region(s) during scan.

On Fri, Oct 28, 2016 at 9:54 AM, Mich Talebzadeh <[email protected]>
wrote:

> This is an example I got
> 
> warning: there were two deprecation warnings; re-run with -deprecation for
> details
> rdd1: org.apache.spark.rdd.RDD[(String, String)] = MapPartitionsRDD[77] at
> map at <console>:151
> defined class columns
> dfTICKER: org.apache.spark.sql.Dataset[columns] = [KEY: string, TICKER:
> string]
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=36, exceptions:
> *Fri Oct 28 13:13:46 BST 2016, null, java.net.SocketTimeoutException:
> callTimeout=60000, callDuration=68411: row
> 'MARKETDATAHBASE,,00000000000000' on table 'hbase:meta' *at
> region=hbase:meta,,1.1588230740, hostname=rhes564,16201,1477246132044,
> seqNum=0
>  at
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepli
> cas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276)
>  at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(
> ScannerCallableWithReplicas.java:210)
>  at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(
> ScannerCallableWithReplicas.java:60)
>  at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(
> RpcRetryingCaller.java:210)
> 
> 
> 
> Dr Mich Talebzadeh
> 
> 
> 
> LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw>*
> 
> 
> 
> http://talebzadehmich.wordpress.com
> 
> 
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
> 
> 
> 
> On 28 October 2016 at 17:52, Pat Ferrel <[email protected]> wrote:
> 
>> I will check that, but if that is a server startup thing I was not aware
> I
>> had to send it to the executors. So it’s like a connection timeout from
>> executor code?
>> 
>> 
>> On Oct 28, 2016, at 9:48 AM, Ted Yu <[email protected]> wrote:
>> 
>> How did you change the timeout(s) ?
>> 
>> bq. timeout is currently set to 60000
>> 
>> Did you pass hbase-site.xml using --files to Spark job ?
>> 
>> Cheers
>> 
>> On Fri, Oct 28, 2016 at 9:27 AM, Pat Ferrel <[email protected]>
> wrote:
>> 
>>> Using standalone Spark. I don’t recall seeing connection lost errors,
> but
>>> there are lots of logs. I’ve set the scanner and RPC timeouts to large
>>> numbers on the servers.
>>> 
>>> But I also saw in the logs:
>>> 
>>>   org.apache.hadoop.hbase.client.ScannerTimeoutException: 381788ms
>>> passed since the last invocation, timeout is currently set to 60000
>>> 
>>> Not sure where that is coming from. Does the driver machine making
>> queries
>>> need to have the timeout config also?
>>> 
>>> And why so large, am I doing something wrong?
>>> 
>>> 
>>> On Oct 28, 2016, at 8:50 AM, Ted Yu <[email protected]> wrote:
>>> 
>>> Mich:
>>> The OutOfOrderScannerNextException indicated problem with read from
>> hbase.
>>> 
>>> How did you know connection to Spark cluster was lost ?
>>> 
>>> Cheers
>>> 
>>> On Fri, Oct 28, 2016 at 8:47 AM, Mich Talebzadeh <
>>> [email protected]>
>>> wrote:
>>> 
>>>> Looks like it lost the connection to Spark cluster.
>>>> 
>>>> What mode you are using with Spark, Standalone, Yarn or others. The
>> issue
>>>> looks like a resource manager issue.
>>>> 
>>>> I have seen this when running Zeppelin with Spark on Hbase.
>>>> 
>>>> HTH
>>>> 
>>>> Dr Mich Talebzadeh
>>>> 
>>>> 
>>>> 
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=
>>>> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=
>> AAEAAAAWh2gBxianrbJd6zP6AcPCCd
>>>> OABUrV8Pw>*
>>>> 
>>>> 
>>>> 
>>>> http://talebzadehmich.wordpress.com
>>>> 
>>>> 
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any
>>>> loss, damage or destruction of data or any other property which may
>> arise
>>>> from relying on this email's technical content is explicitly
> disclaimed.
>>>> The author will in no case be liable for any monetary damages arising
>>> from
>>>> such loss, damage or destruction.
>>>> 
>>>> 
>>>> 
>>>> On 28 October 2016 at 16:38, Pat Ferrel <[email protected]>
> wrote:
>>>> 
>>>>> I’m getting data from HBase using a large Spark cluster with
>> parallelism
>>>>> of near 400. The query fails quire often with the message below.
>>>> Sometimes
>>>>> a retry will work and sometimes the ultimate failure results (below).
>>>>> 
>>>>> If I reduce parallelism in Spark it slows other parts of the
> algorithm
>>>>> unacceptably. I have also experimented with very large RPC/Scanner
>>>> timeouts
>>>>> of many minutes—to no avail.
>>>>> 
>>>>> Any clues about what to look for or what may be setup wrong in my
>>> tables?
>>>>> 
>>>>> Job aborted due to stage failure: Task 44 in stage 147.0 failed 4
>> times,
>>>>> most recent failure: Lost task 44.3 in stage 147.0 (TID 24833,
>>>>> ip-172-16-3-9.eu-central-1.compute.internal):
> org.apache.hadoop.hbase.
>>>> DoNotRetryIOException:
>>>>> Failed after retry of OutOfOrderScannerNextException: was there a
> rpc
>>>>> timeout?+details
>>>>> Job aborted due to stage failure: Task 44 in stage 147.0 failed 4
>> times,
>>>>> most recent failure: Lost task 44.3 in stage 147.0 (TID 24833,
>>>>> ip-172-16-3-9.eu-central-1.compute.internal):
> org.apache.hadoop.hbase.
>>>> DoNotRetryIOException:
>>>>> Failed after retry of OutOfOrderScannerNextException: was there a
> rpc
>>>>> timeout? at org.apache.hadoop.hbase.client.ClientScanner.next(
>>>> ClientScanner.java:403)
>>>>> at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.
>>> nextKeyValue(
>>>>> TableRecordReaderImpl.java:232) at org.apache.hadoop.hbase.
>>>>> mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138)
>> at
>>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>

Re: Scanner timeouts

Reply via email to