Re: Scanner timeouts

Ted Yu Fri, 28 Oct 2016 09:49:49 -0700

How did you change the timeout(s) ?

bq. timeout is currently set to 60000


Did you pass hbase-site.xml using --files to Spark job ?

Cheers

On Fri, Oct 28, 2016 at 9:27 AM, Pat Ferrel <[email protected]> wrote:

> Using standalone Spark. I don’t recall seeing connection lost errors, but
> there are lots of logs. I’ve set the scanner and RPC timeouts to large
> numbers on the servers.
>
> But I also saw in the logs:
>
>     org.apache.hadoop.hbase.client.ScannerTimeoutException: 381788ms
> passed since the last invocation, timeout is currently set to 60000
>
> Not sure where that is coming from. Does the driver machine making queries
> need to have the timeout config also?
>
> And why so large, am I doing something wrong?
>
>
> On Oct 28, 2016, at 8:50 AM, Ted Yu <[email protected]> wrote:
>
> Mich:
> The OutOfOrderScannerNextException indicated problem with read from hbase.
>
> How did you know connection to Spark cluster was lost ?
>
> Cheers
>
> On Fri, Oct 28, 2016 at 8:47 AM, Mich Talebzadeh <
> [email protected]>
> wrote:
>
> > Looks like it lost the connection to Spark cluster.
> >
> > What mode you are using with Spark, Standalone, Yarn or others. The issue
> > looks like a resource manager issue.
> >
> > I have seen this when running Zeppelin with Spark on Hbase.
> >
> > HTH
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 28 October 2016 at 16:38, Pat Ferrel <[email protected]> wrote:
> >
> >> I’m getting data from HBase using a large Spark cluster with parallelism
> >> of near 400. The query fails quire often with the message below.
> > Sometimes
> >> a retry will work and sometimes the ultimate failure results (below).
> >>
> >> If I reduce parallelism in Spark it slows other parts of the algorithm
> >> unacceptably. I have also experimented with very large RPC/Scanner
> > timeouts
> >> of many minutes—to no avail.
> >>
> >> Any clues about what to look for or what may be setup wrong in my
> tables?
> >>
> >> Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 times,
> >> most recent failure: Lost task 44.3 in stage 147.0 (TID 24833,
> >> ip-172-16-3-9.eu-central-1.compute.internal): org.apache.hadoop.hbase.
> > DoNotRetryIOException:
> >> Failed after retry of OutOfOrderScannerNextException: was there a rpc
> >> timeout?+details
> >> Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 times,
> >> most recent failure: Lost task 44.3 in stage 147.0 (TID 24833,
> >> ip-172-16-3-9.eu-central-1.compute.internal): org.apache.hadoop.hbase.
> > DoNotRetryIOException:
> >> Failed after retry of OutOfOrderScannerNextException: was there a rpc
> >> timeout? at org.apache.hadoop.hbase.client.ClientScanner.next(
> > ClientScanner.java:403)
> >> at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.
> nextKeyValue(
> >> TableRecordReaderImpl.java:232) at org.apache.hadoop.hbase.
> >> mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138) at
> >>
> >
>
>

Re: Scanner timeouts

Reply via email to