Can you start a separate thread?
On Oct 28, 2016, at 10:29 AM, Mich Talebzadeh <[email protected]> wrote: sorry do you mean in my error case the issue was locating regions during scan. in that case I do not know why it works through spark shell but not Zeppelin? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 28 October 2016 at 17:59, Ted Yu <[email protected]> wrote: > Mich: > bq. on table 'hbase:meta' *at region=hbase:meta,,1.1588230740 > > What you observed was different issue. > The above looks like trouble with locating region(s) during scan. > > On Fri, Oct 28, 2016 at 9:54 AM, Mich Talebzadeh < > [email protected]> > wrote: > >> This is an example I got >> >> warning: there were two deprecation warnings; re-run with -deprecation > for >> details >> rdd1: org.apache.spark.rdd.RDD[(String, String)] = MapPartitionsRDD[77] > at >> map at <console>:151 >> defined class columns >> dfTICKER: org.apache.spark.sql.Dataset[columns] = [KEY: string, TICKER: >> string] >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after >> attempts=36, exceptions: >> *Fri Oct 28 13:13:46 BST 2016, null, java.net.SocketTimeoutException: >> callTimeout=60000, callDuration=68411: row >> 'MARKETDATAHBASE,,00000000000000' on table 'hbase:meta' *at >> region=hbase:meta,,1.1588230740, hostname=rhes564,16201,1477246132044, >> seqNum=0 >> at >> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepli >> cas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276) >> at >> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call( >> ScannerCallableWithReplicas.java:210) >> at >> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call( >> ScannerCallableWithReplicas.java:60) >> at >> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries( >> RpcRetryingCaller.java:210) >> >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * https://www.linkedin.com/profile/view?id= >> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd >> OABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for any >> loss, damage or destruction of data or any other property which may arise >> from relying on this email's technical content is explicitly disclaimed. >> The author will in no case be liable for any monetary damages arising > from >> such loss, damage or destruction. >> >> >> >> On 28 October 2016 at 17:52, Pat Ferrel <[email protected]> wrote: >> >>> I will check that, but if that is a server startup thing I was not > aware >> I >>> had to send it to the executors. So it’s like a connection timeout from >>> executor code? >>> >>> >>> On Oct 28, 2016, at 9:48 AM, Ted Yu <[email protected]> wrote: >>> >>> How did you change the timeout(s) ? >>> >>> bq. timeout is currently set to 60000 >>> >>> Did you pass hbase-site.xml using --files to Spark job ? >>> >>> Cheers >>> >>> On Fri, Oct 28, 2016 at 9:27 AM, Pat Ferrel <[email protected]> >> wrote: >>> >>>> Using standalone Spark. I don’t recall seeing connection lost errors, >> but >>>> there are lots of logs. I’ve set the scanner and RPC timeouts to > large >>>> numbers on the servers. >>>> >>>> But I also saw in the logs: >>>> >>>> org.apache.hadoop.hbase.client.ScannerTimeoutException: 381788ms >>>> passed since the last invocation, timeout is currently set to 60000 >>>> >>>> Not sure where that is coming from. Does the driver machine making >>> queries >>>> need to have the timeout config also? >>>> >>>> And why so large, am I doing something wrong? >>>> >>>> >>>> On Oct 28, 2016, at 8:50 AM, Ted Yu <[email protected]> wrote: >>>> >>>> Mich: >>>> The OutOfOrderScannerNextException indicated problem with read from >>> hbase. >>>> >>>> How did you know connection to Spark cluster was lost ? >>>> >>>> Cheers >>>> >>>> On Fri, Oct 28, 2016 at 8:47 AM, Mich Talebzadeh < >>>> [email protected]> >>>> wrote: >>>> >>>>> Looks like it lost the connection to Spark cluster. >>>>> >>>>> What mode you are using with Spark, Standalone, Yarn or others. The >>> issue >>>>> looks like a resource manager issue. >>>>> >>>>> I have seen this when running Zeppelin with Spark on Hbase. >>>>> >>>>> HTH >>>>> >>>>> Dr Mich Talebzadeh >>>>> >>>>> >>>>> >>>>> LinkedIn * https://www.linkedin.com/profile/view?id= >>>>> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>> <https://www.linkedin.com/profile/view?id= >>> AAEAAAAWh2gBxianrbJd6zP6AcPCCd >>>>> OABUrV8Pw>* >>>>> >>>>> >>>>> >>>>> http://talebzadehmich.wordpress.com >>>>> >>>>> >>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility > for >>> any >>>>> loss, damage or destruction of data or any other property which may >>> arise >>>>> from relying on this email's technical content is explicitly >> disclaimed. >>>>> The author will in no case be liable for any monetary damages > arising >>>> from >>>>> such loss, damage or destruction. >>>>> >>>>> >>>>> >>>>> On 28 October 2016 at 16:38, Pat Ferrel <[email protected]> >> wrote: >>>>> >>>>>> I’m getting data from HBase using a large Spark cluster with >>> parallelism >>>>>> of near 400. The query fails quire often with the message below. >>>>> Sometimes >>>>>> a retry will work and sometimes the ultimate failure results > (below). >>>>>> >>>>>> If I reduce parallelism in Spark it slows other parts of the >> algorithm >>>>>> unacceptably. I have also experimented with very large RPC/Scanner >>>>> timeouts >>>>>> of many minutes—to no avail. >>>>>> >>>>>> Any clues about what to look for or what may be setup wrong in my >>>> tables? >>>>>> >>>>>> Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 >>> times, >>>>>> most recent failure: Lost task 44.3 in stage 147.0 (TID 24833, >>>>>> ip-172-16-3-9.eu-central-1.compute.internal): >> org.apache.hadoop.hbase. >>>>> DoNotRetryIOException: >>>>>> Failed after retry of OutOfOrderScannerNextException: was there a >> rpc >>>>>> timeout?+details >>>>>> Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 >>> times, >>>>>> most recent failure: Lost task 44.3 in stage 147.0 (TID 24833, >>>>>> ip-172-16-3-9.eu-central-1.compute.internal): >> org.apache.hadoop.hbase. >>>>> DoNotRetryIOException: >>>>>> Failed after retry of OutOfOrderScannerNextException: was there a >> rpc >>>>>> timeout? at org.apache.hadoop.hbase.client.ClientScanner.next( >>>>> ClientScanner.java:403) >>>>>> at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl. >>>> nextKeyValue( >>>>>> TableRecordReaderImpl.java:232) at org.apache.hadoop.hbase. >>>>>> mapreduce.TableRecordReader.nextKeyValue( > TableRecordReader.java:138) >>> at >>>>>> >>>>> >>>> >>>> >>> >>> >> >
