So to clarify there are some values in hbase/conf/hbase-site.xml that are needed by the calling code in the Spark driver and executors and so must be passed using --files to spark-submit? If so I can do this.
But do I have a deeper issue? Is it typical to need a scan like this? Have I missed indexing some column maybe? On Oct 28, 2016, at 9:59 AM, Ted Yu <[email protected]> wrote: Mich: bq. on table 'hbase:meta' *at region=hbase:meta,,1.1588230740 What you observed was different issue. The above looks like trouble with locating region(s) during scan. On Fri, Oct 28, 2016 at 9:54 AM, Mich Talebzadeh <[email protected]> wrote: > This is an example I got > > warning: there were two deprecation warnings; re-run with -deprecation for > details > rdd1: org.apache.spark.rdd.RDD[(String, String)] = MapPartitionsRDD[77] at > map at <console>:151 > defined class columns > dfTICKER: org.apache.spark.sql.Dataset[columns] = [KEY: string, TICKER: > string] > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=36, exceptions: > *Fri Oct 28 13:13:46 BST 2016, null, java.net.SocketTimeoutException: > callTimeout=60000, callDuration=68411: row > 'MARKETDATAHBASE,,00000000000000' on table 'hbase:meta' *at > region=hbase:meta,,1.1588230740, hostname=rhes564,16201,1477246132044, > seqNum=0 > at > org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepli > cas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call( > ScannerCallableWithReplicas.java:210) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call( > ScannerCallableWithReplicas.java:60) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries( > RpcRetryingCaller.java:210) > > > > Dr Mich Talebzadeh > > > > LinkedIn * https://www.linkedin.com/profile/view?id= > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd > OABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 28 October 2016 at 17:52, Pat Ferrel <[email protected]> wrote: > >> I will check that, but if that is a server startup thing I was not aware > I >> had to send it to the executors. So it’s like a connection timeout from >> executor code? >> >> >> On Oct 28, 2016, at 9:48 AM, Ted Yu <[email protected]> wrote: >> >> How did you change the timeout(s) ? >> >> bq. timeout is currently set to 60000 >> >> Did you pass hbase-site.xml using --files to Spark job ? >> >> Cheers >> >> On Fri, Oct 28, 2016 at 9:27 AM, Pat Ferrel <[email protected]> > wrote: >> >>> Using standalone Spark. I don’t recall seeing connection lost errors, > but >>> there are lots of logs. I’ve set the scanner and RPC timeouts to large >>> numbers on the servers. >>> >>> But I also saw in the logs: >>> >>> org.apache.hadoop.hbase.client.ScannerTimeoutException: 381788ms >>> passed since the last invocation, timeout is currently set to 60000 >>> >>> Not sure where that is coming from. Does the driver machine making >> queries >>> need to have the timeout config also? >>> >>> And why so large, am I doing something wrong? >>> >>> >>> On Oct 28, 2016, at 8:50 AM, Ted Yu <[email protected]> wrote: >>> >>> Mich: >>> The OutOfOrderScannerNextException indicated problem with read from >> hbase. >>> >>> How did you know connection to Spark cluster was lost ? >>> >>> Cheers >>> >>> On Fri, Oct 28, 2016 at 8:47 AM, Mich Talebzadeh < >>> [email protected]> >>> wrote: >>> >>>> Looks like it lost the connection to Spark cluster. >>>> >>>> What mode you are using with Spark, Standalone, Yarn or others. The >> issue >>>> looks like a resource manager issue. >>>> >>>> I have seen this when running Zeppelin with Spark on Hbase. >>>> >>>> HTH >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * https://www.linkedin.com/profile/view?id= >>>> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id= >> AAEAAAAWh2gBxianrbJd6zP6AcPCCd >>>> OABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any >>>> loss, damage or destruction of data or any other property which may >> arise >>>> from relying on this email's technical content is explicitly > disclaimed. >>>> The author will in no case be liable for any monetary damages arising >>> from >>>> such loss, damage or destruction. >>>> >>>> >>>> >>>> On 28 October 2016 at 16:38, Pat Ferrel <[email protected]> > wrote: >>>> >>>>> I’m getting data from HBase using a large Spark cluster with >> parallelism >>>>> of near 400. The query fails quire often with the message below. >>>> Sometimes >>>>> a retry will work and sometimes the ultimate failure results (below). >>>>> >>>>> If I reduce parallelism in Spark it slows other parts of the > algorithm >>>>> unacceptably. I have also experimented with very large RPC/Scanner >>>> timeouts >>>>> of many minutes—to no avail. >>>>> >>>>> Any clues about what to look for or what may be setup wrong in my >>> tables? >>>>> >>>>> Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 >> times, >>>>> most recent failure: Lost task 44.3 in stage 147.0 (TID 24833, >>>>> ip-172-16-3-9.eu-central-1.compute.internal): > org.apache.hadoop.hbase. >>>> DoNotRetryIOException: >>>>> Failed after retry of OutOfOrderScannerNextException: was there a > rpc >>>>> timeout?+details >>>>> Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 >> times, >>>>> most recent failure: Lost task 44.3 in stage 147.0 (TID 24833, >>>>> ip-172-16-3-9.eu-central-1.compute.internal): > org.apache.hadoop.hbase. >>>> DoNotRetryIOException: >>>>> Failed after retry of OutOfOrderScannerNextException: was there a > rpc >>>>> timeout? at org.apache.hadoop.hbase.client.ClientScanner.next( >>>> ClientScanner.java:403) >>>>> at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl. >>> nextKeyValue( >>>>> TableRecordReaderImpl.java:232) at org.apache.hadoop.hbase. >>>>> mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138) >> at >>>>> >>>> >>> >>> >> >> >
