Thanks Josh, Option 1 did the trick, however this is a very global approach, I'll raise a ticket to support configuration via DF option() method, which is cleaner in the long run.
Although option 2 would work, those methods require to know in advance what column you want to read, which in my case I cannot anticipate apart from parsing a SQL query. *Pierre Lacave* 171 Skellig House, Custom House, Lower Mayor street, Dublin 1, Ireland Phone : +353879128708 On Sun, Jan 24, 2016 at 6:14 PM, Josh Mahonin <[email protected]> wrote: > Hi Pierre, > > You have a few options: > > 1) If your app ships an hbase-site.xml resource, or you specify one on the > Spark classpath, this should get picked up by the Phoenix client and used > for client-side settings. > 2) Bypass the Spark SQLContext loader, and invoke the more direct > 'phoenixTableAsDataFrame' method, passing in your own Configuration object, > as described by the 'Load as a DataFrame directly using a Configuration > object' section in the docs [1] > > It may be a good idea to also pass specific tuning parameters at runtime > through the Spark DataFrame API, using the 'option()' method. Feel free to > file a JIRA ticket as an enhancement. > > Thanks! > > Josh > > [1] https://phoenix.apache.org/phoenix_spark.html > > > On Sat, Jan 23, 2016 at 3:19 PM, pierre lacave <[email protected]> wrote: > >> Is it possible to set a HBase config when using phoenix in spark as a >> datasource ? >> >> Had a peek at the source, and it looks I can only set the table and the >> zookeeper quorum. >> >> What would be my options if I want to set "hbase.rpc.timeout" in the >> client? >> >> // Override 'RelationProvider.createRelation', this enables DataFrame.load() >> override def createRelation(sqlContext: SQLContext, parameters: Map[String, >> String]): BaseRelation = { >> verifyParameters(parameters) >> >> new PhoenixRelation( >> parameters("table"), >> parameters("zkUrl") >> )(sqlContext) >> } >> >> >> A bit of background, I am getting those exception when running queries in >> a spark job, >> >> Sat Jan 23 20:13:57 GMT 2016, null, java.net.SocketTimeoutException: >> callTimeout=60000, callDuration=60303: row '�Ԑx~R�Traders' on table 'EVENT' >> at region=EVENT,,1453212975336.b28c683cea467d1a2fff4c495e39a3d1., >> hostname=dc-hadoop-slave2.local,17020,1453578819465, seqNum=16367961 >> >> at >> org.apache.phoenix.mapreduce.PhoenixRecordReader.nextKeyValue(PhoenixRecordReader.java:159) >> at >> org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:143) >> at >> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) >> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) >> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >> at >> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$ >> 1.org >> $apache$spark$sql$execution$aggregate$TungstenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:97) >> at >> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119) >> at >> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119) >> at >> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:88) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: org.apache.phoenix.exception.PhoenixIOException: Failed after >> attempts=36, exceptions: >> Sat Jan 23 20:13:57 GMT 2016, null, java.net.SocketTimeoutException: >> callTimeout=60000, callDuration=60303: row '�Ԑx~R�Traders' on table 'EVENT' >> at region=EVENT,,1453212975336.b28c683cea467d1a2fff4c495e39a3d1., >> hostname=dc-hadoop-slave2.local,17020,1453578819465, seqNum=16367961 >> >> at >> org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:111) >> at >> org.apache.phoenix.iterate.ScanningResultIterator.next(ScanningResultIterator.java:65) >> at >> org.apache.phoenix.iterate.TableResultIterator.next(TableResultIterator.java:110) >> at >> org.apache.phoenix.iterate.LookAheadResultIterator$1.advance(LookAheadResultIterator.java:47) >> at >> org.apache.phoenix.iterate.LookAheadResultIterator.init(LookAheadResultIterator.java:59) >> at >> org.apache.phoenix.iterate.LookAheadResultIterator.next(LookAheadResultIterator.java:65) >> at >> org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:778) >> at >> org.apache.phoenix.mapreduce.PhoenixRecordReader.nextKeyValue(PhoenixRecordReader.java:152) >> ... 25 more >> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: >> Failed after attempts=36, exceptions: >> Sat Jan 23 20:13:57 GMT 2016, null, java.net.SocketTimeoutException: >> callTimeout=60000, callDuration=60303: row '�Ԑx~R�Traders' on table 'EVENT' >> at region=EVENT,,1453212975336.b28c683cea467d1a2fff4c495e39a3d1., >> hostname=dc-hadoop-slave2.local,17020,1453578819465, seqNum=16367961 >> >> at >> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:271) >> at >> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:195) >> at >> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59) >> at >> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) >> at >> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320) >> at >> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:403) >> at >> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364) >> at >> org.apache.phoenix.iterate.ScanningResultIterator.next(ScanningResultIterator.java:55) >> ... 31 more >> Caused by: java.net.SocketTimeoutException: callTimeout=60000, >> callDuration=60303: row '�Ԑx~R�Traders' on table 'EVENT' at >> region=EVENT,,1453212975336.b28c683cea467d1a2fff4c495e39a3d1., >> hostname=dc-hadoop-slave2.local,17020,1453578819465, seqNum=16367961 >> at >> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159) >> at >> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:64) >> ... 3 more >> Caused by: java.io.IOException: Call to dc-hadoop-slave2.local/ >> 172.20.1.185:17020 failed on local exception: >> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=257, >> waitTime=60001, operationTimeout=60000 expired. >> at >> org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1284) >> at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1252) >> at >> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213) >> at >> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287) >> at >> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32651) >> at >> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:213) >> at >> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62) >> at >> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) >> at >> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:346) >> at >> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:320) >> at >> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126) >> ... 4 more >> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=257, >> waitTime=60001, operationTimeout=60000 expired. >> at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70) >> at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1226) >> ... 13 more >> >> >> the hbase regions server have this config >> >> <property> >> <name>hbase.client.scanner.timeout.period</name> >> <value>1200000</value> >> </property> >> <property> >> <name>hbase.rpc.timeout</name> >> <value>1200000</value> >> </property> >> <property> >> <name>phoenix.query.keepAliveMs</name> >> <value>240000</value> >> </property> >> >> >> Thanks >> >> >> >> >> *Pierre Lacave* >> >> >
