Re: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-17 Thread Mich Talebzadeh
Hi

CLOSE_WAIT!

According to this <https://access.redhat.com/solutions/437133> link

- CLOSE_WAIT - Indicates that the server has received the first FIN signal
from the client and the connection is in the process of being closed .So
this essentially means that his is a state where socket is waiting for the
application to execute close() . A socket can be in CLOSE_WAIT state
indefinitely until the application closes it. Faulty scenarios would be
like file descriptor leak, server not being execute close() on socket
leading to pile up of close_wait sockets
- The CLOSE_WAIT status means that the other side has initiated a
connection close, but the application on the local side has not yet closed
the socket

Normally it should be LISTEN or ESTABLISHED.

HTH




Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 17 September 2016 at 16:14, <anupama.gangad...@daimler.com> wrote:

> Hi,
>
>
>
> Yes. I am able to connect to Hive from simple Java program running in the
> cluster. When using spark-submit I faced the issue.
>
> The output of command is given below
>
>
>
> $> netstat -alnp |grep 10001
>
> (Not all processes could be identified, non-owned process info
>
> will not be shown, you would have to be root to see it all.)
>
> tcp1  0 53.244.194.223:2561253.244.194.221:10001
> CLOSE_WAIT  -
>
>
>
> Thanks
>
> Anupama
>
>
>
> *From:* Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
> *Sent:* Saturday, September 17, 2016 12:36 AM
> *To:* Gangadhar, Anupama (623)
> *Cc:* user @spark
> *Subject:* Re: Error trying to connect to Hive from Spark (Yarn-Cluster
> Mode)
>
>
>
> Is your Hive Thrift Server up and running on port
> jdbc:hive2://10001?
>
>
>
> Do  the following
>
>
>
>  netstat -alnp |grep 10001
>
> and see whether it is actually running
>
>
>
> HTH
>
>
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn  
> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
> On 16 September 2016 at 19:53, <anupama.gangad...@daimler.com> wrote:
>
> Hi,
>
>
>
> I am trying to connect to Hive from Spark application in Kerborized
> cluster and get the following exception.  Spark version is 1.4.1 and Hive
> is 1.2.1. Outside of spark the connection goes through fine.
>
> Am I missing any configuration parameters?
>
>
>
> ava.sql.SQLException: Could not open connection to
> jdbc:hive2://10001/default;principal=hive/ server2 host>;ssl=false;transportMode=http;httpPath=cliservice: null
>
>at org.apache.hive.jdbc.HiveConnection.openTransport(
> HiveConnection.java:206)
>
>at org.apache.hive.jdbc.HiveConnection.(
> HiveConnection.java:178)
>
>at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.
> java:105)
>
>at java.sql.DriverManager.getConnection(DriverManager.
> java:571)
>
>at java.sql.DriverManager.getConnection(DriverManager.
> java:215)
>
>at SparkHiveJDBCTest$1.call(SparkHiveJDBCTest.java:124)
>
>at SparkHiveJDBCTest$1.call(SparkHiveJDBCTest.java:1)
>
>at org.apache.spark.api.java.JavaPairRDD$$anonfun$
> toScalaFunction$1.apply(JavaPairRDD.scala:1027)
>
>at scala.collection.Iterator$$anon$11.next(Iterator.scala:
> 328)
>
>at scala.collection.Iterator$$anon$11.next(Iterator.scala:
> 328)
>
>at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.
> apply$mcV$sp(PairRDDFunctions.scala:1109)
>
>at org.apache.spark.r

RE: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-17 Thread anupama . gangadhar
Hi,

Yes. I am able to connect to Hive from simple Java program running in the 
cluster. When using spark-submit I faced the issue.
The output of command is given below

$> netstat -alnp |grep 10001
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp1  0 53.244.194.223:2561253.244.194.221:10001CLOSE_WAIT  
-

Thanks
Anupama

From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
Sent: Saturday, September 17, 2016 12:36 AM
To: Gangadhar, Anupama (623)
Cc: user @spark
Subject: Re: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

Is your Hive Thrift Server up and running on port  jdbc:hive2://10001?

Do  the following

 netstat -alnp |grep 10001

and see whether it is actually running

HTH







Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



On 16 September 2016 at 19:53, 
<anupama.gangad...@daimler.com<mailto:anupama.gangad...@daimler.com>> wrote:
Hi,

I am trying to connect to Hive from Spark application in Kerborized cluster and 
get the following exception.  Spark version is 1.4.1 and Hive is 1.2.1. Outside 
of spark the connection goes through fine.
Am I missing any configuration parameters?

ava.sql.SQLException: Could not open connection to jdbc:hive2://10001/default;principal=hive/;ssl=false;transportMode=http;httpPath=cliservice: null
   at 
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:206)
   at 
org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:178)
   at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
   at java.sql.DriverManager.getConnection(DriverManager.java:571)
   at java.sql.DriverManager.getConnection(DriverManager.java:215)
   at SparkHiveJDBCTest$1.call(SparkHiveJDBCTest.java:124)
   at SparkHiveJDBCTest$1.call(SparkHiveJDBCTest.java:1)
   at 
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1027)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1109)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1108)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1108)
   at 
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
   at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
   at org.apache.spark.scheduler.Task.run(Task.scala:70)
   at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException
   at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
   at 
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
   at 
org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
   at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:258)
   at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
   at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
   at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInform

RE: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-17 Thread anupama . gangadhar
Hi,
@Deepak
I have used a separate user keytab(not hadoop services keytab) and able to 
connect to Hive via simple java program.
I am able to connect to Hive from spark-shell as well. However when I submit a 
spark job using this same keytab, I see the issue.
Do cache have a role to play here? In the cluster, transport mode is http and 
ssl is disabled.

Thanks

Anupama



From: Deepak Sharma [mailto:deepakmc...@gmail.com]
Sent: Saturday, September 17, 2016 8:35 AM
To: Gangadhar, Anupama (623)
Cc: spark users
Subject: Re: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)


Hi Anupama

To me it looks like issue with the SPN with which you are trying to connect to 
hive2 , i.e. hive@hostname.

Are you able to connect to hive from spark-shell?

Try getting the tkt using any other user keytab but not hadoop services keytab 
and then try running the spark submit.



Thanks

Deepak

On 17 Sep 2016 12:23 am, 
<anupama.gangad...@daimler.com<mailto:anupama.gangad...@daimler.com>> wrote:
Hi,

I am trying to connect to Hive from Spark application in Kerborized cluster and 
get the following exception.  Spark version is 1.4.1 and Hive is 1.2.1. Outside 
of spark the connection goes through fine.
Am I missing any configuration parameters?

ava.sql.SQLException: Could not open connection to jdbc:hive2://10001/default;principal=hive/;ssl=false;transportMode=http;httpPath=cliservice: null
   at 
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:206)
   at 
org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:178)
   at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
   at java.sql.DriverManager.getConnection(DriverManager.java:571)
   at java.sql.DriverManager.getConnection(DriverManager.java:215)
   at SparkHiveJDBCTest$1.call(SparkHiveJDBCTest.java:124)
   at SparkHiveJDBCTest$1.call(SparkHiveJDBCTest.java:1)
   at 
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1027)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1109)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1108)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1108)
   at 
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
   at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
   at org.apache.spark.scheduler.Task.run(Task.scala:70)
   at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
Caused by: 
org.apache.thrift.transport.TT<http://org.apache.thrift.transport.TT>ransportException
   at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
   at 
org.apache.thrift.transport.TT<http://org.apache.thrift.transport.TT>ransport.readAll(TTransport.java:84)
   at 
org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
   at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:258)
   at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
   at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
   at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
   at java.security.AccessController.doPrivileged(Native Method)
   at 
javax.security.auth.Subject.do<http://javax.security.auth.Subject.do>As(Subject.java:415)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
   at 
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:203)
   ... 21 more

In spark conf directory hive-site.xm

Re: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-16 Thread Deepak Sharma
Hi Anupama

To me it looks like issue with the SPN with which you are trying to connect
to hive2 , i.e. hive@hostname.

Are you able to connect to hive from spark-shell?

Try getting the tkt using any other user keytab but not hadoop services
keytab and then try running the spark submit.


Thanks

Deepak

On 17 Sep 2016 12:23 am,  wrote:

> Hi,
>
>
>
> I am trying to connect to Hive from Spark application in Kerborized
> cluster and get the following exception.  Spark version is 1.4.1 and Hive
> is 1.2.1. Outside of spark the connection goes through fine.
>
> Am I missing any configuration parameters?
>
>
>
> ava.sql.SQLException: Could not open connection to
> jdbc:hive2://10001/default;principal=hive/ server2 host>;ssl=false;transportMode=http;httpPath=cliservice: null
>
>at org.apache.hive.jdbc.HiveConne
> ction.openTransport(HiveConnection.java:206)
>
>at org.apache.hive.jdbc.HiveConne
> ction.(HiveConnection.java:178)
>
>at org.apache.hive.jdbc.HiveDrive
> r.connect(HiveDriver.java:105)
>
>at java.sql.DriverManager.getConn
> ection(DriverManager.java:571)
>
>at java.sql.DriverManager.getConn
> ection(DriverManager.java:215)
>
>at SparkHiveJDBCTest$1.call(SparkHiveJDBCTest.java:124)
>
>at SparkHiveJDBCTest$1.call(SparkHiveJDBCTest.java:1)
>
>at org.apache.spark.api.java.Java
> PairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1027)
>
>at scala.collection.Iterator$$ano
> n$11.next(Iterator.scala:328)
>
>at scala.collection.Iterator$$ano
> n$11.next(Iterator.scala:328)
>
>at org.apache.spark.rdd.PairRDDFu
> nctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$
> apply$6.apply$mcV$sp(PairRDDFunctions.scala:1109)
>
>at org.apache.spark.rdd.PairRDDFu
> nctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(
> PairRDDFunctions.scala:1108)
>
>at org.apache.spark.rdd.PairRDDFu
> nctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(
> PairRDDFunctions.scala:1108)
>
>at org.apache.spark.util.Utils$.t
> ryWithSafeFinally(Utils.scala:1285)
>
>at org.apache.spark.rdd.PairRDDFu
> nctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(Pai
> rRDDFunctions.scala:1116)
>
>at org.apache.spark.rdd.PairRDDFu
> nctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(Pai
> rRDDFunctions.scala:1095)
>
>at org.apache.spark.scheduler.Res
> ultTask.runTask(ResultTask.scala:63)
>
>at org.apache.spark.scheduler.Task.run(Task.scala:70)
>
>at org.apache.spark.executor.Exec
> utor$TaskRunner.run(Executor.scala:213)
>
>at java.util.concurrent.ThreadPoo
> lExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>at java.util.concurrent.ThreadPoo
> lExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>at java.lang.Thread.run(Thread.java:745)
>
> Caused by: org.apache.thrift.transport.TTransportException
>
>at org.apache.thrift.transport.TI
> OStreamTransport.read(TIOStreamTransport.java:132)
>
>at org.apache.thrift.transport.TT
> ransport.readAll(TTransport.java:84)
>
>at org.apache.thrift.transport.TS
> aslTransport.receiveSaslMessage(TSaslTransport.java:182)
>
>at org.apache.thrift.transport.TS
> aslTransport.open(TSaslTransport.java:258)
>
>at org.apache.thrift.transport.TS
> aslClientTransport.open(TSaslClientTransport.java:37)
>
>at org.apache.hadoop.hive.thrift.
> client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>
>at org.apache.hadoop.hive.thrift.
> client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>
>at java.security.AccessController.doPrivileged(Native
> Method)
>
>at javax.security.auth.Subject.doAs(Subject.java:415)
>
>at org.apache.hadoop.security.Use
> rGroupInformation.doAs(UserGroupInformation.java:1657)
>
>at org.apache.hadoop.hive.thrift.
> client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>
>at org.apache.hive.jdbc.HiveConne
> ction.openTransport(HiveConnection.java:203)
>
>... 21 more
>
>
>
> In spark conf directory hive-site.xml has the following properties
>
>
>
> 
>
>
>
> 
>
>   hive.metastore.kerberos.keytab.file
>
>   /etc/security/keytabs/hive.service.keytab
>
> 
>
>
>
> 
>
>   hive.metastore.kerberos.principal
>
>   hive/_HOST@
>
> 
>
>
>
> 
>
>   hive.metastore.sasl.enabled
>
>   true
>
> 
>
>
>
> 
>
>   hive.metastore.uris
>
>   thrift://:9083
>
> 
>
>
>
> 
>
>   hive.server2.authentication
>
>   KERBEROS
>
> 
>
>
>
> 
>
>   

Re: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-16 Thread Mich Talebzadeh
Is your Hive Thrift Server up and running on port
jdbc:hive2://10001?

Do  the following

 netstat -alnp |grep 10001

and see whether it is actually running

HTH





Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 16 September 2016 at 19:53,  wrote:

> Hi,
>
>
>
> I am trying to connect to Hive from Spark application in Kerborized
> cluster and get the following exception.  Spark version is 1.4.1 and Hive
> is 1.2.1. Outside of spark the connection goes through fine.
>
> Am I missing any configuration parameters?
>
>
>
> ava.sql.SQLException: Could not open connection to
> jdbc:hive2://10001/default;principal=hive/ server2 host>;ssl=false;transportMode=http;httpPath=cliservice: null
>
>at org.apache.hive.jdbc.HiveConnection.openTransport(
> HiveConnection.java:206)
>
>at org.apache.hive.jdbc.HiveConnection.(
> HiveConnection.java:178)
>
>at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.
> java:105)
>
>at java.sql.DriverManager.getConnection(DriverManager.
> java:571)
>
>at java.sql.DriverManager.getConnection(DriverManager.
> java:215)
>
>at SparkHiveJDBCTest$1.call(SparkHiveJDBCTest.java:124)
>
>at SparkHiveJDBCTest$1.call(SparkHiveJDBCTest.java:1)
>
>at org.apache.spark.api.java.JavaPairRDD$$anonfun$
> toScalaFunction$1.apply(JavaPairRDD.scala:1027)
>
>at scala.collection.Iterator$$anon$11.next(Iterator.scala:
> 328)
>
>at scala.collection.Iterator$$anon$11.next(Iterator.scala:
> 328)
>
>at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.
> apply$mcV$sp(PairRDDFunctions.scala:1109)
>
>at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.
> apply(PairRDDFunctions.scala:1108)
>
>at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.
> apply(PairRDDFunctions.scala:1108)
>
>at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.
> scala:1285)
>
>at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116)
>
>at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
>
>at org.apache.spark.scheduler.
> ResultTask.runTask(ResultTask.scala:63)
>
>at org.apache.spark.scheduler.Task.run(Task.scala:70)
>
>at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:213)
>
>at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>
>at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>
>at java.lang.Thread.run(Thread.java:745)
>
> Caused by: org.apache.thrift.transport.TTransportException
>
>at org.apache.thrift.transport.TIOStreamTransport.read(
> TIOStreamTransport.java:132)
>
>at org.apache.thrift.transport.
> TTransport.readAll(TTransport.java:84)
>
>at org.apache.thrift.transport.TSaslTransport.
> receiveSaslMessage(TSaslTransport.java:182)
>
>at org.apache.thrift.transport.TSaslTransport.open(
> TSaslTransport.java:258)
>
>at org.apache.thrift.transport.TSaslClientTransport.open(
> TSaslClientTransport.java:37)
>
>at org.apache.hadoop.hive.thrift.
> client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>
>at org.apache.hadoop.hive.thrift.
> client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>
>at java.security.AccessController.doPrivileged(Native
> Method)
>
>at javax.security.auth.Subject.doAs(Subject.java:415)
>
>at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1657)
>
>at org.apache.hadoop.hive.thrift.
> client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>
>at org.apache.hive.jdbc.HiveConnection.openTransport(
> HiveConnection.java:203)
>
>... 21 more
>
>
>
> In spark conf directory hive-site.xml has the following properties
>
>
>
> 
>
>
>
> 
>
>   

Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-16 Thread anupama . gangadhar
Hi,

I am trying to connect to Hive from Spark application in Kerborized cluster and 
get the following exception.  Spark version is 1.4.1 and Hive is 1.2.1. Outside 
of spark the connection goes through fine.
Am I missing any configuration parameters?

ava.sql.SQLException: Could not open connection to jdbc:hive2://10001/default;principal=hive/;ssl=false;transportMode=http;httpPath=cliservice: null
   at 
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:206)
   at 
org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:178)
   at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
   at java.sql.DriverManager.getConnection(DriverManager.java:571)
   at java.sql.DriverManager.getConnection(DriverManager.java:215)
   at SparkHiveJDBCTest$1.call(SparkHiveJDBCTest.java:124)
   at SparkHiveJDBCTest$1.call(SparkHiveJDBCTest.java:1)
   at 
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1027)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1109)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1108)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1108)
   at 
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
   at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
   at org.apache.spark.scheduler.Task.run(Task.scala:70)
   at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException
   at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
   at 
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
   at 
org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
   at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:258)
   at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
   at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
   at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
   at 
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:203)
   ... 21 more

In spark conf directory hive-site.xml has the following properties




  hive.metastore.kerberos.keytab.file
  /etc/security/keytabs/hive.service.keytab



  hive.metastore.kerberos.principal
  hive/_HOST@



  hive.metastore.sasl.enabled
  true



  hive.metastore.uris
  thrift://:9083



  hive.server2.authentication
  KERBEROS



  hive.server2.authentication.kerberos.keytab
  /etc/security/keytabs/hive.service.keytab



  hive.server2.authentication.kerberos.principal
  hive/_HOST@



  hive.server2.authentication.spnego.keytab
  /etc/security/keytabs/spnego.service.keytab



  hive.server2.authentication.spnego.principal
  HTTP/_HOST@


  

--Thank you

If you are not the addressee, please inform us immediately that you have 
received this e-mail by mistake, and delete it. We thank you for your support.