RE: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-17 Thread anupama . gangadhar
Hi,

Yes. I am able to connect to Hive from simple Java program running in the 
cluster. When using spark-submit I faced the issue.
The output of command is given below

$> netstat -alnp |grep 10001
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp1  0 53.244.194.223:2561253.244.194.221:10001CLOSE_WAIT  
-

Thanks
Anupama

From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
Sent: Saturday, September 17, 2016 12:36 AM
To: Gangadhar, Anupama (623)
Cc: user @spark
Subject: Re: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

Is your Hive Thrift Server up and running on port  jdbc:hive2://10001?

Do  the following

 netstat -alnp |grep 10001

and see whether it is actually running

HTH







Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



On 16 September 2016 at 19:53, 
> wrote:
Hi,

I am trying to connect to Hive from Spark application in Kerborized cluster and 
get the following exception.  Spark version is 1.4.1 and Hive is 1.2.1. Outside 
of spark the connection goes through fine.
Am I missing any configuration parameters?

ava.sql.SQLException: Could not open connection to jdbc:hive2://10001/default;principal=hive/;ssl=false;transportMode=http;httpPath=cliservice: null
   at 
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:206)
   at 
org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:178)
   at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
   at java.sql.DriverManager.getConnection(DriverManager.java:571)
   at java.sql.DriverManager.getConnection(DriverManager.java:215)
   at SparkHiveJDBCTest$1.call(SparkHiveJDBCTest.java:124)
   at SparkHiveJDBCTest$1.call(SparkHiveJDBCTest.java:1)
   at 
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1027)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1109)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1108)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1108)
   at 
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
   at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
   at org.apache.spark.scheduler.Task.run(Task.scala:70)
   at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException
   at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
   at 
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
   at 
org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
   at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:258)
   at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
   at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
   at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   

RE: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-17 Thread anupama . gangadhar
Hi,
@Deepak
I have used a separate user keytab(not hadoop services keytab) and able to 
connect to Hive via simple java program.
I am able to connect to Hive from spark-shell as well. However when I submit a 
spark job using this same keytab, I see the issue.
Do cache have a role to play here? In the cluster, transport mode is http and 
ssl is disabled.

Thanks

Anupama



From: Deepak Sharma [mailto:deepakmc...@gmail.com]
Sent: Saturday, September 17, 2016 8:35 AM
To: Gangadhar, Anupama (623)
Cc: spark users
Subject: Re: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)


Hi Anupama

To me it looks like issue with the SPN with which you are trying to connect to 
hive2 , i.e. hive@hostname.

Are you able to connect to hive from spark-shell?

Try getting the tkt using any other user keytab but not hadoop services keytab 
and then try running the spark submit.



Thanks

Deepak

On 17 Sep 2016 12:23 am, 
> wrote:
Hi,

I am trying to connect to Hive from Spark application in Kerborized cluster and 
get the following exception.  Spark version is 1.4.1 and Hive is 1.2.1. Outside 
of spark the connection goes through fine.
Am I missing any configuration parameters?

ava.sql.SQLException: Could not open connection to jdbc:hive2://10001/default;principal=hive/;ssl=false;transportMode=http;httpPath=cliservice: null
   at 
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:206)
   at 
org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:178)
   at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
   at java.sql.DriverManager.getConnection(DriverManager.java:571)
   at java.sql.DriverManager.getConnection(DriverManager.java:215)
   at SparkHiveJDBCTest$1.call(SparkHiveJDBCTest.java:124)
   at SparkHiveJDBCTest$1.call(SparkHiveJDBCTest.java:1)
   at 
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1027)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1109)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1108)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1108)
   at 
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
   at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
   at org.apache.spark.scheduler.Task.run(Task.scala:70)
   at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
Caused by: 
org.apache.thrift.transport.TTransportException
   at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
   at 
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
   at 
org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
   at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:258)
   at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
   at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
   at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
   at java.security.AccessController.doPrivileged(Native Method)
   at 
javax.security.auth.Subject.doAs(Subject.java:415)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
   at 
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:203)
   ... 21 more

In spark conf directory hive-site.xml has the following properties



 

Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-16 Thread anupama . gangadhar
Hi,

I am trying to connect to Hive from Spark application in Kerborized cluster and 
get the following exception.  Spark version is 1.4.1 and Hive is 1.2.1. Outside 
of spark the connection goes through fine.
Am I missing any configuration parameters?

ava.sql.SQLException: Could not open connection to jdbc:hive2://10001/default;principal=hive/;ssl=false;transportMode=http;httpPath=cliservice: null
   at 
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:206)
   at 
org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:178)
   at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
   at java.sql.DriverManager.getConnection(DriverManager.java:571)
   at java.sql.DriverManager.getConnection(DriverManager.java:215)
   at SparkHiveJDBCTest$1.call(SparkHiveJDBCTest.java:124)
   at SparkHiveJDBCTest$1.call(SparkHiveJDBCTest.java:1)
   at 
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1027)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1109)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1108)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1108)
   at 
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116)
   at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
   at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
   at org.apache.spark.scheduler.Task.run(Task.scala:70)
   at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException
   at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
   at 
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
   at 
org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
   at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:258)
   at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
   at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
   at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
   at 
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:203)
   ... 21 more

In spark conf directory hive-site.xml has the following properties




  hive.metastore.kerberos.keytab.file
  /etc/security/keytabs/hive.service.keytab



  hive.metastore.kerberos.principal
  hive/_HOST@



  hive.metastore.sasl.enabled
  true



  hive.metastore.uris
  thrift://:9083



  hive.server2.authentication
  KERBEROS



  hive.server2.authentication.kerberos.keytab
  /etc/security/keytabs/hive.service.keytab



  hive.server2.authentication.kerberos.principal
  hive/_HOST@



  hive.server2.authentication.spnego.keytab
  /etc/security/keytabs/spnego.service.keytab



  hive.server2.authentication.spnego.principal
  HTTP/_HOST@


  

--Thank you

If you are not the addressee, please inform us immediately that you have 
received this e-mail by mistake, and delete it. We thank you for your support.