[jira] [Commented] (SPARK-14162) java.lang.IllegalStateException: Did not find registered driver with class oracle.jdbc.OracleDriver

2016-05-10 Thread Kevin McHale (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277955#comment-15277955
 ] 

Kevin McHale commented on SPARK-14162:
--

[~sunrui] you are incorrect.

You should take a look at https://issues.apache.org/jira/browse/SPARK-14204 and 
the github issue, because:

1. The temporary workaround that I list there could not solve the problem as 
you describe it.

2. There is a blatant error in the code.

> java.lang.IllegalStateException: Did not find registered driver with class 
> oracle.jdbc.OracleDriver
> ---
>
> Key: SPARK-14162
> URL: https://issues.apache.org/jira/browse/SPARK-14162
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Zoltan Fedor
>
> This is an interesting one.
> We are using JupyterHub with Python to connect to a Hadoop cluster to run 
> Spark jobs and as the new Spark versions come out I compile them and add as 
> new kernels to JupyterHub to be used.
> There are also some libraries we are using, like ojdbc to connect to an 
> Oracle database.
> Now the interesting thing, that ojdbc worked fine in Spark 1.6.0 but suddenly 
> "it cannot be found" in 1.6.1.
> Everything, all settings are the same when starting pyspark 1.6.1 and 1.6.0, 
> so there is no reason for it not to work in 1.6.1 if it works in 1.6.0.
> This is the pysparjk code I am running in both 1.6.1 and 1.6.0:
> {quote}
> df = 
> sqlContext.read.format('jdbc').options(url='jdbc:oracle:thin:'+connection_script+'',
>  dbtable='bi.contact').load()
> print(df.count()){quote}
> And it throws this error in 1.6.1 only:
> {quote}
> java.lang.IllegalStateException: Did not find registered driver with class 
> oracle.jdbc.OracleDriver
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58)
>   at scala.Option.getOrElse(Option.scala:120)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:57)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.(JDBCRDD.scala:347)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:339)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745){quote}
> I know that this usually means that the ojdbc driver is not available on the 
> executor, but it is. Spark is being started the exact same way in 1.6.1 as in 
> 1.6.0 and it does find it on 1.6.0.
> I can steadily reproduce this, so the only conclusion that something must 
> have changed between 1.6.0 and 1.6.1 causing this, but I have see no 
> "depreciation" notice of anything what could cause this.
> Environment variables set when starting pyspark 1.6.1:
> {quote}
>   "SPARK_HOME": "/usr/lib/spark-1.6.1-hive",
>   "SCALA_HOME": "/usr/lib/scala",
>   "HADOOP_CONF_DIR": "/etc/hadoop/venus-hadoop-conf",
>   "HADOOP_HOME": "/usr/bin/hadoop",
>   "HIVE_HOME": "/usr/bin/hive",
>   "LD_LIBRARY_PATH": "/usr/local/hadoop/lib/native/:$LD_LIBRARY_PATH",
>   "YARN_HOME": "",
>   "SPARK_DIST_CLASSPATH": 
> "/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*",
>   "SPARK_LIBRARY_PATH": "/usr/lib/hadoop/lib",
>   

[jira] [Commented] (SPARK-14162) java.lang.IllegalStateException: Did not find registered driver with class oracle.jdbc.OracleDriver

2016-05-10 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277682#comment-15277682
 ] 

Sun Rui commented on SPARK-14162:
-

We met the same error. The cause is that in one worker node, mysql JDBC driver 
is not in the CLASSPATH.

[~mchalek] It seems this is not a bug. It seems that in your case, for some 
reason, in one worker node, ojdbc6 driver is not automatically loaded and 
registered.

> java.lang.IllegalStateException: Did not find registered driver with class 
> oracle.jdbc.OracleDriver
> ---
>
> Key: SPARK-14162
> URL: https://issues.apache.org/jira/browse/SPARK-14162
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Zoltan Fedor
>
> This is an interesting one.
> We are using JupyterHub with Python to connect to a Hadoop cluster to run 
> Spark jobs and as the new Spark versions come out I compile them and add as 
> new kernels to JupyterHub to be used.
> There are also some libraries we are using, like ojdbc to connect to an 
> Oracle database.
> Now the interesting thing, that ojdbc worked fine in Spark 1.6.0 but suddenly 
> "it cannot be found" in 1.6.1.
> Everything, all settings are the same when starting pyspark 1.6.1 and 1.6.0, 
> so there is no reason for it not to work in 1.6.1 if it works in 1.6.0.
> This is the pysparjk code I am running in both 1.6.1 and 1.6.0:
> {quote}
> df = 
> sqlContext.read.format('jdbc').options(url='jdbc:oracle:thin:'+connection_script+'',
>  dbtable='bi.contact').load()
> print(df.count()){quote}
> And it throws this error in 1.6.1 only:
> {quote}
> java.lang.IllegalStateException: Did not find registered driver with class 
> oracle.jdbc.OracleDriver
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58)
>   at scala.Option.getOrElse(Option.scala:120)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:57)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.(JDBCRDD.scala:347)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:339)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745){quote}
> I know that this usually means that the ojdbc driver is not available on the 
> executor, but it is. Spark is being started the exact same way in 1.6.1 as in 
> 1.6.0 and it does find it on 1.6.0.
> I can steadily reproduce this, so the only conclusion that something must 
> have changed between 1.6.0 and 1.6.1 causing this, but I have see no 
> "depreciation" notice of anything what could cause this.
> Environment variables set when starting pyspark 1.6.1:
> {quote}
>   "SPARK_HOME": "/usr/lib/spark-1.6.1-hive",
>   "SCALA_HOME": "/usr/lib/scala",
>   "HADOOP_CONF_DIR": "/etc/hadoop/venus-hadoop-conf",
>   "HADOOP_HOME": "/usr/bin/hadoop",
>   "HIVE_HOME": "/usr/bin/hive",
>   "LD_LIBRARY_PATH": "/usr/local/hadoop/lib/native/:$LD_LIBRARY_PATH",
>   "YARN_HOME": "",
>   "SPARK_DIST_CLASSPATH": 
> "/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*",
>   "SPARK_LIBRARY_PATH": "/usr/lib/hadoop/lib",
>   "PATH": 
> 

[jira] [Commented] (SPARK-14162) java.lang.IllegalStateException: Did not find registered driver with class oracle.jdbc.OracleDriver

2016-05-04 Thread Kevin McHale (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270800#comment-15270800
 ] 

Kevin McHale commented on SPARK-14162:
--

I documented this issue here: https://issues.apache.org/jira/browse/SPARK-14204

And fixed it here: https://github.com/apache/spark/pull/12000

Nobody is responding to my multiple requests to merge the PR --- maybe others 
will have some luck convincing them?

> java.lang.IllegalStateException: Did not find registered driver with class 
> oracle.jdbc.OracleDriver
> ---
>
> Key: SPARK-14162
> URL: https://issues.apache.org/jira/browse/SPARK-14162
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Zoltan Fedor
>
> This is an interesting one.
> We are using JupyterHub with Python to connect to a Hadoop cluster to run 
> Spark jobs and as the new Spark versions come out I compile them and add as 
> new kernels to JupyterHub to be used.
> There are also some libraries we are using, like ojdbc to connect to an 
> Oracle database.
> Now the interesting thing, that ojdbc worked fine in Spark 1.6.0 but suddenly 
> "it cannot be found" in 1.6.1.
> Everything, all settings are the same when starting pyspark 1.6.1 and 1.6.0, 
> so there is no reason for it not to work in 1.6.1 if it works in 1.6.0.
> This is the pysparjk code I am running in both 1.6.1 and 1.6.0:
> {quote}
> df = 
> sqlContext.read.format('jdbc').options(url='jdbc:oracle:thin:'+connection_script+'',
>  dbtable='bi.contact').load()
> print(df.count()){quote}
> And it throws this error in 1.6.1 only:
> {quote}
> java.lang.IllegalStateException: Did not find registered driver with class 
> oracle.jdbc.OracleDriver
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58)
>   at scala.Option.getOrElse(Option.scala:120)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:57)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.(JDBCRDD.scala:347)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:339)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745){quote}
> I know that this usually means that the ojdbc driver is not available on the 
> executor, but it is. Spark is being started the exact same way in 1.6.1 as in 
> 1.6.0 and it does find it on 1.6.0.
> I can steadily reproduce this, so the only conclusion that something must 
> have changed between 1.6.0 and 1.6.1 causing this, but I have see no 
> "depreciation" notice of anything what could cause this.
> Environment variables set when starting pyspark 1.6.1:
> {quote}
>   "SPARK_HOME": "/usr/lib/spark-1.6.1-hive",
>   "SCALA_HOME": "/usr/lib/scala",
>   "HADOOP_CONF_DIR": "/etc/hadoop/venus-hadoop-conf",
>   "HADOOP_HOME": "/usr/bin/hadoop",
>   "HIVE_HOME": "/usr/bin/hive",
>   "LD_LIBRARY_PATH": "/usr/local/hadoop/lib/native/:$LD_LIBRARY_PATH",
>   "YARN_HOME": "",
>   "SPARK_DIST_CLASSPATH": 
> "/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*",
>   "SPARK_LIBRARY_PATH": "/usr/lib/hadoop/lib",
>   "PATH": 
> 

[jira] [Commented] (SPARK-14162) java.lang.IllegalStateException: Did not find registered driver with class oracle.jdbc.OracleDriver

2016-04-30 Thread Martin Hall (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265310#comment-15265310
 ] 

Martin Hall commented on SPARK-14162:
-

I got the same error when I had forgotten to copy the oracle jdbc jar file 
(ojdbc6.jar) to one of the spark worker nodes

> java.lang.IllegalStateException: Did not find registered driver with class 
> oracle.jdbc.OracleDriver
> ---
>
> Key: SPARK-14162
> URL: https://issues.apache.org/jira/browse/SPARK-14162
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Zoltan Fedor
>
> This is an interesting one.
> We are using JupyterHub with Python to connect to a Hadoop cluster to run 
> Spark jobs and as the new Spark versions come out I compile them and add as 
> new kernels to JupyterHub to be used.
> There are also some libraries we are using, like ojdbc to connect to an 
> Oracle database.
> Now the interesting thing, that ojdbc worked fine in Spark 1.6.0 but suddenly 
> "it cannot be found" in 1.6.1.
> Everything, all settings are the same when starting pyspark 1.6.1 and 1.6.0, 
> so there is no reason for it not to work in 1.6.1 if it works in 1.6.0.
> This is the pysparjk code I am running in both 1.6.1 and 1.6.0:
> {quote}
> df = 
> sqlContext.read.format('jdbc').options(url='jdbc:oracle:thin:'+connection_script+'',
>  dbtable='bi.contact').load()
> print(df.count()){quote}
> And it throws this error in 1.6.1 only:
> {quote}
> java.lang.IllegalStateException: Did not find registered driver with class 
> oracle.jdbc.OracleDriver
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58)
>   at scala.Option.getOrElse(Option.scala:120)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:57)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.(JDBCRDD.scala:347)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:339)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745){quote}
> I know that this usually means that the ojdbc driver is not available on the 
> executor, but it is. Spark is being started the exact same way in 1.6.1 as in 
> 1.6.0 and it does find it on 1.6.0.
> I can steadily reproduce this, so the only conclusion that something must 
> have changed between 1.6.0 and 1.6.1 causing this, but I have see no 
> "depreciation" notice of anything what could cause this.
> Environment variables set when starting pyspark 1.6.1:
> {quote}
>   "SPARK_HOME": "/usr/lib/spark-1.6.1-hive",
>   "SCALA_HOME": "/usr/lib/scala",
>   "HADOOP_CONF_DIR": "/etc/hadoop/venus-hadoop-conf",
>   "HADOOP_HOME": "/usr/bin/hadoop",
>   "HIVE_HOME": "/usr/bin/hive",
>   "LD_LIBRARY_PATH": "/usr/local/hadoop/lib/native/:$LD_LIBRARY_PATH",
>   "YARN_HOME": "",
>   "SPARK_DIST_CLASSPATH": 
> "/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*",
>   "SPARK_LIBRARY_PATH": "/usr/lib/hadoop/lib",
>   "PATH": 
>