[jira] [Commented] (SPARK-14162) java.lang.IllegalStateException: Did not find registered driver with class oracle.jdbc.OracleDriver
[ https://issues.apache.org/jira/browse/SPARK-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277955#comment-15277955 ] Kevin McHale commented on SPARK-14162: -- [~sunrui] you are incorrect. You should take a look at https://issues.apache.org/jira/browse/SPARK-14204 and the github issue, because: 1. The temporary workaround that I list there could not solve the problem as you describe it. 2. There is a blatant error in the code. > java.lang.IllegalStateException: Did not find registered driver with class > oracle.jdbc.OracleDriver > --- > > Key: SPARK-14162 > URL: https://issues.apache.org/jira/browse/SPARK-14162 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Zoltan Fedor > > This is an interesting one. > We are using JupyterHub with Python to connect to a Hadoop cluster to run > Spark jobs and as the new Spark versions come out I compile them and add as > new kernels to JupyterHub to be used. > There are also some libraries we are using, like ojdbc to connect to an > Oracle database. > Now the interesting thing, that ojdbc worked fine in Spark 1.6.0 but suddenly > "it cannot be found" in 1.6.1. > Everything, all settings are the same when starting pyspark 1.6.1 and 1.6.0, > so there is no reason for it not to work in 1.6.1 if it works in 1.6.0. > This is the pysparjk code I am running in both 1.6.1 and 1.6.0: > {quote} > df = > sqlContext.read.format('jdbc').options(url='jdbc:oracle:thin:'+connection_script+'', > dbtable='bi.contact').load() > print(df.count()){quote} > And it throws this error in 1.6.1 only: > {quote} > java.lang.IllegalStateException: Did not find registered driver with class > oracle.jdbc.OracleDriver > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:57) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.(JDBCRDD.scala:347) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:339) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){quote} > I know that this usually means that the ojdbc driver is not available on the > executor, but it is. Spark is being started the exact same way in 1.6.1 as in > 1.6.0 and it does find it on 1.6.0. > I can steadily reproduce this, so the only conclusion that something must > have changed between 1.6.0 and 1.6.1 causing this, but I have see no > "depreciation" notice of anything what could cause this. > Environment variables set when starting pyspark 1.6.1: > {quote} > "SPARK_HOME": "/usr/lib/spark-1.6.1-hive", > "SCALA_HOME": "/usr/lib/scala", > "HADOOP_CONF_DIR": "/etc/hadoop/venus-hadoop-conf", > "HADOOP_HOME": "/usr/bin/hadoop", > "HIVE_HOME": "/usr/bin/hive", > "LD_LIBRARY_PATH": "/usr/local/hadoop/lib/native/:$LD_LIBRARY_PATH", > "YARN_HOME": "", > "SPARK_DIST_CLASSPATH": > "/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*", > "SPARK_LIBRARY_PATH": "/usr/lib/hadoop/lib", >
[jira] [Commented] (SPARK-14162) java.lang.IllegalStateException: Did not find registered driver with class oracle.jdbc.OracleDriver
[ https://issues.apache.org/jira/browse/SPARK-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277682#comment-15277682 ] Sun Rui commented on SPARK-14162: - We met the same error. The cause is that in one worker node, mysql JDBC driver is not in the CLASSPATH. [~mchalek] It seems this is not a bug. It seems that in your case, for some reason, in one worker node, ojdbc6 driver is not automatically loaded and registered. > java.lang.IllegalStateException: Did not find registered driver with class > oracle.jdbc.OracleDriver > --- > > Key: SPARK-14162 > URL: https://issues.apache.org/jira/browse/SPARK-14162 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Zoltan Fedor > > This is an interesting one. > We are using JupyterHub with Python to connect to a Hadoop cluster to run > Spark jobs and as the new Spark versions come out I compile them and add as > new kernels to JupyterHub to be used. > There are also some libraries we are using, like ojdbc to connect to an > Oracle database. > Now the interesting thing, that ojdbc worked fine in Spark 1.6.0 but suddenly > "it cannot be found" in 1.6.1. > Everything, all settings are the same when starting pyspark 1.6.1 and 1.6.0, > so there is no reason for it not to work in 1.6.1 if it works in 1.6.0. > This is the pysparjk code I am running in both 1.6.1 and 1.6.0: > {quote} > df = > sqlContext.read.format('jdbc').options(url='jdbc:oracle:thin:'+connection_script+'', > dbtable='bi.contact').load() > print(df.count()){quote} > And it throws this error in 1.6.1 only: > {quote} > java.lang.IllegalStateException: Did not find registered driver with class > oracle.jdbc.OracleDriver > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:57) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.(JDBCRDD.scala:347) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:339) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){quote} > I know that this usually means that the ojdbc driver is not available on the > executor, but it is. Spark is being started the exact same way in 1.6.1 as in > 1.6.0 and it does find it on 1.6.0. > I can steadily reproduce this, so the only conclusion that something must > have changed between 1.6.0 and 1.6.1 causing this, but I have see no > "depreciation" notice of anything what could cause this. > Environment variables set when starting pyspark 1.6.1: > {quote} > "SPARK_HOME": "/usr/lib/spark-1.6.1-hive", > "SCALA_HOME": "/usr/lib/scala", > "HADOOP_CONF_DIR": "/etc/hadoop/venus-hadoop-conf", > "HADOOP_HOME": "/usr/bin/hadoop", > "HIVE_HOME": "/usr/bin/hive", > "LD_LIBRARY_PATH": "/usr/local/hadoop/lib/native/:$LD_LIBRARY_PATH", > "YARN_HOME": "", > "SPARK_DIST_CLASSPATH": > "/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*", > "SPARK_LIBRARY_PATH": "/usr/lib/hadoop/lib", > "PATH": >
[jira] [Commented] (SPARK-14162) java.lang.IllegalStateException: Did not find registered driver with class oracle.jdbc.OracleDriver
[ https://issues.apache.org/jira/browse/SPARK-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270800#comment-15270800 ] Kevin McHale commented on SPARK-14162: -- I documented this issue here: https://issues.apache.org/jira/browse/SPARK-14204 And fixed it here: https://github.com/apache/spark/pull/12000 Nobody is responding to my multiple requests to merge the PR --- maybe others will have some luck convincing them? > java.lang.IllegalStateException: Did not find registered driver with class > oracle.jdbc.OracleDriver > --- > > Key: SPARK-14162 > URL: https://issues.apache.org/jira/browse/SPARK-14162 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Zoltan Fedor > > This is an interesting one. > We are using JupyterHub with Python to connect to a Hadoop cluster to run > Spark jobs and as the new Spark versions come out I compile them and add as > new kernels to JupyterHub to be used. > There are also some libraries we are using, like ojdbc to connect to an > Oracle database. > Now the interesting thing, that ojdbc worked fine in Spark 1.6.0 but suddenly > "it cannot be found" in 1.6.1. > Everything, all settings are the same when starting pyspark 1.6.1 and 1.6.0, > so there is no reason for it not to work in 1.6.1 if it works in 1.6.0. > This is the pysparjk code I am running in both 1.6.1 and 1.6.0: > {quote} > df = > sqlContext.read.format('jdbc').options(url='jdbc:oracle:thin:'+connection_script+'', > dbtable='bi.contact').load() > print(df.count()){quote} > And it throws this error in 1.6.1 only: > {quote} > java.lang.IllegalStateException: Did not find registered driver with class > oracle.jdbc.OracleDriver > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:57) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.(JDBCRDD.scala:347) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:339) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){quote} > I know that this usually means that the ojdbc driver is not available on the > executor, but it is. Spark is being started the exact same way in 1.6.1 as in > 1.6.0 and it does find it on 1.6.0. > I can steadily reproduce this, so the only conclusion that something must > have changed between 1.6.0 and 1.6.1 causing this, but I have see no > "depreciation" notice of anything what could cause this. > Environment variables set when starting pyspark 1.6.1: > {quote} > "SPARK_HOME": "/usr/lib/spark-1.6.1-hive", > "SCALA_HOME": "/usr/lib/scala", > "HADOOP_CONF_DIR": "/etc/hadoop/venus-hadoop-conf", > "HADOOP_HOME": "/usr/bin/hadoop", > "HIVE_HOME": "/usr/bin/hive", > "LD_LIBRARY_PATH": "/usr/local/hadoop/lib/native/:$LD_LIBRARY_PATH", > "YARN_HOME": "", > "SPARK_DIST_CLASSPATH": > "/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*", > "SPARK_LIBRARY_PATH": "/usr/lib/hadoop/lib", > "PATH": >
[jira] [Commented] (SPARK-14162) java.lang.IllegalStateException: Did not find registered driver with class oracle.jdbc.OracleDriver
[ https://issues.apache.org/jira/browse/SPARK-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265310#comment-15265310 ] Martin Hall commented on SPARK-14162: - I got the same error when I had forgotten to copy the oracle jdbc jar file (ojdbc6.jar) to one of the spark worker nodes > java.lang.IllegalStateException: Did not find registered driver with class > oracle.jdbc.OracleDriver > --- > > Key: SPARK-14162 > URL: https://issues.apache.org/jira/browse/SPARK-14162 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Zoltan Fedor > > This is an interesting one. > We are using JupyterHub with Python to connect to a Hadoop cluster to run > Spark jobs and as the new Spark versions come out I compile them and add as > new kernels to JupyterHub to be used. > There are also some libraries we are using, like ojdbc to connect to an > Oracle database. > Now the interesting thing, that ojdbc worked fine in Spark 1.6.0 but suddenly > "it cannot be found" in 1.6.1. > Everything, all settings are the same when starting pyspark 1.6.1 and 1.6.0, > so there is no reason for it not to work in 1.6.1 if it works in 1.6.0. > This is the pysparjk code I am running in both 1.6.1 and 1.6.0: > {quote} > df = > sqlContext.read.format('jdbc').options(url='jdbc:oracle:thin:'+connection_script+'', > dbtable='bi.contact').load() > print(df.count()){quote} > And it throws this error in 1.6.1 only: > {quote} > java.lang.IllegalStateException: Did not find registered driver with class > oracle.jdbc.OracleDriver > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:57) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.(JDBCRDD.scala:347) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:339) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){quote} > I know that this usually means that the ojdbc driver is not available on the > executor, but it is. Spark is being started the exact same way in 1.6.1 as in > 1.6.0 and it does find it on 1.6.0. > I can steadily reproduce this, so the only conclusion that something must > have changed between 1.6.0 and 1.6.1 causing this, but I have see no > "depreciation" notice of anything what could cause this. > Environment variables set when starting pyspark 1.6.1: > {quote} > "SPARK_HOME": "/usr/lib/spark-1.6.1-hive", > "SCALA_HOME": "/usr/lib/scala", > "HADOOP_CONF_DIR": "/etc/hadoop/venus-hadoop-conf", > "HADOOP_HOME": "/usr/bin/hadoop", > "HIVE_HOME": "/usr/bin/hive", > "LD_LIBRARY_PATH": "/usr/local/hadoop/lib/native/:$LD_LIBRARY_PATH", > "YARN_HOME": "", > "SPARK_DIST_CLASSPATH": > "/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*", > "SPARK_LIBRARY_PATH": "/usr/lib/hadoop/lib", > "PATH": >