Hi guys, Trying to use a Spark SQL context’s .load(“jdbc", …) method to create a DF from a JDBC data source. All seems to work well locally (master = local[*]), however as soon as we try and run on YARN we have problems.
We seem to be running into problems with the class path and loading up the JDBC driver. I’m using the jTDS 1.3.1 driver, net.sourceforge.jtds.jdbc.Driver. ./bin/spark-shell --jars /tmp/jtds-1.3.1.jar --master yarn-client When trying to run I get an exception; scala> sqlContext.load("jdbc", Map("url" -> "jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd", "dbtable" -> "CUBE.DIM_SUPER_STORE_TBL”)) java.sql.SQLException: No suitable driver found for jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd Thinking maybe we need to force load the driver, if I supply “driver” -> “net.sourceforge.jtds.jdbc.Driver” to .load we get; scala> sqlContext.load("jdbc", Map("url" -> "jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd", "driver" -> "net.sourceforge.jtds.jdbc.Driver", "dbtable" -> "CUBE.DIM_SUPER_STORE_TBL”)) java.lang.ClassNotFoundException: net.sourceforge.jtds.jdbc.Driver at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:191) at org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:97) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:290) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:679) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:21) Yet if I run a Class.forName() just from the shell; scala> Class.forName("net.sourceforge.jtds.jdbc.Driver") res1: Class[_] = class net.sourceforge.jtds.jdbc.Driver No problem finding the JAR. I’ve tried in both the shell, and running with spark-submit (packing the driver in with my application as a fat JAR). Nothing seems to work. I can also get a connection in the driver/shell no problem; scala> import java.sql.DriverManager import java.sql.DriverManager scala> DriverManager.getConnection("jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd") res3: java.sql.Connection = net.sourceforge.jtds.jdbc.JtdsConnection@2a67ecd0 I’m probably missing some class path setting here. In jdbc.DefaultSource.createRelation it looks like the call to Class.forName doesn’t specify a class loader so it just uses the default Java behaviour to reflectively get the class loader. It almost feels like its using a different class loader. I also tried seeing if the class path was there on all my executors by running; import scala.collection.JavaConverters._ sc.parallelize(Seq(1,2,3,4)).flatMap(_ => java.sql.DriverManager.getDrivers().asScala.map(d => s”$d | ${d.acceptsURL("jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd")}")).collect().foreach(println) This successfully returns; 15/04/15 01:07:37 INFO scheduler.DAGScheduler: Job 0 finished: collect at Main.scala:46, took 1.495597 s org.apache.derby.jdbc.AutoloadedDriver40 | false com.mysql.jdbc.Driver | false net.sourceforge.jtds.jdbc.Driver | true org.apache.derby.jdbc.AutoloadedDriver40 | false com.mysql.jdbc.Driver | false net.sourceforge.jtds.jdbc.Driver | true org.apache.derby.jdbc.AutoloadedDriver40 | false com.mysql.jdbc.Driver | false net.sourceforge.jtds.jdbc.Driver | true org.apache.derby.jdbc.AutoloadedDriver40 | false com.mysql.jdbc.Driver | false net.sourceforge.jtds.jdbc.Driver | true As a final test we tried with postgres driver and had the same problem. Any ideas? Cheers, Nathan