Like many things it is not that straight forward!
Need to explicitly reference oracle jar file with switch -jars spark-shell --master yarn --deploy-mode client --driver-class-path /home/hduser/jars/ojdbc6.jar --jars /home/hduser/jars/ojdbc6.jar HTH Dr Mich Talebzadeh LinkedIn <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABU rV8Pw> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUr V8Pw <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility. From: Mich Talebzadeh [mailto:m...@peridale.co.uk] Sent: 13 February 2016 15:25 To: user@spark.apache.org Subject: jdbc driver used by spark fails folowing first stage Hi, My spark shell I start with --driver-class-path /home/hduser/jars/ojdbc6.jar It finds the driver as any Map read reads the correct structure for the Oracle tables. Even when I join columns I can see the join structure: scala> empDepartments.printSchema() root |-- DEPARTMENT_ID: decimal(4,0) (nullable = false) |-- DEPARTMENT_NAME: string (nullable = false) |-- MANAGER_ID: decimal(6,0) (nullable = true) |-- LOCATION_ID: decimal(4,0) (nullable = true) |-- DEPARTMENT_ID: decimal(4,0) (nullable = false) |-- DEPARTMENT_NAME: string (nullable = false) |-- MANAGER_ID: decimal(6,0) (nullable = true) |-- LOCATION_ID: decimal(4,0) (nullable = true) Howver, any operation dealing the rows themselves fail as shown below. scala> empDepartments.foreach(println) 16/02/13 15:32:56 INFO SparkContext: Starting job: foreach at <console>:37 16/02/13 15:32:56 INFO DAGScheduler: Got job 5 (foreach at <console>:37) with 200 output partitions 16/02/13 15:32:56 INFO DAGScheduler: Final stage: ResultStage 11(foreach at <console>:37) 16/02/13 15:32:56 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 9, ShuffleMapStage 10) 16/02/13 15:32:56 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 9, ShuffleMapStage 10) 16/02/13 15:32:56 INFO DAGScheduler: Submitting ShuffleMapStage 9 (MapPartitionsRDD[12] at foreach at <console>:37), which has no missing parents 16/02/13 15:32:57 INFO MemoryStore: ensureFreeSpace(8136) called with curMem=44967, maxMem=555684986 16/02/13 15:32:57 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 7.9 KB, free 529.9 MB) 16/02/13 15:32:57 INFO MemoryStore: ensureFreeSpace(3976) called with curMem=53103, maxMem=555684986 16/02/13 15:32:57 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 3.9 KB, free 529.9 MB) 16/02/13 15:32:57 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on 50.140.197.217:40741 (size: 3.9 KB, free: 529.9 MB) 16/02/13 15:32:57 INFO SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:861 16/02/13 15:32:57 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 9 (MapPartitionsRDD[12] at foreach at <console>:37) 16/02/13 15:32:57 INFO YarnScheduler: Adding task set 9.0 with 1 tasks 16/02/13 15:32:57 INFO DAGScheduler: Submitting ShuffleMapStage 10 (MapPartitionsRDD[8] at foreach at <console>:37), which has no missing parents 16/02/13 15:32:57 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 27, rhes564, PROCESS_LOCAL, 1918 bytes) 16/02/13 15:32:57 INFO MemoryStore: ensureFreeSpace(8136) called with curMem=57079, maxMem=555684986 16/02/13 15:32:57 INFO MemoryStore: Block broadcast_8 stored as values in memory (estimated size 7.9 KB, free 529.9 MB) 16/02/13 15:32:57 INFO MemoryStore: ensureFreeSpace(3978) called with curMem=65215, maxMem=555684986 16/02/13 15:32:57 INFO MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 3.9 KB, free 529.9 MB) 16/02/13 15:32:57 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on 50.140.197.217:40741 (size: 3.9 KB, free: 529.9 MB) 16/02/13 15:32:57 INFO SparkContext: Created broadcast 8 from broadcast at DAGScheduler.scala:861 16/02/13 15:32:57 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 10 (MapPartitionsRDD[8] at foreach at <console>:37) 16/02/13 15:32:57 INFO YarnScheduler: Adding task set 10.0 with 1 tasks 16/02/13 15:32:57 INFO TaskSetManager: Starting task 0.0 in stage 10.0 (TID 28, rhes564, PROCESS_LOCAL, 1918 bytes) 16/02/13 15:32:57 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on rhes564:23270 (size: 3.9 KB, free: 1589.7 MB) 16/02/13 15:32:57 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on rhes564:23270 (size: 3.9 KB, free: 1589.7 MB) 16/02/13 15:32:57 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 27, rhes564): java.sql.SQLException: No suitable driver found for jdbc:oracle:thin:@rhes564:1521:mydb at java.sql.DriverManager.getConnection(DriverManager.java:596) at java.sql.DriverManager.getConnection(DriverManager.java:187) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$getConnecto r$1.apply(JDBCRDD.scala:188) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$getConnecto r$1.apply(JDBCRDD.scala:181) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.<init>(JDBCR DD.scala:360) Mich Talebzadeh LinkedIn <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABU rV8Pw> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUr V8Pw <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility.