I have no problems when submitting the task using spark-submit. The --jars option with the list of jars required is successful and I see in the output the jars being added:
16/02/10 11:14:24 INFO spark.SparkContext: Added JAR file:/usr/lib/spark/extras/lib/spark-streaming-kafka.jar at http://192.168.10.4:33820/jars/spark-streaming-kafka.jar with timestamp 1455102864058 16/02/10 11:14:24 INFO spark.SparkContext: Added JAR file:/opt/kafka/libs/scala-library-2.10.1.jar at http://192.168.10.4:33820/jars/scala-library-2.10.1.jar with timestamp 1455102864077 16/02/10 11:14:24 INFO spark.SparkContext: Added JAR file:/opt/kafka/libs/kafka_2.10-0.8.1.1.jar at http://192.168.10.4:33820/jars/kafka_2.10-0.8.1.1.jar with timestamp 1455102864085 16/02/10 11:14:24 INFO spark.SparkContext: Added JAR file:/opt/kafka/libs/metrics-core-2.2.0.jar at http://192.168.10.4:33820/jars/metrics-core-2.2.0.jar with timestamp 1455102864086 16/02/10 11:14:24 INFO spark.SparkContext: Added JAR file:/usr/share/java/mysql.jar at http://192.168.10.4:33820/jars/mysql.jar with timestamp 1455102864090 But when I try to programmatically create a context in python (I want to set up some tests) I don't see this and I end up with class not found errors. Trying to cover all bases even though I suspect that it's redundant when running local I've tried: spark_conf = SparkConf() spark_conf.setMaster('local[4]') spark_conf.set('spark.executor.extraLibraryPath', '/usr/lib/spark/extras/lib/spark-streaming-kafka.jar,' '/opt/kafka/libs/scala-library-2.10.1.jar,' '/opt/kafka/libs/kafka_2.10-0.8.1.1.jar,' '/opt/kafka/libs/metrics-core-2.2.0.jar,' '/usr/share/java/mysql.jar') spark_conf.set('spark.executor.extraClassPath', '/usr/lib/spark/extras/lib/spark-streaming-kafka.jar,' '/opt/kafka/libs/scala-library-2.10.1.jar,' '/opt/kafka/libs/kafka_2.10-0.8.1.1.jar,' '/opt/kafka/libs/metrics-core-2.2.0.jar,' '/usr/share/java/mysql.jar') spark_conf.set('spark.driver.extraClassPath', '/usr/lib/spark/extras/lib/spark-streaming-kafka.jar,' '/opt/kafka/libs/scala-library-2.10.1.jar,' '/opt/kafka/libs/kafka_2.10-0.8.1.1.jar,' '/opt/kafka/libs/metrics-core-2.2.0.jar,' '/usr/share/java/mysql.jar') spark_conf.set('spark.driver.extraLibraryPath', '/usr/lib/spark/extras/lib/spark-streaming-kafka.jar,' '/opt/kafka/libs/scala-library-2.10.1.jar,' '/opt/kafka/libs/kafka_2.10-0.8.1.1.jar,' '/opt/kafka/libs/metrics-core-2.2.0.jar,' '/usr/share/java/mysql.jar') self.spark_context = SparkContext(conf=spark_conf) But still I get the same failure to find the same class: Py4JJavaError: An error occurred while calling o30.loadClass. : java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper The class is certainly in the spark_streaming_kafka.jar and is present in the filesystem at that location. I'm under the impression that were I using java I'd be able to use the addJars method on the conf to take care of this but there doesn't appear to be anything that corresponds for python. Hacking about I found that adding: spark_conf.set('spark.jars', '/usr/lib/spark/extras/lib/spark-streaming-kafka.jar,' '/opt/kafka/libs/scala-library-2.10.1.jar,' '/opt/kafka/libs/kafka_2.10-0.8.1.1.jar,' '/opt/kafka/libs/metrics-core-2.2.0.jar,' '/usr/share/java/mysql.jar') got the logging to admit to adding the jars to the http server (just as for the spark submit output above) but leaving the other config options in place or removing them the class is still not found. Is this not possible in python? Incidentally, I have tried SPARK_CLASSPATH (getting the message that it's deprecated and ignored anyway) and I cannot find anything else to try. Can anybody help? David K.