Please disregard, I used an obsolete version of the jar which did indeed not have the classes in...
From: Rabe, Jens <jens.r...@iwes.fraunhofer.de> Sent: Montag, 8. Oktober 2018 12:31 To: user@livy.incubator.apache.org Subject: Submitting a PySpark batch job ignores jars sent with it Hello, I defined a custom format to read data into spark. This works when used in Scala Spark or e.g. from Zeppelin, also with PySpark. I now try to use this from Livy. I post something like this to http://mylivy:8998/batches: { "file":"/path/to/myjob.py", "args":["foo", "bar"], "jars":"/path/to/myformat-assembly.jar" } In the log I see the jar gets loaded and added: "2018-10-08 12:23:28 INFO SparkContext:54 - Added JAR file:/// path/to/myformat-assembly.jar at spark://172.30.10.10:45613/jars/ myformat-assembly.jar with timestamp 1538994208755" But my PySpark job doesn't find the format: "Traceback (most recent call last):", " File \"/path/to/myjob.py \", line 13, in <module>", " data = spark.read.format(\"my.custom.format\").load(path)", " File \"/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py\", line 166, in load", " File \"/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py\", line 1257, in __call__", " File \"/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py\", line 63, in deco", " File \"/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py\", line 328, in get_return_value", "py4j.protocol.Py4JJavaError: An error occurred while calling o29.load.", ": java.lang.ClassNotFoundException: Failed to find data source: my.custom.format. Please find packages at http://spark.apache.org/third-party-projects.html", When opening a session (which loads the same library jar) and sending the respective command, it fails as well. However, I just added a simple object into this library, and calling this works (like using sc._jvm.somepackage.Foo.bar()) What am I missing?