I’ve tried it both ways. Uber jar gives me gives me the following:
- Caused by: java.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at http://spark.apache.org/third-party-projects.html If I only do minimal packaging and add org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar as a --package and then add it to the --driver-class-path then I get past that error, but I get the error I showed in the original post. I agree it seems it’s missing the kafka-clients jar file as that is where the ByteArrayDeserializer is, though it looks like it’s present as far as I can tell. I can see the following two packages in the ClassPath entries on the history server (Though the source shows: **********(redacted) — not sure why?) - spark://<ip>:<port>/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar - spark://<ip>:<port>/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar As as side note, i’m running both a master and worker on the same system just to test out running in cluster mode. Not sure if that would have anything to do with it. I would think it would make it easier since it's got access to all the same file system... but I'm pretty new to Spark. I have also read through and followed those instructions as well as many others at this point. Thanks! On Wed, Dec 27, 2017 at 12:56 AM, Eyal Zituny <eyal.zit...@equalum.io> wrote: > Hi, > it seems that you're missing the kafka-clients jar (and probably some > other dependencies as well) > how did you packaged you application jar? does it includes all the > required dependencies (as an uber jar)? > if it's not an uber jar you need to pass via the driver-class-path and the > executor-class-path all the files\dirs where your dependencies can be found > (note that those must be accessible from each node in the cluster) > i suggest to go over the manual > <https://spark.apache.org/docs/latest/submitting-applications.html> > > Eyal > > > On Wed, Dec 27, 2017 at 1:08 AM, Geoff Von Allmen <ge...@ibleducation.com> > wrote: > >> I am trying to deploy a standalone cluster but running into ClassNotFound >> errors. >> >> I have tried a whole myriad of different approaches varying from >> packaging all dependencies into a single JAR and using the --packages >> and --driver-class-path options. >> >> I’ve got a master node started, a slave node running on the same system, >> and am using spark submit to get the streaming job kicked off. >> >> Here is the error I’m getting: >> >> Exception in thread "main" java.lang.reflect.InvocationTargetException >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:498) >> at >> org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58) >> at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) >> Caused by: java.lang.NoClassDefFoundError: >> org/apache/kafka/common/serialization/ByteArrayDeserializer >> at >> org.apache.spark.sql.kafka010.KafkaSourceProvider$.<init>(KafkaSourceProvider.scala:376) >> at >> org.apache.spark.sql.kafka010.KafkaSourceProvider$.<clinit>(KafkaSourceProvider.scala) >> at >> org.apache.spark.sql.kafka010.KafkaSourceProvider.validateStreamOptions(KafkaSourceProvider.scala:323) >> at >> org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSchema(KafkaSourceProvider.scala:60) >> at >> org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:198) >> at >> org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:88) >> at >> org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:88) >> at >> org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30) >> at >> org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:150) >> at com.Customer.start(Customer.scala:47) >> at com.Main$.main(Main.scala:23) >> at com.Main.main(Main.scala) >> ... 6 more >> Caused by: java.lang.ClassNotFoundException: >> org.apache.kafka.common.serialization.ByteArrayDeserializer >> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> ... 18 more >> >> Here is the spark submit command I’m using: >> >> ./spark-submit \ >> --master spark://<domain>:<port> \ >> --files jaas.conf \ >> --deploy-mode cluster \ >> --driver-java-options "-Djava.security.auth.login.config=./jaas.conf" \ >> --conf >> "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./jaas.conf" >> \ >> --packages org.apache.spark:spark-sql-kafka-0-10_2.11 \ >> --driver-class-path >> ~/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.1.jar \ >> --class <class_main> \ >> --verbose \ >> my_jar.jar >> >> I’ve tried all sorts of combinations of including different packages and >> driver-class-path jar files. As far as I can find, the serializer should be >> in the kafka-clients jar file, which I’ve tried including to no success. >> >> Pom Dependencies are as follows: >> >> <dependencies> >> <dependency> >> <groupId>org.scala-lang</groupId> >> <artifactId>scala-library</artifactId> >> <version>2.11.12</version> >> </dependency> >> <dependency> >> <groupId>org.apache.spark</groupId> >> <artifactId>spark-streaming-kafka-0-10_2.11</artifactId> >> <version>2.2.1</version> >> </dependency> >> <dependency> >> <groupId>org.apache.spark</groupId> >> <artifactId>spark-core_2.11</artifactId> >> <version>2.2.1</version> >> </dependency> >> <dependency> >> <groupId>org.apache.spark</groupId> >> <artifactId>spark-sql_2.11</artifactId> >> <version>2.2.1</version> >> </dependency> >> <dependency> >> <groupId>org.apache.spark</groupId> >> <artifactId>spark-sql-kafka-0-10_2.11</artifactId> >> <version>2.2.1</version> >> </dependency> >> <dependency> >> <groupId>mysql</groupId> >> <artifactId>mysql-connector-java</artifactId> >> <version>8.0.8-dmr</version> >> </dependency> >> <dependency> >> <groupId>joda-time</groupId> >> <artifactId>joda-time</artifactId> >> <version>2.9.9</version> >> </dependency> >> </dependencies> >> >> If I remove --deploy-mode and run it as client … it works just fine. >> >> Thanks Everyone - >> >> Geoff V. >> >> > >