I’ve tried it both ways.

Uber jar gives me gives me the following:

   - Caused by: java.lang.ClassNotFoundException: Failed to find data
   source: kafka. Please find packages at
   http://spark.apache.org/third-party-projects.html

If I only do minimal packaging and add
org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar as a --package and
then add it to the --driver-class-path then I get past that error, but I
get the error I showed in the original post.

I agree it seems it’s missing the kafka-clients jar file as that is where
the ByteArrayDeserializer is, though it looks like it’s present as far as I
can tell.

I can see the following two packages in the ClassPath entries on the
history server (Though the source shows: **********(redacted) — not sure
why?)

   - spark://<ip>:<port>/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar
   -
   spark://<ip>:<port>/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar

As as side note, i’m running both a master and worker on the same system
just to test out running in cluster mode. Not sure if that would have
anything to do with it. I would think it would make it easier since it's
got access to all the same file system... but I'm pretty new to Spark.

I have also read through and followed those instructions as well as many
others at this point.

Thanks!
​

On Wed, Dec 27, 2017 at 12:56 AM, Eyal Zituny <eyal.zit...@equalum.io>
wrote:

> Hi,
> it seems that you're missing the kafka-clients jar (and probably some
> other dependencies as well)
> how did you packaged you application jar? does it includes all the
> required dependencies (as an uber jar)?
> if it's not an uber jar you need to pass via the driver-class-path and the
> executor-class-path all the files\dirs where your dependencies can be found
> (note that those must be accessible from each node in the cluster)
> i suggest to go over the manual
> <https://spark.apache.org/docs/latest/submitting-applications.html>
>
> Eyal
>
>
> On Wed, Dec 27, 2017 at 1:08 AM, Geoff Von Allmen <ge...@ibleducation.com>
> wrote:
>
>> I am trying to deploy a standalone cluster but running into ClassNotFound
>> errors.
>>
>> I have tried a whole myriad of different approaches varying from
>> packaging all dependencies into a single JAR and using the --packages
>> and --driver-class-path options.
>>
>> I’ve got a master node started, a slave node running on the same system,
>> and am using spark submit to get the streaming job kicked off.
>>
>> Here is the error I’m getting:
>>
>> Exception in thread "main" java.lang.reflect.InvocationTargetException
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>     at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>     at java.lang.reflect.Method.invoke(Method.java:498)
>>     at 
>> org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
>>     at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
>> Caused by: java.lang.NoClassDefFoundError: 
>> org/apache/kafka/common/serialization/ByteArrayDeserializer
>>     at 
>> org.apache.spark.sql.kafka010.KafkaSourceProvider$.<init>(KafkaSourceProvider.scala:376)
>>     at 
>> org.apache.spark.sql.kafka010.KafkaSourceProvider$.<clinit>(KafkaSourceProvider.scala)
>>     at 
>> org.apache.spark.sql.kafka010.KafkaSourceProvider.validateStreamOptions(KafkaSourceProvider.scala:323)
>>     at 
>> org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSchema(KafkaSourceProvider.scala:60)
>>     at 
>> org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:198)
>>     at 
>> org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:88)
>>     at 
>> org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:88)
>>     at 
>> org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
>>     at 
>> org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:150)
>>     at com.Customer.start(Customer.scala:47)
>>     at com.Main$.main(Main.scala:23)
>>     at com.Main.main(Main.scala)
>>     ... 6 more
>> Caused by: java.lang.ClassNotFoundException: 
>> org.apache.kafka.common.serialization.ByteArrayDeserializer
>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>     ... 18 more
>>
>> Here is the spark submit command I’m using:
>>
>> ./spark-submit \
>>     --master spark://<domain>:<port> \
>>     --files jaas.conf \
>>     --deploy-mode cluster \
>>     --driver-java-options "-Djava.security.auth.login.config=./jaas.conf" \
>>     --conf 
>> "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./jaas.conf"
>>  \
>>     --packages org.apache.spark:spark-sql-kafka-0-10_2.11 \
>>     --driver-class-path 
>> ~/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.1.jar \
>>     --class <class_main> \
>>     --verbose \
>>     my_jar.jar
>>
>> I’ve tried all sorts of combinations of including different packages and
>> driver-class-path jar files. As far as I can find, the serializer should be
>> in the kafka-clients jar file, which I’ve tried including to no success.
>>
>> Pom Dependencies are as follows:
>>
>>     <dependencies>
>>         <dependency>
>>             <groupId>org.scala-lang</groupId>
>>             <artifactId>scala-library</artifactId>
>>             <version>2.11.12</version>
>>         </dependency>
>>         <dependency>
>>             <groupId>org.apache.spark</groupId>
>>             <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
>>             <version>2.2.1</version>
>>         </dependency>
>>         <dependency>
>>             <groupId>org.apache.spark</groupId>
>>             <artifactId>spark-core_2.11</artifactId>
>>             <version>2.2.1</version>
>>         </dependency>
>>         <dependency>
>>             <groupId>org.apache.spark</groupId>
>>             <artifactId>spark-sql_2.11</artifactId>
>>             <version>2.2.1</version>
>>         </dependency>
>>         <dependency>
>>             <groupId>org.apache.spark</groupId>
>>             <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
>>             <version>2.2.1</version>
>>         </dependency>
>>         <dependency>
>>             <groupId>mysql</groupId>
>>             <artifactId>mysql-connector-java</artifactId>
>>             <version>8.0.8-dmr</version>
>>         </dependency>
>>         <dependency>
>>             <groupId>joda-time</groupId>
>>             <artifactId>joda-time</artifactId>
>>             <version>2.9.9</version>
>>         </dependency>
>>     </dependencies>
>>
>> If I remove --deploy-mode and run it as client … it works just fine.
>>
>> Thanks Everyone -
>>
>> Geoff V.
>> ​
>>
>
>

Reply via email to