Hi,
I am finding difficulty in getting the proper Kafka lib's for spark. The
version of HDP is 3.1 and i tried the below lib's but it produces the below
issues.
*POM entry :*
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.0.0.3.1.0.0-78</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>2.0.0.3.1.0.0-78</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.compat.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.2.3.1.0.0-78</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.3.2.3.1.0.0-78</version>
</dependency>
*Issues while spark-submit :*
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find
data source: kafka. Please find packages at
http://spark.apache.org/third-party-projects.html
at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:639)
at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190)
at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
at com.example.ReadDataFromKafka$.main(ReadDataFromKafka.scala:18)
at com.example.ReadDataFromKafka.main(ReadDataFromKafka.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
Could someone help me if i am doing something wrong ?
*Spark Submit:*
export
KAFKA_KERBEROS_PARAMS="-Djava.security.auth.login.config=kafka.consumer.properties"
export
KAFKA_OPTS="-Djava.security.auth.login.config=kafka.consumer.properties"
export SPARK_KAFKA_VERSION=NONE
spark-submit --conf
"spark.driver.extraJavaOptions=-Djava.security.auth.login.conf=kafka.consumer.properties"
--files "kafka.consumer.properties" --class com.example.ReadDataFromKafka
HelloKafka-1.0-SNAPSHOT.jar
*Consumer Code : *
https://sparkbyexamples.com/spark/spark-batch-processing-produce-consume-kafka-topic/
Regards,
William R