Re: HDP 3.1 spark Kafka dependency

2020-03-18 Thread Zahid Rahman
I have found many library incompatibility issues including JVM headless
issues where I had to uninstall  headless jvm and install jdk
and work through them, anyway
This page shows the same error as yours,
you  may get away  with making the changes to your pom.xml as suggested.
https://stackoverflow.com/questions/41303037/why-does-spark-application-fail-with-classnotfoundexception-failed-to-find-dat

Good Luck !

Backbutton.co.uk
¯\_(ツ)_/¯
♡۶Java♡۶RMI ♡۶
Make Use Method {MUM}
makeuse.org



On Wed, 18 Mar 2020 at 16:36, William R  wrote:

> Hi,
>
> I am finding difficulty in getting the proper Kafka lib's for spark. The
> version of HDP is 3.1 and i tried the below lib's but it produces the below
> issues.
>
> *POM entry :*
>
> 
> org.apache.kafka
> kafka-clients
> 2.0.0.3.1.0.0-78
> 
> 
> org.apache.kafka
> kafka_2.11
> 2.0.0.3.1.0.0-78
> 
>
> 
> org.apache.spark
> spark-sql_${scala.compat.version}
> ${spark.version}
> provided
> 
>
> 
> org.apache.spark
> spark-core_2.11
> 2.3.2.3.1.0.0-78
> provided
> 
> 
> org.apache.spark
> spark-streaming_2.11
> 2.3.2.3.1.0.0-78
> 
>
> *Issues while spark-submit :*
>
> Exception in thread "main" java.lang.ClassNotFoundException: Failed to
> find data source: kafka. Please find packages at
> http://spark.apache.org/third-party-projects.html
> at
> org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:639)
> at
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190)
> at
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
> at com.example.ReadDataFromKafka$.main(ReadDataFromKafka.scala:18)
> at com.example.ReadDataFromKafka.main(ReadDataFromKafka.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
> at
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
> at
> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>
>
> Could someone help me if i am doing something wrong ?
>
> *Spark Submit:*
>
> export
> KAFKA_KERBEROS_PARAMS="-Djava.security.auth.login.config=kafka.consumer.properties"
> export
> KAFKA_OPTS="-Djava.security.auth.login.config=kafka.consumer.properties"
> export SPARK_KAFKA_VERSION=NONE
>
> spark-submit --conf
> "spark.driver.extraJavaOptions=-Djava.security.auth.login.conf=kafka.consumer.properties"
> --files "kafka.consumer.properties" --class com.example.ReadDataFromKafka
> HelloKafka-1.0-SNAPSHOT.jar
>
> *Consumer Code : *
> https://sparkbyexamples.com/spark/spark-batch-processing-produce-consume-kafka-topic/
>
>
> Regards,
> William R
>
>
>
>


HDP 3.1 spark Kafka dependency

2020-03-18 Thread William R
Hi,

I am finding difficulty in getting the proper Kafka lib's for spark. The
version of HDP is 3.1 and i tried the below lib's but it produces the below
issues.

*POM entry :*


org.apache.kafka
kafka-clients
2.0.0.3.1.0.0-78


org.apache.kafka
kafka_2.11
2.0.0.3.1.0.0-78



org.apache.spark
spark-sql_${scala.compat.version}
${spark.version}
provided



org.apache.spark
spark-core_2.11
2.3.2.3.1.0.0-78
provided


org.apache.spark
spark-streaming_2.11
2.3.2.3.1.0.0-78


*Issues while spark-submit :*

Exception in thread "main" java.lang.ClassNotFoundException: Failed to find
data source: kafka. Please find packages at
http://spark.apache.org/third-party-projects.html
at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:639)
at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190)
at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
at com.example.ReadDataFromKafka$.main(ReadDataFromKafka.scala:18)
at com.example.ReadDataFromKafka.main(ReadDataFromKafka.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)


Could someone help me if i am doing something wrong ?

*Spark Submit:*

export
KAFKA_KERBEROS_PARAMS="-Djava.security.auth.login.config=kafka.consumer.properties"
export
KAFKA_OPTS="-Djava.security.auth.login.config=kafka.consumer.properties"
export SPARK_KAFKA_VERSION=NONE

spark-submit --conf
"spark.driver.extraJavaOptions=-Djava.security.auth.login.conf=kafka.consumer.properties"
--files "kafka.consumer.properties" --class com.example.ReadDataFromKafka
HelloKafka-1.0-SNAPSHOT.jar

*Consumer Code : *
https://sparkbyexamples.com/spark/spark-batch-processing-produce-consume-kafka-topic/


Regards,
William R