ByteArraySerializer

Mich Talebzadeh Wed, 02 Feb 2022 16:02:42 -0800

Well you are now using a package instead of the jar.  There is a difference
between using a jar and using a package in spark-submit. --jar adds only
that jar. --package adds the jar and all its dependencies listed in maven.
Packages do resolve the dependencies. They do so via ivy
<https://ant.apache.org/ivy/> which is a dependency manager. You can of
course do it manually and working out yourself looking at missing jars
through .ivy2/jars sub-directory step-by-step using


cd ~/.ivy2/jars
grep -lRi  <missing class>

//example
grep -lRi  HttpRequestInitializer

Then you can add the missing jars to -- jars and repeat until everything
works (a bit tedious) .


HTH


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 2 Feb 2022 at 23:30, karan alang <karan.al...@gmail.com> wrote:

> Hi Mitch, All -
>
> thnx, i was able to resolve this using the command below  :
>
> ---
> gcloud dataproc jobs submit pyspark
> /Users/karanalang/Documents/Technology/gcp/DataProc/StructuredStreaming_Kafka_GCP-Batch-feb2.py
>  --cluster dataproc-ss-poc  --properties
> spark.jars.packages=org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2
> --region us-central1
> ----
>
>
> On Wed, Feb 2, 2022 at 1:25 AM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> The current Spark version on GCP is 3.1.2.
>>
>> Try using this jar file instead
>>
>> spark-sql-kafka-0-10_2.12-3.0.1.jar
>>
>>
>> HTH
>>
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 2 Feb 2022 at 06:51, karan alang <karan.al...@gmail.com> wrote:
>>
>>> Hello All,
>>>
>>> I'm running a simple Structured Streaming on GCP, which reads data from
>>> Kafka and prints onto console.
>>>
>>> Command :
>>>
>>> cloud dataproc jobs submit pyspark     
>>> /Users/karanalang/Documents/Technology/gcp/DataProc/StructuredStreaming_Kafka_GCP-Batch-feb1.py
>>>     --cluster dataproc-ss-poc      --jars
>>> gs://spark-jars-karan/spark-sql-kafka-0-10_2.12-3.1.2.jar
>>> gs://spark-jars-karan/spark-core_2.12-3.1.2.jar     --region us-central1
>>>
>>> I'm getting error :
>>>
>>> File
>>> "/tmp/01c16a55009a42a0a29da6dde9aae4d5/StructuredStreaming_Kafka_GCP-Batch-feb1.py",
>>> line 49, in <module>
>>>
>>>     df = spark.read.format('kafka')\
>>>
>>>   File
>>> "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line
>>> 210, in load
>>>
>>>   File
>>> "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line
>>> 1304, in __call__
>>>
>>>   File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py",
>>> line 111, in deco
>>>
>>>   File
>>> "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 326,
>>> in get_return_value
>>>
>>> py4j.protocol.Py4JJavaError: An error occurred while calling o69.load.
>>>
>>> : java.lang.NoClassDefFoundError:
>>> org/apache/kafka/common/serialization/ByteArraySerializer
>>>
>>> at
>>> org.apache.spark.sql.kafka010.KafkaSourceProvider$.<init>(KafkaSourceProvider.scala:599)
>>>
>>> at
>>> org.apache.spark.sql.kafka010.KafkaSourceProvider$.<clinit>(KafkaSourceProvider.scala)
>>>
>>> at org.apache.spark.sql.kafka010.KafkaSourceProvider.org
>>> $apache$spark$sql$kafka010$KafkaSourceProvider$$validateBatchOptions(KafkaSourceProvider.scala:348)
>>>
>>> at
>>> org.apache.spark.sql.kafka010.KafkaSourceProvider.createRelation(KafkaSourceProvider.scala:128)
>>>
>>> at
>>> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355)
>>>
>>> at
>>> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
>>>
>>> at
>>> org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
>>>
>>> at scala.Option.getOrElse(Option.scala:189)
>>>
>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
>>>
>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:225)
>>>
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>
>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>>
>>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>>>
>>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>>>
>>> at py4j.Gateway.invoke(Gateway.java:282)
>>>
>>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>>>
>>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>>
>>> at py4j.GatewayConnection.run(GatewayConnection.java:238)
>>>
>>> at java.lang.Thread.run(Thread.java:748)
>>>
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.kafka.common.serialization.ByteArraySerializer
>>>
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>>>
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>>>
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>>>
>>> Additional details are in stackoverflow -
>>>
>>>
>>> https://stackoverflow.com/questions/70951195/gcp-dataproc-java-lang-noclassdeffounderror-org-apache-kafka-common-serializa
>>>
>>> Do we need to pass any other jar ?
>>> What needs to be done to debug/fix this ?
>>>
>>> tia !
>>>
>>>
>>>

Re: Structured Streaming on GCP Dataproc - java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer

Reply via email to