Re: Missing required configuration "partition.assignment.strategy" [ Kafka + Spark Structured Streaming ]

Gabor Somogyi Mon, 07 Dec 2020 08:45:57 -0800

Well, I can't do miracle without cluster and logs access.
What I don't understand why you need fat jar?! Spark libraries normally
need provided scope because it must exist on all machines...
I would take a look at the driver and executor logs which contains the
consumer configs + I would take a look at the exact version of the consumer
(this is printed also in the same log)


G


On Mon, Dec 7, 2020 at 5:07 PM Amit Joshi <mailtojoshia...@gmail.com> wrote:

> Hi Gabor,
>
> The code is very simple Kafka consumption of data.
> I guess, it may be the cluster.
> Can you please point out the possible problem toook for in the cluster?
>
> Regards
> Amit
>
> On Monday, December 7, 2020, Gabor Somogyi <gabor.g.somo...@gmail.com>
> wrote:
>
>> + Adding back user list.
>>
>> I've had a look at the Spark code and it's not
>> modifying "partition.assignment.strategy" so the problem
>> must be either in your application or in your cluster setup.
>>
>> G
>>
>>
>> On Mon, Dec 7, 2020 at 12:31 PM Gabor Somogyi <gabor.g.somo...@gmail.com>
>> wrote:
>>
>>> It's super interesting because that field has default value:
>>> *org.apache.kafka.clients.consumer.RangeAssignor*
>>>
>>> On Mon, 7 Dec 2020, 10:51 Amit Joshi, <mailtojoshia...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Thnks for the reply.
>>>> I did tried removing the client version.
>>>> But got the same exception.
>>>>
>>>>
>>>> Thnks
>>>>
>>>> On Monday, December 7, 2020, Gabor Somogyi <gabor.g.somo...@gmail.com>
>>>> wrote:
>>>>
>>>>> +1 on the mentioned change, Spark uses the following kafka-clients
>>>>> library:
>>>>>
>>>>> <kafka.version>2.4.1</kafka.version>
>>>>>
>>>>> G
>>>>>
>>>>>
>>>>> On Mon, Dec 7, 2020 at 9:30 AM German Schiavon <
>>>>> gschiavonsp...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I think the issue is that you are overriding the kafka-clients that
>>>>>> comes with  <artifactId>spark-sql-kafka-0-10_2.12</artifactId>
>>>>>>
>>>>>>
>>>>>> I'd try removing the kafka-clients and see if it works
>>>>>>
>>>>>>
>>>>>> On Sun, 6 Dec 2020 at 08:01, Amit Joshi <mailtojoshia...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I am running the Spark Structured Streaming along with Kafka.
>>>>>>> Below is the pom.xml
>>>>>>>
>>>>>>> <properties>
>>>>>>>     <maven.compiler.source>1.8</maven.compiler.source>
>>>>>>>     <maven.compiler.target>1.8</maven.compiler.target>
>>>>>>>     <encoding>UTF-8</encoding>
>>>>>>>     <!-- Put the Scala version of the cluster -->
>>>>>>>     <scalaVersion>2.12.10</scalaVersion>
>>>>>>>     <sparkVersion>3.0.1</sparkVersion>
>>>>>>> </properties>
>>>>>>>
>>>>>>> <dependency>
>>>>>>>     <groupId>org.apache.kafka</groupId>
>>>>>>>     <artifactId>kafka-clients</artifactId>
>>>>>>>     <version>2.1.0</version>
>>>>>>> </dependency>
>>>>>>>
>>>>>>> <dependency>
>>>>>>>     <groupId>org.apache.spark</groupId>
>>>>>>>     <artifactId>spark-core_2.12</artifactId>
>>>>>>>     <version>${sparkVersion}</version>
>>>>>>>     <scope>provided</scope>
>>>>>>> </dependency>
>>>>>>> <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
>>>>>>> <dependency>
>>>>>>>     <groupId>org.apache.spark</groupId>
>>>>>>>     <artifactId>spark-sql_2.12</artifactId>
>>>>>>>     <version>${sparkVersion}</version>
>>>>>>>     <scope>provided</scope>
>>>>>>> </dependency>
>>>>>>> <!-- 
>>>>>>> https://mvnrepository.com/artifact/org.apache.spark/spark-sql-kafka-0-10
>>>>>>>  -->
>>>>>>> <dependency>
>>>>>>>     <groupId>org.apache.spark</groupId>
>>>>>>>     <artifactId>spark-sql-kafka-0-10_2.12</artifactId>
>>>>>>>     <version>${sparkVersion}</version>
>>>>>>> </dependency>
>>>>>>>
>>>>>>> Building the fat jar with shade plugin. The jar is running as expected 
>>>>>>> in my local setup with the command
>>>>>>>
>>>>>>> *spark-submit --master local[*] --class com.stream.Main --num-executors 
>>>>>>> 3 --driver-memory 2g --executor-cores 2 --executor-memory 3g 
>>>>>>> prism-event-synch-rta.jar*
>>>>>>>
>>>>>>> But when I am trying to run same jar in spark cluster using yarn with 
>>>>>>> command:
>>>>>>>
>>>>>>> *spark-submit --master yarn --deploy-mode cluster --class 
>>>>>>> com.stream.Main --num-executors 4 --driver-memory 2g --executor-cores 1 
>>>>>>> --executor-memory 4g  gs://jars/prism-event-synch-rta.jar*
>>>>>>>
>>>>>>> Getting the this exception:
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>
>>>>>>> *at org.apache.spark.sql.execution.streaming.StreamExecution.org 
>>>>>>> <http://org.apache.spark.sql.execution.streaming.StreamExecution.org>$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:355)
>>>>>>>        at 
>>>>>>> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:245)Caused
>>>>>>>  by: org.apache.kafka.common.config.ConfigException: Missing required 
>>>>>>> configuration "partition.assignment.strategy" which has no default 
>>>>>>> value. at 
>>>>>>> org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:124)*
>>>>>>>
>>>>>>> I have tried setting up the "partition.assignment.strategy", then also 
>>>>>>> its not working.
>>>>>>>
>>>>>>> Please help.
>>>>>>>
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Amit Joshi
>>>>>>>
>>>>>>>

Re: Missing required configuration "partition.assignment.strategy" [ Kafka + Spark Structured Streaming ]

Reply via email to