If you configure to many Kafka partitions, you can run into memory issues.
This will increase memory requirements for spark job a lot.

Best,
Michael


On Wed, Nov 7, 2018 at 8:28 AM JF Chen <darou...@gmail.com> wrote:

> I have a Spark Streaming application which reads data from kafka and save
> the the transformation result to hdfs.
> My original partition number of kafka topic is 8, and repartition the data
> to 100 to increase the parallelism of spark job.
> Now I am wondering if I increase the kafka partition number to 100 instead
> of setting repartition to 100, will the performance be enhanced? (I know
> repartition action cost a lot cpu resource)
> If I set the kafka partition number to 100, does it have any negative
> efficiency?
> I just have one production environment so it's not convenient for me to do
> the test....
>
> Thanks!
>
> Regard,
> Junfeng Chen
>

Reply via email to