If you configure to many Kafka partitions, you can run into memory issues. This will increase memory requirements for spark job a lot.
Best, Michael On Wed, Nov 7, 2018 at 8:28 AM JF Chen <darou...@gmail.com> wrote: > I have a Spark Streaming application which reads data from kafka and save > the the transformation result to hdfs. > My original partition number of kafka topic is 8, and repartition the data > to 100 to increase the parallelism of spark job. > Now I am wondering if I increase the kafka partition number to 100 instead > of setting repartition to 100, will the performance be enhanced? (I know > repartition action cost a lot cpu resource) > If I set the kafka partition number to 100, does it have any negative > efficiency? > I just have one production environment so it's not convenient for me to do > the test.... > > Thanks! > > Regard, > Junfeng Chen >