Yes, now I have allocated 100 cores and 8 kafka partitions, and then repartition it to 100 to feed 100 cores. In following stage I have map action, will it also cause slow down?
Regard, Junfeng Chen On Thu, Nov 8, 2018 at 12:34 AM Shahbaz <shahzadh...@gmail.com> wrote: > Hi , > > - Do you have adequate CPU cores allocated to handle increased > partitions ,generally if you have Kafka partitions >=(greater than or equal > to) CPU Cores Total (Number of Executor Instances * Per Executor Core) > ,gives increased task parallelism for reader phase. > - However if you have too many partitions but not enough cores ,it > would eventually slow down the reader (Ex: 100 Partitions and only 20 Total > Cores). > - Additionally ,the next set of transformation will have there own > partitions ,if its involving shuffle ,sq.shuffle.partitions then defines > next level of parallelism ,if you are not having any data skew,then you > should get good performance. > > > Regards, > Shahbaz > > On Wed, Nov 7, 2018 at 12:58 PM JF Chen <darou...@gmail.com> wrote: > >> I have a Spark Streaming application which reads data from kafka and save >> the the transformation result to hdfs. >> My original partition number of kafka topic is 8, and repartition the >> data to 100 to increase the parallelism of spark job. >> Now I am wondering if I increase the kafka partition number to 100 >> instead of setting repartition to 100, will the performance be enhanced? (I >> know repartition action cost a lot cpu resource) >> If I set the kafka partition number to 100, does it have any negative >> efficiency? >> I just have one production environment so it's not convenient for me to >> do the test.... >> >> Thanks! >> >> Regard, >> Junfeng Chen >> >