Re: How to increase the parallelism of Spark Streaming application？

JF Chen Thu, 08 Nov 2018 13:12:02 -0800

Yes, now I have allocated 100 cores and 8 kafka partitions, and then
repartition it to 100 to feed 100 cores. In following stage I have map
action, will it also cause slow down?


Regard,
Junfeng Chen


On Thu, Nov 8, 2018 at 12:34 AM Shahbaz <shahzadh...@gmail.com> wrote:

> Hi ,
>
>    - Do you have adequate CPU cores allocated to handle increased
>    partitions ,generally if you have Kafka partitions >=(greater than or equal
>    to) CPU Cores Total (Number of Executor Instances * Per Executor Core)
>    ,gives increased task parallelism for reader phase.
>    - However if you have too many partitions but not enough cores ,it
>    would eventually slow down the reader (Ex: 100 Partitions and only 20 Total
>    Cores).
>    - Additionally ,the next set of transformation will have there own
>    partitions ,if its involving  shuffle ,sq.shuffle.partitions then defines
>    next level of parallelism ,if you are not having any data skew,then you
>    should get good performance.
>
>
> Regards,
> Shahbaz
>
> On Wed, Nov 7, 2018 at 12:58 PM JF Chen <darou...@gmail.com> wrote:
>
>> I have a Spark Streaming application which reads data from kafka and save
>> the the transformation result to hdfs.
>> My original partition number of kafka topic is 8, and repartition the
>> data to 100 to increase the parallelism of spark job.
>> Now I am wondering if I increase the kafka partition number to 100
>> instead of setting repartition to 100, will the performance be enhanced? (I
>> know repartition action cost a lot cpu resource)
>> If I set the kafka partition number to 100, does it have any negative
>> efficiency?
>> I just have one production environment so it's not convenient for me to
>> do the test....
>>
>> Thanks!
>>
>> Regard,
>> Junfeng Chen
>>
>

Re: How to increase the parallelism of Spark Streaming application？

Reply via email to