Hi Adnan,

coalescing involves network shuffle to other executors. How many executors are 
configured for that job?

Best regards

Roland Johann
Software Developer/Data Engineer

phenetic GmbH
Lütticher Straße 10, 50674 Köln, Germany

Mobil: +49 172 365 26 46
Mail: roland.joh...@phenetic.io
Web: phenetic.io

Handelsregister: Amtsgericht Köln (HRB 92595)
Geschäftsführer: Roland Johann, Uwe Reimann



> Am 23.04.2020 um 20:41 schrieb dev nan <dnan8...@gmail.com>:
> 
> I would like to know why it is faster to write out an RDD that has 30,000 
> partitions as 30,000 files sized 1K-2M rather than coalescing it to 1000 
> partitions and writing out 1000 S3 files of roughly 26MB each, or even 100 
> partitions and 100 S3 files of 260MB each. 
> 
> The coalescing takes a long time.
> 
> 
> Thanks,
> 
> Adnan

Reply via email to