Hi Adnan, coalescing involves network shuffle to other executors. How many executors are configured for that job?
Best regards Roland Johann Software Developer/Data Engineer phenetic GmbH Lütticher Straße 10, 50674 Köln, Germany Mobil: +49 172 365 26 46 Mail: roland.joh...@phenetic.io Web: phenetic.io Handelsregister: Amtsgericht Köln (HRB 92595) Geschäftsführer: Roland Johann, Uwe Reimann > Am 23.04.2020 um 20:41 schrieb dev nan <dnan8...@gmail.com>: > > I would like to know why it is faster to write out an RDD that has 30,000 > partitions as 30,000 files sized 1K-2M rather than coalescing it to 1000 > partitions and writing out 1000 S3 files of roughly 26MB each, or even 100 > partitions and 100 S3 files of 260MB each. > > The coalescing takes a long time. > > > Thanks, > > Adnan