Hi Adnan,
coalescing involves network shuffle to other executors. How many executors are
configured for that job?
Best regards
Roland Johann
Software Developer/Data Engineer
phenetic GmbH
Lütticher Straße 10, 50674 Köln, Germany
Mobil: +49 172 365 26 46
Mail: roland.joh...@phenetic.io
Web:
I would like to know why it is faster to write out an RDD that has 30,000
partitions as 30,000 files sized 1K-2M rather than coalescing it to 1000
partitions and writing out 1000 S3 files of roughly 26MB each, or even 100
partitions and 100 S3 files of 260MB each.
The coalescing takes a long