Re: 30000 partitions vs 1000 partitions with Coalescing

2020-04-24 Thread Roland Johann
Hi Adnan, coalescing involves network shuffle to other executors. How many executors are configured for that job? Best regards Roland Johann Software Developer/Data Engineer phenetic GmbH Lütticher Straße 10, 50674 Köln, Germany Mobil: +49 172 365 26 46 Mail: roland.joh...@phenetic.io Web:

30000 partitions vs 1000 partitions with Coalescing

2020-04-23 Thread dev nan
I would like to know why it is faster to write out an RDD that has 30,000 partitions as 30,000 files sized 1K-2M rather than coalescing it to 1000 partitions and writing out 1000 S3 files of roughly 26MB each, or even 100 partitions and 100 S3 files of 260MB each. The coalescing takes a long