Github user Achuth17 commented on the issue:
https://github.com/apache/spark/pull/21608
Yes, In the case where the data is stored in S3 I noticed a significant
difference.
Some rough numbers - When done serially for a table in S3 with 1000
partitions, the calculateTotalSize method took about 90 seconds vs 30-40
seconds when done in parallel.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]