Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20372 It sounds like we fixed a "bug" and make the actual partition size more close to the expected one, but caused another "bug". 2 speculations: 1. The expected partition size can't maximum read performace 2. the open file cost is wrongly estimated
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org