EnricoMi commented on PR #44458:
URL: https://github.com/apache/spark/pull/44458#issuecomment-1867988850

   This is not about optimal ordering (I presume you refer to partitions being 
ordered by partition columns, which is optimal to have only one file writer 
open at a time), but about additional ordering (to have some additional order 
that is not required by the writer task). Having sorted partitions is very 
useful when your downstream systems that consume the written data can expect 
some order beyond partition keys. So users care about the in-partition order.
   
   I am happy as long as `df.repartition("id").sortWithinPartitions("id", 
"time").write.partitionBy("id")` keeps being supported.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to