Re: [PR] [SPARK-46485][SQL] V1Write should not add Sort when not needed [spark]

via GitHub Fri, 22 Dec 2023 10:53:26 -0800


EnricoMi commented on PR #44458:
URL: https://github.com/apache/spark/pull/44458#issuecomment-1867988850


   This is not about optimal ordering (I presume you refer to partitions being 
ordered by partition columns, which is optimal to have only one file writer 
open at a time), but about additional ordering (to have some additional order 
that is not required by the writer task). Having sorted partitions is very 
useful when your downstream systems that consume the written data can expect 
some order beyond partition keys. So users care about the in-partition order.
   
   I am happy as long as `df.repartition("id").sortWithinPartitions("id", 
"time").write.partitionBy("id")` keeps being supported.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46485][SQL] V1Write should not add Sort when not needed [spark]

Reply via email to