[GitHub] [spark] HeartSaVioR edited a comment on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

GitBox Wed, 27 Jan 2021 21:05:14 -0800


HeartSaVioR edited a comment on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-768801016



   > AQE won't kick in if users specify num partitions, e.g. df.repartition(5), 
I think the same applies here if the sink requires a certain num partitions.
   
   The case I imagine is like this, `df.repartition(5).write.(blabla).save()`, 
with the condition sink requires a specific distribution but doesn't require 
specific number of partitions. Now Spark would repartition to default number of 
shuffle partitions, and I'm unsure changing it to None (SPARK-34230) would keep 
the user intention.
   
   There's a glitch where end users would know the good number of partitions 
but don't know about the requirement of distribution/ordering for the sink, so 
end users can't add the same repartition (but with number of partitions) sink 
will require to Spark and see the opportunity Spark will deduplicate two 
neighbor repartitions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR edited a comment on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

Reply via email to