Re: [PR] [SPARK-46367][SQL][FOLLOWUP] Group mixed-arity KPs by arity in projectKeyedPartitionings [spark]

via GitHub Thu, 14 May 2026 03:04:59 -0700


peter-toth commented on PR #55876:
URL: https://github.com/apache/spark/pull/55876#issuecomment-4449692202


   @cloud-fan , I don't think this is valid.
   
   There simply can't be mixed-arity `KeyedPartitioning`s in a 
`PartitioningCollection`. This is an invariant required by `GroupPartitions` as 
well: 
https://github.com/apache/spark/blob/5949ab30b41860574ab57b94a8848464b5e127a7/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/GroupPartitionsExec.scala#L70-L72
   and actualy, it doesn't make sense to have/keep lower arity KPs in a 
collection if there are higher arity ones in it because the expressions of the 
lower must be the subset of the expressions of the higher.
   If we know more details of the keys of partitions, like `KP([a, b], [(1, 1), 
(2, 2)])` so we know that `a = 1` and `b = 1` in the first; and `a = 2` and `b 
= 2` in the second partition, why would we have/keep the less granular `KP([a], 
[(1), (2)])` or `KP([b], [(1), (2)])` in the collection?
   Maybe the wording of the comment is not the best but the "`partitionKeys` 
must match" means that arity of KPs in a collection must be the same. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46367][SQL][FOLLOWUP] Group mixed-arity KPs by arity in projectKeyedPartitionings [spark]

Reply via email to