cloud-fan commented on a change in pull request #22957: [SPARK-25951][SQL]
Ignore aliases for distributions and orderings
URL: https://github.com/apache/spark/pull/22957#discussion_r255406405
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
##########
@@ -284,6 +298,19 @@ case class RangePartitioning(ordering: Seq[SortOrder],
numPartitions: Int)
}
}
}
+
+ override private[spark] def pruneInvalidAttribute(invalidAttr: Attribute):
Partitioning = {
+ if (this.references.contains(invalidAttr)) {
+ val validExprs =
this.children.takeWhile(!_.references.contains(invalidAttr))
+ if (validExprs.isEmpty) {
+ UnknownPartitioning(numPartitions)
+ } else {
+ RangePartitioning(validExprs, numPartitions)
Review comment:
think about `RangePartitioning('a.ASC, 'b.ASC)` with output expression `'a
as 'a1`.
It cannot satisfy `ClusteredDistribution('a1)`, but can still satisfy
`OrderedDistribution('a1.ASC)`. I think the expected result should be
`RangePartitioning('a1.ASC, 'b.ASC)` instead of `RangePartitioning('a1.ASC)`,
which is wrong.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]