peter-toth commented on a change in pull request #31848:
URL: https://github.com/apache/spark/pull/31848#discussion_r597885016
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScan.scala
##########
@@ -84,11 +85,25 @@ trait FileScan extends Scan
protected def seqToString(seq: Seq[Any]): String = seq.mkString("[", ", ",
"]")
+ private lazy val (normalizedPartitionFilters, normalizedDataFilters) = {
+ val output = readSchema().toAttributes.map(a =>
a.withName(normalizeName(a.name)))
Review comment:
I see your point and agree that name is that matters in these `Filter`
like `Expressions` but if we go this way then I think:
- we also need to clear other properties of `AttributeReference`s like
`qualifier`
- we need to either explicitly sort `partitionFilters` and `dataFilters`
expression lists (probably with `.sortBy(_.hashCode())`) to make sure they
match with `f.partitionFilters` and `f.dataFilters`, or use
`Set(partitionFilters) == Set(f.partitionFilters)` because we can't use
`ExpressionSet(partitionFilters) == ExpressionSet(f.partitionFilters)` as we
removed all expr ids
- we need to reorder all descendants of each `partitionFilters` and
`dataFilters` expression (with `Canonicalize.expressionReorder()` to make sure
like `id = 1` matches with `1 = id` (and `Canonicalize.ignoreTimeZone()` also
needs to be applied)
And just a side note that I think we could do most of the above at
https://github.com/apache/spark/blob/v3.1.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala#L120-L121
before `withFilters()` and then `FileScan.equals()` became very simple.
But I wonder all these changes are simpler than the current PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]