GitHub user ajithme opened a pull request:
https://github.com/apache/spark/pull/22277
[SPARK-25276] Redundant constrains when using alias
Attaching a test to reproduce the issue. The test fails with following
message
test("redundant constrains") {
val tr = LocalRelation('a.int, 'b.string, 'c.int)
val aliasedRelation = tr.where('a.attr > 10).select('a.as('x), 'b,
'b.as('y), 'a.as('z))
verifyConstraints(aliasedRelation.analyze.constraints,
ExpressionSet(Seq(resolveColumn(aliasedRelation.analyze, "x") > 10,
IsNotNull(resolveColumn(aliasedRelation.analyze, "x")),
resolveColumn(aliasedRelation.analyze, "b") <=>
resolveColumn(aliasedRelation.analyze, "y"),
resolveColumn(aliasedRelation.analyze, "z") <=>
resolveColumn(aliasedRelation.analyze, "x"))))
}
== FAIL: Constraints do not match ===
Found: isnotnull(z#5),(z#5 > 10),(x#3 > 10),(z#5 <=> x#3),(b#1 <=>
y#4),isnotnull(x#3)
Expected: (x#3 > 10),isnotnull(x#3),(b#1 <=> y#4),(z#5 <=> x#3)
== Result ==
Missing: N/A
Found but not expected: isnotnull(z#5),(z#5 > 10)
Here i think as z has a EqualNullSafe comparison with x, so having
isnotnull(z#5),(z#5 > 10) is redundant. If a query has lot of aliases, this may
cause overhead.
So i suggest
https://github.com/apache/spark/blob/v2.3.2-rc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala#L254
instead of addAll++= we must just assign =
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ajithme/spark SPARK-25276
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22277.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22277
----
commit 5be1df1215299758d6e684ec9a5f6ba12693280e
Author: Ajith <ajith2489@...>
Date: 2018-08-30T03:41:36Z
[SPARK-25276] Redundant constrains when using alias
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]