GitHub user ajithme opened a pull request:

    https://github.com/apache/spark/pull/22277

    [SPARK-25276] Redundant constrains when using alias

    Attaching a test to reproduce the issue. The test fails with following 
message
    
      test("redundant constrains") {
        val tr = LocalRelation('a.int, 'b.string, 'c.int)
        val aliasedRelation = tr.where('a.attr > 10).select('a.as('x), 'b, 
'b.as('y), 'a.as('z))
    
        verifyConstraints(aliasedRelation.analyze.constraints,
          ExpressionSet(Seq(resolveColumn(aliasedRelation.analyze, "x") > 10,
            IsNotNull(resolveColumn(aliasedRelation.analyze, "x")),
            resolveColumn(aliasedRelation.analyze, "b") <=> 
resolveColumn(aliasedRelation.analyze, "y"),
            resolveColumn(aliasedRelation.analyze, "z") <=>
              resolveColumn(aliasedRelation.analyze, "x"))))
      }
    
    == FAIL: Constraints do not match ===
    Found: isnotnull(z#5),(z#5 > 10),(x#3 > 10),(z#5 <=> x#3),(b#1 <=> 
y#4),isnotnull(x#3)
    Expected: (x#3 > 10),isnotnull(x#3),(b#1 <=> y#4),(z#5 <=> x#3)
    == Result ==
    Missing: N/A
    Found but not expected: isnotnull(z#5),(z#5 > 10)
    Here i think as z has a EqualNullSafe comparison with x, so having 
isnotnull(z#5),(z#5 > 10) is redundant. If a query has lot of aliases, this may 
cause overhead.
    
    So i suggest 
https://github.com/apache/spark/blob/v2.3.2-rc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala#L254
 instead of  addAll++= we must just assign =

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ajithme/spark SPARK-25276

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22277.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22277
    
----
commit 5be1df1215299758d6e684ec9a5f6ba12693280e
Author: Ajith <ajith2489@...>
Date:   2018-08-30T03:41:36Z

    [SPARK-25276] Redundant constrains when using alias

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to