GitHub user tejasapatil opened a pull request:
https://github.com/apache/spark/pull/14920
[SPARK-17271] [SQL] Planner adds un-necessary Sort even if child ordeâ¦
## What changes were proposed in this pull request?
Jira : https://issues.apache.org/jira/browse/SPARK-17271
Planner is adding un-needed SORT operation due to bug in the way comparison
for `SortOrder` is done at
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253
`SortOrder` needs to be compared semantically because `Expression` within
two `SortOrder` can be "semantically equal" but not literally equal objects.
eg. In case of `sql("SELECT * FROM table1 a JOIN table2 b ON
a.col1=b.col1")`
Expression in required SortOrder:
```
AttributeReference(
name = "col1",
dataType = LongType,
nullable = false
) (exprId = exprId,
qualifier = Some("a")
)
```
Expression in child SortOrder:
```
AttributeReference(
name = "col1",
dataType = LongType,
nullable = false
) (exprId = exprId)
```
Notice that the output column has a qualifier but the child attribute does
not but the inherent expression is the same and hence in this case we can say
that the child satisfies the required sort order.
This PR includes following changes:
- Added a `semanticEquals` method to `SortOrder` so that it can compare
underlying child expressions semantically (and not using default Object.equals)
- Fixed `EnsureRequirements` to use semantic comparison of SortOrder
## How was this patch tested?
- Added a test case to `PlannerSuite`. Ran rest tests in `PlannerSuite`
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tejasapatil/spark SPARK-17271_2.0_port
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14920.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14920
----
commit 2c9ab99d29f7d890e0d777e7e4c109cf60aa7323
Author: Tejas Patil <[email protected]>
Date: 2016-09-01T14:22:03Z
[SPARK-17271] [SQL] Planner adds un-necessary Sort even if child ordering
is semantically same as required ordering
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]