GitHub user gengliangwang opened a pull request:
https://github.com/apache/spark/pull/18429
[SPARK-21222] Move elimination of Distinct clause from analyzer to optimizer
## What changes were proposed in this pull request?
Move elimination of Distinct clause from analyzer to optimizer
Distinct clause is after MAX/MIN clause
"Select MAX(distinct a) FROM src from"
is equivalent of
"Select MAX(distinct a) FROM src from"
However, this optimization is implemented in analyzer. It should be in
optimizer.
## How was this patch tested?
Unit test
@gatorsmile @cloud-fan
Please review http://spark.apache.org/contributing.html before opening a
pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gengliangwang/spark distinct_opt
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18429.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18429
----
commit f60c1843ec6a4c0c380825cb316e918e6fd7ceba
Author: Wang Gengliang <[email protected]>
Date: 2017-06-24T02:54:44Z
save for now
commit 7604811863567cc81778b0f0cb39c1385564781c
Author: Wang Gengliang <[email protected]>
Date: 2017-06-27T02:26:03Z
finish implementation and test cases
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]