Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/482#issuecomment-41746809
I made some more comments about the correctness here. I'm pretty worried
about the fact that our tests did not catch any of these regressions. I think
we need to add a new set of tests before merging the PR. I've sketched them
out [here](https://github.com/chenghao-intel/spark/pull/1) and I made a PR
against your branch. In addition to this change, I think you should also
create evaluation tests for each new optimization case.
Testing is a little harder for aggregate expressions, so how about instead
we add a HiveQueryTest for each of these. That way we make sure we are obeying
the correct semantics (i.e. `COUNT(null) => 0`).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---