GitHub user concretevitamin opened a pull request:
https://github.com/apache/spark/pull/1055
SPARK-2053: add Catalyst expressions for CASE WHEN.
JIRA ticket: https://issues.apache.org/jira/browse/SPARK-2053
This PR adds support for two types of CASE statements present in Hive. The
first type is of the form `CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END`,
with the semantics like a chain of if statements, which is implemented in
`CaseWhen`. The second type is of the form `CASE a WHEN b THEN c [WHEN d THEN
e]* [ELSE f] END`, with the semantics like a switch statement on key `a`, which
is implemented in `CaseKeyWhen`.
[This
link](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-ConditionalFunctions)
contains more detailed descriptions on their semantics.
Notes / Open issues:
* Please help look at the two new case classes and check if any implicit
contracts / invariants are broken in the implementations. I am not very
familiar with them and I currently find them tricky to spot.
* We should decide whether or not a non-boolean condition is allowed in a
branch of `CaseWhen`. Hive throws a `SemanticException` for this situation and
I think it'd be good to mimic it -- the question is where in the whole Spark
SQL pipeline should we signal an exception for such a query.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/concretevitamin/spark caseWhen
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1055.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1055
----
commit a31d7820c7a0cc3f1ec93bb40f4625d10c8d1c35
Author: Zongheng Yang <[email protected]>
Date: 2014-06-09T23:51:22Z
Finish up Case.
commit 7d81e95e28c7fe6b745a8981499166b6557acda4
Author: Zongheng Yang <[email protected]>
Date: 2014-06-10T20:35:04Z
Clean up resolved.
commit efd019b0ca22e70d01cdcb668d94af2a1ad6f406
Author: Zongheng Yang <[email protected]>
Date: 2014-06-10T20:57:20Z
eval() and toString() bug fixes.
commit 5906f75201a0c62a78454634fb18e4713f924275
Author: Zongheng Yang <[email protected]>
Date: 2014-06-10T21:32:56Z
WIP
commit f2bcb9d3be72e9675a3e1a84829c44a0c9ba3a84
Author: Zongheng Yang <[email protected]>
Date: 2014-06-11T02:03:39Z
WIP
commit be54bc8865784c1c9e4c0523db1eb33c75f9e8f6
Author: Zongheng Yang <[email protected]>
Date: 2014-06-11T19:57:48Z
Rewrite eval() to a low-level implementation. Separate two CASE stmts.
commit 3f9ef0a5523ab281917139a629e9fe9bace04962
Author: Zongheng Yang <[email protected]>
Date: 2014-06-11T20:52:50Z
Cleanups and bug fixes (mainly in eval() and resolved).
commit db51a85324079ece6bf62fb908f07be7f7f10d3d
Author: Zongheng Yang <[email protected]>
Date: 2014-06-11T22:13:32Z
Add allCondBooleans check; uncomment tests.
commit 9f84b40c2c2eeaff5396b29bced71936dca80a5b
Author: Zongheng Yang <[email protected]>
Date: 2014-06-11T22:14:02Z
Add golden outputs from Hive.
commit 2cf08bbf8a1fd7e9e6206eac32fa79c82dad879c
Author: Zongheng Yang <[email protected]>
Date: 2014-06-11T22:16:07Z
Merge branch 'master' into caseWhen
Conflicts:
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
commit 7392f3a376e2dbe58e4d34f7b2fe62756c521d74
Author: Zongheng Yang <[email protected]>
Date: 2014-06-11T22:18:51Z
Minor cleanup.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---