GitHub user davies opened a pull request:
https://github.com/apache/spark/pull/7949
[SPARK-8231][SQL] Add array_contains
This PR is based on #7580 , thanks to @EntilZha
PR for work on https://issues.apache.org/jira/browse/SPARK-8231
Currently, I have an initial implementation for contains. Based on
discussion on JIRA, it should behave same as Hive:
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayContains.java#L102-L128
Main points are:
1. If the array is empty, null, or the value is null, return false
2. If there is a type mismatch, throw error
3. If comparison is not supported, throw error
Closes #7580
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/davies/spark array_contains
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/7949.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #7949
----
commit 9e0bfc4d7f497861180cc1b0974831e2ec911fd6
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-21T22:09:18Z
initial attempt at implementation
commit 69c46fb2c17aa0cf52954840f2b8e3de5ba33e03
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-21T23:27:28Z
added tests and codegen
commit 72cb4b1db2308de9c60d67195bbf8eb50c82f24a
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-22T00:52:55Z
added checkInputTypes and docs
commit 4b4425be1657edcdd2998d7fe8f331d94dd62f16
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-22T01:07:58Z
changed Arrays in tests to Seqs
commit 9623c6497b7a41ef2254f30a70ab4c6e0b8603e3
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-22T01:29:39Z
fixed test
commit 33b45aa0f7b1f486fbc21d5c8bba06d8ade7cc94
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-22T01:54:34Z
reordered test
commit 65b562c5a73aba0a42b05abb4b3bc9a08c841878
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-22T17:25:29Z
made array_contains nullable false
commit e8a20a90dce5f08d41f0fac61bf191bc9ba57dad
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-22T17:43:12Z
added python df (broken atm)
commit 12f8795a168f207805cb88574d40cf5b52fa1c2e
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-22T18:19:39Z
fix scala style checks
commit 28b4f716ac43e30e61cc3a92a99b9c685880fb3f
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-23T06:49:12Z
fixed bug with type conversions and re-added tests
commit 2517a5818d31500ce9bbc1b171b29f173869cd0f
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-23T06:50:14Z
removed unused import
commit d262e9dbb6b879069afcc93aea02f26819be779c
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-23T07:30:03Z
reworked type checking code and added more tests
commit 686e02937a9bd9068e21350c681fd609e249fb5b
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-23T07:32:22Z
fix scala style
commit 85280272a9635b250a3be5a82665f10b9c2c9b66
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-23T17:21:24Z
added more tests
commit d3ca01383e95fe7309f57517da7e77259454b196
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-23T19:55:36Z
Fixed type checking to match hive behavior, then added tests to insure this
commit 46f9789b43b776d880ac744cd21ea16950cc63ce
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-23T21:16:00Z
reverted change
commit 308239927c92cc30865db114e7bbefb247cb505d
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-27T19:46:15Z
fixed unit test
commit 4e7dce35013f3e00db17fecceed570e52bff401d
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-27T21:17:43Z
added more docs
commit b5ffae81f3a7167c7c92c1955f35a0cacda4c155
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-28T07:00:53Z
fixed pyspark test
commit 7a22debf04ecd1f21bd7b74558029185582c45a3
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-28T23:20:47Z
Changed test to use strings instead of long/ints which are different
between python 2 an 3
commit ffc0591b00aedfc6d54a33d66b35040eff601c9c
Author: Pedro Rodriguez <[email protected]>
Date: 2015-07-29T06:12:24Z
fixed unit test
commit 4d5b0ff014ea600e68c1dbe5adaad6dc1daabbb2
Author: Pedro Rodriguez <[email protected]>
Date: 2015-08-03T05:19:37Z
added docs and another type check
commit e352cf9647b476c4b03f60aca118d2c3fd9dd113
Author: Pedro Rodriguez <[email protected]>
Date: 2015-08-03T05:31:48Z
fixed diff from master
commit 719e37dc8f717df8f282549eb9c69dbcba473771
Author: Davies Liu <[email protected]>
Date: 2015-08-04T22:05:52Z
Merge branch 'master' of github.com:apache/spark into array_contains
commit bc3d1fef8166b5192b3948f3fc28582625bcc45a
Author: Davies Liu <[email protected]>
Date: 2015-08-04T22:56:07Z
fix array_contains
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]