GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/7949

    [SPARK-8231][SQL] Add array_contains

    This PR is based on #7580 , thanks to @EntilZha 
    
    PR for work on https://issues.apache.org/jira/browse/SPARK-8231
    
    Currently, I have an initial implementation for contains. Based on 
discussion on JIRA, it should behave same as Hive: 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayContains.java#L102-L128
    
    Main points are:
    1. If the array is empty, null, or the value is null, return false
    2. If there is a type mismatch, throw error
    3. If comparison is not supported, throw error
    
    Closes #7580 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark array_contains

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/7949.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #7949
    
----
commit 9e0bfc4d7f497861180cc1b0974831e2ec911fd6
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-21T22:09:18Z

    initial attempt at implementation

commit 69c46fb2c17aa0cf52954840f2b8e3de5ba33e03
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-21T23:27:28Z

    added tests and codegen

commit 72cb4b1db2308de9c60d67195bbf8eb50c82f24a
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-22T00:52:55Z

    added checkInputTypes and docs

commit 4b4425be1657edcdd2998d7fe8f331d94dd62f16
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-22T01:07:58Z

    changed Arrays in tests to Seqs

commit 9623c6497b7a41ef2254f30a70ab4c6e0b8603e3
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-22T01:29:39Z

    fixed test

commit 33b45aa0f7b1f486fbc21d5c8bba06d8ade7cc94
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-22T01:54:34Z

    reordered test

commit 65b562c5a73aba0a42b05abb4b3bc9a08c841878
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-22T17:25:29Z

    made array_contains nullable false

commit e8a20a90dce5f08d41f0fac61bf191bc9ba57dad
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-22T17:43:12Z

    added python df (broken atm)

commit 12f8795a168f207805cb88574d40cf5b52fa1c2e
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-22T18:19:39Z

    fix scala style checks

commit 28b4f716ac43e30e61cc3a92a99b9c685880fb3f
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-23T06:49:12Z

    fixed bug with type conversions and re-added tests

commit 2517a5818d31500ce9bbc1b171b29f173869cd0f
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-23T06:50:14Z

    removed unused import

commit d262e9dbb6b879069afcc93aea02f26819be779c
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-23T07:30:03Z

    reworked type checking code and added more tests

commit 686e02937a9bd9068e21350c681fd609e249fb5b
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-23T07:32:22Z

    fix scala style

commit 85280272a9635b250a3be5a82665f10b9c2c9b66
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-23T17:21:24Z

    added more tests

commit d3ca01383e95fe7309f57517da7e77259454b196
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-23T19:55:36Z

    Fixed type checking to match hive behavior, then added tests to insure this

commit 46f9789b43b776d880ac744cd21ea16950cc63ce
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-23T21:16:00Z

    reverted change

commit 308239927c92cc30865db114e7bbefb247cb505d
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-27T19:46:15Z

    fixed unit test

commit 4e7dce35013f3e00db17fecceed570e52bff401d
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-27T21:17:43Z

    added more docs

commit b5ffae81f3a7167c7c92c1955f35a0cacda4c155
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-28T07:00:53Z

    fixed pyspark test

commit 7a22debf04ecd1f21bd7b74558029185582c45a3
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-28T23:20:47Z

    Changed test to use strings instead of long/ints which are different 
between python 2 an 3

commit ffc0591b00aedfc6d54a33d66b35040eff601c9c
Author: Pedro Rodriguez <[email protected]>
Date:   2015-07-29T06:12:24Z

    fixed unit test

commit 4d5b0ff014ea600e68c1dbe5adaad6dc1daabbb2
Author: Pedro Rodriguez <[email protected]>
Date:   2015-08-03T05:19:37Z

    added docs and another type check

commit e352cf9647b476c4b03f60aca118d2c3fd9dd113
Author: Pedro Rodriguez <[email protected]>
Date:   2015-08-03T05:31:48Z

    fixed diff from master

commit 719e37dc8f717df8f282549eb9c69dbcba473771
Author: Davies Liu <[email protected]>
Date:   2015-08-04T22:05:52Z

    Merge branch 'master' of github.com:apache/spark into array_contains

commit bc3d1fef8166b5192b3948f3fc28582625bcc45a
Author: Davies Liu <[email protected]>
Date:   2015-08-04T22:56:07Z

    fix array_contains

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to