GitHub user yanboliang opened a pull request:

    https://github.com/apache/spark/pull/18453

    [SPARK-19852][PYSPARK][ML] Python StringIndexer supports 'keep' to handle 
invalid data

    ## What changes were proposed in this pull request?
    This PR is to maintain API parity with changes made in SPARK-17498 to 
support a new option
    'keep' in StringIndexer to handle unseen labels or NULL values with PySpark.
    Note: The primary author of this PR is @VinceShieh .
    ## How was this patch tested?
    Unit tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanboliang/spark spark-19852

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18453.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18453
    
----
commit 62c54c060769d64816a928841613f02fda58fa0e
Author: VinceShieh <[email protected]>
Date:   2017-03-10T06:50:41Z

    [SPARK-19852][PYSPARK][ML] Update Python API for StringIndexer 
setHandleInvalid
    
    This PR reflect the changes made in SPARK-17498 on pyspark to support a new 
option
    'keep' in StringIndexer to handle unseen labels
    
    Signed-off-by: VinceShieh <[email protected]>

commit 1327860b059b72981867876c328db801805dd714
Author: VinceShieh <[email protected]>
Date:   2017-03-10T07:21:47Z

    fix compilation issues
    
    Signed-off-by: VinceShieh <[email protected]>

commit 55400b58ba0e11a5014accd59c149367e09481a8
Author: VinceShieh <[email protected]>
Date:   2017-03-10T08:20:14Z

    doctest
    
    Signed-off-by: VinceShieh <[email protected]>

commit 81341882cc4e9fa073b7508c941df4837864a216
Author: VinceShieh <[email protected]>
Date:   2017-03-10T08:50:09Z

    update doctest
    
    Signed-off-by: VinceShieh <[email protected]>

commit 7e0050d86f2b5b52cde6dc25909383893321002c
Author: VinceShieh <[email protected]>
Date:   2017-03-17T02:55:15Z

    include changes made by SPARK-11569
    
    Signed-off-by: VinceShieh <[email protected]>

commit a4bc74ca90140925981afffc41b758882eb75146
Author: Yanbo Liang <[email protected]>
Date:   2017-06-28T10:12:33Z

    Move doc tests to tests.py.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to