GitHub user yanboliang opened a pull request:
https://github.com/apache/spark/pull/18453
[SPARK-19852][PYSPARK][ML] Python StringIndexer supports 'keep' to handle
invalid data
## What changes were proposed in this pull request?
This PR is to maintain API parity with changes made in SPARK-17498 to
support a new option
'keep' in StringIndexer to handle unseen labels or NULL values with PySpark.
Note: The primary author of this PR is @VinceShieh .
## How was this patch tested?
Unit tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/yanboliang/spark spark-19852
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18453.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18453
----
commit 62c54c060769d64816a928841613f02fda58fa0e
Author: VinceShieh <[email protected]>
Date: 2017-03-10T06:50:41Z
[SPARK-19852][PYSPARK][ML] Update Python API for StringIndexer
setHandleInvalid
This PR reflect the changes made in SPARK-17498 on pyspark to support a new
option
'keep' in StringIndexer to handle unseen labels
Signed-off-by: VinceShieh <[email protected]>
commit 1327860b059b72981867876c328db801805dd714
Author: VinceShieh <[email protected]>
Date: 2017-03-10T07:21:47Z
fix compilation issues
Signed-off-by: VinceShieh <[email protected]>
commit 55400b58ba0e11a5014accd59c149367e09481a8
Author: VinceShieh <[email protected]>
Date: 2017-03-10T08:20:14Z
doctest
Signed-off-by: VinceShieh <[email protected]>
commit 81341882cc4e9fa073b7508c941df4837864a216
Author: VinceShieh <[email protected]>
Date: 2017-03-10T08:50:09Z
update doctest
Signed-off-by: VinceShieh <[email protected]>
commit 7e0050d86f2b5b52cde6dc25909383893321002c
Author: VinceShieh <[email protected]>
Date: 2017-03-17T02:55:15Z
include changes made by SPARK-11569
Signed-off-by: VinceShieh <[email protected]>
commit a4bc74ca90140925981afffc41b758882eb75146
Author: Yanbo Liang <[email protected]>
Date: 2017-06-28T10:12:33Z
Move doc tests to tests.py.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]