GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19588
[SPARK-12375][ML] VectorIndexerModel support handle unseen categories via
handleInvalid
## What changes were proposed in this pull request?
Support skip/error/keep strategy, similar to `StringIndexer`.
Implemented through `try...catch`, so that it can avoid possible
performance impact.
## How was this patch tested?
Unit test added.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/WeichenXu123/spark
handle_invalid_for_vector_indexer
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19588.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19588
----
commit d06399cb54ed4d2420e0982238bc1dd5f5a425bd
Author: WeichenXu <[email protected]>
Date: 2017-10-27T15:32:47Z
init pr
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]