GitHub user yanboliang opened a pull request:
https://github.com/apache/spark/pull/15851
[SPARK-18412][SPARKR][ML] Fix exception for some SparkR ML algorithms
training on libsvm data
## What changes were proposed in this pull request?
* Fix the following exceptions which throws when
```spark.randomForest```(classification), ```spark.gbt```(classification),
```spark.naiveBayes``` and ```spark.glm```(binomial family) were fitted on
libsvm data.
```
java.lang.IllegalArgumentException: requirement failed: If label column
already exists, forceIndexLabel can not be set with true.
```
See [SPARK-18412](https://issues.apache.org/jira/browse/SPARK-18412) for
more detail about how to reproduce this bug.
* Refactor out ```getFeaturesAndLabels``` to RWrapperUtils, since lots of
ML algorithm wrappers use this function.
* Drop some unwanted columns when making prediction.
## How was this patch tested?
Add unit test.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/yanboliang/spark spark-18412
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15851.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15851
----
commit 4752fe2c1e0e211ae2e27a0a7807f141c91430a2
Author: Yanbo Liang <[email protected]>
Date: 2016-11-11T10:29:27Z
Handle the case label column already exists and forceIndexLabel = true.
commit 6262178be4b2a085fb48ad0be8b1bf61c7812689
Author: Yanbo Liang <[email protected]>
Date: 2016-11-11T10:42:17Z
Add unit tests.
commit 26eb40aaca3b8e4de4d2f1922a83dc2198754c6a
Author: Yanbo Liang <[email protected]>
Date: 2016-11-11T11:16:12Z
Set correct label column for classification algorithms.
commit d0d7c28b05bbba51266a9a1364b7fe9e4c452ed9
Author: Yanbo Liang <[email protected]>
Date: 2016-11-11T11:47:57Z
Divide spark.gbt test into two parts: classification and regression.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]