GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/19516
[SPARK-22277][ML]fix the bug of ChiSqSelector on preparing the output column
## What changes were proposed in this pull request?
To prepare the output columns when use ChiSqSelector, the master method
adds some additional feature attribute, this is not necessary, and sometimes
cause error.
` val featureAttributes: Array[Attribute] = if
(origAttrGroup.attributes.nonEmpty) {
origAttrGroup.attributes.get.zipWithIndex.filter(x =>
selector.contains(x._2)).map(_._1)
} else {
Array.fill[Attribute](selector.size)(NominalAttribute.defaultAttr)
}
val newAttributeGroup = new AttributeGroup($(outputCol),
featureAttributes)`
## How was this patch tested?
The existing UT.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mpjlu/spark testDFdirect
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19516.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19516
----
commit 3128133d76348666df82bf43aa42cd9ebae70faf
Author: Peng Meng <[email protected]>
Date: 2017-10-17T13:04:08Z
fix the bug of ChiSqSelector on preparing the output column
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]