[GitHub] spark pull request #19516: [SPARK-22277][ML]fix the bug of ChiSqSelector on ...

mpjlu Tue, 17 Oct 2017 06:16:06 -0700

GitHub user mpjlu opened a pull request:

    https://github.com/apache/spark/pull/19516


    [SPARK-22277][ML]fix the bug of ChiSqSelector on preparing the output column

    ## What changes were proposed in this pull request?
    
    To prepare the output columns when use ChiSqSelector,  the master method 
adds some additional feature attribute, this is not necessary, and sometimes 
cause error. 
    
    `    val featureAttributes: Array[Attribute] = if 
(origAttrGroup.attributes.nonEmpty) {
          origAttrGroup.attributes.get.zipWithIndex.filter(x => 
selector.contains(x._2)).map(_._1)
        } else {
          Array.fill[Attribute](selector.size)(NominalAttribute.defaultAttr)
        }
        val newAttributeGroup = new AttributeGroup($(outputCol), 
featureAttributes)` 
    
    ## How was this patch tested?
    The existing UT.  


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mpjlu/spark testDFdirect

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19516.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19516
    
----
commit 3128133d76348666df82bf43aa42cd9ebae70faf
Author: Peng Meng <[email protected]>
Date:   2017-10-17T13:04:08Z

    fix the bug of ChiSqSelector on preparing the output column

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19516: [SPARK-22277][ML]fix the bug of ChiSqSelector on ...

Reply via email to