[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

srowen Sun, 18 Sep 2016 11:12:07 -0700

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14597#discussion_r79311682
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
    @@ -54,11 +55,44 @@ private[feature] trait ChiSqSelectorParams extends 
Params
     
       /** @group getParam */
       def getNumTopFeatures: Int = $(numTopFeatures)
    +
    +  final val percentile = new DoubleParam(this, "percentile",
    +    "Percentile of features that selector will select, ordered by 
statistics value descending.",
    +    ParamValidators.inRange(0, 1))
    +  setDefault(percentile -> 0.1)
    +
    +  /** @group getParam */
    +  def getPercentile: Double = $(percentile)
    +
    +  final val alpha = new DoubleParam(this, "alpha",
    +    "The highest p-value for features to be kept.",
    +    ParamValidators.inRange(0, 1))
    +  setDefault(alpha -> 0.05)
    +
    +  /** @group getParam */
    +  def getAlpha: Double = $(alpha)
    +
    +  /**
    +   * The ChiSqSelector supports KBest, Percentile, FPR selection,
    +   * which is the same as ChiSqSelectorType defined in MLLIB.
    +   * when call setNumTopFeatures, the selectorType is set to KBest
    +   * when call setPercentile, the selectorType is set to Percentile
    +   * when call setAlpha, the selectorType is set to FPR
    +   */
    +  final val selectorType = new Param[String](this, "selectorType",
    +    "ChiSqSelector Type: KBest, Percentile, FPR")
    +  setDefault(selectorType -> ChiSqSelectorType.KBest.toString)
    +
    +  /** @group getParam */
    +  def getChiSqSelectorType: String = $(selectorType)
     }
     
     /**
      * Chi-Squared feature selection, which selects categorical features to 
use for predicting a
      * categorical label.
    + * The selector supports three selection methods: KBest, Percentile and 
FPR.
    --- End diff --
    
    This is a good start but I think we could say some more. I suggest 
something like ...
    
    The selector supports three selection methods: `KBest`, `Percentile`, and 
`FPR`. `KBest` chooses the _k_ top features according to a chi-squared test. 
`Percentile` is similar but chooses a fraction of all features instead of a 
fixed number. `FPR` chooses all features whose false positive rate meets some 
threshold.
    
    Should this doc be applied to Python too?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

Reply via email to