Github user mpjlu commented on the issue:

    https://github.com/apache/spark/pull/14597
  
    Hi @srowen, I have added the parameter to control the feature selection 
type.
    The usage is like this: 
    **var selector = new ChiSqSelector()
    var model = selector.fit(df) // by default, the selector is selection 
numTopFeatures (50)
    var newModel = selector.selectKBest(10), or var newModel = 
selector.selectPercentile(5), or,,**
    You can fit the DataFrame one time, and generate the model multi times. 
    
    And the indices is sort in the model internally as we have discussed. 
    
    For pass the p-value to the model function, this update does not include 
it. Because for the KBest and Percentile selection, the fit function uses 
ChiSqTestResult.statics to generate the model. For Fpr, the fit function uses 
ChiSqTestResult.p-value.  So it maybe better to pass ChiSqTestResult to the 
model and expose to the caller. And I think it is better to submit another PR 
for  "pass value to model and expose to the caller" problem. Because much codes 
will be changed for this problem, includes which data should be passed to the 
model, how  to save the model, how to test the model.   
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to