huaxingao opened a new pull request #27954: [SPARK-31885][ML] Implement 
VarianceThresholdSelector
URL: https://github.com/apache/spark/pull/27954
 
 
   
   ### What changes were proposed in this pull request?
   Implement a Feature selector that removes all low-variance features. 
Features with a
   variance lower than the threshold will be removed. The default is to keep 
all features with non-zero variance, i.e. remove the features that have the 
same value in all samples.
   
   
   ### Why are the changes needed?
   VarianceThreshold is a simple baseline approach to feature selection. It 
removes all features whose variance doesn’t meet some threshold. The idea is 
when a feature doesn’t vary much within itself, it generally has very little 
predictive power.
   scikit has implemented this selector. 
   
https://scikit-learn.org/stable/modules/feature_selection.html#variance-threshold
   
   
   
   ### Does this PR introduce any user-facing change?
   Yes. 
   
   
   ### How was this patch tested?
   Add new test suite.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to