mgaido91 opened a new pull request #23773: [SPARK-26721][ML] Avoid per-tree 
normalization in featureImportance for GBT
URL: https://github.com/apache/spark/pull/23773
 
 
   ## What changes were proposed in this pull request?
   
   Our feature importance calculation is taken from sklearn's one, which has 
been recently fixed (in 
https://github.com/scikit-learn/scikit-learn/pull/11176). Citing the 
description of that PR:
   
   > Because the feature importances are (currently, by default) normalized and 
then averaged, feature importances from later stages are overweighted.
   
   The PR performs a fix similar to sklearn's one. The per-tree normalization 
of the feature importance is skipped and GBT.
   
   Credits for pointing out clearly the issue and the sklearn's PR to Daniel 
Jumper.
   
   ## How was this patch tested?
   
   modified UT, checked that the computed `featureImportance` in that test is 
similar to sklearn's one (ti can't be the same, because the trees may be 
slightly different)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to