Github user sethah commented on the pull request:

    https://github.com/apache/spark/pull/9912#issuecomment-159058114
  
    After some further review, it seems generally accepted in the literature 
that this method of computing feature importance for decision trees has high 
variance due to correlated predictors. Some methods for compensating this would 
be to incorporate surrogate splits in the computation, but surrogate splits are 
not currently tracked in spark.ml.
    
    Despite the shortcomings, since scikit-learn and R (package: rpart) both 
offer it, I think this is still appropriate. We could include a warning 
message... thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to