Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/7838#issuecomment-126859467
@feynmanliang Thanks for writing those tests. I could not think of a good
way to make the tests robust.
The issue is that Random Forests could be run in the same way for both
MLlib and sklearn, but that would require not resampling on each iteration. If
we did that, then all of the trees in the forest would be the same, so it would
not be much of a test of the feature importance calculation.
So I wrote some tests by hand instead. Not a great solution, but hopefully
good enough for now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]