Github user rkarimi commented on the issue:
https://github.com/apache/spark/pull/17972
perhaps related:
Big Random Forest Models (example: 100 or more trees with depth of around
20):
Big models can be trained effectively even on machines with limited RAM
(such
Github user rkarimi closed the pull request at:
https://github.com/apache/spark/pull/16838
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
GitHub user rkarimi opened a pull request:
https://github.com/apache/spark/pull/16838
Branch 2.1
Fix Execution Plan:
Remove unnecessary steps of sort from execution plan:
df.sort(...).count()
It should not do the sort! It should be absolutely fast if