Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2063#discussion_r16504941
--- Diff: docs/mllib-decision-tree.md ---
@@ -7,20 +7,26 @@ displayTitle: <a href="mllib-guide.html">MLlib</a> -
Decision Tree
* Table of contents
{:toc}
-Decision trees and their ensembles are popular methods for the machine
learning tasks of
+[Decision trees](http://en.wikipedia.org/wiki/Decision_tree_learning)
+and their ensembles are popular methods for the machine learning tasks of
classification and regression. Decision trees are widely used since they
are easy to interpret,
-handle categorical variables, extend to the multiclass classification
setting, do not require
+handle categorical features, extend to the multiclass classification
setting, do not require
feature scaling and are able to capture nonlinearities and feature
interactions. Tree ensemble
-algorithms such as decision forest and boosting are among the top
performers for classification and
+algorithms such as decision forests and boosting are among the top
performers for classification and
--- End diff --
should we just call them random forests instead? :-)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]