Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3461#discussion_r21068173
--- Diff: docs/mllib-gbt.md ---
@@ -0,0 +1,308 @@
+---
+layout: global
+title: Gradient-Boosted Trees - MLlib
+displayTitle: <a href="mllib-guide.html">MLlib</a> - Gradient-Boosted Trees
+---
+
+* Table of contents
+{:toc}
+
+[Gradient-Boosted Trees
(GBTs)](http://en.wikipedia.org/wiki/Gradient_boosting)
+are ensembles of [decision trees](mllib-decision-tree.html).
+GBTs iteratively train decision trees in order to minimize a loss function.
+Like decision trees, GBTs handle categorical features,
+extend to the multiclass classification setting, do not require
+feature scaling, and are able to capture non-linearities and feature
interactions.
+
+MLlib supports GBTs for binary classification and for regression,
+using both continuous and categorical features.
+MLlib implements GBTs using the existing [decision
tree](mllib-decision-tree.html) implementation. Please see the decision tree
guide for more information on trees.
+
+*Note*: GBTs do not yet support multiclass classification. For multiclass
problems, please use
+[decision trees](mllib-decision-tree.html) or [Random
Forests](mllib-random-forest.html).
+
+## Basic algorithm
+
+Gradient boosting iteratively trains a sequence of decision trees.
+On each iteration, the algorithm uses the current ensemble to predict the
label of each training instance and then compares the prediction with the true
label. The dataset is re-labeled to put more weight on training instances with
poor predictions. Thus, in the next iteration, the decision tree will help
correct for previous mistakes.
+
+The specific weight mechanism is defined by a loss function (discussed
below). With each iteration, GBTs further reduce this loss function on the
training data.
+
+### Comparison with Random Forests
--- End diff --
I really like this section since this is very useful information. We should
try and add some graphs here in a separate PR. However, shouldn't this be in a
separate section under Ensemble comparing both RF and Boosting algorithms in
terms of performance and accuracy.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]