Re: MLlib : Gradient Boosted Trees classification confidence

mike Mon, 13 Apr 2015 05:09:53 -0700

Thank you Peter.

I just want to be sure.
even if I use the "classification" setting the GBT uses regression trees
and not classification trees?


I know the difference between the two(theoretically) is only in the loss
and impurity functions.
thus in case it uses classification trees doing what you proposed will
result in the classification it self.

Also by looking in the scala API
I found that each Node holds a Predict object which contains "probability
of the label (classification only)" (
https://spark.apache.org/docs/latest/api/scala/#org.apache.spark.mllib.tree.model.Predict
)
** This what i called confidence


So to sum-up my questions and confusion:
1. Does GBT uses classification trees when setting it to classification or
it always uses regression trees ?
2. In case it uses classification trees , How could i efficiently get to
the confidence = Node. Predict.prob ?

Thanks again'
Michael



On Mon, Apr 13, 2015 at 10:13 AM, pprett [via Apache Spark User List] <
ml-node+s1001560n22470...@n3.nabble.com> wrote:

> Hi Mike,
>
> Gradient Boosted Trees (or gradient boosted regression trees) dont store
> probabilities in each leaf node but rather model a continuous function
> which is then transformed via a logistic sigmoid (ie. like in a Logistic
> Regression model).
> If you are just interested in a confidence, you can use this continuous
> function directly: its just the (weighted) sum of the predictions of the
> individual regression trees. Use the absolute value for confidence and the
> sign to determine which class label.
> Here is an example:
>
> def score(features: Vector): Double = {
>     val treePredictions = gbdt.trees.map(_.predict(features))
>     blas.ddot(gbdt.numTrees, treePredictions, 1, gbdt.treeWeights, 1)
> }
>
> If you are rather interested in probabilities, just pass the function
> value to a logistic sigmoid.
>
> best,
>  Peter
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-Gradient-Boosted-Trees-classification-confidence-tp22466p22470.html
>  To unsubscribe from MLlib : Gradient Boosted Trees classification
> confidence, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=22466&code=bWljaGFlbGtyYXNAZ21haWwuY29tfDIyNDY2fDQxMDYzODQ0Mw==>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-Gradient-Boosted-Trees-classification-confidence-tp22466p22476.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: MLlib : Gradient Boosted Trees classification confidence

Reply via email to