Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3439#discussion_r20910282
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/loss/LogLoss.scala ---
    @@ -45,19 +46,21 @@ object LogLoss extends Loss {
           model: TreeEnsembleModel,
           point: LabeledPoint): Double = {
         val prediction = model.predict(point.features)
    -    1.0 / (1.0 + math.exp(-prediction)) - point.label
    +    - 4.0 * point.label / (1.0 + math.exp(2.0 * point.label * prediction))
       }
     
       /**
    -   * Method to calculate error of the base learner for the gradient 
boosting calculation.
    +   * Method to calculate loss of the base learner for the gradient 
boosting calculation.
        * Note: This method is not used by the gradient boosting algorithm but 
is useful for debugging
        * purposes.
    -   * @param model Model of the weak learner.
    +   * @param model Ensemble model
        * @param data Training dataset: RDD of 
[[org.apache.spark.mllib.regression.LabeledPoint]].
    -   * @return
    +   * @return Mean log loss of model on data
        */
       override def computeError(model: TreeEnsembleModel, data: 
RDD[LabeledPoint]): Double = {
    -    val wrongPredictions = data.filter(lp => model.predict(lp.features) != 
lp.label).count()
    -    wrongPredictions / data.count
    +    data.map { case point =>
    +      val prediction = model.predict(point.features)
    +      2.0 * math.log(1 + math.exp(-2.0 * point.label * prediction))
    --- End diff --
    
    Oh, good point!  It looks like option # 2 is best:
    ```
    scala> def test(a: Double) = (math.log(1 + math.exp(-a)), 
math.log1p(math.exp(-a)), -a + math.log1p(math.exp(a)))
    test: (a: Double)(Double, Double, Double)
    
    scala> test(1)
    res6: (Double, Double, Double) = 
(0.31326168751822286,0.31326168751822286,0.3132616875182228)
    
    scala> test(10)
    res7: (Double, Double, Double) = 
(4.5398899216870535E-5,4.539889921686465E-5,4.5398899217730104E-5)
    
    scala> test(20)
    res8: (Double, Double, Double) = 
(2.0611536900435727E-9,2.061153620314381E-9,2.061153026033935E-9)
    
    scala> test(30)
    res9: (Double, Double, Double) = 
(9.348077867343381E-14,9.357622968839737E-14,9.237055564881302E-14)
    
    scala> test(40)
    res10: (Double, Double, Double) = (0.0,4.248354255291589E-18,0.0)
    
    scala> test(100)
    res11: (Double, Double, Double) = (0.0,3.720075976020836E-44,0.0)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to