Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3915#discussion_r22573290
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala ---
    @@ -61,14 +62,27 @@ abstract class Gradient extends Serializable {
     class LogisticGradient extends Gradient {
       override def compute(data: Vector, label: Double, weights: Vector): 
(Vector, Double) = {
         val margin = -1.0 * dot(data, weights)
    -    val gradientMultiplier = (1.0 / (1.0 + math.exp(margin))) - label
    +    /**
    +     * gradientMultiplier = (1.0 / (1.0 + math.exp(margin))) - label
    --- End diff --
    
    Yeah, but in MLOR, the formula is very different since we have k-1 margins, 
and we need to carry out the largest margin. As a result, it will be very hard 
to generalized for BLOR and MLOR. I plan to have it as it since we probably 
will not use it in other place. The following is the version for MLOR from the 
comment of my another PR.
    
         * `multiplier = exp(margins(i)) / (1 + \sum_k exp(margins(k))) -
         *    \alpha(label) * \delta_{label}{i+1}`
         *
         * where \alpha(label) = 1 if label != 0;
         *       \alpha(label) = 0 if label == 0, and
         *       \delta_{i}{j} = 1 if i == j;
         *       \delta_{i}{j} = 0 if i != j
         *
         * See the reference for the detailed mathematical derivation.
         *
         * However, the first part of multiplier will be likely suffered from 
arithmetic overflow,
         * if any one of the `margins` is larger than 709.78.
         *
         * Let's say the largest `margins(l)` is `maxMargin` and positive, then 
the formula can be
         * rewritten into equivalent formula with more numerical stability by
         *
         * exp(margins(i)) / (1 + \sum_k exp(margins(k))) =
         * exp(margins(i) / maxMargin) / (exp(-margins(l)) + \sum_k 
exp(margins(k) / maxMargin))
         *
         * If we define margins'(i) = margins(i) / maxMargin when i != l,
         * and margins'(l) = -margins(l), we will have equivalent numerically 
stable formula as
         *
         * `multiplier = exp(margins'(i)) / (1 + \sum_k exp(margins'(k))) -
         *    \alpha(label) * \delta_{label}{i+1}`
         *
         * Note that if the largest `margins` is negative, the original formula 
is stable;
         * as a result, we don't need to change it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to