[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

MLnick Wed, 30 Nov 2016 02:33:08 -0800

Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/16037
  
    Right ok. So I think the approach of making the zero vector sparse then 
calling `toDense` in `seqOp` as @srowen suggested makes most sense.
    
    Currently the gradient vector *must* be dense in MLlib since both `axpy` 
and the logic for multinomial logreg requires it. So the thing that is 
initially serialized with the task should be tiny, and the call `toDense` for 
the first instance in each partition will essentially generate the dense zero 
vector. Thereafter it should be a no-op as the vector will be dense and 
`toDense` will just be a ref to the values array.
    
    Can we see if this works:
    ```scala
          val zeroVector = Vectors.sparse(n, Seq())
          val (gradientSum, lossSum) = data.treeAggregate((zeroVector, 0.0))(
              seqOp = (c, v) => (c, v) match { case ((grad, loss), (label, 
features)) =>
                val denseGrad = grad.toDense
                val l = localGradient.compute(
                  features, label, bcW.value, denseGrad)
                (denseGrad, loss + l)
              },
              combOp = (c1, c2) => (c1, c2) match { case ((grad1, loss1), 
(grad2, loss2)) =>
                axpy(1.0, grad2, grad1)
                (grad1, loss1 + loss2)
              })
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

Reply via email to