Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/16037
Right ok. So I think the approach of making the zero vector sparse then
calling `toDense` in `seqOp` as @srowen suggested makes most sense.
Currently the gradient vector *must* be dense in MLlib since both `axpy`
and the logic for multinomial logreg requires it. So the thing that is
initially serialized with the task should be tiny, and the call `toDense` for
the first instance in each partition will essentially generate the dense zero
vector. Thereafter it should be a no-op as the vector will be dense and
`toDense` will just be a ref to the values array.
Can we see if this works:
```scala
val zeroVector = Vectors.sparse(n, Seq())
val (gradientSum, lossSum) = data.treeAggregate((zeroVector, 0.0))(
seqOp = (c, v) => (c, v) match { case ((grad, loss), (label,
features)) =>
val denseGrad = grad.toDense
val l = localGradient.compute(
features, label, bcW.value, denseGrad)
(denseGrad, loss + l)
},
combOp = (c1, c2) => (c1, c2) match { case ((grad1, loss1),
(grad2, loss2)) =>
axpy(1.0, grad2, grad1)
(grad1, loss1 + loss2)
})
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]