Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/17078#discussion_r103154658
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -1447,7 +1447,7 @@ private class LogisticAggregator(
label: Double): Unit = {
val localFeaturesStd = bcFeaturesStd.value
- val localCoefficients = bcCoefficients.value
+ val localCoefficients = bcCoefficients.value.toArray
--- End diff --
In the first version of LOR, we have the following code which avoid this
issue you pointed out.
```scala
private val weightsArray = weights match {
case dv: DenseVector => dv.values
case _ =>
throw new IllegalArgumentException(
s"weights only supports dense vector but got type
${weights.getClass}.")
}
```
I think order approach will be more efficient since `toArray` is only
called once (you can add the case for sparse), and for sparse initial
coefficients, we will not convert from sparse to dense again and again.
This can be a future work. With L1 applied, the coefficients can be very
sparse, so we can compress the coefficients for each iteration, and have
specialized implementation for `UpdateInPlace`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]