Github user dbtsai commented on the issue:
https://github.com/apache/spark/pull/15593
@MLnick I'm hoping that we could abstract out the the implementation of
using column major format as much as possible; as a result, in the future, new
developers can understand the code without thinking hard. I agree that we have
good comments and documentation in the code now, but having said that, with
some effort, we are able to hide the detail using column major matrix.
For example, let's say we have coeffs and gradient in column major
matrices, we will be able to write L2 part of gradient in the following code
without computing `isIntercept` and `featureIndex` using complicated logics.
```scala
val coeffMatrix = new DenseMatrix(numClasses, numFeatures, Array(),
false)
val totalGradientMatrix = new DenseMatrix(numClasses, numFeatures,
Array(), false)
coeffMatrix.foreachActive { case (classIndex, featureIndex, value) =>
// We do not apply regularization to the intercepts
val isIntercept = fitIntercept && featureIndex == numFeatures
if (!isIntercept) {
// The following code will compute the loss of the
regularization; also
// the gradient of the regularization, and add back to
totalGradientArray.
sum += {
if (standardization) {
totalGradientMatrix.update(classIndex, featureIndex,
totalGradientMatrix(classIndex, featureIndex) + regParamL2
* value)
value * value
} else {
if (featuresStd(featureIndex) != 0.0) {
// If `standardization` is false, we still standardize the
data
// to improve the rate of convergence; as a result, we have
to
// perform this reverse standardization by penalizing each
component
// differently to get effectively the same objective
function when
// the training dataset is not standardized.
val temp = value / (featuresStd(featureIndex) *
featuresStd(featureIndex))
totalGradientMatrix.update(classIndex, featureIndex,
totalGradientMatrix(classIndex, featureIndex) +
regParamL2 * temp)
value * temp
} else {
0.0
}
}
}
}
}
0.5 * regParamL2 * sum
}
```
For L1 part, I'm thinking to do the following,
```scala
new BreezeOWLQN[(Int, Int), BDM[Double]]($(maxIter), 10,
regParamL1Fun, $(tol))
```
then we can implement L1 function as
```scala
def regParamL1Fun = (classIndex: Int, featureIndex: Int) => {
// Remove the L1 penalization on the intercept
val isIntercept = fitIntercept && featureIndex == numFeatures
if (isIntercept) {
0.0
} else {
if (standardizationParam) {
regParamL1
} else {
if (featuresStd(featureIndex) != 0.0) {
regParamL1 / featuresStd(featureIndex)
} else {
0.0
}
}
}
}
```
By having coeffs and gradientSum in matrix, we also open up a opportunity
to having both of them in sparse representation. With high dimensional
problems, often times, both of them can be very sparse.
I'm also aware that we're cutting 2.1 release now, and this is an important
performance improvement, so I'm open to deal with those cleaning up work in a
separate PR so at least we're able to merge this.
Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]