Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/15593 @MLnick I'm hoping that we could abstract out the the implementation of using column major format as much as possible; as a result, in the future, new developers can understand the code without thinking hard. I agree that we have good comments and documentation in the code now, but having said that, with some effort, we are able to hide the detail using column major matrix. For example, let's say we have coeffs and gradient in column major matrices, we will be able to write L2 part of gradient in the following code without computing `isIntercept` and `featureIndex` using complicated logics. ```scala val coeffMatrix = new DenseMatrix(numClasses, numFeatures, Array(), false) val totalGradientMatrix = new DenseMatrix(numClasses, numFeatures, Array(), false) coeffMatrix.foreachActive { case (classIndex, featureIndex, value) => // We do not apply regularization to the intercepts val isIntercept = fitIntercept && featureIndex == numFeatures if (!isIntercept) { // The following code will compute the loss of the regularization; also // the gradient of the regularization, and add back to totalGradientArray. sum += { if (standardization) { totalGradientMatrix.update(classIndex, featureIndex, totalGradientMatrix(classIndex, featureIndex) + regParamL2 * value) value * value } else { if (featuresStd(featureIndex) != 0.0) { // If `standardization` is false, we still standardize the data // to improve the rate of convergence; as a result, we have to // perform this reverse standardization by penalizing each component // differently to get effectively the same objective function when // the training dataset is not standardized. val temp = value / (featuresStd(featureIndex) * featuresStd(featureIndex)) totalGradientMatrix.update(classIndex, featureIndex, totalGradientMatrix(classIndex, featureIndex) + regParamL2 * temp) value * temp } else { 0.0 } } } } } 0.5 * regParamL2 * sum } ``` For L1 part, I'm thinking to do the following, ```scala new BreezeOWLQN[(Int, Int), BDM[Double]]($(maxIter), 10, regParamL1Fun, $(tol)) ``` then we can implement L1 function as ```scala def regParamL1Fun = (classIndex: Int, featureIndex: Int) => { // Remove the L1 penalization on the intercept val isIntercept = fitIntercept && featureIndex == numFeatures if (isIntercept) { 0.0 } else { if (standardizationParam) { regParamL1 } else { if (featuresStd(featureIndex) != 0.0) { regParamL1 / featuresStd(featureIndex) } else { 0.0 } } } } ``` By having coeffs and gradientSum in matrix, we also open up a opportunity to having both of them in sparse representation. With high dimensional problems, often times, both of them can be very sparse. I'm also aware that we're cutting 2.1 release now, and this is an important performance improvement, so I'm open to deal with those cleaning up work in a separate PR so at least we're able to merge this. Thanks.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org