[GitHub] spark pull request #23146: [SPARK-26173] [MLlib] Prior regularization for Lo...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23146#discussion_r238104839 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala --- @@ -82,7 +82,72 @@ private[ml] class L2Regularization( } (0.5 * sum * regParam, Vectors.dense(gradient)) case _: SparseVector => -throw new IllegalArgumentException("Sparse coefficients are not currently supported.") +throw new IllegalArgumentException( + "Sparse coefficients are not currently supported.") +} + } +} + + +/** + * Implements regularization for Maximum A Posteriori (MAP) optimization + * based on prior means (coefficients) and precisions. + * + * @param priorMean Prior coefficients (multivariate mean). + * @param priorPrecisions Prior precisions. + * @param regParam The magnitude of the regularization. + * @param shouldApply A function (Int => Boolean) indicating whether a given index should have + *regularization applied to it. Usually we don't apply regularization to + *the intercept. + * @param applyFeaturesStd Option for a function which maps coefficient index (column major) to the + * feature standard deviation. Since we always standardize the data during + * training, if `standardization` is false, we have to reverse + * standardization by penalizing each component differently by this param. + * If `standardization` is true, this should be `None`. + */ +private[ml] class PriorRegularization( +priorMean: Array[Double], +priorPrecisions: Array[Double], +override val regParam: Double, +shouldApply: Int => Boolean, +applyFeaturesStd: Option[Int => Double]) +extends DifferentiableRegularization[Vector] { + + override def calculate(coefficients: Vector): (Double, Vector) = { +coefficients match { + case dv: DenseVector => +var sum = 0.0 +val gradient = new Array[Double](dv.size) +dv.values.indices.filter(shouldApply).foreach { j => + val coef = coefficients(j) + val priorCoef = priorMean(j) + val priorPrecision = priorPrecisions(j) + applyFeaturesStd match { +case Some(getStd) => + // If `standardization` is false, we still standardize the data + // to improve the rate of convergence; as a result, we have to + // perform this reverse standardization by penalizing each component + // differently to get effectively the same objective function when + // the training dataset is not standardized. + val std = getStd(j) + if (std != 0.0) { +val temp = (coef - priorCoef) / (std * std) +sum += (coef - priorCoef) * temp * priorPrecision +gradient(j) = regParam * priorPrecision * temp + } else { +0.0 --- End diff -- Who consumes `0.0`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23146: [SPARK-26173] [MLlib] Prior regularization for Lo...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/23146#discussion_r237888294 --- Diff: mllib/src/test/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularizationSuite.scala --- @@ -16,9 +16,13 @@ */ package org.apache.spark.ml.optim.loss +import org.scalactic.{Equality, TolerantNumerics} --- End diff -- We have ~== etc for approximate comparison --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23146: [SPARK-26173] [MLlib] Prior regularization for Lo...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/23146#discussion_r237888645 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -250,6 +250,66 @@ private[classification] trait LogisticRegressionParams extends ProbabilisticClas isSet(lowerBoundsOnIntercepts) || isSet(upperBoundsOnIntercepts) } + /** + * The prior multivariate mean (coefficients) for Maximum A Posteriori (MAP) optimization. + * Default is none. + * + * @group expertParam */ + @Since("2.4.0") + val priorMean: DoubleArrayParam = new DoubleArrayParam(this, "priorMean", + "The prior mean used for Prior regularization.") --- End diff -- These need continuation indents --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23146: [SPARK-26173] [MLlib] Prior regularization for Lo...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/23146#discussion_r237888349 --- Diff: mllib/src/test/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularizationSuite.scala --- @@ -16,9 +16,13 @@ */ package org.apache.spark.ml.optim.loss +import org.scalactic.{Equality, TolerantNumerics} + import org.apache.spark.SparkFunSuite import org.apache.spark.ml.linalg.{BLAS, Vectors} + --- End diff -- Nit: remove these lines --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23146: [SPARK-26173] [MLlib] Prior regularization for Lo...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/23146#discussion_r237888585 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -250,6 +250,66 @@ private[classification] trait LogisticRegressionParams extends ProbabilisticClas isSet(lowerBoundsOnIntercepts) || isSet(upperBoundsOnIntercepts) } + /** + * The prior multivariate mean (coefficients) for Maximum A Posteriori (MAP) optimization. + * Default is none. + * + * @group expertParam */ + @Since("2.4.0") --- End diff -- This would have to be 3.0.0 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23146: [SPARK-26173] [MLlib] Prior regularization for Lo...
GitHub user elfausto opened a pull request: https://github.com/apache/spark/pull/23146 [SPARK-26173] [MLlib] Prior regularization for Logistic Regression Implementation of [SPARK-26173](https://issues.apache.org/jira/browse/SPARK-26173). Unit tests have been added to the modified classes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/affectv/spark SPARK-26173 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23146.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23146 commit d8c35bb897cc0e2248810bfc7208e642deb5c55d Author: Facundo Bellosi Date: 2018-11-26T15:31:26Z SPARK-26173. Prior regularization for Logistic Regression. commit 52918b7bd30f8fdccce450d7f41e34f8bedf32ad Author: Facundo Bellosi Date: 2018-11-26T15:50:11Z SPARK-26173. Prior regularization for Logistic Regression. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org