[GitHub] spark pull request #23146: [SPARK-26173] [MLlib] Prior regularization for Lo...

2018-12-02 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23146#discussion_r238104839
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala
 ---
@@ -82,7 +82,72 @@ private[ml] class L2Regularization(
 }
 (0.5 * sum * regParam, Vectors.dense(gradient))
   case _: SparseVector =>
-throw new IllegalArgumentException("Sparse coefficients are not 
currently supported.")
+throw new IllegalArgumentException(
+  "Sparse coefficients are not currently supported.")
+}
+  }
+}
+
+
+/**
+ * Implements regularization for Maximum A Posteriori (MAP) optimization
+ * based on prior means (coefficients) and precisions.
+ *
+ * @param priorMean Prior coefficients (multivariate mean).
+ * @param priorPrecisions Prior precisions.
+ * @param regParam The magnitude of the regularization.
+ * @param shouldApply A function (Int => Boolean) indicating whether a 
given index should have
+ *regularization applied to it. Usually we don't apply 
regularization to
+ *the intercept.
+ * @param applyFeaturesStd Option for a function which maps coefficient 
index (column major) to the
+ * feature standard deviation. Since we always 
standardize the data during
+ * training, if `standardization` is false, we 
have to reverse
+ * standardization by penalizing each component 
differently by this param.
+ * If `standardization` is true, this should be 
`None`.
+ */
+private[ml] class PriorRegularization(
+priorMean: Array[Double],
+priorPrecisions: Array[Double],
+override val regParam: Double,
+shouldApply: Int => Boolean,
+applyFeaturesStd: Option[Int => Double])
+extends DifferentiableRegularization[Vector] {
+
+  override def calculate(coefficients: Vector): (Double, Vector) = {
+coefficients match {
+  case dv: DenseVector =>
+var sum = 0.0
+val gradient = new Array[Double](dv.size)
+dv.values.indices.filter(shouldApply).foreach { j =>
+  val coef = coefficients(j)
+  val priorCoef = priorMean(j)
+  val priorPrecision = priorPrecisions(j)
+  applyFeaturesStd match {
+case Some(getStd) =>
+  // If `standardization` is false, we still standardize the 
data
+  // to improve the rate of convergence; as a result, we have 
to
+  // perform this reverse standardization by penalizing each 
component
+  // differently to get effectively the same objective 
function when
+  // the training dataset is not standardized.
+  val std = getStd(j)
+  if (std != 0.0) {
+val temp = (coef - priorCoef) / (std * std)
+sum += (coef - priorCoef) * temp * priorPrecision
+gradient(j) = regParam * priorPrecision * temp
+  } else {
+0.0
--- End diff --

Who consumes `0.0`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23146: [SPARK-26173] [MLlib] Prior regularization for Lo...

2018-11-30 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23146#discussion_r237888294
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularizationSuite.scala
 ---
@@ -16,9 +16,13 @@
  */
 package org.apache.spark.ml.optim.loss
 
+import org.scalactic.{Equality, TolerantNumerics}
--- End diff --

We have ~== etc for approximate comparison


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23146: [SPARK-26173] [MLlib] Prior regularization for Lo...

2018-11-30 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23146#discussion_r237888645
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -250,6 +250,66 @@ private[classification] trait LogisticRegressionParams 
extends ProbabilisticClas
   isSet(lowerBoundsOnIntercepts) || isSet(upperBoundsOnIntercepts)
   }
 
+  /**
+   * The prior multivariate mean (coefficients) for Maximum A Posteriori 
(MAP) optimization.
+   * Default is none.
+   *
+   * @group expertParam */
+  @Since("2.4.0")
+  val priorMean: DoubleArrayParam = new DoubleArrayParam(this, "priorMean",
+  "The prior mean used for Prior regularization.")
--- End diff --

These need continuation indents


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23146: [SPARK-26173] [MLlib] Prior regularization for Lo...

2018-11-30 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23146#discussion_r237888349
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularizationSuite.scala
 ---
@@ -16,9 +16,13 @@
  */
 package org.apache.spark.ml.optim.loss
 
+import org.scalactic.{Equality, TolerantNumerics}
+
 import org.apache.spark.SparkFunSuite
 import org.apache.spark.ml.linalg.{BLAS, Vectors}
 
+
--- End diff --

Nit: remove these lines


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23146: [SPARK-26173] [MLlib] Prior regularization for Lo...

2018-11-30 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23146#discussion_r237888585
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -250,6 +250,66 @@ private[classification] trait LogisticRegressionParams 
extends ProbabilisticClas
   isSet(lowerBoundsOnIntercepts) || isSet(upperBoundsOnIntercepts)
   }
 
+  /**
+   * The prior multivariate mean (coefficients) for Maximum A Posteriori 
(MAP) optimization.
+   * Default is none.
+   *
+   * @group expertParam */
+  @Since("2.4.0")
--- End diff --

This would have to be 3.0.0


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23146: [SPARK-26173] [MLlib] Prior regularization for Lo...

2018-11-26 Thread elfausto
GitHub user elfausto opened a pull request:

https://github.com/apache/spark/pull/23146

[SPARK-26173] [MLlib] Prior regularization for Logistic Regression

Implementation of 
[SPARK-26173](https://issues.apache.org/jira/browse/SPARK-26173).
Unit tests have been added to the modified classes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/affectv/spark SPARK-26173

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23146.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23146


commit d8c35bb897cc0e2248810bfc7208e642deb5c55d
Author: Facundo Bellosi 
Date:   2018-11-26T15:31:26Z

SPARK-26173. Prior regularization for Logistic Regression.

commit 52918b7bd30f8fdccce450d7f41e34f8bedf32ad
Author: Facundo Bellosi 
Date:   2018-11-26T15:50:11Z

SPARK-26173. Prior regularization for Logistic Regression.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org