[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

dbtsai Tue, 02 Feb 2016 20:35:35 -0800

Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10702#discussion_r51677772
  
    --- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala 
---
    @@ -558,6 +583,86 @@ class LinearRegressionSuite
         }
       }
     
    +  test("linear regression model with constant label") {
    +    /*
    +       R code:
    +       for (formula in c(b.const ~ . -1, b.const ~ .)) {
    +         model <- lm(formula, data=df.const.label, weights=w)
    +         print(as.vector(coef(model)))
    +       }
    +      [1] -9.221298  3.394343
    +      [1] 17  0  0
    +    */
    +    val expected = Seq(
    +      Vectors.dense(0.0, -9.221298, 3.394343),
    +      Vectors.dense(17.0, 0.0, 0.0))
    +
    +    Seq("auto", "l-bfgs", "normal").foreach { solver =>
    +      var idx = 0
    +      for (fitIntercept <- Seq(false, true)) {
    +        val model1 = new LinearRegression()
    +          .setFitIntercept(fitIntercept)
    +          .setWeightCol("weight")
    +          .setSolver(solver)
    +          .fit(datasetWithWeightConstantLabel)
    +        val actual1 = Vectors.dense(model1.intercept, 
model1.coefficients(0),
    +            model1.coefficients(1))
    +        assert(actual1 ~== expected(idx) absTol 1e-4)
    +
    +        val model2 = new LinearRegression()
    +          .setFitIntercept(fitIntercept)
    +          .setWeightCol("weight")
    +          .setSolver(solver)
    +          .fit(datasetWithWeightZeroLabel)
    +        val actual2 = Vectors.dense(model2.intercept, 
model2.coefficients(0),
    +            model2.coefficients(1))
    +        assert(actual2 ~==  Vectors.dense(0.0, 0.0, 0.0) absTol 1e-4)
    +        idx += 1
    +      }
    +    }
    +  }
    +
    +  test("regularized linear regression through origin with constant label") 
{
    +    // The problem is ill-defined if fitIntercept=false, regParam is 
non-zero.
    +    // An exception is thrown in this case.
    +    Seq("auto", "l-bfgs", "normal").foreach { solver =>
    +      for (standardization <- Seq(false, true)) {
    +        val model = new LinearRegression().setFitIntercept(false)
    +          
.setRegParam(0.1).setStandardization(standardization).setSolver(solver)
    +        intercept[IllegalArgumentException] {
    +          model.fit(datasetWithWeightConstantLabel)
    +        }
    +      }
    +    }
    +  }
    +
    +  test("linear regression with l-bfgs when training is not needed") {
    +    // When label is constant, l-bfgs solver returns results without 
training.
    +    // There are two possibilities: If the label is non-zero but constant,
    +    // and fitIntercept is true, then the model return yMean as intercept 
without training.
    +    // If label is all zeros, then all coefficients are zero regardless of 
fitIntercept, so
    +    // no training is needed.
    +    for (fitIntercept <- Seq(false, true)) {
    +      for (standardization <- Seq(false, true)) {
    +        val model1 = new LinearRegression()
    +          .setFitIntercept(fitIntercept)
    +          .setStandardization(standardization)
    +          .setWeightCol("weight")
    +          .setSolver("l-bfgs")
    +          .fit(datasetWithWeightConstantLabel)
    +        if (fitIntercept) {
    +          assert(model1.summary.objectiveHistory(0) ~== 0.0 absTol 1e-4)
    --- End diff --
    
    I think this should be `model1.summary.objectiveHistory.length == 0`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

Reply via email to