[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...

2016-08-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14856


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...

2016-08-30 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14856#discussion_r76844386
  
--- Diff: R/pkg/inst/tests/testthat/test_mllib.R ---
@@ -99,6 +99,10 @@ test_that("spark.glm summary", {
   expect_match(out[2], "Deviance Residuals:")
   expect_true(any(grepl("AIC: 59.22", out)))
 
+  # Test spark.glm works with regularization parameter
+  regStats <- summary(spark.glm(training, Sepal_Width ~ Sepal_Length + 
Species, regParam = 0.3))
+  expect_equal(regStats$aic, 136.7, tolerance = 1e-3)
--- End diff --

though it's very likely that the result would not change :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...

2016-08-30 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14856#discussion_r76844264
  
--- Diff: R/pkg/inst/tests/testthat/test_mllib.R ---
@@ -99,6 +99,10 @@ test_that("spark.glm summary", {
   expect_match(out[2], "Deviance Residuals:")
   expect_true(any(grepl("AIC: 59.22", out)))
 
+  # Test spark.glm works with regularization parameter
+  regStats <- summary(spark.glm(training, Sepal_Width ~ Sepal_Length + 
Species, regParam = 0.3))
+  expect_equal(regStats$aic, 136.7, tolerance = 1e-3)
--- End diff --

I remember it should match the result of `glmnet`? Perhaps you can try the 
same example there or take a look at 
https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala#L307


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...

2016-08-30 Thread keypointt
Github user keypointt commented on a diff in the pull request:

https://github.com/apache/spark/pull/14856#discussion_r76842594
  
--- Diff: R/pkg/inst/tests/testthat/test_mllib.R ---
@@ -99,6 +99,10 @@ test_that("spark.glm summary", {
   expect_match(out[2], "Deviance Residuals:")
   expect_true(any(grepl("AIC: 59.22", out)))
 
+  # Test spark.glm works with regularization parameter
+  regStats <- summary(spark.glm(training, Sepal_Width ~ Sepal_Length + 
Species, regParam = 0.3))
+  expect_equal(regStats$aic, 136.7, tolerance = 1e-3)
--- End diff --

oh I just check the output of model stats

maybe there is a better way to test it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...

2016-08-30 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14856#discussion_r76841246
  
--- Diff: R/pkg/inst/tests/testthat/test_mllib.R ---
@@ -99,6 +99,10 @@ test_that("spark.glm summary", {
   expect_match(out[2], "Deviance Residuals:")
   expect_true(any(grepl("AIC: 59.22", out)))
 
+  # Test spark.glm works with regularization parameter
+  regStats <- summary(spark.glm(training, Sepal_Width ~ Sepal_Length + 
Species, regParam = 0.3))
+  expect_equal(regStats$aic, 136.7, tolerance = 1e-3)
--- End diff --

How was this number computed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...

2016-08-29 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14856#discussion_r76669742
  
--- Diff: R/pkg/R/mllib.R ---
@@ -171,7 +172,8 @@ predict_internal <- function(object, newData) {
 #' @note spark.glm since 2.0.0
 #' @seealso \link{glm}, \link{read.ml}
 setMethod("spark.glm", signature(data = "SparkDataFrame", formula = 
"formula"),
-  function(data, formula, family = gaussian, tol = 1e-6, maxIter = 
25, weightCol = NULL) {
+  function(data, formula, family = gaussian, tol = 1e-6, regParam 
= 0.0, maxIter = 25,
+   weightCol = NULL) {
--- End diff --

+1 - we should try to avoid breaking existing caller


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...

2016-08-28 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14856#discussion_r76553172
  
--- Diff: R/pkg/R/mllib.R ---
@@ -171,7 +172,8 @@ predict_internal <- function(object, newData) {
 #' @note spark.glm since 2.0.0
 #' @seealso \link{glm}, \link{read.ml}
 setMethod("spark.glm", signature(data = "SparkDataFrame", formula = 
"formula"),
-  function(data, formula, family = gaussian, tol = 1e-6, maxIter = 
25, weightCol = NULL) {
+  function(data, formula, family = gaussian, tol = 1e-6, regParam 
= 0.0, maxIter = 25,
+   weightCol = NULL) {
--- End diff --

If say an R user call the function by `spark.glm(df, label ~ feature, 
gaussian, 1e-6, 25)`. This will break their code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...

2016-08-28 Thread keypointt
Github user keypointt commented on a diff in the pull request:

https://github.com/apache/spark/pull/14856#discussion_r76552881
  
--- Diff: R/pkg/R/mllib.R ---
@@ -171,7 +172,8 @@ predict_internal <- function(object, newData) {
 #' @note spark.glm since 2.0.0
 #' @seealso \link{glm}, \link{read.ml}
 setMethod("spark.glm", signature(data = "SparkDataFrame", formula = 
"formula"),
-  function(data, formula, family = gaussian, tol = 1e-6, maxIter = 
25, weightCol = NULL) {
+  function(data, formula, family = gaussian, tol = 1e-6, regParam 
= 0.0, maxIter = 25,
+   weightCol = NULL) {
--- End diff --

check the `fit()` method of the wrapper, as long as the parameter order 
matches, it's ok.

I've tested it already in R terminal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...

2016-08-28 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14856#discussion_r76552745
  
--- Diff: R/pkg/R/mllib.R ---
@@ -171,7 +172,8 @@ predict_internal <- function(object, newData) {
 #' @note spark.glm since 2.0.0
 #' @seealso \link{glm}, \link{read.ml}
 setMethod("spark.glm", signature(data = "SparkDataFrame", formula = 
"formula"),
-  function(data, formula, family = gaussian, tol = 1e-6, maxIter = 
25, weightCol = NULL) {
+  function(data, formula, family = gaussian, tol = 1e-6, regParam 
= 0.0, maxIter = 25,
+   weightCol = NULL) {
--- End diff --

Perhaps we can add that to the end of the argument list so that it doesn't 
break the existing calls to the function?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...

2016-08-28 Thread keypointt
GitHub user keypointt opened a pull request:

https://github.com/apache/spark/pull/14856

[SPARK-17241][SparkR][MLlib] SparkR spark.glm should have configurable 
regularization parameter

https://issues.apache.org/jira/browse/SPARK-17241

## What changes were proposed in this pull request?

Spark has configurable L2 regularization parameter for generalized linear 
regression. It is very important to have them in SparkR so that users can run 
ridge regression.

## How was this patch tested?

Test manually on local laptop.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/keypointt/spark SPARK-17241

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14856.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14856


commit 6417049e9185434bc23c651217d73a88abe4f606
Author: Xin Ren 
Date:   2016-08-28T23:01:37Z

[SPARK-17241] add configurable regularization parameter




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org