[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14856 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14856#discussion_r76844386 --- Diff: R/pkg/inst/tests/testthat/test_mllib.R --- @@ -99,6 +99,10 @@ test_that("spark.glm summary", { expect_match(out[2], "Deviance Residuals:") expect_true(any(grepl("AIC: 59.22", out))) + # Test spark.glm works with regularization parameter + regStats <- summary(spark.glm(training, Sepal_Width ~ Sepal_Length + Species, regParam = 0.3)) + expect_equal(regStats$aic, 136.7, tolerance = 1e-3) --- End diff -- though it's very likely that the result would not change :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14856#discussion_r76844264 --- Diff: R/pkg/inst/tests/testthat/test_mllib.R --- @@ -99,6 +99,10 @@ test_that("spark.glm summary", { expect_match(out[2], "Deviance Residuals:") expect_true(any(grepl("AIC: 59.22", out))) + # Test spark.glm works with regularization parameter + regStats <- summary(spark.glm(training, Sepal_Width ~ Sepal_Length + Species, regParam = 0.3)) + expect_equal(regStats$aic, 136.7, tolerance = 1e-3) --- End diff -- I remember it should match the result of `glmnet`? Perhaps you can try the same example there or take a look at https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala#L307 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...
Github user keypointt commented on a diff in the pull request: https://github.com/apache/spark/pull/14856#discussion_r76842594 --- Diff: R/pkg/inst/tests/testthat/test_mllib.R --- @@ -99,6 +99,10 @@ test_that("spark.glm summary", { expect_match(out[2], "Deviance Residuals:") expect_true(any(grepl("AIC: 59.22", out))) + # Test spark.glm works with regularization parameter + regStats <- summary(spark.glm(training, Sepal_Width ~ Sepal_Length + Species, regParam = 0.3)) + expect_equal(regStats$aic, 136.7, tolerance = 1e-3) --- End diff -- oh I just check the output of model stats maybe there is a better way to test it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14856#discussion_r76841246 --- Diff: R/pkg/inst/tests/testthat/test_mllib.R --- @@ -99,6 +99,10 @@ test_that("spark.glm summary", { expect_match(out[2], "Deviance Residuals:") expect_true(any(grepl("AIC: 59.22", out))) + # Test spark.glm works with regularization parameter + regStats <- summary(spark.glm(training, Sepal_Width ~ Sepal_Length + Species, regParam = 0.3)) + expect_equal(regStats$aic, 136.7, tolerance = 1e-3) --- End diff -- How was this number computed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14856#discussion_r76669742 --- Diff: R/pkg/R/mllib.R --- @@ -171,7 +172,8 @@ predict_internal <- function(object, newData) { #' @note spark.glm since 2.0.0 #' @seealso \link{glm}, \link{read.ml} setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"), - function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25, weightCol = NULL) { + function(data, formula, family = gaussian, tol = 1e-6, regParam = 0.0, maxIter = 25, + weightCol = NULL) { --- End diff -- +1 - we should try to avoid breaking existing caller --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14856#discussion_r76553172 --- Diff: R/pkg/R/mllib.R --- @@ -171,7 +172,8 @@ predict_internal <- function(object, newData) { #' @note spark.glm since 2.0.0 #' @seealso \link{glm}, \link{read.ml} setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"), - function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25, weightCol = NULL) { + function(data, formula, family = gaussian, tol = 1e-6, regParam = 0.0, maxIter = 25, + weightCol = NULL) { --- End diff -- If say an R user call the function by `spark.glm(df, label ~ feature, gaussian, 1e-6, 25)`. This will break their code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...
Github user keypointt commented on a diff in the pull request: https://github.com/apache/spark/pull/14856#discussion_r76552881 --- Diff: R/pkg/R/mllib.R --- @@ -171,7 +172,8 @@ predict_internal <- function(object, newData) { #' @note spark.glm since 2.0.0 #' @seealso \link{glm}, \link{read.ml} setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"), - function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25, weightCol = NULL) { + function(data, formula, family = gaussian, tol = 1e-6, regParam = 0.0, maxIter = 25, + weightCol = NULL) { --- End diff -- check the `fit()` method of the wrapper, as long as the parameter order matches, it's ok. I've tested it already in R terminal. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14856#discussion_r76552745 --- Diff: R/pkg/R/mllib.R --- @@ -171,7 +172,8 @@ predict_internal <- function(object, newData) { #' @note spark.glm since 2.0.0 #' @seealso \link{glm}, \link{read.ml} setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"), - function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25, weightCol = NULL) { + function(data, formula, family = gaussian, tol = 1e-6, regParam = 0.0, maxIter = 25, + weightCol = NULL) { --- End diff -- Perhaps we can add that to the end of the argument list so that it doesn't break the existing calls to the function? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...
GitHub user keypointt opened a pull request: https://github.com/apache/spark/pull/14856 [SPARK-17241][SparkR][MLlib] SparkR spark.glm should have configurable regularization parameter https://issues.apache.org/jira/browse/SPARK-17241 ## What changes were proposed in this pull request? Spark has configurable L2 regularization parameter for generalized linear regression. It is very important to have them in SparkR so that users can run ridge regression. ## How was this patch tested? Test manually on local laptop. You can merge this pull request into a Git repository by running: $ git pull https://github.com/keypointt/spark SPARK-17241 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14856.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14856 commit 6417049e9185434bc23c651217d73a88abe4f606 Author: Xin Ren Date: 2016-08-28T23:01:37Z [SPARK-17241] add configurable regularization parameter --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org