[GitHub] [spark] huaxingao commented on a change in pull request #27593: [SPARK-30818][SPARKR][ML] Add SparkR LinearRegression wrapper

GitBox Mon, 23 Mar 2020 12:25:48 -0700

huaxingao commented on a change in pull request #27593: 
[SPARK-30818][SPARKR][ML] Add SparkR LinearRegression wrapper
URL: https://github.com/apache/spark/pull/27593#discussion_r396696280


 ##########
 File path: R/pkg/R/mllib_regression.R
 ##########
 @@ -540,3 +546,145 @@ setMethod("write.ml", signature(object = 
"AFTSurvivalRegressionModel", path = "c
           function(object, path, overwrite = FALSE) {
             write_internal(object, path, overwrite)
           })
+
+#' Linear Regression Model
+#'
+#' \code{spark.lm} fits a linear regression model against a SparkDataFrame.
+#' Users can call \code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load 
fitted models.
+#'
+#' @param data a \code{SparkDataFrame} of observations and labels for model 
fitting.
+#' @param formula a symbolic description of the model to be fitted. Currently 
only a few formula
+#'                operators are supported, including '~', '.', ':', '+', and 
'-'.
+#' @param maxIter maximum iteration number.
+#' @param regParam the regularization parameter.
+#' @param elasticNetParam the ElasticNet mixing parameter, in range [0, 1].
+#'        For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an 
L1 penalty.
+#' @param tol convergence tolerance of iterations.
+#' @param standardization whether to standardize the training features before 
fitting the model.
+#' @param weightCol weight column name.
+#' @param aggregationDepth suggested depth for treeAggregate (>= 2).
+#' @param loss the loss function to be optimized. Supported options: 
"squaredError" and "huber"
+#' @param epsilon the shape parameter to control the amount of robustnes
+#' @param solver The solver algorithm for optimization.
+#'        Supported options: "l-bfgs", "normal" and "auto".
+#' @param stringIndexerOrderType how to order categories of a string feature 
column. This is used to
+#'                               decide the base level of a string feature as 
the last category
+#'                               after ordering is dropped when encoding 
strings. Supported options
+#'                               are "frequencyDesc", "frequencyAsc", 
"alphabetDesc", and
+#'                               "alphabetAsc". The default value is 
"frequencyDesc". When the
+#'                               ordering is set to "alphabetDesc", this drops 
the same category
+#'                               as R when encoding strings.
+#' @param ... additional arguments passed to the method.
+#' @return \code{spark.lm} returns a fitted Linear Regression Model.
+#'
+#' @rdname spark.lm
+#' @aliases spark.lm,SparkDataFrame,formula-method
+#' @name spark.lm
+#' @seealso \link{read.ml}
+#' @examples
+#' \dontrun{
+#' df <- read.df("data/mllib/sample_linear_regression_data.txt", source = 
"libsvm")
+#'
+#' # fit Linear Regression Model
+#' model <- spark.lm(
+#'            df, label ~ features,
+#'            regParam = 0.01, maxIter = 10, fitLinear = TRUE
+#'          )
+#'
+#' # get the summary of the model
+#' summary(model)
+#'
+#' # make predictions
+#' predictions <- predict(model, df)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#' }
+#' @note spark.lm since 3.1.0
+setMethod("spark.lm", signature(data = "SparkDataFrame", formula = "formula"),
+          function(data, formula,
+                   maxIter = 100L, regParam = 0.0, elasticNetParam = 0.0,
+                   tol = 1e-6, standardization = TRUE,
+                   solver = c("auto", "l-bfgs", "normal"),
+                   weightCol = NULL, aggregationDepth = 2L,
+                   loss = c("squaredError", "huber"), epsilon = 1.35,
+                   stringIndexerOrderType = c("frequencyDesc", "frequencyAsc",
+                                              "alphabetDesc", "alphabetAsc")) {
+
+            formula <- paste(deparse(formula), collapse = "")
+
+            solver <- match.arg(solver)
+            loss <- match.arg(loss)
+            stringIndexerOrderType <- match.arg(stringIndexerOrderType)
+
+            jobj <- 
callJStatic("org.apache.spark.ml.r.LinearRegressionWrapper",
+                                "fit",
+                                data@sdf,
+                                formula,
+                                as.integer(maxIter),
+                                as.numeric(regParam),
+                                as.numeric(elasticNetParam),
+                                as.numeric(tol),
+                                as.logical(standardization),
+                                solver,
+                                weightCol,
 
 Review comment:
   Do the following for ```weightCol```?
   ```
               if (!is.null(weightCol) && weightCol == "") {
                 weightCol <- NULL
               } else if (!is.null(weightCol)) {
                 weightCol <- as.character(weightCol)
               }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] huaxingao commented on a change in pull request #27593: [SPARK-30818][SPARKR][ML] Add SparkR LinearRegression wrapper

Reply via email to