[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

mengxr Sat, 30 Apr 2016 06:27:33 -0700

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12813#discussion_r61669606
  
    --- Diff: R/pkg/R/mllib.R ---
    @@ -271,22 +274,25 @@ setMethod("summary", signature(object = 
"NaiveBayesModel"),
     #' Fit a k-means model, similarly to R's kmeans().
     #'
     #' @param data SparkDataFrame for training
    -#' @param k Number of centers
    -#' @param maxIter Maximum iteration number
    -#' @param initializationMode Algorithm choosen to fit the model
    +#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
    +#'                operators are supported, including '~', '.', ':', '+', 
and '-'.
    +#'                Note that the response variable of formula is empty in 
spark.kmeans.
    +#' @param centers Number of centers
    +#' @param iter.max Maximum iteration number
    +#' @param algorithm The initialization algorithm choosen to fit the model
     #' @return A fitted k-means model
     #' @rdname spark.kmeans
     #' @export
     #' @examples
     #' \dontrun{
    -#' model <- spark.kmeans(data, k = 2, initializationMode="random")
    +#' model <- spark.kmeans(data, ~ ., centers = 2, algorithm="random")
     #' }
    -setMethod("spark.kmeans", signature(data = "SparkDataFrame"),
    -          function(data, k, maxIter = 10, initializationMode = c("random", 
"k-means||")) {
    -            columnNames <- as.array(colnames(data))
    -            initializationMode <- match.arg(initializationMode)
    -            jobj <- callJStatic("org.apache.spark.ml.r.KMeansWrapper", 
"fit", data@sdf,
    -                                k, maxIter, initializationMode, 
columnNames)
    +setMethod("spark.kmeans", signature(data = "SparkDataFrame", formula = 
"formula"),
    +          function(data, formula, centers, iter.max = 10, algorithm = 
c("random", "k-means||")) {
    --- End diff --
    
    Since `spark.kmeans` already indicates that this method is different from 
R's `kmeans`, we made the param names consistent with MLlib params. Especially 
the change from `algorithm` to `initMode` makes more sense. We should discuss 
whether we want to use `maxIter`/`initMode` or `max.iter`/`init.mode` as params.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Reply via email to