[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

junyangq Thu, 18 Aug 2016 16:44:42 -0700

Github user junyangq commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14384#discussion_r75408767
  
    --- Diff: R/pkg/R/mllib.R ---
    @@ -632,3 +642,146 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
               function(object, newData) {
                 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
               })
    +
    +
    +#' Alternating Least Squares (ALS) for Collaborative Filtering
    +#'
    +#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
    +#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
    +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
    +#'
    +#' For more details, see
    +#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
    +#' Collaborative Filtering}.
    +#'
    +#' @param data a SparkDataFrame for training.
    +#' @param ratingCol column name for ratings.
    +#' @param userCol column name for user ids. Ids must be (or can be coerced 
into) integers.
    +#' @param itemCol column name for item ids. Ids must be (or can be coerced 
into) integers.
    +#' @param rank rank of the matrix factorization (> 0).
    +#' @param reg regularization parameter (>= 0).
    +#' @param maxIter maximum number of iterations (>= 0).
    +#' @param nonnegative logical value indicating whether to apply 
nonnegativity constraints.
    +#' @param implicitPrefs logical value indicating whether to use implicit 
preference.
    +#' @param alpha alpha parameter in the implicit preference formulation (>= 
0).
    +#' @param seed integer seed for random number generation.
    +#' @param numUserBlocks number of user blocks used to parallelize 
computation (> 0).
    +#' @param numItemBlocks number of item blocks used to parallelize 
computation (> 0).
    +#' @param checkpointInterval number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
    +#'
    +#' @return \code{spark.als} returns a fitted ALS model
    +#' @rdname spark.als
    +#' @aliases spark.als,SparkDataFrame-method
    +#' @name spark.als
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), 
list(1, 2, 4.0),
    +#'                 list(2, 1, 1.0), list(2, 2, 5.0))
    +#' df <- createDataFrame(ratings, c("user", "item", "rating"))
    +#' model <- spark.als(df, "rating", "user", "item")
    +#'
    +#' # extract latent factors
    +#' stats <- summary(model)
    +#' userFactors <- stats$userFactors
    +#' itemFactors <- stats$itemFactors
    +#'
    +#' # make predictions
    +#' predicted <- predict(model, df)
    +#' showDF(predicted)
    +#'
    +#' # save and load the model
    +#' path <- "path/to/model"
    +#' write.ml(model, path)
    +#' savedModel <- read.ml(path)
    +#' summary(savedModel)
    +#'
    +#' # set other arguments
    +#' modelS <- spark.als(df, "rating", "user", "item", rank = 20,
    +#'                     reg = 0.1, nonnegative = TRUE)
    +#' statsS <- summary(modelS)
    +#' }
    +#' @note spark.als since 2.1.0
    +setMethod("spark.als", signature(data = "SparkDataFrame"),
    +          function(data, ratingCol = "rating", userCol = "user", itemCol = 
"item",
    +                   rank = 10, reg = 1.0, maxIter = 10, nonnegative = FALSE,
    +                   implicitPrefs = FALSE, alpha = 1, numUserBlocks = 10, 
numItemBlocks = 10,
    --- End diff --
    
    In fact it doesn't matter? I think R default is double type, but it's more 
clear to differentiate from other integer parameters. Done thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

Reply via email to