Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/23072#discussion_r234432019
--- Diff: R/pkg/R/mllib_clustering.R ---
@@ -610,3 +616,57 @@ setMethod("write.ml", signature(object = "LDAModel",
path = "character"),
function(object, path, overwrite = FALSE) {
write_internal(object, path, overwrite)
})
+
+#' PowerIterationClustering
+#'
+#' A scalable graph clustering algorithm. Users can call
\code{spark.assignClusters} to
+#' return a cluster assignment for each input vertex.
+#'
+# Run the PIC algorithm and returns a cluster assignment for each input
vertex.
+#' @param data A SparkDataFrame.
+#' @param k The number of clusters to create.
+#' @param initMode Param for the initialization algorithm.
+#' @param maxIter Param for maximum number of iterations.
+#' @param srcCol Param for the name of the input column for source vertex
IDs.
+#' @param dstCol Name of the input column for destination vertex IDs.
+#' @param weightCol Param for weight column name. If this is not set or
\code{NULL},
+#' we treat all instance weights as 1.0.
+#' @param ... additional argument(s) passed to the method.
+#' @return A dataset that contains columns of vertex id and the
corresponding cluster for the id.
+#' The schema of it will be:
+#' \code{id: Long}
+#' \code{cluster: Int}
+#' @rdname spark.powerIterationClustering
+#' @aliases
assignClusters,PowerIterationClustering-method,SparkDataFrame-method
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
+#' list(1L, 2L, 1.0), list(3L, 4L, 1.0),
+#' list(4L, 0L, 0.1)), schema = c("src", "dst",
"weight"))
+#' clusters <- spark.assignClusters(df, initMode="degree",
weightCol="weight")
+#' showDF(clusters)
+#' }
+#' @note spark.assignClusters(SparkDataFrame) since 3.0.0
+setMethod("spark.assignClusters",
+ signature(data = "SparkDataFrame"),
+ function(data, k = 2L, initMode = "random", maxIter = 20L,
srcCol = "src",
--- End diff --
set valid values for initMode and check for it - eg.
https://github.com/apache/spark/pull/23072/files#diff-d9f92e07db6424e2527a7f9d7caa9013R355
and `match.arg(initMode)`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]