[GitHub] felixcheung commented on a change in pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-10 Thread GitBox
felixcheung commented on a change in pull request #23072: 
[SPARK-19827][R]spark.ml R API for PIC
URL: https://github.com/apache/spark/pull/23072#discussion_r240491948
 
 

 ##
 File path: R/pkg/R/mllib_clustering.R
 ##
 @@ -610,3 +616,59 @@ setMethod("write.ml", signature(object = "LDAModel", path 
= "character"),
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+#' PowerIterationClustering
+#'
+#' A scalable graph clustering algorithm. Users can call 
\code{spark.assignClusters} to
+#' return a cluster assignment for each input vertex.
+#'
 
 Review comment:
   remove empty line - empty is significant in roxygen2


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] felixcheung commented on a change in pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-10 Thread GitBox
felixcheung commented on a change in pull request #23072: 
[SPARK-19827][R]spark.ml R API for PIC
URL: https://github.com/apache/spark/pull/23072#discussion_r240492041
 
 

 ##
 File path: R/pkg/R/mllib_clustering.R
 ##
 @@ -610,3 +616,59 @@ setMethod("write.ml", signature(object = "LDAModel", path 
= "character"),
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+#' PowerIterationClustering
+#'
+#' A scalable graph clustering algorithm. Users can call 
\code{spark.assignClusters} to
+#' return a cluster assignment for each input vertex.
+#'
+#  Run the PIC algorithm and returns a cluster assignment for each input 
vertex.
+#' @param data a SparkDataFrame.
+#' @param k the number of clusters to create.
+#' @param initMode the initialization algorithm.
 
 Review comment:
   add `One of "random", "degree"`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] felixcheung commented on a change in pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-10 Thread GitBox
felixcheung commented on a change in pull request #23072: 
[SPARK-19827][R]spark.ml R API for PIC
URL: https://github.com/apache/spark/pull/23072#discussion_r240492482
 
 

 ##
 File path: R/pkg/R/mllib_clustering.R
 ##
 @@ -610,3 +616,59 @@ setMethod("write.ml", signature(object = "LDAModel", path 
= "character"),
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+#' PowerIterationClustering
+#'
+#' A scalable graph clustering algorithm. Users can call 
\code{spark.assignClusters} to
+#' return a cluster assignment for each input vertex.
+#'
+#  Run the PIC algorithm and returns a cluster assignment for each input 
vertex.
+#' @param data a SparkDataFrame.
+#' @param k the number of clusters to create.
+#' @param initMode the initialization algorithm.
+#' @param maxIter the maximum number of iterations.
+#' @param sourceCol the name of the input column for source vertex IDs.
+#' @param destinationCol the name of the input column for destination vertex 
IDs
+#' @param weightCol weight column name. If this is not set or \code{NULL},
+#'  we treat all instance weights as 1.0.
+#' @param ... additional argument(s) passed to the method.
+#' @return A dataset that contains columns of vertex id and the corresponding 
cluster for the id.
+#' The schema of it will be:
+#' \code{id: Long}
+#' \code{cluster: Int}
 
 Review comment:
   mm, this won't format correctly - roxygen strips all the whitespaces
   also Long and Int is not a proper type in R


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] felixcheung commented on a change in pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-10 Thread GitBox
felixcheung commented on a change in pull request #23072: 
[SPARK-19827][R]spark.ml R API for PIC
URL: https://github.com/apache/spark/pull/23072#discussion_r240492887
 
 

 ##
 File path: R/pkg/R/mllib_clustering.R
 ##
 @@ -610,3 +616,59 @@ setMethod("write.ml", signature(object = "LDAModel", path 
= "character"),
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+#' PowerIterationClustering
+#'
+#' A scalable graph clustering algorithm. Users can call 
\code{spark.assignClusters} to
+#' return a cluster assignment for each input vertex.
+#'
+#  Run the PIC algorithm and returns a cluster assignment for each input 
vertex.
+#' @param data a SparkDataFrame.
+#' @param k the number of clusters to create.
+#' @param initMode the initialization algorithm.
+#' @param maxIter the maximum number of iterations.
+#' @param sourceCol the name of the input column for source vertex IDs.
+#' @param destinationCol the name of the input column for destination vertex 
IDs
+#' @param weightCol weight column name. If this is not set or \code{NULL},
+#'  we treat all instance weights as 1.0.
+#' @param ... additional argument(s) passed to the method.
+#' @return A dataset that contains columns of vertex id and the corresponding 
cluster for the id.
+#' The schema of it will be:
+#' \code{id: Long}
+#' \code{cluster: Int}
+#' @rdname spark.powerIterationClustering
+#' @aliases 
assignClusters,PowerIterationClustering-method,SparkDataFrame-method
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
+#'list(1L, 2L, 1.0), list(3L, 4L, 1.0),
+#'list(4L, 0L, 0.1)),
+#'   schema = c("src", "dst", "weight"))
+#' clusters <- spark.assignClusters(df, initMode="degree", weightCol="weight")
 
 Review comment:
   space around `=` as style


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] felixcheung commented on a change in pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-10 Thread GitBox
felixcheung commented on a change in pull request #23072: 
[SPARK-19827][R]spark.ml R API for PIC
URL: https://github.com/apache/spark/pull/23072#discussion_r240492789
 
 

 ##
 File path: R/pkg/R/mllib_clustering.R
 ##
 @@ -610,3 +616,59 @@ setMethod("write.ml", signature(object = "LDAModel", path 
= "character"),
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+#' PowerIterationClustering
+#'
+#' A scalable graph clustering algorithm. Users can call 
\code{spark.assignClusters} to
+#' return a cluster assignment for each input vertex.
+#'
+#  Run the PIC algorithm and returns a cluster assignment for each input 
vertex.
+#' @param data a SparkDataFrame.
+#' @param k the number of clusters to create.
+#' @param initMode the initialization algorithm.
+#' @param maxIter the maximum number of iterations.
+#' @param sourceCol the name of the input column for source vertex IDs.
+#' @param destinationCol the name of the input column for destination vertex 
IDs
+#' @param weightCol weight column name. If this is not set or \code{NULL},
+#'  we treat all instance weights as 1.0.
+#' @param ... additional argument(s) passed to the method.
+#' @return A dataset that contains columns of vertex id and the corresponding 
cluster for the id.
+#' The schema of it will be:
+#' \code{id: Long}
+#' \code{cluster: Int}
+#' @rdname spark.powerIterationClustering
+#' @aliases 
assignClusters,PowerIterationClustering-method,SparkDataFrame-method
 
 Review comment:
   wait, this aliases doesn't make sense. could you test if `?assignClusters` 
in R shell if this works?
   
   this should be `@aliases spark.assignClusters,SparkDataFrame-method`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] felixcheung commented on a change in pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-10 Thread GitBox
felixcheung commented on a change in pull request #23072: 
[SPARK-19827][R]spark.ml R API for PIC
URL: https://github.com/apache/spark/pull/23072#discussion_r240493499
 
 

 ##
 File path: R/pkg/R/mllib_clustering.R
 ##
 @@ -610,3 +616,59 @@ setMethod("write.ml", signature(object = "LDAModel", path 
= "character"),
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+#' PowerIterationClustering
+#'
+#' A scalable graph clustering algorithm. Users can call 
\code{spark.assignClusters} to
+#' return a cluster assignment for each input vertex.
+#'
+#  Run the PIC algorithm and returns a cluster assignment for each input 
vertex.
+#' @param data a SparkDataFrame.
+#' @param k the number of clusters to create.
+#' @param initMode the initialization algorithm.
+#' @param maxIter the maximum number of iterations.
+#' @param sourceCol the name of the input column for source vertex IDs.
+#' @param destinationCol the name of the input column for destination vertex 
IDs
+#' @param weightCol weight column name. If this is not set or \code{NULL},
+#'  we treat all instance weights as 1.0.
+#' @param ... additional argument(s) passed to the method.
+#' @return A dataset that contains columns of vertex id and the corresponding 
cluster for the id.
+#' The schema of it will be:
+#' \code{id: Long}
+#' \code{cluster: Int}
+#' @rdname spark.powerIterationClustering
+#' @aliases 
assignClusters,PowerIterationClustering-method,SparkDataFrame-method
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
+#'list(1L, 2L, 1.0), list(3L, 4L, 1.0),
+#'list(4L, 0L, 0.1)),
+#'   schema = c("src", "dst", "weight"))
+#' clusters <- spark.assignClusters(df, initMode="degree", weightCol="weight")
+#' showDF(clusters)
+#' }
+#' @note spark.assignClusters(SparkDataFrame) since 3.0.0
+setMethod("spark.assignClusters",
+  signature(data = "SparkDataFrame"),
+  function(data, k = 2L, initMode = c("random", "degree"), maxIter = 
20L,
+sourceCol = "src", destinationCol = "dst", weightCol = NULL) {
+if (!is.numeric(k) || k < 1) {
+  stop("k should be a number with value >= 1.")
+}
+if (!is.integer(maxIter) || maxIter <= 0) {
 
 Review comment:
   if maxIter should in integer, should we check k is also integer? it;s fixed 
when it is passed, so just a minor consistency on value check


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org