Github user NarineK commented on a diff in the pull request:
https://github.com/apache/spark/pull/12989#discussion_r62576985
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1197,21 +1215,57 @@ setMethod("summarize",
setMethod("dapply",
signature(x = "SparkDataFrame", func = "function", schema =
"structType"),
function(x, func, schema) {
- packageNamesArr <- serialize(.sparkREnv[[".packages"]],
- connection = NULL)
-
- broadcastArr <- lapply(ls(.broadcastNames),
- function(name) { get(name,
.broadcastNames) })
-
- sdf <- callJStatic(
- "org.apache.spark.sql.api.r.SQLUtils",
- "dapply",
- x@sdf,
- serialize(cleanClosure(func), connection = NULL),
- packageNamesArr,
- broadcastArr,
- schema$jobj)
- dataFrame(sdf)
+ dapplyInternal(x, func, schema)
+ })
+
+#' dapplyCollect
+#'
+#' Apply a function to each partition of a SparkDataFrame and collect the
result back
+#â to R as a data.frame.
+#'
+#' @param x A SparkDataFrame
+#' @param func A function to be applied to each partition of the
SparkDataFrame.
+#' func should have only one parameter, to which a data.frame
corresponds
+#' to each partition will be passed.
+#' The output of func should be a data.frame.
+#' @family SparkDataFrame functions
+#' @rdname dapply
+#' @name dapplyCollect
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame (sqlContext, iris)
+#' ldf <- dapplyCollect(df, function(x) { x })
+#'
+#' # filter and add a column
+#' df <- createDataFrame (
+#' sqlContext,
+#' list(list(1L, 1, "1"), list(2L, 2, "2"), list(3L, 3, "3")),
+#' c("a", "b", "c"))
+#' ldf <- dapplyCollect(
+#' df,
+#' function(x) {
+#' y <- x[x[1] > 1, ]
+#' y <- cbind(y, y[1] + 1L)
+#' })
+#' # the result
+#' # a b c d
+#' # 2 2 2 3
+#' # 3 3 3 4
--- End diff --
These seem to be the same examples from dapply ;)
Maybe we can have different ones here :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]