Github user sun-rui commented on a diff in the pull request:
https://github.com/apache/spark/pull/12493#discussion_r60394359
--- Diff: R/pkg/R/generics.R ---
@@ -439,6 +439,10 @@ setGeneric("covar_samp", function(col1, col2)
{standardGeneric("covar_samp") })
#' @export
setGeneric("covar_pop", function(col1, col2) {standardGeneric("covar_pop")
})
+#' @rdname dapply
+#' @export
+setGeneric("dapply", function(x, func, schema = NULL) {
standardGeneric("dapply") })
--- End diff --
Allowing schema to be NULL is influenced by internals of SparkR RDD API.
Two purpose behind this are:
1. For multiple successive calls to dapply(), users actually are not
necessary to provide schemas except for the last call to dapply(). This ease
programming.
2. A kind of optimization. Suppose a user collect the data back to R side
immediately after a call to dapply() without any futher DataFrame operations,
serialized R data can improve performance by avoiding the un-necessary serde
between JVM types and R types.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]