[GitHub] spark pull request: [SPARK-12919][SPARKR] Implement dapply() on Da...

sun-rui Wed, 20 Apr 2016 04:59:58 -0700

Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12493#discussion_r60394359
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -439,6 +439,10 @@ setGeneric("covar_samp", function(col1, col2) 
{standardGeneric("covar_samp") })
     #' @export
     setGeneric("covar_pop", function(col1, col2) {standardGeneric("covar_pop") 
})
     
    +#' @rdname dapply
    +#' @export
    +setGeneric("dapply", function(x, func, schema = NULL) { 
standardGeneric("dapply") })
    --- End diff --
    
    Allowing schema to be NULL is influenced by internals of SparkR RDD API. 
Two purpose behind this are:
    1. For multiple successive calls to dapply(), users actually are not 
necessary to provide schemas except for the last call to dapply(). This ease 
programming.
    2. A kind of optimization. Suppose a user collect the data back to R side 
immediately after a call to dapply() without any futher DataFrame operations, 
serialized R data can improve performance by avoiding the un-necessary serde 
between JVM types and R types.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-12919][SPARKR] Implement dapply() on Da...

Reply via email to