[GitHub] spark pull request: [SPARK-12919][SPARKR] Implement dapply() on Da...

shivaram Thu, 21 Apr 2016 12:49:48 -0700

Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12493#discussion_r60644001
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -439,6 +439,10 @@ setGeneric("covar_samp", function(col1, col2) 
{standardGeneric("covar_samp") })
     #' @export
     setGeneric("covar_pop", function(col1, col2) {standardGeneric("covar_pop") 
})
     
    +#' @rdname dapply
    +#' @export
    +setGeneric("dapply", function(x, func, schema = NULL) { 
standardGeneric("dapply") })
    --- End diff --
    
    My vote would be to gradually add more complexity
    1. `dapply` with schema - this PR
    2. `dapplyCollect` needing no schema - a follow up PR
    3. Say auto-schema detection in the worker.R file to figure out the schema 
of the resulting data.frame in the UDF. If the auto-schema detection fails, we 
fall back to serialized R data for specific columns etc. But this will be 
complex and can wait. 
    
    @rxin @davies Does this sound good ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-12919][SPARKR] Implement dapply() on Da...

Reply via email to