[GitHub] spark pull request: [SPARK-12919][SPARKR] Implement dapply() on Da...

davies Wed, 20 Apr 2016 09:53:22 -0700

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12493#discussion_r60446033
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -439,6 +439,10 @@ setGeneric("covar_samp", function(col1, col2) 
{standardGeneric("covar_samp") })
     #' @export
     setGeneric("covar_pop", function(col1, col2) {standardGeneric("covar_pop") 
})
     
    +#' @rdname dapply
    +#' @export
    +setGeneric("dapply", function(x, func, schema = NULL) { 
standardGeneric("dapply") })
    --- End diff --
    
    The two use cases could be done by:
    1) combine multiple directly chained functions into single one, this is 
also useful even we support the no-schema mapPartition()
    2) The serializer in JVM should be much faster than that in R. In order to 
speedup collect for R, maybe another column format (column in R) could be 
faster than current (rows in R and Java). This could be another topic.
    
    Once we expose the raw format to users, user may see it as public 
interface, may save the binary as parquet file, it's hard to maintain it in 
compatible way.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-12919][SPARKR] Implement dapply() on Da...

Reply via email to