[GitHub] HyukjinKwon commented on a change in pull request #23746: [SPARK-26761][SQL][R] Vectorized R gapply() implementation

GitBox Tue, 12 Feb 2019 00:41:23 -0800

HyukjinKwon commented on a change in pull request #23746: [SPARK-26761][SQL][R] 
Vectorized R gapply() implementation
URL: https://github.com/apache/spark/pull/23746#discussion_r255849370


 ##########
 File path: R/pkg/R/group.R
 ##########
 @@ -229,6 +229,24 @@ gapplyInternal <- function(x, func, schema) {
   if (is.character(schema)) {
     schema <- structType(schema)
   }
+  arrowEnabled <- sparkR.conf("spark.sql.execution.arrow.enabled")[[1]] == 
"true"
+  if (arrowEnabled) {
+    if (is.null(schema)) {
+      stop(paste0("Arrow optimization does not support gapplyCollect yet. 
Please use ",
+                  "'collect' and 'gapply' APIs instead."))
 
 Review comment:
   @felixcheung, I was double checking one by one and realised that I need some 
more fixes for `gapplyCollect()`. Currently, I disabled it when Arrow is 
enabled:
   
   ```r
   > df <- createDataFrame(mtcars)
   > gapplyCollect(df,
   +               "gear",
   +               function(key, group) {
   +                 data.frame(gear = key[[1]], disp = mean(group$disp) > 
group$disp)
   +               })
   Error in gapplyInternal(x, func, NULL) :
     Arrow optimization does not support gapplyCollect yet. Please use 
'collect' and 'gapply' APIs instead.
   ```
   
    I need few line changes (I guess between 10 ~ 20 lines) to support gapply 
but let me do this separately with a separate set of tests.
   
   I file a JIRA here, https://issues.apache.org/jira/browse/SPARK-26858. I 
will do this too as soon as this PR gets merged.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] HyukjinKwon commented on a change in pull request #23746: [SPARK-26761][SQL][R] Vectorized R gapply() implementation

Reply via email to