zero323 commented on a change in pull request #29813:
URL: https://github.com/apache/spark/pull/29813#discussion_r491707650



##########
File path: R/pkg/R/DataFrame.R
##########
@@ -2863,11 +2863,19 @@ setMethod("unionAll",
 #' \code{UNION ALL} and \code{UNION DISTINCT} in SQL as column positions are 
not taken
 #' into account. Input SparkDataFrames can have different data types in the 
schema.
 #'
+#' When the parameter `allowMissingColumns` is `TRUE`, this function allows
+#' different set of column names between two `SparkDataFrames`.
+#' Missing columns at each side, will be filled with null values.
+#' The missing columns at left `SparkDataFrame` will be added at the end in 
the schema
+#' of the union result.
+#'
 #' Note: This does not remove duplicate rows across the two SparkDataFrames.
 #' This function resolves columns by name (not by position).
 #'
 #' @param x A SparkDataFrame
 #' @param y A SparkDataFrame
+#' @param allowMissingColumns logical
+#' @param ... further arguments to be passed to or from other methods.

Review comment:
       That's correct, but I am not sure if there is a better way of handling 
that.
   
   Right now we have generic as follows:
   
   ```R
   setGeneric("unionByName", function(x, y, ...) { 
standardGeneric("unionByName") })
   ```
   
   ‒ as far as I am aware this is the convention for handling optional 
arguments we use in SparkR.
   
   Technically speaking we could have
   
   ```R
   setGeneric("unionByName", function(x, y, allowMissingColumns) { 
standardGeneric("unionByName") })
   ```
   
   but then we'd have to support
   
   ```R
   signature(x = "SparkDataFrame", y = "SparkDataFrame", allowMissingColumns = 
"missing")
   ```
   
   and 
   
   ```R
   signature(x = "SparkDataFrame", y = "SparkDataFrame", allowMissingColumns = 
"logical")
   ```
   
   if I am not mistaken, and in the past I've been told that's too much.
   
   Do I miss something?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to