Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/18025#discussion_r117602305
--- Diff: R/pkg/R/generics.R ---
@@ -396,67 +397,81 @@ setGeneric("agg", function (x, ...) {
standardGeneric("agg") })
#' @param object x a SparkDataFrame or a Column
#' @param data new name to use
#' @return a SparkDataFrame or a Column
+#' @noRd
NULL
#' @rdname arrange
#' @export
+#' @noRd
setGeneric("arrange", function(x, col, ...) { standardGeneric("arrange") })
#' @rdname as.data.frame
#' @export
+#' @noRd
setGeneric("as.data.frame",
function(x, row.names = NULL, optional = FALSE, ...) {
standardGeneric("as.data.frame")
})
#' @rdname attach
#' @export
+#' @noRd
setGeneric("attach")
#' @rdname cache
#' @export
+#' @noRd
setGeneric("cache", function(x) { standardGeneric("cache") })
#' @rdname checkpoint
#' @export
+#' @noRd
setGeneric("checkpoint", function(x, eager = TRUE) {
standardGeneric("checkpoint") })
#' @rdname coalesce
#' @param x a Column or a SparkDataFrame.
--- End diff --
wait, I think some wires are crossed. let me clarify. let's take `coalesce`
as an example.
there are 2 coalesce, one `coalesce(df)` like repartition, and one
`coalesce(df$foo)` on a column like in SQL. so therefore, these are in fact 2
`coalesce` and `x is either a SparkDataFrame or a Column`.
and to elaborate, the history behind that approach we have today is because
we use to have this
```
@param x a Column
...
function(x = "Column"...)
```
and at the same time, in a different .R file
```
@param x a SparkDataFrame
...
function(x = "SparkDataFrame"
```
they seem fine when we write the code and it seems logical/easy to
maintain, but when the rd/doc page is generated it has
```
x a SparkDataFrame
x a Column
```
here `x` is explained twice. worse, the order is largely random (it's the
alphabetic order of the .R file)
and it is going against the standard R pattern of one description line for
each parameter with the choice of type separated by `.. or..` like
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/lm.html, and not to
mention CRAN check I think complain about parameter documented more than once.
so in short, having one `@param x a SparkDataFrame or a Column` is
intentional. since this is describing 2 things, from discussions back then it
feels more nature to put it some where independent - in fact like you are
touching on, I'd argue it's better to look it up in generic.R rather than
trying to figure out what other existing overload class that method already has.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]