Date: Fri, 6 Jun 2025 11:59:08 -0700
From: Michael Chirico<michaelchiri...@gmail.com>
There is a big difference in how to think of '...' for non-generic
functions like data.frame() vs. S3 generics.
In the former, it means "any number of inputs" [e.g. columns]; in the
latter, it means "any number of inputs [think c()], as well as any
arguments that might be interpreted by class implementations".
Understanding the difference for a given generic can require carefully
reading lots of documentation. print(<generic>), which is useful for
so many other contexts, can be a dead end.
One idea is to extend the print() method to suggest to the reader
which other arguments are available (among registered generics). Often
?<generic> will include the most common implementation, but not always
so.
For rbind (in a --vanilla session), we currently have one method,
rbind.data.frame, that offers three arguments not present in the
generic: make.row.names, stringsAsFactors, and factor.exclude. The
proposal would be to mention this in the print(rbind) output somehow,
e.g.
print(rbind)
function (..., deparse.level = 1)
.Internal(rbind(deparse.level, ...))
<bytecode: 0x73d4fd824e20>
<environment: namespace:base>
+Other arguments implemented by methods
+ factor.exclude: rbind.data.frame
+ make.row.names: rbind.data.frame
+ stringsAsFactors: rbind.data.frame
I suggest grouping by argument, not generic, although something like
this could be OK too:
+Signatures of other methods
+ rbind.data.frame(..., deparse.level = 1, make.row.names = TRUE,
stringsAsFactors = FALSE,
+ factor.exclude = TRUE)
Where it gets more interesting is when there are many methods, e.g.
for as.data.frame (again, in a --vanilla session):
print(as.data.frame)
function (x, row.names = NULL, optional = FALSE, ...)
{
if (is.null(x))
return(as.data.frame(list()))
UseMethod("as.data.frame")
}
<bytecode: 0x73d4fc1e70d0>
<environment: namespace:base>
+Other arguments implemented by methods
+ base: as.data.frame.table
+ check.names: as.data.frame.list
+ col.names: as.data.frame.list
+ cut.names: as.data.frame.list
+ fix.empty.names: as.data.frame.list
+ make.names: as.data.frame.matrix, as.data.frame.model.matrix
+ new.names: as.data.frame.list
+ nm: as.data.frame.bibentry, as.data.frame.complex, as.data.frame.Date,
+ as.data.frame.difftime, as.data.frame.factor, as.data.frame.integer,
+ as.data.frame.logical, as.data.frame.noquote, as.data.frame.numeric,
+ as.data.frame.numeric_version, as.data.frame.ordered,
+ as.data.frame.person, as.data.frame.POSIXct, as.data.frame.raw
+ responseName: as.data.frame.table
+ sep: as.data.frame.table
+ stringsAsFactors: as.data.frame.character, as.data.frame.list,
+ as.data.frame.matrix, as.data.frame.table
Or
+Signatures of other methods
+ as.data.frame.aovproj(x, ...)
+ as.data.frame.array(x, row.names = NULL, optional = FALSE, ...)
+ as.data.frame.AsIs(x, row.names = NULL, optional = FALSE, ...)
+ as.data.frame.bibentry(x, row.names = NULL, optional = FALSE, ...,
nm = deparse1(substitute(x)))
+ as.data.frame.character(x, ..., stringsAsFactors = FALSE)
+ as.data.frame.citation(x, row.names = NULL, optional = FALSE, ...)
+ as.data.frame.complex(x, row.names = NULL, optional = FALSE, ...,
nm = deparse1(substitute(x)))
+ as.data.frame.data.frame(x, row.names = NULL, ...)
+ as.data.frame.Date(x, row.names = NULL, optional = FALSE, ..., nm =
deparse1(substitute(x)))
+ as.data.frame.default(x, ...)
+ as.data.frame.difftime(x, row.names = NULL, optional = FALSE, ...,
nm = deparse1(substitute(x)))
+ as.data.frame.factor(x, row.names = NULL, optional = FALSE, ..., nm
= deparse1(substitute(x)))
+ as.data.frame.ftable(x, row.names = NULL, optional = FALSE, ...)
+ as.data.frame.integer(x, row.names = NULL, optional = FALSE, ...,
nm = deparse1(substitute(x)))
+ as.data.frame.list(x, row.names = NULL, optional = FALSE, ...,
cut.names = FALSE,
+ col.names = names(x), fix.empty.names = TRUE, new.names =
!missing(col.names),
+ check.names = !optional, stringsAsFactors = FALSE)
+ as.data.frame.logical(x, row.names = NULL, optional = FALSE, ...,
nm = deparse1(substitute(x)))
+ as.data.frame.logLik(x, ...)
+ as.data.frame.matrix(x, row.names = NULL, optional = FALSE,
make.names = TRUE,
+ ..., stringsAsFactors = FALSE)
+ as.data.frame.model.matrix(x, row.names = NULL, optional = FALSE,
make.names = TRUE,
+ ...)
+ as.data.frame.noquote(x, row.names = NULL, optional = FALSE, ...,
nm = deparse1(substitute(x)))
+ as.data.frame.numeric(x, row.names = NULL, optional = FALSE, ...,
nm = deparse1(substitute(x)))
+ as.data.frame.numeric_version(x, row.names = NULL, optional =
FALSE, ..., nm = deparse1(substitute(x)))
+ as.data.frame.ordered(x, row.names = NULL, optional = FALSE, ...,
nm = deparse1(substitute(x)))
+ as.data.frame.person(x, row.names = NULL, optional = FALSE, ..., nm
= deparse1(substitute(x)))
+ as.data.frame.POSIXct(x, row.names = NULL, optional = FALSE, ...,
nm = deparse1(substitute(x)))
+ as.data.frame.POSIXlt(x, row.names = NULL, optional = FALSE, ...)
+ as.data.frame.raw(x, row.names = NULL, optional = FALSE, ..., nm =
deparse1(substitute(x)))
+ as.data.frame.table(x, row.names = NULL, ..., responseName =
"Freq", stringsAsFactors = TRUE,
+ sep = "", base = list(LETTERS))
+ as.data.frame.ts(x, ...)
Obviously that's a bit more cluttered, but as.data.frame() should be a
pretty unusual case. It also highlights better the differences in the
two approaches: the former economizes on space and focuses on what
sorts of arguments are available; the latter shows the defaults, does
not hide the arguments shared with the generic, and will always
produce as many lines as there are methods.
There are other edge cases to think through (multiple registrations,
interactions with S4, primitives, ...), but I want to first check with
the list if this looks workable & valuable enough to pursue.
Mike C
----
Code that helped with the above:
f = as.data.frame
# NB: methods() and getAnywhere() require {utils}
m = methods(f)
generic_args = names(formals(f))
f_methods = lapply(m, \(fn) getAnywhere(fn)$objs[[1L]])
names(f_methods) = m
new_args = sapply(f_methods, \(g) setdiff(names(formals(g)), generic_args))
with( # group by argument name
data.frame(method = rep(names(new_args), lengths(new_args)), arg =
unlist(new_args), row.names=NULL),
{tbl = tapply(method, arg, toString); writeLines(paste0(names(tbl),
": ", tbl))}
)
signatures=sapply(f_methods, \(g) paste(head(format(args(g)), -1),
collapse="\n"))
writeLines(paste0(names(signatures), gsub("^\\s*function\\s*", "", signatures)))