Re: [Rd] Suggestion: default print method for S3 generics could offer some insights on '...' among registered methods

Duncan Murdoch Tue, 10 Jun 2025 12:11:03 -0700

On 2025-06-10 1:46 p.m., Michael Chirico wrote:

When it comes to adding more info in the help pages, we'd be remiss
not to point out the great engineering work the tidyverse folks have
put in to this end:


https://dplyr.tidyverse.org/reference/mutate.html

Methods available in currently loaded packages: dbplyr (tbl_lazy), dplyr 
(data.frame) .


https://github.com/tidyverse/dplyr/blob/9df9d63234f34837fa058bbd081a42dfb8a4eeb0/R/mutate.R#L78-L79
https://github.com/tidyverse/dplyr/blob/be3e3a05fd0081cb53168d6aedb417d62139b75d/R/doc-methods.R#L57-L75


Those lines arise from this entry in the Rd file:

  \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("mutate")}

The dplyr:::methods_rd function basically just formats the results of acall similar to `methods("mutate")`. All the rest of that is by R Core,not tidyverse folks.


Duncan Murdoch


On Mon, Jun 9, 2025 at 2:25 PM Lluís Revilla <[email protected]> wrote:


Hi,

I agree that showing that there are other methods might help.
The print.function method could be modified to add this in addition to 
print.default output.

But I guess (new) users would check the help page with ?as.data.frame and not 
print the method or use args() (if they don't check with their prefered 
LLM/agent).
Would it be helpful to also report these numbers on the documentation page of 
the generic too?

Best,

Lluís


PS: Trying to see what happens with ? and looking for a specific method I found that on 
my computer with R-devel (2025-06-08 r88288), the expressions below the "Not 
run:" on ? examples raise errors.


On Mon, 9 Jun 2025 at 08:07, Michael Chirico <[email protected]> wrote:


Thanks Josh,

With fresh eyes, it's definitely information overload for the
suggested output to take up more space than the function body itself.

I'm not sure your suggestion gets at the heart of the issue, though,
which is about steering the user with regards to interpreting '...'
they see in the printout.

Therefore I would suggest something like this as the economized
version of my original suggestion:

print(rbind)

function (..., deparse.level = 1)
# ...
<environment: namespace:base>
+ 1 other method defines 3 other arguments. See methods(rbind).

print(as.data.frame)

function (x, row.names = NULL, optional = FALSE, ...)
# ...
<environment: namespace:base>
+ 29 other methods define 11 other arguments. See methods(as.data.frame).

Mike C

On Sun, Jun 8, 2025 at 3:57 PM Joshua Ulrich <[email protected]> wrote:


Hi Mike,

On Fri, Jun 6, 2025 at 1:59 PM Michael Chirico
<[email protected]> wrote:


There is a big difference in how to think of '...' for non-generic
functions like data.frame() vs. S3 generics.

In the former, it means "any number of inputs" [e.g. columns]; in the
latter, it means "any number of inputs [think c()], as well as any
arguments that might be interpreted by class implementations".

Understanding the difference for a given generic can require carefully
reading lots of documentation. print(<generic>), which is useful for
so many other contexts, can be a dead end.

One idea is to extend the print() method to suggest to the reader
which other arguments are available (among registered generics). Often
?<generic> will include the most common implementation, but not always
so.

For rbind (in a --vanilla session), we currently have one method,
rbind.data.frame, that offers three arguments not present in the
generic: make.row.names, stringsAsFactors, and factor.exclude. The
proposal would be to mention this in the print(rbind) output somehow,
e.g.

print(rbind)

function (..., deparse.level = 1)
.Internal(rbind(deparse.level, ...))
<bytecode: 0x73d4fd824e20>
<environment: namespace:base>

+Other arguments implemented by methods
+  factor.exclude: rbind.data.frame
+  make.row.names: rbind.data.frame
+  stringsAsFactors: rbind.data.frame

I suggest grouping by argument, not generic, although something like
this could be OK too:

+Signatures of other methods
+  rbind.data.frame(..., deparse.level = 1, make.row.names = TRUE,
stringsAsFactors = FALSE,
+      factor.exclude = TRUE)

Where it gets more interesting is when there are many methods, e.g.
for as.data.frame (again, in a --vanilla session):

print(as.data.frame)

function (x, row.names = NULL, optional = FALSE, ...)
{
     if (is.null(x))
         return(as.data.frame(list()))
     UseMethod("as.data.frame")
}
<bytecode: 0x73d4fc1e70d0>
<environment: namespace:base>

+Other arguments implemented by methods
+  base: as.data.frame.table
+  check.names: as.data.frame.list
+  col.names: as.data.frame.list
+  cut.names: as.data.frame.list
+  fix.empty.names: as.data.frame.list
+  make.names: as.data.frame.matrix, as.data.frame.model.matrix
+  new.names: as.data.frame.list
+  nm: as.data.frame.bibentry, as.data.frame.complex, as.data.frame.Date,
+    as.data.frame.difftime, as.data.frame.factor, as.data.frame.integer,
+    as.data.frame.logical, as.data.frame.noquote, as.data.frame.numeric,
+    as.data.frame.numeric_version, as.data.frame.ordered,
+    as.data.frame.person, as.data.frame.POSIXct, as.data.frame.raw
+  responseName: as.data.frame.table
+  sep: as.data.frame.table
+  stringsAsFactors: as.data.frame.character, as.data.frame.list,
+    as.data.frame.matrix, as.data.frame.table

Or

+Signatures of other methods
+  as.data.frame.aovproj(x, ...)
+  as.data.frame.array(x, row.names = NULL, optional = FALSE, ...)
+  as.data.frame.AsIs(x, row.names = NULL, optional = FALSE, ...)
+  as.data.frame.bibentry(x, row.names = NULL, optional = FALSE, ...,
nm = deparse1(substitute(x)))
+  as.data.frame.character(x, ..., stringsAsFactors = FALSE)
+  as.data.frame.citation(x, row.names = NULL, optional = FALSE, ...)
+  as.data.frame.complex(x, row.names = NULL, optional = FALSE, ...,
nm = deparse1(substitute(x)))
+  as.data.frame.data.frame(x, row.names = NULL, ...)
+  as.data.frame.Date(x, row.names = NULL, optional = FALSE, ..., nm =
deparse1(substitute(x)))
+  as.data.frame.default(x, ...)
+  as.data.frame.difftime(x, row.names = NULL, optional = FALSE, ...,
nm = deparse1(substitute(x)))
+  as.data.frame.factor(x, row.names = NULL, optional = FALSE, ..., nm
= deparse1(substitute(x)))
+  as.data.frame.ftable(x, row.names = NULL, optional = FALSE, ...)
+  as.data.frame.integer(x, row.names = NULL, optional = FALSE, ...,
nm = deparse1(substitute(x)))
+  as.data.frame.list(x, row.names = NULL, optional = FALSE, ...,
cut.names = FALSE,
+      col.names = names(x), fix.empty.names = TRUE, new.names =
!missing(col.names),
+      check.names = !optional, stringsAsFactors = FALSE)
+  as.data.frame.logical(x, row.names = NULL, optional = FALSE, ...,
nm = deparse1(substitute(x)))
+  as.data.frame.logLik(x, ...)
+  as.data.frame.matrix(x, row.names = NULL, optional = FALSE,
make.names = TRUE,
+      ..., stringsAsFactors = FALSE)
+  as.data.frame.model.matrix(x, row.names = NULL, optional = FALSE,
make.names = TRUE,
+      ...)
+  as.data.frame.noquote(x, row.names = NULL, optional = FALSE, ...,
nm = deparse1(substitute(x)))
+  as.data.frame.numeric(x, row.names = NULL, optional = FALSE, ...,
nm = deparse1(substitute(x)))
+  as.data.frame.numeric_version(x, row.names = NULL, optional =
FALSE, ..., nm = deparse1(substitute(x)))
+  as.data.frame.ordered(x, row.names = NULL, optional = FALSE, ...,
nm = deparse1(substitute(x)))
+  as.data.frame.person(x, row.names = NULL, optional = FALSE, ..., nm
= deparse1(substitute(x)))
+  as.data.frame.POSIXct(x, row.names = NULL, optional = FALSE, ...,
nm = deparse1(substitute(x)))
+  as.data.frame.POSIXlt(x, row.names = NULL, optional = FALSE, ...)
+  as.data.frame.raw(x, row.names = NULL, optional = FALSE, ..., nm =
deparse1(substitute(x)))
+  as.data.frame.table(x, row.names = NULL, ..., responseName =
"Freq", stringsAsFactors = TRUE,
+      sep = "", base = list(LETTERS))
+  as.data.frame.ts(x, ...)

Obviously that's a bit more cluttered, but as.data.frame() should be a
pretty unusual case. It also highlights better the differences in the
two approaches: the former economizes on space and focuses on what
sorts of arguments are available; the latter shows the defaults, does
not hide the arguments shared with the generic, and will always
produce as many lines as there are methods.

There are other edge cases to think through (multiple registrations,
interactions with S4, primitives, ...), but I want to first check with
the list if this looks workable & valuable enough to pursue.

I like and appreciate the intent behind your suggestion, though I
don't like all the extra output from printing the generic. I want to
look at the function body when I print it. And as you show, it can
output a lot of information you're probably not interested in.

What about adding the number of methods to printed output for
generics, and a suggestion to use `methods(some_generic)` to get a
list of them? Then you can use help(some_method) or args(some_method)
to get more information about the specific method(s) you're interested
in.

Best,
Josh

Mike C

----

Code that helped with the above:

f = as.data.frame
# NB: methods() and getAnywhere() require {utils}
m = methods(f)
generic_args = names(formals(f))
f_methods = lapply(m, \(fn) getAnywhere(fn)$objs[[1L]])
names(f_methods) = m
new_args = sapply(f_methods, \(g) setdiff(names(formals(g)), generic_args))
with( # group by argument name
   data.frame(method = rep(names(new_args), lengths(new_args)), arg =
unlist(new_args), row.names=NULL),
   {tbl = tapply(method, arg, toString); writeLines(paste0(names(tbl),
": ", tbl))}
)
signatures=sapply(f_methods, \(g) paste(head(format(args(g)), -1),
collapse="\n"))
writeLines(paste0(names(signatures), gsub("^\\s*function\\s*", "", signatures)))

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com


______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Suggestion: default print method for S3 generics could offer some insights on '...' among registered methods

Reply via email to