Date: Wed, 17 Dec 2025 11:50:21 -0800
From: Josiah Parry<[email protected]>
I wanted to write to understand what limitations there may be with making
set operations in base S3 generic functions. Are there any technical
limitations as to why this wouldn't be possible?
The set ops {intersect, union, setdiff, setequal} and %in% and %notin% are
all
generic-like by virtue of composing generic functions for vector-like
classes.
If you have a vector-like class and you define (as needed) methods for '[',
'c', 'mtfrm', 'names<-', and 'unique', then the set ops work automatically
and
correctly. The built-in classes 'Date', 'POSIXct', 'POSIXlt', 'difftime',
and
'factor' provide a good model here.
S3 generic set ops would only really support those non-vector-like classes
for
which set ops happen to have a meaningful definition: 'nb' is a good
example,
but are there many others?
A benefit of having a minimal set of generic functions in base (and
composing
them to form a larger set of generic-like functions) is that it limits
growth
of the base namespace. Every new generic function base::generic requires a
corresponding default method base::generic.default.
In writing a reply in R-Sig-Geo (1) today, I was reminded that `spdep`'s
set operations are not exported S3 methods—e.g. must use
spdep::union.nb()—because there is no generic declared in `base`.
I think the R ecosystem would benefit greatly from generics declared in
base for these methods. For example, the `generics` (2) package was
published in 2018 including S3 generics for set operations masking base.
`generics` has 189 reverse imports, I suspect quite a few of them are for
set operations.
Generics GitHub usage (duplicates ofc from forks)
- 353 results for importFrom(generics, union) (3)
- 361 results for importFrom(generics, intersect) (4)
- 355 results for importFrom(generics,setdiff) (5)
There are also a number of manual implementations of an S3 generic for
set
ops that mask base. See the following search GitHub results
- 249 results for UseMethod("union") (6)
- 208 results for UseMethod("intersect") (7)
- 199 results for UseMethod("setdiff") (8)
My guess is that in most of these examples masking the base set ops would
not
be necessary if some vector-like class were implemented more rigorously,
i.e.,
with methods for '[', 'c', etc.
Mikael
references :
1.https://stat.ethz.ch/pipermail/r-sig-geo/2025-December/029582.html
2.https://cran.r-project.org/src/contrib/Archive/generics
3.
https://github.com/search?q=importFrom%28generics%2Cunion%29+&type=code
4.
https://github.com/search?q=importFrom%28generics%2Cintersect%29+&type=code
5.
https://github.com/search?q=importFrom%28generics%2Csetdiff%29+&type=code
6.
https://github.com/search?q=UseMethod%28%22union%22%29+language%3AR&type=code
7.
https://github.com/search?q=UseMethod%28%22intersect%22%29+language%3AR&type=code
8.
https://github.com/search?q=UseMethod%28%22setdiff%22%29+language%3AR&type=code
[[alternative HTML version deleted]]