Synopsis: In multistep expressions, e.g.:
fun <- function(a, b, c) (a + b) / c
`fun` returns an unexpected and non-intuative result when:
- a, b, and c are vectors
- c is the longest vector
- the lengths of a, b, and c are not even multiples of one another.
In this case, because of the way vectors are being recycled:
fun(a, b, c)
returns a different result from:
mapply(fun, a, b, c)
Description:
The R documentation in "An Introduction to R" Section 2.2 states:
"Vectors occurring in the same expression need not all be of the
same length.
If they are not, the value of the expression is a vector with the
same length
as the longest vector which occurs in the expression. Shorter
vectors in the
expression are recycled as often as need be (perhaps fractionally)
until they
match the length of the longest vector."
Based on this documentation, I would expect that all vectors in an
expression
are recycled to match the length of the longest vector before
element-wise
operations are performed. However, R appears to perform recycling
independently
at each operation, which produces different results than the documented
behavior would suggest.
Minimal reproducible example:
```r
# Simple function demonstrating the issue
f <- function(a, b, c) {
(a + b) / c
}
# Vectors of different lengths (not multiples of each other)
a <- c(1, 2, 3, 4)
b <- c(10, 20, 30)
c <- c(100, 200, 300, 400, 500, 600, 700)
# Direct call
direct_result <- f(a, b, c)
# mapply (recycles all inputs to length 7 first, then applies
element-wise)
mapply_result <- mapply(f, a, b, c)
# Compare results
cat("Direct call result:\n")
print(direct_result)
cat("\nmapply result:\n")
print(mapply_result)
cat("\nResults are identical:", identical(direct_result,
mapply_result), "\n")
sessionInfo()
```
Output:
f <- function(a, b, c) {
+ (a + b) / c
+ }
a <- c(1, 2, 3, 4)
b <- c(10, 20, 30)
c <- c(100, 200, 300, 400, 500, 600, 700)
direct_result <- f(a, b, c)
Warning messages:
1: In a + b :
longer object length is not a multiple of shorter object length
2: In (a + b)/c :
longer object length is not a multiple of shorter object length
mapply_result <- mapply(f, a, b, c)
Warning messages:
1: In mapply(f, a, b, c) :
longer argument not a multiple of length of shorter
2: In mapply(f, a, b, c) :
longer argument not a multiple of length of shorter
print(direct_result)
[1] 0.11000000 0.11000000 0.11000000 0.03500000 0.02200000 0.03666667
0.04714286
print(mapply_result)
[1] 0.11000000 0.11000000 0.11000000 0.03500000 0.04200000 0.05333333
0.01857143
cat("\nResults are identical:", identical(direct_result,
mapply_result), "\n")
Results are identical: FALSE
sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 22.2
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=C
time zone: America/Denver
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.3.3 tools_4.3.3
Explanation of what is happening:
In the direct call, recycling occurs independently at each binary
operation:
1. `a + b` is evaluated first: `a` (length 4) and `b` (length 3) are
recycled
to length 4, producing `[11, 22, 33, 14]`
2. The length-4 result is then divided by `c` (length 7): the
length-4 result
is recycled to length 7 as `[11, 22, 33, 14, 11, 22, 33]`, then
divided by `c`
3. Final result: `[0.11, 0.11, 0.11, 0.035, 0.022, 0.0367, 0.0471]`
However, based on my reading of the documentation, I would expect all
three
vectors (`a`, `b`, and `c`) to be recycled to length 7 (the length of
the
longest vector in the expression) before any operations are
performed, which
is what `mapply` does:
- `a` recycled to length 7: `[1, 2, 3, 4, 1, 2, 3]`
- `b` recycled to length 7: `[10, 20, 30, 10, 20, 30, 10]`
- `c` unchanged: `[100, 200, 300, 400, 500, 600, 700]`
- Then `(a + b) / c` computed element-wise: `[0.11, 0.11, 0.11,
0.035, 0.042, 0.0533, 0.0186]`
The key difference is at positions 5, 6, and 7. In the direct call, the
intermediate result `(a + b)` has length 4 and is recycled
independently of
the original vectors when dividing by `c`.
Question:
Does this example represent a bug in R's recycling behavior, or is the
documentation in Section 2.2 not intended to describe how recycling
works in
expressions with multiple binary operations? If the current behavior is
intentional, could the documentation be clarified to explain that
recycling
occurs at each binary operation rather than globally across the
expression?
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel