Synopsis: In multistep expressions, e.g.:
fun <- function(a, b, c) (a + b) / c
`fun` returns an unexpected and non-intuative result when:
- a, b, and c are vectors
- c is the longest vector
- the lengths of a, b, and c are not even multiples of one another.
In this case, because of the way vectors are being recycled:
fun(a, b, c)
returns a different result from:
mapply(fun, a, b, c)
Description:
The R documentation in "An Introduction to R" Section 2.2 states:
"Vectors occurring in the same expression need not all be of the same length.
If they are not, the value of the expression is a vector with the same length
as the longest vector which occurs in the expression. Shorter vectors in the
expression are recycled as often as need be (perhaps fractionally) until they
match the length of the longest vector."
Based on this documentation, I would expect that all vectors in an expression
are recycled to match the length of the longest vector before element-wise
operations are performed. However, R appears to perform recycling independently
at each operation, which produces different results than the documented
behavior would suggest.
Minimal reproducible example:
```r
# Simple function demonstrating the issue
f <- function(a, b, c) {
(a + b) / c
}
# Vectors of different lengths (not multiples of each other)
a <- c(1, 2, 3, 4)
b <- c(10, 20, 30)
c <- c(100, 200, 300, 400, 500, 600, 700)
# Direct call
direct_result <- f(a, b, c)
# mapply (recycles all inputs to length 7 first, then applies element-wise)
mapply_result <- mapply(f, a, b, c)
# Compare results
cat("Direct call result:\n")
print(direct_result)
cat("\nmapply result:\n")
print(mapply_result)
cat("\nResults are identical:", identical(direct_result, mapply_result), "\n")
sessionInfo()
```
Output:
f <- function(a, b, c) {
+ (a + b) / c
+ }
a <- c(1, 2, 3, 4)
b <- c(10, 20, 30)
c <- c(100, 200, 300, 400, 500, 600, 700)
direct_result <- f(a, b, c)
Warning messages:
1: In a + b :
longer object length is not a multiple of shorter object length
2: In (a + b)/c :
longer object length is not a multiple of shorter object length
mapply_result <- mapply(f, a, b, c)
Warning messages:
1: In mapply(f, a, b, c) :
longer argument not a multiple of length of shorter
2: In mapply(f, a, b, c) :
longer argument not a multiple of length of shorter
print(direct_result)
[1] 0.11000000 0.11000000 0.11000000 0.03500000 0.02200000 0.03666667 0.04714286
print(mapply_result)
[1] 0.11000000 0.11000000 0.11000000 0.03500000 0.04200000 0.05333333 0.01857143
cat("\nResults are identical:", identical(direct_result, mapply_result), "\n")
Results are identical: FALSE
sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 22.2
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/Denver
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.3.3 tools_4.3.3
Explanation of what is happening:
In the direct call, recycling occurs independently at each binary operation:
1. `a + b` is evaluated first: `a` (length 4) and `b` (length 3) are recycled
to length 4, producing `[11, 22, 33, 14]`
2. The length-4 result is then divided by `c` (length 7): the length-4 result
is recycled to length 7 as `[11, 22, 33, 14, 11, 22, 33]`, then divided by
`c`
3. Final result: `[0.11, 0.11, 0.11, 0.035, 0.022, 0.0367, 0.0471]`
However, based on my reading of the documentation, I would expect all three
vectors (`a`, `b`, and `c`) to be recycled to length 7 (the length of the
longest vector in the expression) before any operations are performed, which
is what `mapply` does:
- `a` recycled to length 7: `[1, 2, 3, 4, 1, 2, 3]`
- `b` recycled to length 7: `[10, 20, 30, 10, 20, 30, 10]`
- `c` unchanged: `[100, 200, 300, 400, 500, 600, 700]`
- Then `(a + b) / c` computed element-wise: `[0.11, 0.11, 0.11, 0.035, 0.042,
0.0533, 0.0186]`
The key difference is at positions 5, 6, and 7. In the direct call, the
intermediate result `(a + b)` has length 4 and is recycled independently of
the original vectors when dividing by `c`.
Question:
Does this example represent a bug in R's recycling behavior, or is the
documentation in Section 2.2 not intended to describe how recycling works in
expressions with multiple binary operations? If the current behavior is
intentional, could the documentation be clarified to explain that recycling
occurs at each binary operation rather than globally across the expression?
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel