Thanks. 'by' is implemented by tapply, and it seems to behave rather
erratically for empty inputs:
> FUN = function(x) x[1,]
> FUNx <- function(x) FUN(df[x, , drop = FALSE])
> tapply(seq_len(10), df$b, FUNx) %>% storage.mode # array
> tapply(seq_len(0), c(), FUNx) %>% storage.mode # logical
> ta
You can see this a bit more clearly with e.g.
> storage.mode(byy)
[1] "list"
> storage.mode(byy.empty)
[1] "logical"
So even though both objects have S3 class "by", they have a different
underlying internal storage mode (as simplifying the result of 'by'
has given you a 0-length logical, instead
Try adding simplify=FALSE to the call to by().
-Bill
On Tue, Nov 16, 2021 at 4:04 AM Ofek Shilon wrote:
> Take this toy code:
> df <- data.frame(a=seq(10), b=rep(1:2, 5))
> df.empty <- subset(df, a>10)
> byy <- by(data=df, INDICES=df$b, FUN=function(x) x[1,])
> byy.empty <- by
Take this toy code:
df <- data.frame(a=seq(10), b=rep(1:2, 5))
df.empty <- subset(df, a>10)
byy <- by(data=df, INDICES=df$b, FUN=function(x) x[1,])
byy.empty <- by(data=df.empty, INDICES=df.empty$b, FUN=function(x) x[1,])
class(byy) # "by"
class(byy.empty) # "by"
is.list(b