Hello,
I found what looks to me like an odd edge case for duplicated(), unique() etc. 
on data frames with zero columns, due to duplicated() returning a zero-length 
vector for them, regardless of the number of rows:
df <- data.frame(a = 1:5)df$a <- NULLnrow(df) # 5 (row count preserved by 
row.names)duplicated(df) # logical(0), should be c(FALSE, TRUE, TRUE, TRUE, 
TRUE)anyDuplicated(df) # 0, should be 2nrow(unique(df)) # 0, should be 1
This behaviour isn't mentioned in the documentation; is there a reason for it 
to work like this?I'm struggling to see this as anything other than unintended 
behaviour, as a consequence of the do.call(Map, `names<-(c(list, x), NULL)`) 
expression in duplicated.data.frame returning an empty list instead of a list 
of empty lists.
Other data frame libraries have similar behaviour: tibble does the same; 
data.table, Python's pandas and Rust's polars drop all the rows as soon as 
there are zero columns, because they don't preserve the row count via the row 
names.
---
I admit this is a case we rarely care about.However, for an example of this 
being an issue, I've been running into it when treating data frames as database 
relations, where they have one or more candidate keys (irreducible subsets of 
the columns for which every row must have a unique value set).Sometimes, a 
generated relation can have an empty candidate key, which limits it to only 
having zero or one rows.Usually, I can check a relation contains no duplicated 
key values by using anyDuplicated:
df2 <- unique(ChickWeight[, c("Chick", "Diet")])keycols <- "Chick" # Each chick 
only has one diet (Chick -> Diet)!anyDuplicated(df2[, keycols, drop = FALSE]) # 
TRUE, so Chick values are unique
When the key is empty, any row after the first must be a duplicate, but 
anyDuplicated doesn't detect these because of the above edge case, so I have to 
add special handling:
df3 <- data.frame(a = rep(1, 5)) # relations shouldn't have duplicate 
rowskeycols <- character(0) # a is constant, so key is 
empty!anyDuplicated(df3[, keycols, drop = FALSE]) # TRUE because equivalent to 
!any(logical(0)) by above, should be FALSE
---
Best Regards,Mark
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to