On Thu, 9 Jun 2005, Thomas Lumley wrote:
On Thu, 9 Jun 2005, McGehee, Robert wrote:
I'm seeing some inconsistent behavior when re-assigning values in a data
frame. The first assignment turns all of the 0s in my data frame to 2s,
the second fails to do so.
But they differ in several ways, so why is this labelled `inconsistent'?
Why not ask `what is the difference'?
The answer to the pertinent question is `the number of items to be
replaced'.
df1 <- data.frame(a = c(NA, 0, 3, 4))
df2 <- data.frame(a = c(NA, 0, 0, 4))
df1[df1 == 0] <- 2 ## Works
df2[df2 == 0] <- 2
Error: NAs are not allowed in subscripted assignments
Hmm. This looks like a bug to me.
Checking an old news file I see this:
o Subassignments involving NAs and with a replacement value of
length > 1 are now disallowed. (They were handled
inconsistently in R < 2.0.0, see PR#7210.) For data frames
they are disallowed altogether, even for logical matrix indices
(the only case which used to work).
which leaves me to believe that the assignment for both df1 and df2
should fail ("data frame ... disallowed altogether"), however that seems
not to be the case, since the example works for df1.
Yes, I think the bug is that it works
It has since been allowed in a few cases to avoid needlessly breaking
existing code. (The curse of back-compatibility.)
In the first example there is only one value to be replaced, so there is
no ambiguity in the meaning. In the second the replacement has to be
replicated to the needed length and so the rules for vectors give the
error message.
Another case which is allowed is if none of the values are to be replaced:
that is all the logical indices are FALSE or NA.
Also, the
vectorized version works as expected (because the replacement value has
a length of 1).
vec1 <- c(NA, 0, 3, 4)
vec2 <- c(NA, 0, 0, 4)
vec1[vec1 == 0] <- 2 ## Works
vec2[vec2 == 0] <- 2 ## Also works
I'm not sure that this is supposed to work, either, but it might be.
Reading help("[") should help alleviate your uncertainty, for this is
explicitly documented there.
Is this behavior for data frames intentional? What's the best
alternative to df1[df1 == 0] <- 2 that doesn't fail in situations such
as df2? A simple loop over columns?
df2[df2 %in% 0] is the recommended method.
That index is a logical vector of length one. Try
ind <- df2 == 0
df2[ind & !is.na(ind)] <- 2
but this is really just a loop over columns implemented in [<-.data.frame.
--
Brian D. Ripley, [EMAIL PROTECTED]
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html