Re: [R] Subassignments involving NAs in data frames

Prof Brian Ripley Thu, 09 Jun 2005 13:10:45 -0700

On Thu, 9 Jun 2005, Thomas Lumley wrote:

On Thu, 9 Jun 2005, McGehee, Robert wrote:

I'm seeing some inconsistent behavior when re-assigning values in a data
frame. The first assignment turns all of the 0s in my data frame to 2s,
the second fails to do so.


But they differ in several ways, so why is this labelled `inconsistent'?
Why not ask `what is the difference'?

The answer to the pertinent question is `the number of items to bereplaced'.

df1 <- data.frame(a = c(NA, 0, 3, 4))
df2 <- data.frame(a = c(NA, 0, 0, 4))
df1[df1 == 0] <- 2 ## Works
df2[df2 == 0] <- 2

Error: NAs are not allowed in subscripted assignments


Hmm. This looks like a bug to me.

Checking an old news file I see this:
   o    Subassignments involving NAs and with a replacement value of
        length > 1 are now disallowed.       (They were handled
        inconsistently in R < 2.0.0, see PR#7210.)  For data frames
        they are disallowed altogether, even for logical matrix indices
        (the only case which used to work).

which leaves me to believe that the assignment for both df1 and df2
should fail ("data frame ... disallowed altogether"), however that seems
not to be the case, since the example works for df1.


Yes, I think the bug is that it works

It has since been allowed in a few cases to avoid needlessly breakingexisting code. (The curse of back-compatibility.)

In the first example there is only one value to be replaced, so there isno ambiguity in the meaning. In the second the replacement has to bereplicated to the needed length and so the rules for vectors give theerror message.

Another case which is allowed is if none of the values are to be replaced:that is all the logical indices are FALSE or NA.

Also, the
vectorized version works as expected (because the replacement value has
a length of 1).

vec1 <- c(NA, 0, 3, 4)
vec2 <- c(NA, 0, 0, 4)
vec1[vec1 == 0] <- 2 ## Works
vec2[vec2 == 0] <- 2 ## Also works


I'm not sure that this is supposed to work, either, but it might be.

Reading help("[") should help alleviate your uncertainty, for this isexplicitly documented there.

Is this behavior for data frames intentional? What's the best
alternative to df1[df1 == 0] <- 2 that doesn't fail in situations such
as df2? A simple loop over columns?


df2[df2 %in% 0] is the recommended method.


That index is a logical vector of length one.  Try

ind <- df2 == 0
df2[ind & !is.na(ind)] <- 2

but this is really just a loop over columns implemented in [<-.data.frame.

--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Subassignments involving NAs in data frames

Reply via email to