Re: [R] [FORGED] Q re: logical indexing with is.na

Duncan Murdoch Sun, 10 Mar 2019 04:46:47 -0700

On 10/03/2019 1:15 a.m., David Goldsmith wrote:

Thanks, all.  I had read about recycling, but I guess I didn't fully
appreciate all the "weirdness" it might produce. :/


With this explained, I'm going to ask a follow-up, which is only
contextually related: the impetus for this discovery was checking "corner
cases" to determine if all(x[!is.na(x)]==y[!is.na(y)]) would suffice to
determine equality of two vectors containing NA's.  Between the above
result; my related discovery that this indexing preserves relative
positional info but not absolute positional info; and the performance
penalty when comparing long vectors that may be unequal "early on";  I've
concluded that--if it (can be made to) "short circuit"--it would probably
be better to use an implicit loop.  So that's my Q: will (or can) an
implicit loop (be made to) "exit early" if a specified condition is met
before all indices have been checked?

You could use the identical() function. When I have vectors of length 1million, all(x == y) takes about 3 milliseconds when the difference isin the last value, 2 milliseconds when it comes first. identical(x, y)takes about 5 milliseconds when the difference comes last, but 0.006milliseconds when it comes first. Of course, all(x == y) andidentical(x, y) do slightly different tests: read the docs!


Duncan Murdoch


Thanks again!

DLG

On Sat, Mar 9, 2019 at 9:07 PM Jeff Newmiller <[email protected]>
wrote:

Regarding the mention of logical indexing, under ?Extract I see:

For [-indexing only: i, j, ... can be logical vectors, indicating
elements/slices to select. Such vectors are recycled if necessary to match
the corresponding extent. i, j, ... can also be negative integers,
indicating elements/slices to leave out of the selection.

On March 9, 2019 6:57:05 PM PST, Rolf Turner <[email protected]>
wrote:

On 3/10/19 2:36 PM, David Goldsmith wrote:

Hi!  Newbie (self-)learning R using P. Dalgaard's "Intro Stats w/ R";

not

new to statistics (have had grad-level courses and work experience in
statistics) or vectorized programming syntax (have extensive

experience

with MatLab, Python/NumPy, and IDL, and even a smidgen--a long time

ago--of

experience w/ S-plus).

In exploring the use of is.na in the context of logical indexing,

I've come

across the following puzzling-to-me result:

y; !is.na(y[1:3]); y[!is.na(y[1:3])]

[1]  0.3534253 -1.6731597         NA -0.2079209
[1]  TRUE  TRUE FALSE
[1]  0.3534253 -1.6731597 -0.2079209

As you can see, y is a four element vector, the third element of

which is

NA; the next line gives what I would expect--T T F--because the first

two

elements are not NA but the third element is.  The third line is what
confuses me: why is the result not the two element vector consisting

of

simply the first two elements of the vector (or, if vectorized

indexing in

R is implemented to return a vector the same length as the logical

index

vector, which appears to be the case, at least the first two elements

and

then either NA or NaN in the third slot, where the logical indexing

vector

is FALSE): why does the implementation "go looking" for an element

whose

index in the "original" vector, 4, is larger than BOTH the largest

index

specified in the inner-most subsetting index AND the size of the

resulting

indexing vector?  (Note: at first I didn't even understand why the

result

wasn't simply

0.3534253 -1.6731597         NA

but then I realized that the third logical index being FALSE, there

was no

reason for *any* element to be there; but if there is, due to some
overriding rule regarding the length of the result relative to the

length

of the indexer, shouldn't it revert back to *something* that

indicates the

"FALSE"ness of that indexing element?)

Thanks!


It happens because R is eco-concious and re-cycles. :-)

Try:

ok <- c(TRUE,TRUE,FALSE)
(1:4)[ok]

In general in R if there is an operation involving two vectors then
the shorter one gets recycled to provide sufficiently many entries to
match those of the longer vector.

This in the foregoing example the first entry of "ok" gets used again,
to make a length 4 vector to match up with 1:4.  The result is the same

as (1:4)[c(TRUE,TRUE,FALSE,TRUE)].

If you did (1:7)[ok] you'd get the same result as that from
(1:7)[c(TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)] i.e. "ok" gets
recycled 2 and 1/3 times.

Try 10*(1:3) + 1:4, 10*(1:3) + 1:5, 10*(1:3) + 1:6 .

Note that in the first two instances you get warnings, but in the third
you don't, since 6 is an integer multiple of 3.

Why aren't there warnings when logical indexing is used?  I guess
because it would be annoying.  Maybe.

Note that integer indices get recycled too, but the recycling is
limited
so as not to produce redundancies.  So

(1:4)[1:3] just (sensibly) gives

[1] 1 2 3

and *not*

[1] 1 2 3 1

Perhaps a bit subtle, but it gives what you'd actually *want* rather
than being pedantic about rules with a result that you wouldn't want.

cheers,

Rolf Turner

P.S.  If you do

y[1:3][!is.na(y[1:3])]

i.e. if you're careful to match the length of the vector and the that
of
the indices, you get what you initially expected.

R. T.

P^2.S.  To the younger and wiser heads on this list:  the help on "["
does not mention that the index vectors can be logical.  I couldn't
find
anything about logical indexing in the R help files.  Is something
missing here, or am I just not looking in the right place?

R. T.


--
Sent from my phone. Please excuse my brevity.


        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [FORGED] Q re: logical indexing with is.na

Reply via email to