Re: [R] any and all

Lennart Kasserra Sat, 13 Apr 2024 07:16:49 -0700

Hi Avi,

As Dénes Tóth has rightly diagnosed, you are building an "all ornothing" filter. However, you do not need to explicitly spell out allcolumns that you want to filter for; the "tidy" way would be to use ahelper function like `if_all()` or `if_any()`. Consider this example (Ihope I understand your intentions correctly):


```

library(dplyr)


data <- tribble(
  ~first.a, ~first.b, ~first.c,
  1L,        1L,       0L,
  NA,       1L,       0L,
  1L,        0L,       NA,
  NA,       NA,       1L
)

```

Let's say we only want to keep rows that have a non-missing value foreither `first.a` or `first.b` (or hypothetical later generations like`second.a` and `second.b` etc.):


```

data |>
  filter(if_any(ends_with(c(".a", ".b")), \(x) !is.na(x)))

```

So: `filter()` (keep observations) `if_any` of the columns ending with.a or .b is not `NA` (we have to wrap `!is.na` into an anonymousfunction for it to be a valid argument type). This would yield


```

# A tibble: 3 × 3
  first.a first.b first.c
    <int>   <int>   <int>
1       1       1       0
2      NA       1       0
3       1       0      NA

```

Discarding only the row where both of them are missing. Another way ofwriting this would be


```

data |>
  filter(!if_all(ends_with(c(".a", ".b")), is.na))

```

i.e. don't keep rows where all columns ending in .a or .b are `NA`,which returns the same result. Hope this helps,


Lennart Kasserra

Am 12.04.24 um 21:52 schrieb avi.e.gr...@gmail.com:

Base R has generic functions called any() and all() that I am having trouble
using.

It works fine when I play with it in a base R context as in:

all(any(TRUE, TRUE), any(TRUE, FALSE))

[1] TRUE

all(any(TRUE, TRUE), any(FALSE, FALSE))

[1] FALSE

But in a tidyverse/dplyr environment, it returns wrong answers.Consider this example. I have data I have joined together with pairs of

columns representing a first generation and several other pairs representing
additional generations. I want to consider any pair where at least one of
the pair is not NA as a success. But in order to keep the entire row, I want
all three pairs to have some valid data. This seems like a fairly common
reasonable thing often needed when evaluating data.

So to make it very general, I chose to do something a bit like this:result <- filter(mydata,

                  all(
                    any(!is.na(first.a), !is.na(first.b)),
                    any(!is.na(second.a), !is.na(second.b)),
                    any(!is.na(third.a), !is.na(third.b))))

I apologize if the formatting is not seen properly. The above logically

should work. And it should be extendable to scenarios where you want at
least one of M columns to contain data as a group with N such groups of any
size.

But since it did not work, I tried a plan that did work and feels silly. I

used mutate() to make new columns such as:

result <-

   mydata |>
   mutate(
     usable.1 = (!is.na(first.a) | !is.na(first.b)),
     usable.2 = (!is.na(second.a) | !is.na(second.b)),
     usable.3 = (!is.na(third.a) | !is.na(third.b)),
     usable = (usable.1 & usable.2 & usable.3)
   ) |>
   filter(usable == TRUE)

The above wastes time and effort making new columns so I can check the

calculations then uses the combined columns to make a Boolean that can be
used to filter the result.

I know this is not the place to discuss dplyr. I want to check first if I am

doing anything wrong in how I use any/all. One guess is that the generic is
messed with by dplyr or other packages I libraried.

And, of course, some aspects of delayed evaluation can interfere in subtle

ways.

I note I have had other problems with these base R functions before and

generally solved them by not using them, as shown above. I would much rather
use them, or something similar.

Avi

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] any and all

Reply via email to