Hi Avi,

As Dénes Tóth has rightly diagnosed, you are building an "all or nothing" filter. However, you do not need to explicitly spell out all columns that you want to filter for; the "tidy" way would be to use a helper function like `if_all()` or `if_any()`. Consider this example (I hope I understand your intentions correctly):

```

library(dplyr)


data <- tribble(
  ~first.a, ~first.b, ~first.c,
  1L,        1L,       0L,
  NA,       1L,       0L,
  1L,        0L,       NA,
  NA,       NA,       1L
)

```

Let's say we only want to keep rows that have a non-missing value for either `first.a` or `first.b` (or hypothetical later generations like `second.a` and `second.b` etc.):

```

data |>
  filter(if_any(ends_with(c(".a", ".b")), \(x) !is.na(x)))

```

So: `filter()` (keep observations) `if_any` of the columns ending with .a or .b is not `NA` (we have to wrap `!is.na` into an anonymous function for it to be a valid argument type). This would yield

```

# A tibble: 3 × 3
  first.a first.b first.c
    <int>   <int>   <int>
1       1       1       0
2      NA       1       0
3       1       0      NA

```

Discarding only the row where both of them are missing. Another way of writing this would be

```

data |>
  filter(!if_all(ends_with(c(".a", ".b")), is.na))

```

i.e. don't keep rows where all columns ending in .a or .b are `NA`, which returns the same result. Hope this helps,

Lennart Kasserra

Am 12.04.24 um 21:52 schrieb avi.e.gr...@gmail.com:
Base R has generic functions called any() and all() that I am having trouble
using.
It works fine when I play with it in a base R context as in:
all(any(TRUE, TRUE), any(TRUE, FALSE))
[1] TRUE
all(any(TRUE, TRUE), any(FALSE, FALSE))
[1] FALSE
But in a tidyverse/dplyr environment, it returns wrong answers. Consider this example. I have data I have joined together with pairs of
columns representing a first generation and several other pairs representing
additional generations. I want to consider any pair where at least one of
the pair is not NA as a success. But in order to keep the entire row, I want
all three pairs to have some valid data. This seems like a fairly common
reasonable thing often needed when evaluating data.
So to make it very general, I chose to do something a bit like this: result <- filter(mydata,
                  all(
                    any(!is.na(first.a), !is.na(first.b)),
                    any(!is.na(second.a), !is.na(second.b)),
                    any(!is.na(third.a), !is.na(third.b))))
I apologize if the formatting is not seen properly. The above logically
should work. And it should be extendable to scenarios where you want at
least one of M columns to contain data as a group with N such groups of any
size.
But since it did not work, I tried a plan that did work and feels silly. I
used mutate() to make new columns such as:
result <-
   mydata |>
   mutate(
     usable.1 = (!is.na(first.a) | !is.na(first.b)),
     usable.2 = (!is.na(second.a) | !is.na(second.b)),
     usable.3 = (!is.na(third.a) | !is.na(third.b)),
     usable = (usable.1 & usable.2 & usable.3)
   ) |>
   filter(usable == TRUE)
The above wastes time and effort making new columns so I can check the
calculations then uses the combined columns to make a Boolean that can be
used to filter the result.
I know this is not the place to discuss dplyr. I want to check first if I am
doing anything wrong in how I use any/all. One guess is that the generic is
messed with by dplyr or other packages I libraried.
And, of course, some aspects of delayed evaluation can interfere in subtle
ways.
I note I have had other problems with these base R functions before and
generally solved them by not using them, as shown above. I would much rather
use them, or something similar.
Avi
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to