Hello,

You can pass a grouped tibble to a function with grouped_modify but the function must return a data.frame (or similar).

## this will also do it
#sillyFun <- function(tib){
#  tibble(nrow = nrow(tib), ncol = ncol(tib))
#}


sillyFun <- function(tib){
  data.frame(nrow = nrow(tib), ncol = ncol(tib)))
}

tib %>%
  group_by(y) %>%
  group_modify(~ sillyFun(.))
## A tibble: 3 x 3
## Groups:   y [3]
#      y  nrow  ncol
#  <dbl> <int> <int>
#1     1    17     2
#2     2    21     2
#3     3    12     2


Hope this helps,

Rui Barradas

Às 09:43 de 05/07/2020, Chris Evans escreveu:
Apologies if this is a stupid question but searching keeps getting things I 
know and don't need.

What I want to do is to use the group-by() power of dplyr to run functions that 
expect a dataframe/tibble per group but I can't see how do it. Here is a 
reproducible example.

### create trivial tibble
n <- 50
x <- 1:n
y <- sample(1:3, n, replace = TRUE)
z <- rnorm(n)
tib <- as_tibble(cbind(x,y,z))

### create trivial function that expects a tibble/data frame
sillyFun <- function(tib){
return(list(nrow = nrow(tib),
ncol = ncol(tib)))
}

### works fine on the whole tibble
tib %>%
summarise(dim = list(sillyFun(.))) %>%
unnest_wider(dim)

That gives me:
# A tibble: 1 x 2
    nrow  ncol
   <int> <int>
1    50     3


### So I try the following hoping to apply the function to the grouped tibble
tib %>%
group_by(y) %>%
summarise(dim = list(sillyFun(.))) %>%
unnest_wider(dim)

### But that gives me:
# A tibble: 3 x 3
       y  nrow  ncol
   <dbl> <int> <int>
1     1    50     3
2     2    50     3
3     3    50     3

Clearly "." is still passing the whole tibble, not the grouped subsets.  What I can't 
find is whether there is an alternative to "." that would pass just the grouped subset of 
the tibble.

I have bodged my way around this by writing a function that takes individual 
columns and reassembles them into a data frame that the actual functions I need 
to use require but that takes me back to a lot of clumsiness both selecting the 
variables to pass in the dplyr call to the function and putting the 
reassemble-to-data-frame bit in the function I call.  (The functions I really 
need are reliability explorations and can called on whole dataframes.)

I know I can do this using base R split and lapply but I feel sure it must be 
possible to do this within dplyr/tidyverse.  I'm slowly transferring most of my 
code to the tidyverse and hitting frustrations but also finding that it does 
really help me program more sensibly, handle relational data structures more 
easily, and write code that I seem better at reading when I come back to it 
after months on other things so I am slowly trying to move all my coding to 
tidyverse.  If I could see how to do this, it would help.

Very sorry if the answer should be blindingly obvious to me.  I'd also love to 
have pointers to guidance to the tidyverse written for people who aren't 
professional coders or statisticians and that go a bit beyond the obvious 
basics of tidyverse into issues like this.

TIA,

Chris


--
Este e-mail foi verificado em termos de vírus pelo software antivírus Avast.
https://www.avast.com/antivirus

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to