dataframe from a column with a nested list.

Deramus, Thomas Patrick Thu, 18 Apr 2024 14:02:01 -0700

Hi experts.

I have a tibble with a column containing a nested list (<list<list<double>>> 
data type to be specific).


Looks something like the following (but in R/Arrow format):
ID
Nestedvals
001
[[1]](1,0.1)[[2]](2,0.2)[[3]](3,0.3)[[4]](4,0.4)[[5]](5,0.5)
002
[[1]](1,0.1)[[2]](2,0.2)[[3]](3,0.3)[[4]](4,0.4)
003
[[1]](1,0.1)[[2]](2,0.2)[[3]](3,0.3)
004
[[1]](1,0.1)[[2]](2,0.2)
005
[[1]](1,0.1)

Basically, each list contains a set of doubles, with the first indicating a 
specific index (based on the 0 beginning python index), and a certain value 
(e.g. 0.5).

What I would like to do is generate set of columns based on the rang of unique 
indexes of each nested list. e.g.:
col_1, col_2, col_3, col_4, col_5

Which I have done with the following:
tibble[paste0("col_", 1:5)] <- 0

And then replace each 0 with the value (second number in the nested list), 
based on the index (first number in each nested list), for each row of the 
tibble.

I wrote a function to split each nested list:

  nestsplit <- function(x, y) {
    `unlist(lapply(x, [[`, y))
  }

And then generate unique columns with the column names (by index) and values of 
interest to append to the tibble:
  tibble <-
    tibble |> rowwise() |> mutate(index_names = list(paste0(
      "col_", as.character(nestsplit(nestedvals, 1))
    )),
    index_values = list(nestsplit(nestedvals, 2)))

But I would like to see if there is an efficient, tidyverse/dplyr-based 
solution to individually assign these values rather than writing a loop to 
assign each of them by row.

So that an output like this:

ID
Nestedvals
col_1
col_2
col_3
col_4
col_5
001
<Nested list of 5 pairs of values>
0
0
0
0
0
002
<Nested list of 4 pairs of values>
0
0
0
0
0
003
<Nested list of 3 pairs of values>
0
0
0
0
0
004
<Nested list of 2 pairs of values>
0
0
0
0
0
005
<Nested list of 1 pair of values>
0
0
0
0
0


Looks instead like the following:
ID
Nestedvals
col_1
col_2
col_3
col_4
col_5
001
<Nested list of 5 pairs of values>
0.1
0.2
0.3
0.4
0.5
002
<Nested list of 4 pairs of values>
0.1
0.2
0.3
0.4
0
003
<Nested list of 3 pairs of values>
0.1
0.2
0.3
0
0
004
<Nested list of 2 pairs of values>
0.1
0.2
0
0
0
005
<Nested list of 1 pair of values>
0.1
0
0
0
0

-------------------------------------------------------------------------------------------------------------------------

I would love to give an example to simulate the exact nature of the data, but 
I'm unfortunately not sure how to recreate this class for an example:
> typeof(tibble$var)
[1] "list"
> class(tibble$var)

[1] "arrow_list"    "vctrs_list_of" "vctrs_vctr"    "list"

The closest I have ever been able to get is with:

tibble(ID = c("001", "002", "003", "004", "005"), nestedvals = 
list(list(c(1,0.1),c(2,0.2),c(3,0.3),c(4,0.4),c(5,0.5)),list(c(1,0.1),c(2,0.2),c(3,0.3),c(4,0.4)),list(c(1,0.1),c(2,0.2),c(3,0.3)),list(c(1,0.1),c(2,0.2)),list(c(1,0.1))))

Which gives a list datatype instead of <list<list<double>>>
The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline 
<https://www.massgeneralbrigham.org/complianceline> .
Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail. 

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Tidyverse/dplyr solution for filling values of a tibble/dataframe from a column with a nested list.

Reply via email to