Re: [Bioc-devel] DataFrameList to Wide Format DataFrame

2021-12-17 Thread Hervé Pagès

On 16/12/2021 23:00, Dario Strbenac via Bioc-devel wrote:


Hello,

Ah, yes, the sample names should of course be in the rows - Friday afternoon error. In 
the question, I specified "largely the same set of features", implying that the 
overlap is not complete. So, the example below will error.

DFL <- DataFrameList(X = DataFrame(a = 1:3, b = 3:1, row.names = LETTERS[1:3]),
  Y = DataFrame(b = 4:6, c = 6:4, row.names = 
LETTERS[20:22]))
unlist(DFL)
Error in .aggregate_and_align_all_colnames(all_colnames, strict.colnames = 
strict.colnames) :
   the DFrame objects to combine must have the same column names


unlist() uses rbind() internally to combine the rows and rbind() wants to see 
the same columns in all the DataFrame to combines. combineRows() is a more 
flexible version of rbind() that was added in BioC 3.13:

  do.call(combineRows, unname(as.list(DFL)))

  # DataFrame with 6 rows and 3 columns

  #   a b c
  #     
  # A 1 3    NA
  # B 2 2    NA
  # C 3 1    NA
  # T    NA 4 6
  # U    NA 5 5
  # V    NA 6 4

If you want to discuss this further, please ask on the support site.

H.




This is long but works:

allFeatures <- unique(unlist(lapply(DFL, colnames)))
DFL <- lapply(DFL, function(DF)
{
   missingFeatures <- setdiff(allFeatures, colnames(DF))
   DF[missingFeatures] <- NA
   DF
})
DFLflattened <- do.call(rbind, DFL)

Is there a one-line function for it?

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] DataFrameList to Wide Format DataFrame

2021-12-17 Thread Michael Lawrence via Bioc-devel
This is more of a support site question.

The stack() function is relevant here, but it won't fill in the missing columns.

Note though that there are some conveniences that might help a tiny
bit, like how colnames(DFL) returns a CharacterList, so you can do
unique(unlist(colnames(DFL))).

In theory we could make [<-() on a DataFrameList behave more like its
SplitDataFrameList derivative and insert columns into each of its
elements, so you could do something like:

DFL[,psetdiff(unique(unlist(colnames(DFL))), colnames(DFL))] <- NA

I don't know if psetdiff() would work in that way, but it could.

Michael

On Thu, Dec 16, 2021 at 11:01 PM Dario Strbenac via Bioc-devel
 wrote:
>
> Hello,
>
> Ah, yes, the sample names should of course be in the rows - Friday afternoon 
> error. In the question, I specified "largely the same set of features", 
> implying that the overlap is not complete. So, the example below will error.
>
> DFL <- DataFrameList(X = DataFrame(a = 1:3, b = 3:1, row.names = 
> LETTERS[1:3]),
>  Y = DataFrame(b = 4:6, c = 6:4, row.names = 
> LETTERS[20:22]))
> unlist(DFL)
> Error in .aggregate_and_align_all_colnames(all_colnames, strict.colnames = 
> strict.colnames) :
>   the DFrame objects to combine must have the same column names
>
> This is long but works:
>
> allFeatures <- unique(unlist(lapply(DFL, colnames)))
> DFL <- lapply(DFL, function(DF)
> {
>   missingFeatures <- setdiff(allFeatures, colnames(DF))
>   DF[missingFeatures] <- NA
>   DF
> })
> DFLflattened <- do.call(rbind, DFL)
>
> Is there a one-line function for it?
>
> --
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



-- 
Michael Lawrence
Principal Scientist, Director of Data Science and Statistical Computing
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] DataFrameList to Wide Format DataFrame

2021-12-16 Thread Dario Strbenac via Bioc-devel
Hello,

Ah, yes, the sample names should of course be in the rows - Friday afternoon 
error. In the question, I specified "largely the same set of features", 
implying that the overlap is not complete. So, the example below will error.

DFL <- DataFrameList(X = DataFrame(a = 1:3, b = 3:1, row.names = LETTERS[1:3]),
 Y = DataFrame(b = 4:6, c = 6:4, row.names = 
LETTERS[20:22]))
unlist(DFL)
Error in .aggregate_and_align_all_colnames(all_colnames, strict.colnames = 
strict.colnames) : 
  the DFrame objects to combine must have the same column names

This is long but works:

allFeatures <- unique(unlist(lapply(DFL, colnames)))
DFL <- lapply(DFL, function(DF)
{
  missingFeatures <- setdiff(allFeatures, colnames(DF))
  DF[missingFeatures] <- NA
  DF
})
DFLflattened <- do.call(rbind, DFL)

Is there a one-line function for it?

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] DataFrameList to Wide Format DataFrame

2021-12-16 Thread Hervé Pagès
A metadata column on a DataFrame runs along its 2nd dimension so is not 
a good place to put the list names.


Have you tried unlist()?

  library(S4Vectors)

  DF <- DataFrame(id=letters[1:10], score=runif(10))

  f <- sample(LETTERS[1:3], 10, replace=TRUE)

  DFL <- split(DF, f)

  DFL
  # SplitDataFrameList of length 3
  # $A
  # DataFrame with 2 rows and 2 columns
  #    id score
  #    
  # 1   f  0.894709
  # 2   h  0.801125
  #
  # $B
  # DataFrame with 1 row and 2 columns
  #    id score
  #    
  # 1   d  0.538166
  #
  # $C
  # DataFrame with 7 rows and 2 columns
  #    id score
  #    
  # 1   a 0.0145477
  # 2   b 0.2507581
  # 3   c 0.4388678
  # 4   e 0.5219524
  # 5   g 0.6377634
  # 6   i 0.1892103
  # 7   j 0.1829650

  unlist(DFL)

  # DataFrame with 10 rows and 2 columns
  #    id score
  #    
  # A   f 0.8947085
  # A   h 0.8011255
  # B   d 0.5381664
  # C   a 0.0145477
  # C   b 0.2507581
  # C   c 0.4388678
  # C   e 0.5219524
  # C   g 0.6377634
  # C   i 0.1892103
  # C   j 0.1829650

BTW this is a user question so is more appropriate for the support site.

H.

On 16/12/2021 22:00, Dario Strbenac via Bioc-devel wrote:

Good day,

Is there a function in the S4Vectors API which converts a 
DataFrameList into a DataFrame, automatically putting the list names 
into one of the metadata columns, analogous to MultiAssayExperiment's 
wideFormat function? The scenario is mutliple data sets from different 
organisations measuring the largely the same set of features and 
patient outcome, but on completely different sets of patients in each 
organisation.


--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] DataFrameList to Wide Format DataFrame

2021-12-16 Thread Dario Strbenac via Bioc-devel
Good day,

Is there a function in the S4Vectors API which converts a DataFrameList into a 
DataFrame, automatically putting the list names into one of the metadata 
columns, analogous to MultiAssayExperiment's wideFormat function? The scenario 
is mutliple data sets from different organisations measuring the largely the 
same set of features and patient outcome, but on completely different sets of 
patients in each organisation.

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel