Re: [Bioc-devel] DataFrameList to Wide Format DataFrame
On 16/12/2021 23:00, Dario Strbenac via Bioc-devel wrote: Hello, Ah, yes, the sample names should of course be in the rows - Friday afternoon error. In the question, I specified "largely the same set of features", implying that the overlap is not complete. So, the example below will error. DFL <- DataFrameList(X = DataFrame(a = 1:3, b = 3:1, row.names = LETTERS[1:3]), Y = DataFrame(b = 4:6, c = 6:4, row.names = LETTERS[20:22])) unlist(DFL) Error in .aggregate_and_align_all_colnames(all_colnames, strict.colnames = strict.colnames) : the DFrame objects to combine must have the same column names unlist() uses rbind() internally to combine the rows and rbind() wants to see the same columns in all the DataFrame to combines. combineRows() is a more flexible version of rbind() that was added in BioC 3.13: do.call(combineRows, unname(as.list(DFL))) # DataFrame with 6 rows and 3 columns # a b c # # A 1 3 NA # B 2 2 NA # C 3 1 NA # T NA 4 6 # U NA 5 5 # V NA 6 4 If you want to discuss this further, please ask on the support site. H. This is long but works: allFeatures <- unique(unlist(lapply(DFL, colnames))) DFL <- lapply(DFL, function(DF) { missingFeatures <- setdiff(allFeatures, colnames(DF)) DF[missingFeatures] <- NA DF }) DFLflattened <- do.call(rbind, DFL) Is there a one-line function for it? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] DataFrameList to Wide Format DataFrame
This is more of a support site question. The stack() function is relevant here, but it won't fill in the missing columns. Note though that there are some conveniences that might help a tiny bit, like how colnames(DFL) returns a CharacterList, so you can do unique(unlist(colnames(DFL))). In theory we could make [<-() on a DataFrameList behave more like its SplitDataFrameList derivative and insert columns into each of its elements, so you could do something like: DFL[,psetdiff(unique(unlist(colnames(DFL))), colnames(DFL))] <- NA I don't know if psetdiff() would work in that way, but it could. Michael On Thu, Dec 16, 2021 at 11:01 PM Dario Strbenac via Bioc-devel wrote: > > Hello, > > Ah, yes, the sample names should of course be in the rows - Friday afternoon > error. In the question, I specified "largely the same set of features", > implying that the overlap is not complete. So, the example below will error. > > DFL <- DataFrameList(X = DataFrame(a = 1:3, b = 3:1, row.names = > LETTERS[1:3]), > Y = DataFrame(b = 4:6, c = 6:4, row.names = > LETTERS[20:22])) > unlist(DFL) > Error in .aggregate_and_align_all_colnames(all_colnames, strict.colnames = > strict.colnames) : > the DFrame objects to combine must have the same column names > > This is long but works: > > allFeatures <- unique(unlist(lapply(DFL, colnames))) > DFL <- lapply(DFL, function(DF) > { > missingFeatures <- setdiff(allFeatures, colnames(DF)) > DF[missingFeatures] <- NA > DF > }) > DFLflattened <- do.call(rbind, DFL) > > Is there a one-line function for it? > > -- > Dario Strbenac > University of Sydney > Camperdown NSW 2050 > Australia > ___ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Michael Lawrence Principal Scientist, Director of Data Science and Statistical Computing Genentech, A Member of the Roche Group Office +1 (650) 225-7760 micha...@gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] DataFrameList to Wide Format DataFrame
Hello, Ah, yes, the sample names should of course be in the rows - Friday afternoon error. In the question, I specified "largely the same set of features", implying that the overlap is not complete. So, the example below will error. DFL <- DataFrameList(X = DataFrame(a = 1:3, b = 3:1, row.names = LETTERS[1:3]), Y = DataFrame(b = 4:6, c = 6:4, row.names = LETTERS[20:22])) unlist(DFL) Error in .aggregate_and_align_all_colnames(all_colnames, strict.colnames = strict.colnames) : the DFrame objects to combine must have the same column names This is long but works: allFeatures <- unique(unlist(lapply(DFL, colnames))) DFL <- lapply(DFL, function(DF) { missingFeatures <- setdiff(allFeatures, colnames(DF)) DF[missingFeatures] <- NA DF }) DFLflattened <- do.call(rbind, DFL) Is there a one-line function for it? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] DataFrameList to Wide Format DataFrame
A metadata column on a DataFrame runs along its 2nd dimension so is not a good place to put the list names. Have you tried unlist()? library(S4Vectors) DF <- DataFrame(id=letters[1:10], score=runif(10)) f <- sample(LETTERS[1:3], 10, replace=TRUE) DFL <- split(DF, f) DFL # SplitDataFrameList of length 3 # $A # DataFrame with 2 rows and 2 columns # id score # # 1 f 0.894709 # 2 h 0.801125 # # $B # DataFrame with 1 row and 2 columns # id score # # 1 d 0.538166 # # $C # DataFrame with 7 rows and 2 columns # id score # # 1 a 0.0145477 # 2 b 0.2507581 # 3 c 0.4388678 # 4 e 0.5219524 # 5 g 0.6377634 # 6 i 0.1892103 # 7 j 0.1829650 unlist(DFL) # DataFrame with 10 rows and 2 columns # id score # # A f 0.8947085 # A h 0.8011255 # B d 0.5381664 # C a 0.0145477 # C b 0.2507581 # C c 0.4388678 # C e 0.5219524 # C g 0.6377634 # C i 0.1892103 # C j 0.1829650 BTW this is a user question so is more appropriate for the support site. H. On 16/12/2021 22:00, Dario Strbenac via Bioc-devel wrote: Good day, Is there a function in the S4Vectors API which converts a DataFrameList into a DataFrame, automatically putting the list names into one of the metadata columns, analogous to MultiAssayExperiment's wideFormat function? The scenario is mutliple data sets from different organisations measuring the largely the same set of features and patient outcome, but on completely different sets of patients in each organisation. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] DataFrameList to Wide Format DataFrame
Good day, Is there a function in the S4Vectors API which converts a DataFrameList into a DataFrame, automatically putting the list names into one of the metadata columns, analogous to MultiAssayExperiment's wideFormat function? The scenario is mutliple data sets from different organisations measuring the largely the same set of features and patient outcome, but on completely different sets of patients in each organisation. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel