Re: [Bioc-devel] Multiple colData in SummarizedExperiment
I think the more clean solution for Davide (if he inists on having separate objects; I decided against it in minfi) is to extend the class to allow this. Kasper On Thu, Jun 18, 2015 at 12:25 AM, Ryan r...@thompsonclan.org wrote: Oh wow, I didn't know you could put a DataFrame into a single column of another DataFrame. That actually solves a problem for me too (I don't intend to expose nested DataFrames to the users though). On 6/17/15 7:23 PM, Martin Morgan wrote: On 06/17/2015 11:41 AM, davide risso wrote: Dear list, I'm creating an R package to store RNA-seq data of a somewhat large project in which I'm involved. One of the initial goals is to compare different pre-processing pipelines, hence I have multiple expression matrices corresponding to the same samples. The SummarizedExperiment class seems a good candidate, since I have multiple expression matrices with the same rowData and colData information. I have several sample-specific variables that I want to store with the object, namely, experimental information (e.g., batch, date, experimental condition, ...) and sample quality (e.g., proportion of aligned reads, total duplicate reads, etc...). Of course, I can always create one big data frame concatenating the two (experimental info + sample quality), but it seems that both conceptually and practically, it might be useful to have two separate data frames. Since this seems somewhat a reasonably standard type of information that one would want to carry on, I was wondering if it would be possible / useful to allow the user to have multiple data.frames in the colData slot Actually, colData() is a DataFrame, and a DataFrame column can contain a DataFrame. So after example(SummarizedExperiment) we could make some faux sample quality data quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1)) add this as a column in the colData() colData(se1)$quality = quality (or create the SummarizedExperiment from a similar DataFrame up-front) and manage our grouped data colData(se1) DataFrame with 6 rows and 2 columns Treatment quality character DataFrame AChIP B Input CChIP D Input EChIP F Input colData(se1[,1:2])$quality DataFrame with 2 rows and 2 columns x y integer integer A 1 6 B 2 5 I'm not sure that this is any less confusing to the end user than having to manage a DataFrameList(), but it does not require any new features. Martin of SummarizedExperiment. Best, Davide [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Multiple colData in SummarizedExperiment
yes, if a formal extension is warranted. the metadata slot could also be used. On Thu, Jun 18, 2015 at 2:59 PM, Kasper Daniel Hansen kasperdanielhan...@gmail.com wrote: I think the more clean solution for Davide (if he inists on having separate objects; I decided against it in minfi) is to extend the class to allow this. Kasper On Thu, Jun 18, 2015 at 12:25 AM, Ryan r...@thompsonclan.org wrote: Oh wow, I didn't know you could put a DataFrame into a single column of another DataFrame. That actually solves a problem for me too (I don't intend to expose nested DataFrames to the users though). On 6/17/15 7:23 PM, Martin Morgan wrote: On 06/17/2015 11:41 AM, davide risso wrote: Dear list, I'm creating an R package to store RNA-seq data of a somewhat large project in which I'm involved. One of the initial goals is to compare different pre-processing pipelines, hence I have multiple expression matrices corresponding to the same samples. The SummarizedExperiment class seems a good candidate, since I have multiple expression matrices with the same rowData and colData information. I have several sample-specific variables that I want to store with the object, namely, experimental information (e.g., batch, date, experimental condition, ...) and sample quality (e.g., proportion of aligned reads, total duplicate reads, etc...). Of course, I can always create one big data frame concatenating the two (experimental info + sample quality), but it seems that both conceptually and practically, it might be useful to have two separate data frames. Since this seems somewhat a reasonably standard type of information that one would want to carry on, I was wondering if it would be possible / useful to allow the user to have multiple data.frames in the colData slot Actually, colData() is a DataFrame, and a DataFrame column can contain a DataFrame. So after example(SummarizedExperiment) we could make some faux sample quality data quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1)) add this as a column in the colData() colData(se1)$quality = quality (or create the SummarizedExperiment from a similar DataFrame up-front) and manage our grouped data colData(se1) DataFrame with 6 rows and 2 columns Treatment quality character DataFrame AChIP B Input CChIP D Input EChIP F Input colData(se1[,1:2])$quality DataFrame with 2 rows and 2 columns x y integer integer A 1 6 B 2 5 I'm not sure that this is any less confusing to the end user than having to manage a DataFrameList(), but it does not require any new features. Martin of SummarizedExperiment. Best, Davide [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Multiple colData in SummarizedExperiment
you can just implement this by having reserved column names in the colData slot; that will work and will take appr. 23 seconds to implement. I agree it is not as clean from a design perspective, but you get 100% of the functionality and you can write a separate checker for the colData argument. On Thu, Jun 18, 2015 at 2:00 PM, davide risso risso.dav...@gmail.com wrote: Thank you all for the responses. I didn't think about the nested DataFrame solution. It should work. I agree that an extension might be cleaner, but I clearly need to give it more thought. One of the reasons I wanted to have quality and metadata as separate slots is that one could enforce that all the qualities are numeric, and have a quality() method to extract just the quality scores (e.g., for plotting / quality control). Having them in the same slot makes it harder for the user to extract just the scores (if the column order and/or names are not standardized). Best, davide On Thu, Jun 18, 2015 at 6:35 AM Vincent Carey st...@channing.harvard.edu wrote: yes, if a formal extension is warranted. the metadata slot could also be used. On Thu, Jun 18, 2015 at 2:59 PM, Kasper Daniel Hansen kasperdanielhan...@gmail.com wrote: I think the more clean solution for Davide (if he inists on having separate objects; I decided against it in minfi) is to extend the class to allow this. Kasper On Thu, Jun 18, 2015 at 12:25 AM, Ryan r...@thompsonclan.org wrote: Oh wow, I didn't know you could put a DataFrame into a single column of another DataFrame. That actually solves a problem for me too (I don't intend to expose nested DataFrames to the users though). On 6/17/15 7:23 PM, Martin Morgan wrote: On 06/17/2015 11:41 AM, davide risso wrote: Dear list, I'm creating an R package to store RNA-seq data of a somewhat large project in which I'm involved. One of the initial goals is to compare different pre-processing pipelines, hence I have multiple expression matrices corresponding to the same samples. The SummarizedExperiment class seems a good candidate, since I have multiple expression matrices with the same rowData and colData information. I have several sample-specific variables that I want to store with the object, namely, experimental information (e.g., batch, date, experimental condition, ...) and sample quality (e.g., proportion of aligned reads, total duplicate reads, etc...). Of course, I can always create one big data frame concatenating the two (experimental info + sample quality), but it seems that both conceptually and practically, it might be useful to have two separate data frames. Since this seems somewhat a reasonably standard type of information that one would want to carry on, I was wondering if it would be possible / useful to allow the user to have multiple data.frames in the colData slot Actually, colData() is a DataFrame, and a DataFrame column can contain a DataFrame. So after example(SummarizedExperiment) we could make some faux sample quality data quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1)) add this as a column in the colData() colData(se1)$quality = quality (or create the SummarizedExperiment from a similar DataFrame up-front) and manage our grouped data colData(se1) DataFrame with 6 rows and 2 columns Treatment quality character DataFrame AChIP B Input CChIP D Input EChIP F Input colData(se1[,1:2])$quality DataFrame with 2 rows and 2 columns x y integer integer A 1 6 B 2 5 I'm not sure that this is any less confusing to the end user than having to manage a DataFrameList(), but it does not require any new features. Martin of SummarizedExperiment. Best, Davide [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list
Re: [Bioc-devel] Multiple colData in SummarizedExperiment
Thanks Kasper, I think that's a good solution. Best, Davide On Thu, Jun 18, 2015 at 11:51 AM Kasper Daniel Hansen kasperdanielhan...@gmail.com wrote: you can just implement this by having reserved column names in the colData slot; that will work and will take appr. 23 seconds to implement. I agree it is not as clean from a design perspective, but you get 100% of the functionality and you can write a separate checker for the colData argument. On Thu, Jun 18, 2015 at 2:00 PM, davide risso risso.dav...@gmail.com wrote: Thank you all for the responses. I didn't think about the nested DataFrame solution. It should work. I agree that an extension might be cleaner, but I clearly need to give it more thought. One of the reasons I wanted to have quality and metadata as separate slots is that one could enforce that all the qualities are numeric, and have a quality() method to extract just the quality scores (e.g., for plotting / quality control). Having them in the same slot makes it harder for the user to extract just the scores (if the column order and/or names are not standardized). Best, davide On Thu, Jun 18, 2015 at 6:35 AM Vincent Carey st...@channing.harvard.edu wrote: yes, if a formal extension is warranted. the metadata slot could also be used. On Thu, Jun 18, 2015 at 2:59 PM, Kasper Daniel Hansen kasperdanielhan...@gmail.com wrote: I think the more clean solution for Davide (if he inists on having separate objects; I decided against it in minfi) is to extend the class to allow this. Kasper On Thu, Jun 18, 2015 at 12:25 AM, Ryan r...@thompsonclan.org wrote: Oh wow, I didn't know you could put a DataFrame into a single column of another DataFrame. That actually solves a problem for me too (I don't intend to expose nested DataFrames to the users though). On 6/17/15 7:23 PM, Martin Morgan wrote: On 06/17/2015 11:41 AM, davide risso wrote: Dear list, I'm creating an R package to store RNA-seq data of a somewhat large project in which I'm involved. One of the initial goals is to compare different pre-processing pipelines, hence I have multiple expression matrices corresponding to the same samples. The SummarizedExperiment class seems a good candidate, since I have multiple expression matrices with the same rowData and colData information. I have several sample-specific variables that I want to store with the object, namely, experimental information (e.g., batch, date, experimental condition, ...) and sample quality (e.g., proportion of aligned reads, total duplicate reads, etc...). Of course, I can always create one big data frame concatenating the two (experimental info + sample quality), but it seems that both conceptually and practically, it might be useful to have two separate data frames. Since this seems somewhat a reasonably standard type of information that one would want to carry on, I was wondering if it would be possible / useful to allow the user to have multiple data.frames in the colData slot Actually, colData() is a DataFrame, and a DataFrame column can contain a DataFrame. So after example(SummarizedExperiment) we could make some faux sample quality data quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1)) add this as a column in the colData() colData(se1)$quality = quality (or create the SummarizedExperiment from a similar DataFrame up-front) and manage our grouped data colData(se1) DataFrame with 6 rows and 2 columns Treatment quality character DataFrame AChIP B Input CChIP D Input EChIP F Input colData(se1[,1:2])$quality DataFrame with 2 rows and 2 columns x y integer integer A 1 6 B 2 5 I'm not sure that this is any less confusing to the end user than having to manage a DataFrameList(), but it does not require any new features. Martin of SummarizedExperiment. Best, Davide [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Multiple colData in SummarizedExperiment
Thank you all for the responses. I didn't think about the nested DataFrame solution. It should work. I agree that an extension might be cleaner, but I clearly need to give it more thought. One of the reasons I wanted to have quality and metadata as separate slots is that one could enforce that all the qualities are numeric, and have a quality() method to extract just the quality scores (e.g., for plotting / quality control). Having them in the same slot makes it harder for the user to extract just the scores (if the column order and/or names are not standardized). Best, davide On Thu, Jun 18, 2015 at 6:35 AM Vincent Carey st...@channing.harvard.edu wrote: yes, if a formal extension is warranted. the metadata slot could also be used. On Thu, Jun 18, 2015 at 2:59 PM, Kasper Daniel Hansen kasperdanielhan...@gmail.com wrote: I think the more clean solution for Davide (if he inists on having separate objects; I decided against it in minfi) is to extend the class to allow this. Kasper On Thu, Jun 18, 2015 at 12:25 AM, Ryan r...@thompsonclan.org wrote: Oh wow, I didn't know you could put a DataFrame into a single column of another DataFrame. That actually solves a problem for me too (I don't intend to expose nested DataFrames to the users though). On 6/17/15 7:23 PM, Martin Morgan wrote: On 06/17/2015 11:41 AM, davide risso wrote: Dear list, I'm creating an R package to store RNA-seq data of a somewhat large project in which I'm involved. One of the initial goals is to compare different pre-processing pipelines, hence I have multiple expression matrices corresponding to the same samples. The SummarizedExperiment class seems a good candidate, since I have multiple expression matrices with the same rowData and colData information. I have several sample-specific variables that I want to store with the object, namely, experimental information (e.g., batch, date, experimental condition, ...) and sample quality (e.g., proportion of aligned reads, total duplicate reads, etc...). Of course, I can always create one big data frame concatenating the two (experimental info + sample quality), but it seems that both conceptually and practically, it might be useful to have two separate data frames. Since this seems somewhat a reasonably standard type of information that one would want to carry on, I was wondering if it would be possible / useful to allow the user to have multiple data.frames in the colData slot Actually, colData() is a DataFrame, and a DataFrame column can contain a DataFrame. So after example(SummarizedExperiment) we could make some faux sample quality data quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1)) add this as a column in the colData() colData(se1)$quality = quality (or create the SummarizedExperiment from a similar DataFrame up-front) and manage our grouped data colData(se1) DataFrame with 6 rows and 2 columns Treatment quality character DataFrame AChIP B Input CChIP D Input EChIP F Input colData(se1[,1:2])$quality DataFrame with 2 rows and 2 columns x y integer integer A 1 6 B 2 5 I'm not sure that this is any less confusing to the end user than having to manage a DataFrameList(), but it does not require any new features. Martin of SummarizedExperiment. Best, Davide [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Multiple colData in SummarizedExperiment
On 06/17/2015 11:41 AM, davide risso wrote: Dear list, I'm creating an R package to store RNA-seq data of a somewhat large project in which I'm involved. One of the initial goals is to compare different pre-processing pipelines, hence I have multiple expression matrices corresponding to the same samples. The SummarizedExperiment class seems a good candidate, since I have multiple expression matrices with the same rowData and colData information. I have several sample-specific variables that I want to store with the object, namely, experimental information (e.g., batch, date, experimental condition, ...) and sample quality (e.g., proportion of aligned reads, total duplicate reads, etc...). Of course, I can always create one big data frame concatenating the two (experimental info + sample quality), but it seems that both conceptually and practically, it might be useful to have two separate data frames. Since this seems somewhat a reasonably standard type of information that one would want to carry on, I was wondering if it would be possible / useful to allow the user to have multiple data.frames in the colData slot Actually, colData() is a DataFrame, and a DataFrame column can contain a DataFrame. So after example(SummarizedExperiment) we could make some faux sample quality data quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1)) add this as a column in the colData() colData(se1)$quality = quality (or create the SummarizedExperiment from a similar DataFrame up-front) and manage our grouped data colData(se1) DataFrame with 6 rows and 2 columns Treatment quality character DataFrame AChIP B Input CChIP D Input EChIP F Input colData(se1[,1:2])$quality DataFrame with 2 rows and 2 columns x y integer integer A 1 6 B 2 5 I'm not sure that this is any less confusing to the end user than having to manage a DataFrameList(), but it does not require any new features. Martin of SummarizedExperiment. Best, Davide [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Multiple colData in SummarizedExperiment
Oh wow, I didn't know you could put a DataFrame into a single column of another DataFrame. That actually solves a problem for me too (I don't intend to expose nested DataFrames to the users though). On 6/17/15 7:23 PM, Martin Morgan wrote: On 06/17/2015 11:41 AM, davide risso wrote: Dear list, I'm creating an R package to store RNA-seq data of a somewhat large project in which I'm involved. One of the initial goals is to compare different pre-processing pipelines, hence I have multiple expression matrices corresponding to the same samples. The SummarizedExperiment class seems a good candidate, since I have multiple expression matrices with the same rowData and colData information. I have several sample-specific variables that I want to store with the object, namely, experimental information (e.g., batch, date, experimental condition, ...) and sample quality (e.g., proportion of aligned reads, total duplicate reads, etc...). Of course, I can always create one big data frame concatenating the two (experimental info + sample quality), but it seems that both conceptually and practically, it might be useful to have two separate data frames. Since this seems somewhat a reasonably standard type of information that one would want to carry on, I was wondering if it would be possible / useful to allow the user to have multiple data.frames in the colData slot Actually, colData() is a DataFrame, and a DataFrame column can contain a DataFrame. So after example(SummarizedExperiment) we could make some faux sample quality data quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1)) add this as a column in the colData() colData(se1)$quality = quality (or create the SummarizedExperiment from a similar DataFrame up-front) and manage our grouped data colData(se1) DataFrame with 6 rows and 2 columns Treatment quality character DataFrame AChIP B Input CChIP D Input EChIP F Input colData(se1[,1:2])$quality DataFrame with 2 rows and 2 columns x y integer integer A 1 6 B 2 5 I'm not sure that this is any less confusing to the end user than having to manage a DataFrameList(), but it does not require any new features. Martin of SummarizedExperiment. Best, Davide [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel