Re: [Bioc-devel] Multiple colData in SummarizedExperiment

2015-06-18 Thread Kasper Daniel Hansen
I think the more clean solution for Davide (if he inists on having separate
objects; I decided against it in minfi) is to extend the class to allow
this.

Kasper

On Thu, Jun 18, 2015 at 12:25 AM, Ryan r...@thompsonclan.org wrote:

 Oh wow, I didn't know you could put a DataFrame into a single column of
 another DataFrame. That actually solves a problem for me too (I don't
 intend to expose nested DataFrames to the users though).


 On 6/17/15 7:23 PM, Martin Morgan wrote:

 On 06/17/2015 11:41 AM, davide risso wrote:

 Dear list,

 I'm creating an R package to store RNA-seq data of a somewhat large
 project
 in which I'm involved.

 One of the initial goals is to compare different pre-processing
 pipelines,
 hence I have multiple expression matrices corresponding to the same
 samples.
 The SummarizedExperiment class seems a good candidate, since I have
 multiple expression matrices with the same rowData and colData
 information.

 I have several sample-specific variables that I want to store with the
 object, namely, experimental information (e.g., batch, date, experimental
 condition, ...) and sample quality (e.g., proportion of aligned reads,
 total duplicate reads, etc...).

 Of course, I can always create one big data frame concatenating the two
 (experimental info + sample quality), but it seems that both conceptually
 and practically, it might be useful to have two separate data frames.
 Since this seems somewhat a reasonably standard type of information that
 one would want to carry on, I was wondering if it would be possible /
 useful to allow the user to have multiple data.frames in the colData slot


 Actually, colData() is a DataFrame, and a DataFrame column can contain a
 DataFrame. So after

   example(SummarizedExperiment)

 we could make some faux sample quality data

   quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1))

 add this as a column in the colData()

   colData(se1)$quality = quality

 (or create the SummarizedExperiment from a similar DataFrame up-front)
 and manage our grouped data

  colData(se1)
 DataFrame with 6 rows and 2 columns
 Treatment quality
   character DataFrame
 AChIP
 B   Input
 CChIP
 D   Input
 EChIP
 F   Input
  colData(se1[,1:2])$quality
 DataFrame with 2 rows and 2 columns
   x y
   integer integer
 A 1 6
 B 2 5

 I'm not sure that this is any less confusing to the end user than having
 to manage a DataFrameList(), but it does not require any new features.

 Martin

  of SummarizedExperiment.

 Best,
 Davide

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Multiple colData in SummarizedExperiment

2015-06-18 Thread Vincent Carey
yes, if a formal extension is warranted.  the metadata slot could also be
used.

On Thu, Jun 18, 2015 at 2:59 PM, Kasper Daniel Hansen 
kasperdanielhan...@gmail.com wrote:

 I think the more clean solution for Davide (if he inists on having separate
 objects; I decided against it in minfi) is to extend the class to allow
 this.

 Kasper

 On Thu, Jun 18, 2015 at 12:25 AM, Ryan r...@thompsonclan.org wrote:

  Oh wow, I didn't know you could put a DataFrame into a single column of
  another DataFrame. That actually solves a problem for me too (I don't
  intend to expose nested DataFrames to the users though).
 
 
  On 6/17/15 7:23 PM, Martin Morgan wrote:
 
  On 06/17/2015 11:41 AM, davide risso wrote:
 
  Dear list,
 
  I'm creating an R package to store RNA-seq data of a somewhat large
  project
  in which I'm involved.
 
  One of the initial goals is to compare different pre-processing
  pipelines,
  hence I have multiple expression matrices corresponding to the same
  samples.
  The SummarizedExperiment class seems a good candidate, since I have
  multiple expression matrices with the same rowData and colData
  information.
 
  I have several sample-specific variables that I want to store with the
  object, namely, experimental information (e.g., batch, date,
 experimental
  condition, ...) and sample quality (e.g., proportion of aligned reads,
  total duplicate reads, etc...).
 
  Of course, I can always create one big data frame concatenating the two
  (experimental info + sample quality), but it seems that both
 conceptually
  and practically, it might be useful to have two separate data frames.
  Since this seems somewhat a reasonably standard type of information
 that
  one would want to carry on, I was wondering if it would be possible /
  useful to allow the user to have multiple data.frames in the colData
 slot
 
 
  Actually, colData() is a DataFrame, and a DataFrame column can contain a
  DataFrame. So after
 
example(SummarizedExperiment)
 
  we could make some faux sample quality data
 
quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1))
 
  add this as a column in the colData()
 
colData(se1)$quality = quality
 
  (or create the SummarizedExperiment from a similar DataFrame up-front)
  and manage our grouped data
 
   colData(se1)
  DataFrame with 6 rows and 2 columns
  Treatment quality
character DataFrame
  AChIP
  B   Input
  CChIP
  D   Input
  EChIP
  F   Input
   colData(se1[,1:2])$quality
  DataFrame with 2 rows and 2 columns
x y
integer integer
  A 1 6
  B 2 5
 
  I'm not sure that this is any less confusing to the end user than having
  to manage a DataFrameList(), but it does not require any new features.
 
  Martin
 
   of SummarizedExperiment.
 
  Best,
  Davide
 
  [[alternative HTML version deleted]]
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 
 
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Multiple colData in SummarizedExperiment

2015-06-18 Thread Kasper Daniel Hansen
you can just implement this by having reserved column names in the colData
slot; that will work and will take appr. 23 seconds to implement.  I agree
it is not as clean from a design perspective, but you get 100% of the
functionality and you can write a separate checker for the colData argument.

On Thu, Jun 18, 2015 at 2:00 PM, davide risso risso.dav...@gmail.com
wrote:

 Thank you all for the responses.

 I didn't think about the nested DataFrame solution.  It should work.
 I agree that an extension might be cleaner, but I clearly need to give it
 more thought.

 One of the reasons I wanted to have quality and metadata as separate slots
 is that one could enforce that all the qualities are numeric, and have a
 quality() method to extract just the quality scores (e.g., for plotting /
 quality control). Having them in the same slot makes it harder for the user
 to extract just the scores (if the column order and/or names are not
 standardized).

 Best,
 davide


 On Thu, Jun 18, 2015 at 6:35 AM Vincent Carey st...@channing.harvard.edu
 wrote:

 yes, if a formal extension is warranted.  the metadata slot could also be
 used.

 On Thu, Jun 18, 2015 at 2:59 PM, Kasper Daniel Hansen 
 kasperdanielhan...@gmail.com wrote:

  I think the more clean solution for Davide (if he inists on having
 separate
  objects; I decided against it in minfi) is to extend the class to allow
  this.
 
  Kasper
 
  On Thu, Jun 18, 2015 at 12:25 AM, Ryan r...@thompsonclan.org wrote:
 
   Oh wow, I didn't know you could put a DataFrame into a single column
 of
   another DataFrame. That actually solves a problem for me too (I don't
   intend to expose nested DataFrames to the users though).
  
  
   On 6/17/15 7:23 PM, Martin Morgan wrote:
  
   On 06/17/2015 11:41 AM, davide risso wrote:
  
   Dear list,
  
   I'm creating an R package to store RNA-seq data of a somewhat large
   project
   in which I'm involved.
  
   One of the initial goals is to compare different pre-processing
   pipelines,
   hence I have multiple expression matrices corresponding to the same
   samples.
   The SummarizedExperiment class seems a good candidate, since I have
   multiple expression matrices with the same rowData and colData
   information.
  
   I have several sample-specific variables that I want to store with
 the
   object, namely, experimental information (e.g., batch, date,
  experimental
   condition, ...) and sample quality (e.g., proportion of aligned
 reads,
   total duplicate reads, etc...).
  
   Of course, I can always create one big data frame concatenating the
 two
   (experimental info + sample quality), but it seems that both
  conceptually
   and practically, it might be useful to have two separate data
 frames.
   Since this seems somewhat a reasonably standard type of information
  that
   one would want to carry on, I was wondering if it would be possible
 /
   useful to allow the user to have multiple data.frames in the colData
  slot
  
  
   Actually, colData() is a DataFrame, and a DataFrame column can
 contain a
   DataFrame. So after
  
 example(SummarizedExperiment)
  
   we could make some faux sample quality data
  
 quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1))
  
   add this as a column in the colData()
  
 colData(se1)$quality = quality
  
   (or create the SummarizedExperiment from a similar DataFrame
 up-front)
   and manage our grouped data
  
colData(se1)
   DataFrame with 6 rows and 2 columns
   Treatment quality
 character DataFrame
   AChIP
   B   Input
   CChIP
   D   Input
   EChIP
   F   Input
colData(se1[,1:2])$quality
   DataFrame with 2 rows and 2 columns
 x y
 integer integer
   A 1 6
   B 2 5
  
   I'm not sure that this is any less confusing to the end user than
 having
   to manage a DataFrameList(), but it does not require any new
 features.
  
   Martin
  
of SummarizedExperiment.
  
   Best,
   Davide
  
   [[alternative HTML version deleted]]
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
  
  
  
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
  
 
  [[alternative HTML version deleted]]
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list

Re: [Bioc-devel] Multiple colData in SummarizedExperiment

2015-06-18 Thread davide risso
Thanks Kasper,

I think that's a good solution.

Best,
Davide
On Thu, Jun 18, 2015 at 11:51 AM Kasper Daniel Hansen 
kasperdanielhan...@gmail.com wrote:

 you can just implement this by having reserved column names in the colData
 slot; that will work and will take appr. 23 seconds to implement.  I agree
 it is not as clean from a design perspective, but you get 100% of the
 functionality and you can write a separate checker for the colData argument.

 On Thu, Jun 18, 2015 at 2:00 PM, davide risso risso.dav...@gmail.com
 wrote:

 Thank you all for the responses.

 I didn't think about the nested DataFrame solution.  It should work.
 I agree that an extension might be cleaner, but I clearly need to give it
 more thought.

 One of the reasons I wanted to have quality and metadata as separate
 slots is that one could enforce that all the qualities are numeric, and
 have a quality() method to extract just the quality scores (e.g., for
 plotting / quality control). Having them in the same slot makes it harder
 for the user to extract just the scores (if the column order and/or names
 are not standardized).

 Best,
 davide


 On Thu, Jun 18, 2015 at 6:35 AM Vincent Carey st...@channing.harvard.edu
 wrote:

 yes, if a formal extension is warranted.  the metadata slot could also be
 used.

 On Thu, Jun 18, 2015 at 2:59 PM, Kasper Daniel Hansen 
 kasperdanielhan...@gmail.com wrote:

  I think the more clean solution for Davide (if he inists on having
 separate
  objects; I decided against it in minfi) is to extend the class to allow
  this.
 
  Kasper
 
  On Thu, Jun 18, 2015 at 12:25 AM, Ryan r...@thompsonclan.org wrote:
 
   Oh wow, I didn't know you could put a DataFrame into a single column
 of
   another DataFrame. That actually solves a problem for me too (I don't
   intend to expose nested DataFrames to the users though).
  
  
   On 6/17/15 7:23 PM, Martin Morgan wrote:
  
   On 06/17/2015 11:41 AM, davide risso wrote:
  
   Dear list,
  
   I'm creating an R package to store RNA-seq data of a somewhat large
   project
   in which I'm involved.
  
   One of the initial goals is to compare different pre-processing
   pipelines,
   hence I have multiple expression matrices corresponding to the same
   samples.
   The SummarizedExperiment class seems a good candidate, since I have
   multiple expression matrices with the same rowData and colData
   information.
  
   I have several sample-specific variables that I want to store with
 the
   object, namely, experimental information (e.g., batch, date,
  experimental
   condition, ...) and sample quality (e.g., proportion of aligned
 reads,
   total duplicate reads, etc...).
  
   Of course, I can always create one big data frame concatenating
 the two
   (experimental info + sample quality), but it seems that both
  conceptually
   and practically, it might be useful to have two separate data
 frames.
   Since this seems somewhat a reasonably standard type of information
  that
   one would want to carry on, I was wondering if it would be
 possible /
   useful to allow the user to have multiple data.frames in the
 colData
  slot
  
  
   Actually, colData() is a DataFrame, and a DataFrame column can
 contain a
   DataFrame. So after
  
 example(SummarizedExperiment)
  
   we could make some faux sample quality data
  
 quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1))
  
   add this as a column in the colData()
  
 colData(se1)$quality = quality
  
   (or create the SummarizedExperiment from a similar DataFrame
 up-front)
   and manage our grouped data
  
colData(se1)
   DataFrame with 6 rows and 2 columns
   Treatment quality
 character DataFrame
   AChIP
   B   Input
   CChIP
   D   Input
   EChIP
   F   Input
colData(se1[,1:2])$quality
   DataFrame with 2 rows and 2 columns
 x y
 integer integer
   A 1 6
   B 2 5
  
   I'm not sure that this is any less confusing to the end user than
 having
   to manage a DataFrameList(), but it does not require any new
 features.
  
   Martin
  
of SummarizedExperiment.
  
   Best,
   Davide
  
   [[alternative HTML version deleted]]
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
  
  
  
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
  
 
  [[alternative HTML version deleted]]
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




   

Re: [Bioc-devel] Multiple colData in SummarizedExperiment

2015-06-18 Thread davide risso
Thank you all for the responses.

I didn't think about the nested DataFrame solution.  It should work.
I agree that an extension might be cleaner, but I clearly need to give it
more thought.

One of the reasons I wanted to have quality and metadata as separate slots
is that one could enforce that all the qualities are numeric, and have a
quality() method to extract just the quality scores (e.g., for plotting /
quality control). Having them in the same slot makes it harder for the user
to extract just the scores (if the column order and/or names are not
standardized).

Best,
davide


On Thu, Jun 18, 2015 at 6:35 AM Vincent Carey st...@channing.harvard.edu
wrote:

 yes, if a formal extension is warranted.  the metadata slot could also be
 used.

 On Thu, Jun 18, 2015 at 2:59 PM, Kasper Daniel Hansen 
 kasperdanielhan...@gmail.com wrote:

  I think the more clean solution for Davide (if he inists on having
 separate
  objects; I decided against it in minfi) is to extend the class to allow
  this.
 
  Kasper
 
  On Thu, Jun 18, 2015 at 12:25 AM, Ryan r...@thompsonclan.org wrote:
 
   Oh wow, I didn't know you could put a DataFrame into a single column of
   another DataFrame. That actually solves a problem for me too (I don't
   intend to expose nested DataFrames to the users though).
  
  
   On 6/17/15 7:23 PM, Martin Morgan wrote:
  
   On 06/17/2015 11:41 AM, davide risso wrote:
  
   Dear list,
  
   I'm creating an R package to store RNA-seq data of a somewhat large
   project
   in which I'm involved.
  
   One of the initial goals is to compare different pre-processing
   pipelines,
   hence I have multiple expression matrices corresponding to the same
   samples.
   The SummarizedExperiment class seems a good candidate, since I have
   multiple expression matrices with the same rowData and colData
   information.
  
   I have several sample-specific variables that I want to store with
 the
   object, namely, experimental information (e.g., batch, date,
  experimental
   condition, ...) and sample quality (e.g., proportion of aligned
 reads,
   total duplicate reads, etc...).
  
   Of course, I can always create one big data frame concatenating the
 two
   (experimental info + sample quality), but it seems that both
  conceptually
   and practically, it might be useful to have two separate data frames.
   Since this seems somewhat a reasonably standard type of information
  that
   one would want to carry on, I was wondering if it would be possible /
   useful to allow the user to have multiple data.frames in the colData
  slot
  
  
   Actually, colData() is a DataFrame, and a DataFrame column can
 contain a
   DataFrame. So after
  
 example(SummarizedExperiment)
  
   we could make some faux sample quality data
  
 quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1))
  
   add this as a column in the colData()
  
 colData(se1)$quality = quality
  
   (or create the SummarizedExperiment from a similar DataFrame up-front)
   and manage our grouped data
  
colData(se1)
   DataFrame with 6 rows and 2 columns
   Treatment quality
 character DataFrame
   AChIP
   B   Input
   CChIP
   D   Input
   EChIP
   F   Input
colData(se1[,1:2])$quality
   DataFrame with 2 rows and 2 columns
 x y
 integer integer
   A 1 6
   B 2 5
  
   I'm not sure that this is any less confusing to the end user than
 having
   to manage a DataFrameList(), but it does not require any new features.
  
   Martin
  
of SummarizedExperiment.
  
   Best,
   Davide
  
   [[alternative HTML version deleted]]
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
  
  
  
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
  
 
  [[alternative HTML version deleted]]
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Multiple colData in SummarizedExperiment

2015-06-17 Thread Martin Morgan

On 06/17/2015 11:41 AM, davide risso wrote:

Dear list,

I'm creating an R package to store RNA-seq data of a somewhat large project
in which I'm involved.

One of the initial goals is to compare different pre-processing pipelines,
hence I have multiple expression matrices corresponding to the same samples.
The SummarizedExperiment class seems a good candidate, since I have
multiple expression matrices with the same rowData and colData information.

I have several sample-specific variables that I want to store with the
object, namely, experimental information (e.g., batch, date, experimental
condition, ...) and sample quality (e.g., proportion of aligned reads,
total duplicate reads, etc...).

Of course, I can always create one big data frame concatenating the two
(experimental info + sample quality), but it seems that both conceptually
and practically, it might be useful to have two separate data frames.
Since this seems somewhat a reasonably standard type of information that
one would want to carry on, I was wondering if it would be possible /
useful to allow the user to have multiple data.frames in the colData slot


Actually, colData() is a DataFrame, and a DataFrame column can contain a 
DataFrame. So after


  example(SummarizedExperiment)

we could make some faux sample quality data

  quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1))

add this as a column in the colData()

  colData(se1)$quality = quality

(or create the SummarizedExperiment from a similar DataFrame up-front) and 
manage our grouped data


 colData(se1)
DataFrame with 6 rows and 2 columns
Treatment quality
  character DataFrame
AChIP
B   Input
CChIP
D   Input
EChIP
F   Input
 colData(se1[,1:2])$quality
DataFrame with 2 rows and 2 columns
  x y
  integer integer
A 1 6
B 2 5

I'm not sure that this is any less confusing to the end user than having to 
manage a DataFrameList(), but it does not require any new features.


Martin


of SummarizedExperiment.

Best,
Davide

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Multiple colData in SummarizedExperiment

2015-06-17 Thread Ryan
Oh wow, I didn't know you could put a DataFrame into a single column of 
another DataFrame. That actually solves a problem for me too (I don't 
intend to expose nested DataFrames to the users though).


On 6/17/15 7:23 PM, Martin Morgan wrote:

On 06/17/2015 11:41 AM, davide risso wrote:

Dear list,

I'm creating an R package to store RNA-seq data of a somewhat large 
project

in which I'm involved.

One of the initial goals is to compare different pre-processing 
pipelines,
hence I have multiple expression matrices corresponding to the same 
samples.

The SummarizedExperiment class seems a good candidate, since I have
multiple expression matrices with the same rowData and colData 
information.


I have several sample-specific variables that I want to store with the
object, namely, experimental information (e.g., batch, date, 
experimental

condition, ...) and sample quality (e.g., proportion of aligned reads,
total duplicate reads, etc...).

Of course, I can always create one big data frame concatenating the two
(experimental info + sample quality), but it seems that both 
conceptually

and practically, it might be useful to have two separate data frames.
Since this seems somewhat a reasonably standard type of information that
one would want to carry on, I was wondering if it would be possible /
useful to allow the user to have multiple data.frames in the colData 
slot


Actually, colData() is a DataFrame, and a DataFrame column can contain 
a DataFrame. So after


  example(SummarizedExperiment)

we could make some faux sample quality data

  quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1))

add this as a column in the colData()

  colData(se1)$quality = quality

(or create the SummarizedExperiment from a similar DataFrame up-front) 
and manage our grouped data


 colData(se1)
DataFrame with 6 rows and 2 columns
Treatment quality
  character DataFrame
AChIP
B   Input
CChIP
D   Input
EChIP
F   Input
 colData(se1[,1:2])$quality
DataFrame with 2 rows and 2 columns
  x y
  integer integer
A 1 6
B 2 5

I'm not sure that this is any less confusing to the end user than 
having to manage a DataFrameList(), but it does not require any new 
features.


Martin


of SummarizedExperiment.

Best,
Davide

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel






___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel