[Bioc-devel] change names(assays(SummarizedExperiment)) w/o copy?

2014-05-07 Thread Michael Love

Is there a way that I can change the names of the assays slot of a
SummarizedExperiment, without making a new copy of the data contained
within? Assume I get an SE which has already been constructed, but no
names on the assays() SimpleList.



   used (Mb) gc trigger (Mb) max used (Mb)
 Ncells 1291106   691710298 91.4  1590760 85.0
 Vcells 117861991925843 14.7  1724123 13.2
  m - matrix(1:2e7, ncol=10)
used (Mb) gc trigger  (Mb) max used  (Mb)
 Ncells  129 69.01967602 105.1  1590760  85.0
 Vcells 11178604 85.3   22482701 171.6 21178631 161.6

# made a ~75 Mb matrix

  colnames(m) - letters[1:10]
used (Mb) gc trigger  (Mb) max used  (Mb)
 Ncells  1291149 69.01967602 105.1  1590760  85.0
 Vcells 11178679 85.3   22482701 171.6 21179851 161.6
  se - SummarizedExperiment(m)
used (Mb) gc trigger  (Mb) max used  (Mb)
 Ncells  1302603 69.61967602 105.1  1623929  86.8
 Vcells 12189777 93.1   22482701 171.6 21179851 161.6

# so far no copying

  names(assays(se)) - counts
used  (Mb) gc trigger  (Mb) max used  (Mb)
 Ncells  1303174  69.61967602 105.1  1623929  86.8
 Vcells 22190847 169.4   23686836 180.8 22203423 169.4

# last step made a copy

 R Under development (unstable) (2014-05-07 r65539)
 Platform: x86_64-apple-darwin12.5.0 (64-bit)

 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] parallel  stats graphics  grDevices utils datasets  methods
 [8] base

 other attached packages:
 [1] GenomicRanges_1.17.12 GenomeInfoDb_1.1.3IRanges_1.99.13
 [4] S4Vectors_0.0.6   BiocGenerics_0.11.2

 loaded via a namespace (and not attached):
 [1] RCurl_1.95-4.1 stats4_3.2.0   XVector_0.5.6

Bioc-devel@r-project.org mailing list

Re: [Bioc-devel] change names(assays(SummarizedExperiment)) w/o copy?

2014-05-07 Thread Martin Morgan

On 05/07/2014 12:06 PM, Michael Love wrote:


Is there a way that I can change the names of the assays slot of a
SummarizedExperiment, without making a new copy of the data contained
within? Assume I get an SE which has already been constructed, but no
names on the assays() SimpleList.

Hi Mike --

  names(assays(se)) = counts

extracts the assays from se, then applies the names to the SimpleList, then 
re-assigns the SimpleList to the SummarizedExperiment. The memory copy (of big 
data) is actually in the extraction assays(se)

 m = matrix(0, 0, 0); tracemem(m)
[1] 0x3449b4e8
 se = SummarizedExperiment(m)
 a = assays(se)
tracemem[0x3449b4e8 - 0x34ef64f0]: lapply lapply lapply lapply endoapply 
endoapply assays assays

which can actually be avoided by asking for the assays without their dimnames

 a = assays(se, withDimnames=FALSE)

and from there

  names(a) = counts
  assays(se) = a

verifying that we haven't actually copied the matrix

 .Internal(inspect(assays(se, withDimnames=FALSE)[[1]]))
@3449b4e8 14 REALSXP g0c0 [NAM(2),TR,ATT] (len=0, tl=0)
  @3449b4b0 02 LISTSXP g0c0 []
TAG: @b9c778 01 SYMSXP g0c0 [LCK,gp=0x4000] dim (has value)
@3449a118 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 0,0
@3449b4e8 14 REALSXP g0c0 [NAM(2),TR,ATT] (len=0, tl=0)
  @3449b4b0 02 LISTSXP g0c0 []
TAG: @b9c778 01 SYMSXP g0c0 [LCK,gp=0x4000] dim (has value)
@3449a118 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 0,0

One would hope (a) that I'd followed through on a previous promise to just apply 
the dimnames up-front, so that there is no need to use withDimnames=FALSE to 
avoid the copying (there might have been a price on the way in) and (b) that the 
following would work

  names(assays(se, withDimnames=FALSE)) = counts

it didn't

 names(assays(se, withDimnames=FALSE)) = counts
Error in slot(x, nm) :
  no slot of name withDimnames for this object of class SummarizedExperiment

but does in 1.17.13





used (Mb) gc trigger (Mb) max used (Mb)
  Ncells 1291106   691710298 91.4  1590760 85.0
  Vcells 117861991925843 14.7  1724123 13.2
   m - matrix(1:2e7, ncol=10)
 used (Mb) gc trigger  (Mb) max used  (Mb)
  Ncells  129 69.01967602 105.1  1590760  85.0
  Vcells 11178604 85.3   22482701 171.6 21178631 161.6

# made a ~75 Mb matrix

   colnames(m) - letters[1:10]
 used (Mb) gc trigger  (Mb) max used  (Mb)
  Ncells  1291149 69.01967602 105.1  1590760  85.0
  Vcells 11178679 85.3   22482701 171.6 21179851 161.6
   se - SummarizedExperiment(m)
 used (Mb) gc trigger  (Mb) max used  (Mb)
  Ncells  1302603 69.61967602 105.1  1623929  86.8
  Vcells 12189777 93.1   22482701 171.6 21179851 161.6

# so far no copying

   names(assays(se)) - counts
 used  (Mb) gc trigger  (Mb) max used  (Mb)
  Ncells  1303174  69.61967602 105.1  1623929  86.8
  Vcells 22190847 169.4   23686836 180.8 22203423 169.4

# last step made a copy


  R Under development (unstable) (2014-05-07 r65539)
  Platform: x86_64-apple-darwin12.5.0 (64-bit)

  [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

  attached base packages:
  [1] parallel  stats graphics  grDevices utils datasets  methods
  [8] base

  other attached packages:
  [1] GenomicRanges_1.17.12 GenomeInfoDb_1.1.3IRanges_1.99.13
  [4] S4Vectors_0.0.6   BiocGenerics_0.11.2

  loaded via a namespace (and not attached):
  [1] RCurl_1.95-4.1 stats4_3.2.0   XVector_0.5.6

Bioc-devel@r-project.org mailing list

Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

Bioc-devel@r-project.org mailing list