Re: [aroma.affymetrix] Raw copy numbers calculation using common reference instead of average of all samples

2013-08-06 Thread Henrik Bengtsson
Hi.

On Tue, Aug 6, 2013 at 12:31 PM, ying chen njs...@gmail.com wrote:
 Hi Henrik,


 Sorry I have one more question to bug you. The urls in the df (by df -
 readDataFrame(db);) point to UCSCgenome browser with NCBI36/hg18 assembly. I
 looked at the http://www.aroma-project.org/chipTypes/GenomeWideSNP_6 page
 resource part, the SNP6 chip type files are listed as based on hg19 (Human
 Genome build v36.1). This is a typo, right? I thought it should be hg18. So
 the Cbs out put is also based on hg18, right?

The genome version (hg18) in the URLs to the USCC Genome Browser are
(still) hard coded.  Please see:

  https://groups.google.com/d/msg/aroma-affymetrix/r8CtCvdGWnE/jzQFMifxt2MJ

for more informations/details.

/Henrik


 Thanks a lot for the help!

 Sean







 On Fri, Aug 2, 2013 at 11:03 AM, Henrik Bengtsson
 henrik.bengts...@aroma-project.org wrote:

 On Thu, Aug 1, 2013 at 6:37 PM, ying chen njs...@gmail.com wrote:
  Hi Henrik,
 
  Thanks a lot for the help.
 
  For sm - CbsModel(dsTR, ref=dfR, tags=HapMapRef), you mean sm -
  CbsModel(dsT, ref=dfR, tags=HapMapRef), right?

 Correct, I meant:

 sm - CbsModel(dsT, ref=dfR, tags=HapMapRef)

 /H

 
  Thanks,
 
  Sean
 
 
 
  On Thu, Aug 1, 2013 at 6:00 PM, Henrik Bengtsson
  henrik.bengts...@aroma-project.org wrote:
 
  On Mon, Jul 29, 2013 at 7:45 PM, ying chen njs...@gmail.com wrote:
   Hi Henrik,
   Thanks a lot for the help!
   Sorry I have more questions. I am following How to: Calculate total
   copy
   number ratios from total (non-polymorphic) signals and Vignette:
   Total
   copy-number segmentation (non-paired CBS), but I am not sure if I do
   it
   correctly.
   I have two SNP6 datasets Tumor and HapMap270 and I want to use
   HpaMap270
   as
   reference to go all the way to CBS step. So I do the following steps
   respectively.
  ds1 - doCRMAv2(HapMap270, chipType=GenomeWideSNP_6,Full)
  ds2 - doCRMAv2(Tumor, chipType=GenomeWideSNP_6,Full)
   After that, I do
 dataSet - HapMap270
 tags - ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
 chipType - GenomeWideSNP_6
 dsN - AromaUnitTotalCnBinarySet$byName(dataSet, tags=tags,
   chipType=chipType)
 dataSet - Tumor
 tags - ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
 chipType - GenomeWideSNP_6
 dsT - AromaUnitTotalCnBinarySet$byName(dataSet, tags=tags,
   chipType=chipType)
 dfR - getAverageFile(dsN)   # ?
 dsTR - exportTotalCnRatioSet(dsT, ref=dfR) # ?
   Would the above two steps work? My question is how to go ahead from
   here
   to
   do CBS and will sm - CbsModel(dsTR) work?
 
  Skip the exportTotalCnRatioSet() call, and instead use:
 
  sm - CbsModel(dsTR, ref=dfR, tags=HapMapRef);
 
  The 'tags' is just to add an informative tag to the output data set.
 
  /Henrik
 
   Thanks again for your help!
   Sean
  
  
  
  
   On Sun, Jul 28, 2013 at 4:28 PM, Henrik Bengtsson
   henrik.bengts...@aroma-project.org wrote:
  
   Hi.
  
   On Fri, Jul 26, 2013 at 8:02 AM, sean nj njs...@gmail.com wrote:
Hi guys,
   
I have a question regarding how to calculate raw copy numbers
using
common
reference instead of average of all samples of the study.
Basically I
want
to use average of HapMap270 samples as reference for all further
copy
number
calculations.
   
I have a bunch HapMap270 snp6 cel files and I followed Vignette:
Estimation
of total copy numbers using the CRMA v2 method (10K-CytoScanHD) to
Step
5 -
Calculation of raw copy numbers, and generated ceR and saved it as
a
RData
file ceR.Rdata.
  
   It's important to understand that almost all objects in the Aroma
   framework are basically pointers to external files.  For instance,
   your 'ceR', which I assume you've got from something like ceR -
   getAverageFile(ces), is referring to the file with pathname
   getPathname(ceR).  More below...
  
   
My first question is, how to use this data for any future copy
number
analysis? My guess is that instead of calculating the ceR from the
sample
set I can just load the ceR.RData file I saved and use it. Right?
  
   First of all, please note that when do ceR - getAverageFile(ces) on
   the same data set 'ces', the result is already available on file and
   it will be quickly found and returned.  In other words, it will not
   recalcuate the averages again [unless you do ceR -
   getAverageFile(ces, force=TRUE)].
  
   However, I do understand that you may not want to have to keep a
   large
   'ces' data set around, when you're only interested in the pooled
   average.  In that case, I would copy the file containing the
   average
   to a new data set.  Currently, this is not straightforward in Aroma
   (I'll think about something), but you can do the following:
  
   # Calculate the pooled average
ceR - getAverageFile(cesN);
  
   # Copy this file to plmData/HapMap270,pooled/GenomeWideSNP_6/, e.g.
filename - getFilename(ceR);
filename
   [1]
  
   

Re: [aroma.affymetrix] Raw copy numbers calculation using common reference instead of average of all samples

2013-08-01 Thread ying chen
Hi Henrik,

I tried method I mentioned above. But I got an error message when running
fit(sm, verbose=-10).



 Array #1 ('321T') of 291 on chromosome 1...
 Error in UseMethod(getChecksum) :
   no applicable method for 'getChecksum' applied to an object of class
list

What did I do wrong?

Thanks a lot for the help!

Sean


 library(aroma.affymetrix)
 dataSet - HapMap270
 tags - ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
 chipType - GenomeWideSNP_6
 dsN - AromaUnitTotalCnBinarySet$byName(dataSet, tags=tags,
chipType=chipType)
 dataSet - Tumor
 dsT - AromaUnitTotalCnBinarySet$byName(dataSet, tags=tags,
chipType=chipType)
 dfR - getAverageFile(dsN)
 dsTR - exportTotalCnRatioSet(dsT, ref=dfR)
There were 50 or more warnings (use warnings() to see the first 50)
 warnings()
Warning messages:
1: In log(C) : NaNs produced
2: In log(C) : NaNs produced
3: In log(C) : NaNs produced
...
 print(dsTR)
AromaUnitTotalCnBinarySet:
Name: Tumor
Tags: ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
Full name: Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
Number of files: 291
Names: 321T, 322T, 323T, ..., T97 [291]
Path (to the first file):
rawCnData/Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY/GenomeWideSNP_6
Total file size: 2088.72 MB
RAM: 0.37MB
 print(dsT)
AromaUnitTotalCnBinarySet:
Name: Tumor
Tags: ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
Full name: Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
Number of files: 291
Names: 321T, 322T, 323T, ..., T97 [291]
Path (to the first file):
totalAndFracBData/Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY/GenomeWideSNP_6
Total file size: 2088.64 MB
RAM: 0.37MB

 sm - CbsModel(dsTR)

Loading required package: DNAcopy
 print(sm)
CbsModel:
Name: Tumor
Tags: ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY,paired
Chip type (virtual): GenomeWideSNP_6
Path:
cbsData/Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY,paired/GenomeWideSNP_6
Number of chip types: 1
Sample  reference file pairs:
Chip type #1 ('GenomeWideSNP_6') of 1:
Sample data set:
AromaUnitTotalCnBinarySet:
Name: Tumor
Tags: ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
Full name: Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
Number of files: 291
Names: 321T, 322T, 323T, ..., T97 [291]
Path (to the first file):
rawCnData/Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY/GenomeWideSNP_6
Total file size: 2088.72 MB
RAM: 0.37MB
Reference: median of samples
RAM: 0.00MB

 fit(sm, verbose=-10)
Building tuple of reference sets...
 Type of reference: median
 No reference available.
 Calculating average copy-number signals...
  Retrieving average unit signals across 291 arrays...
   AromaUnitTotalCnBinaryFile:
   Name: .average-signals-median-mad
   Tags: 8143f776d59b4c432debaf4bb288
   Full name: .average-signals-median-mad,8143f776d59b4c432debaf4bb288
   Pathname:
rawCnData/Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY/GenomeWideSNP_6/.average-signals-median-mad,8143f776d59b4c432debaf4bb288.asb
   File size: 7.18 MB (7526027 bytes)
   RAM: 0.00 MB
   Number of data rows: 1881415
   File format: v1
   Dimensions: 1881415x1
   Column classes: double
   Number of bytes per column: 4
   Footer: createdOn20130801 10:20:33
EDT/createdOnplatformAffymetrix/platformchipTypeGenomeWideSNP_6,Full/chipTypesrcDetailsnbrOfFiles291/nbrOfFilescheckSum60c565ccf6239a081e4162b3328e1ccb/checkSum/srcDetailsparamsmeanNamemedian/meanNamesdNamemad/sdName/params
   Platform: Affymetrix
   Chip type: GenomeWideSNP_6,Full
   Number of units to be updated: 1
   Processing chunk...
 chr Indices in chunk:
 int 22933
Reading data...
Reading data...done
Estimating averages and standard deviations...
Estimating averages and standard deviations...done
Writing estimates...
Writing estimates...done
   Processing chunk...done
  Retrieving average unit signals across 291 arrays...done
 Calculating average copy-number signals...done
Building tuple of reference sets...done
Using reference tuple:
AromaUnitTotalCnBinarySetTuple:
Name: Tumor
Tags: ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
Chip types: GenomeWideSNP_6
AromaUnitTotalCnBinarySet:
Name: Tumor
Tags: ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
Full name: Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
Number of files: 291
Names: .average-signals-median-mad, .average-signals-median-mad,
.average-signals-median-mad, ..., .average-signals-median-mad [291]
Path (to the first file):
rawCnData/Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY/GenomeWideSNP_6
Total file size: 2088.62 MB
RAM: 0.37MB
RAM: 0.00MB
Extract DataFileMatrix...
 Array: 1
 Test data sets:
 AromaUnitTotalCnBinarySetTuple:
 Name: Tumor
 Tags: ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
 Chip types: GenomeWideSNP_6
 AromaUnitTotalCnBinarySet:
 Name: Tumor
 Tags: ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
 Full name: Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
 Number of files: 291
 Names: 321T, 322T, 323T, ..., T97 [291]
 Path (to the first file):
rawCnData/Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY/GenomeWideSNP_6
 Total file size: 2088.72 MB
 RAM: 0.37MB
 RAM: 0.00MB
 Test data files:
 $`GenomeWideSNP_6,Full`
 AromaUnitTotalCnBinaryFile:
 Name: 321T
 

Re: [aroma.affymetrix] Raw copy numbers calculation using common reference instead of average of all samples

2013-08-01 Thread ying chen
Sorry, one more question regarding CBS. During

fit(sm, verbose=-10)
Building tuple of reference sets...
 Type of reference: median
 No reference available.
 Calculating average copy-number signals...
  Retrieving average unit signals across 291 arrays...
..


On Thu, Aug 1, 2013 at 10:53 AM, ying chen njs...@gmail.com wrote:

 Hi Henrik,

 I tried method I mentioned above. But I got an error message when running
 fit(sm, verbose=-10).



  Array #1 ('321T') of 291 on chromosome 1...
  Error in UseMethod(getChecksum) :
no applicable method for 'getChecksum' applied to an object of class
 list

 What did I do wrong?

 Thanks a lot for the help!

 Sean


  library(aroma.affymetrix)

  dataSet - HapMap270
  tags - ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
  chipType - GenomeWideSNP_6
  dsN - AromaUnitTotalCnBinarySet$byName(dataSet, tags=tags,
 chipType=chipType)
  dataSet - Tumor
  dsT - AromaUnitTotalCnBinarySet$byName(dataSet, tags=tags,
 chipType=chipType)
  dfR - getAverageFile(dsN)
  dsTR - exportTotalCnRatioSet(dsT, ref=dfR)
 There were 50 or more warnings (use warnings() to see the first 50)
  warnings()
 Warning messages:
 1: In log(C) : NaNs produced
 2: In log(C) : NaNs produced
 3: In log(C) : NaNs produced
 ...
  print(dsTR)
 AromaUnitTotalCnBinarySet:
 Name: Tumor
 Tags: ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
 Full name: Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
 Number of files: 291
 Names: 321T, 322T, 323T, ..., T97 [291]
 Path (to the first file):
 rawCnData/Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY/GenomeWideSNP_6
 Total file size: 2088.72 MB
 RAM: 0.37MB
  print(dsT)
 AromaUnitTotalCnBinarySet:
 Name: Tumor
 Tags: ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
 Full name: Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
 Number of files: 291
 Names: 321T, 322T, 323T, ..., T97 [291]
 Path (to the first file):
 totalAndFracBData/Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY/GenomeWideSNP_6
 Total file size: 2088.64 MB
 RAM: 0.37MB
 
  sm - CbsModel(dsTR)
 
 Loading required package: DNAcopy
  print(sm)
 CbsModel:
 Name: Tumor
 Tags: ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY,paired
 Chip type (virtual): GenomeWideSNP_6
 Path:
 cbsData/Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY,paired/GenomeWideSNP_6
 Number of chip types: 1
 Sample  reference file pairs:
 Chip type #1 ('GenomeWideSNP_6') of 1:
 Sample data set:
 AromaUnitTotalCnBinarySet:
 Name: Tumor
 Tags: ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
 Full name: Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
 Number of files: 291
 Names: 321T, 322T, 323T, ..., T97 [291]
 Path (to the first file):
 rawCnData/Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY/GenomeWideSNP_6
 Total file size: 2088.72 MB
 RAM: 0.37MB
 Reference: median of samples
 RAM: 0.00MB
 
  fit(sm, verbose=-10)
 Building tuple of reference sets...
  Type of reference: median
  No reference available.
  Calculating average copy-number signals...
   Retrieving average unit signals across 291 arrays...
AromaUnitTotalCnBinaryFile:
Name: .average-signals-median-mad
Tags: 8143f776d59b4c432debaf4bb288
Full name: .average-signals-median-mad,8143f776d59b4c432debaf4bb288
Pathname:
 rawCnData/Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY/GenomeWideSNP_6/.average-signals-median-mad,8143f776d59b4c432debaf4bb288.asb
File size: 7.18 MB (7526027 bytes)
RAM: 0.00 MB
Number of data rows: 1881415
File format: v1
Dimensions: 1881415x1
Column classes: double
Number of bytes per column: 4
Footer: createdOn20130801 10:20:33
 EDT/createdOnplatformAffymetrix/platformchipTypeGenomeWideSNP_6,Full/chipTypesrcDetailsnbrOfFiles291/nbrOfFilescheckSum60c565ccf6239a081e4162b3328e1ccb/checkSum/srcDetailsparamsmeanNamemedian/meanNamesdNamemad/sdName/params
Platform: Affymetrix
Chip type: GenomeWideSNP_6,Full
Number of units to be updated: 1
Processing chunk...
  chr Indices in chunk:
  int 22933
 Reading data...
 Reading data...done
 Estimating averages and standard deviations...
 Estimating averages and standard deviations...done
 Writing estimates...
 Writing estimates...done
Processing chunk...done
   Retrieving average unit signals across 291 arrays...done
  Calculating average copy-number signals...done
 Building tuple of reference sets...done
 Using reference tuple:
 AromaUnitTotalCnBinarySetTuple:
 Name: Tumor
 Tags: ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
 Chip types: GenomeWideSNP_6
 AromaUnitTotalCnBinarySet:
 Name: Tumor
 Tags: ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
 Full name: Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
 Number of files: 291
 Names: .average-signals-median-mad, .average-signals-median-mad,
 .average-signals-median-mad, ..., .average-signals-median-mad [291]
 Path (to the first file):
 rawCnData/Tumor,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY/GenomeWideSNP_6
 Total file size: 2088.62 MB
 RAM: 0.37MB
 RAM: 0.00MB
 Extract DataFileMatrix...
  Array: 1
  Test data sets:
  AromaUnitTotalCnBinarySetTuple:
  Name: Tumor
  Tags: 

Re: [aroma.affymetrix] Raw copy numbers calculation using common reference instead of average of all samples

2013-08-01 Thread Henrik Bengtsson
On Mon, Jul 29, 2013 at 7:45 PM, ying chen njs...@gmail.com wrote:
 Hi Henrik,
 Thanks a lot for the help!
 Sorry I have more questions. I am following How to: Calculate total copy
 number ratios from total (non-polymorphic) signals and Vignette: Total
 copy-number segmentation (non-paired CBS), but I am not sure if I do it
 correctly.
 I have two SNP6 datasets Tumor and HapMap270 and I want to use HpaMap270 as
 reference to go all the way to CBS step. So I do the following steps
 respectively.
ds1 - doCRMAv2(HapMap270, chipType=GenomeWideSNP_6,Full)
ds2 - doCRMAv2(Tumor, chipType=GenomeWideSNP_6,Full)
 After that, I do
   dataSet - HapMap270
   tags - ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
   chipType - GenomeWideSNP_6
   dsN - AromaUnitTotalCnBinarySet$byName(dataSet, tags=tags,
 chipType=chipType)
   dataSet - Tumor
   tags - ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
   chipType - GenomeWideSNP_6
   dsT - AromaUnitTotalCnBinarySet$byName(dataSet, tags=tags,
 chipType=chipType)
   dfR - getAverageFile(dsN)   # ?
   dsTR - exportTotalCnRatioSet(dsT, ref=dfR) # ?
 Would the above two steps work? My question is how to go ahead from here to
 do CBS and will sm - CbsModel(dsTR) work?

Skip the exportTotalCnRatioSet() call, and instead use:

sm - CbsModel(dsTR, ref=dfR, tags=HapMapRef);

The 'tags' is just to add an informative tag to the output data set.

/Henrik

 Thanks again for your help!
 Sean




 On Sun, Jul 28, 2013 at 4:28 PM, Henrik Bengtsson
 henrik.bengts...@aroma-project.org wrote:

 Hi.

 On Fri, Jul 26, 2013 at 8:02 AM, sean nj njs...@gmail.com wrote:
  Hi guys,
 
  I have a question regarding how to calculate raw copy numbers using
  common
  reference instead of average of all samples of the study. Basically I
  want
  to use average of HapMap270 samples as reference for all further copy
  number
  calculations.
 
  I have a bunch HapMap270 snp6 cel files and I followed Vignette:
  Estimation
  of total copy numbers using the CRMA v2 method (10K-CytoScanHD) to Step
  5 -
  Calculation of raw copy numbers, and generated ceR and saved it as a
  RData
  file ceR.Rdata.

 It's important to understand that almost all objects in the Aroma
 framework are basically pointers to external files.  For instance,
 your 'ceR', which I assume you've got from something like ceR -
 getAverageFile(ces), is referring to the file with pathname
 getPathname(ceR).  More below...

 
  My first question is, how to use this data for any future copy number
  analysis? My guess is that instead of calculating the ceR from the
  sample
  set I can just load the ceR.RData file I saved and use it. Right?

 First of all, please note that when do ceR - getAverageFile(ces) on
 the same data set 'ces', the result is already available on file and
 it will be quickly found and returned.  In other words, it will not
 recalcuate the averages again [unless you do ceR -
 getAverageFile(ces, force=TRUE)].

 However, I do understand that you may not want to have to keep a large
 'ces' data set around, when you're only interested in the pooled
 average.  In that case, I would copy the file containing the average
 to a new data set.  Currently, this is not straightforward in Aroma
 (I'll think about something), but you can do the following:

 # Calculate the pooled average
  ceR - getAverageFile(cesN);

 # Copy this file to plmData/HapMap270,pooled/GenomeWideSNP_6/, e.g.
  filename - getFilename(ceR);
  filename
 [1] .average-intensities-median-mad,d03faaf8b707a97c4e43381b1a5d1ef2.CEL
  rootPath - getParent(getPath(cesN), depth=2L);
  dataSet - HapMap270,pooled;
  chipType - getChipType(ceR, fullname=FALSE);
  path - file.path(rootPath, dataSet, chipType);
  path
 [1] plmData/HapMap270,pooled/GenomeWideSNP_6
  mkdirs(path);
  copyFile(getPathname(ceR), file.path(path, filename));

 With this done, you can then grab this pooled reference as:

  library(aroma.affymetrix)
  path - plmData/HapMap270,pooled/GenomeWideSNP_6;
  filename -
  .average-intensities-median-mad,d03faaf8b707a97c4e43381b1a5d1ef2.CEL;
  ceR - CnChipEffectFile(filename, path=path);

 Note, when you save 'ceR', you are basically saving the reference to
 the file.  Yes, you can load it later, but make sure not to move it,
 otherwise you'll get some type of file not found error.

  saveObject(ceR, HapMap270,GenomeWideSNP_6,reference.Rdata);

 If already saved, and file not moved, you can then do:

  library(aroma.affymetrix);
  ceR - loadObject(HapMap270,GenomeWideSNP_6,reference.Rdata);

 All this is very ad hoc (=non-aroma style), and as I said, I'll see if
 I can come up with a cleaner solution for storing and retrieving
 pooled averages.

 
  My second question is, how to go ahead from there to calculate the
  relative
  copy numbers for all unit from all samples? The two examples given in
  the
  Vignette  are for one unit from one sample and for a few unit on
  chromosome 2
  for one sample. What is the function to retrieve all units on all
  chromosomes instead of 

Re: [aroma.affymetrix] Raw copy numbers calculation using common reference instead of average of all samples

2013-07-29 Thread ying chen
Hi Henrik,
Thanks a lot for the help!
Sorry I have more questions. I am following How to: Calculate total copy
number ratios from total (non-polymorphic) signals and Vignette: Total
copy-number segmentation (non-paired CBS), but I am not sure if I do it
correctly.
I have two SNP6 datasets Tumor and HapMap270 and I want to use HpaMap270 as
reference to go all the way to CBS step. So I do the following steps
respectively.
   ds1 - doCRMAv2(HapMap270, chipType=GenomeWideSNP_6,Full)
   ds2 - doCRMAv2(Tumor, chipType=GenomeWideSNP_6,Full)
After that, I do
  dataSet - HapMap270
  tags - ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
  chipType - GenomeWideSNP_6
  dsN - AromaUnitTotalCnBinarySet$byName(dataSet, tags=tags,
chipType=chipType)
  dataSet - Tumor
  tags - ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
  chipType - GenomeWideSNP_6
  dsT - AromaUnitTotalCnBinarySet$byName(dataSet, tags=tags,
chipType=chipType)
  dfR - getAverageFile(dsN)   # ?
  dsTR - exportTotalCnRatioSet(dsT, ref=dfR) # ?
Would the above two steps work? My question is how to go ahead from here to
do CBS and will sm - CbsModel(dsTR) work?
Thanks again for your help!
Sean




On Sun, Jul 28, 2013 at 4:28 PM, Henrik Bengtsson 
henrik.bengts...@aroma-project.org wrote:

 Hi.

 On Fri, Jul 26, 2013 at 8:02 AM, sean nj njs...@gmail.com wrote:
  Hi guys,
 
  I have a question regarding how to calculate raw copy numbers using
 common
  reference instead of average of all samples of the study. Basically I
 want
  to use average of HapMap270 samples as reference for all further copy
 number
  calculations.
 
  I have a bunch HapMap270 snp6 cel files and I followed Vignette:
 Estimation
  of total copy numbers using the CRMA v2 method (10K-CytoScanHD) to Step
 5 -
  Calculation of raw copy numbers, and generated ceR and saved it as a
 RData
  file ceR.Rdata.

 It's important to understand that almost all objects in the Aroma
 framework are basically pointers to external files.  For instance,
 your 'ceR', which I assume you've got from something like ceR -
 getAverageFile(ces), is referring to the file with pathname
 getPathname(ceR).  More below...

 
  My first question is, how to use this data for any future copy number
  analysis? My guess is that instead of calculating the ceR from the sample
  set I can just load the ceR.RData file I saved and use it. Right?

 First of all, please note that when do ceR - getAverageFile(ces) on
 the same data set 'ces', the result is already available on file and
 it will be quickly found and returned.  In other words, it will not
 recalcuate the averages again [unless you do ceR -
 getAverageFile(ces, force=TRUE)].

 However, I do understand that you may not want to have to keep a large
 'ces' data set around, when you're only interested in the pooled
 average.  In that case, I would copy the file containing the average
 to a new data set.  Currently, this is not straightforward in Aroma
 (I'll think about something), but you can do the following:

 # Calculate the pooled average
  ceR - getAverageFile(cesN);

 # Copy this file to plmData/HapMap270,pooled/GenomeWideSNP_6/, e.g.
  filename - getFilename(ceR);
  filename
 [1] .average-intensities-median-mad,d03faaf8b707a97c4e43381b1a5d1ef2.CEL
  rootPath - getParent(getPath(cesN), depth=2L);
  dataSet - HapMap270,pooled;
  chipType - getChipType(ceR, fullname=FALSE);
  path - file.path(rootPath, dataSet, chipType);
  path
 [1] plmData/HapMap270,pooled/GenomeWideSNP_6
  mkdirs(path);
  copyFile(getPathname(ceR), file.path(path, filename));

 With this done, you can then grab this pooled reference as:

  library(aroma.affymetrix)
  path - plmData/HapMap270,pooled/GenomeWideSNP_6;
  filename -
 .average-intensities-median-mad,d03faaf8b707a97c4e43381b1a5d1ef2.CEL;
  ceR - CnChipEffectFile(filename, path=path);

 Note, when you save 'ceR', you are basically saving the reference to
 the file.  Yes, you can load it later, but make sure not to move it,
 otherwise you'll get some type of file not found error.

  saveObject(ceR, HapMap270,GenomeWideSNP_6,reference.Rdata);

 If already saved, and file not moved, you can then do:

  library(aroma.affymetrix);
  ceR - loadObject(HapMap270,GenomeWideSNP_6,reference.Rdata);

 All this is very ad hoc (=non-aroma style), and as I said, I'll see if
 I can come up with a cleaner solution for storing and retrieving
 pooled averages.

 
  My second question is, how to go ahead from there to calculate the
 relative
  copy numbers for all unit from all samples? The two examples given in the
  Vignette  are for one unit from one sample and for a few unit on
 chromosome 2
  for one sample. What is the function to retrieve all units on all
  chromosomes instead of units - getUnitsOnChromosome(gi, chromosome=2,
  region=c(81,86)*1e6)?

 You can set 'units' to NULL to retrieve all loci, i.e. no need to use
 getUnitsOnChromosome().  FYI, units - NULL will give the same data as
 with units - 1:nbrOfUnits(gi).

  And what is the function to retrieve all samples

Re: [aroma.affymetrix] Raw copy numbers calculation using common reference instead of average of all samples

2013-07-28 Thread Henrik Bengtsson
Hi.

On Fri, Jul 26, 2013 at 8:02 AM, sean nj njs...@gmail.com wrote:
 Hi guys,

 I have a question regarding how to calculate raw copy numbers using common
 reference instead of average of all samples of the study. Basically I want
 to use average of HapMap270 samples as reference for all further copy number
 calculations.

 I have a bunch HapMap270 snp6 cel files and I followed Vignette: Estimation
 of total copy numbers using the CRMA v2 method (10K-CytoScanHD) to Step 5 -
 Calculation of raw copy numbers, and generated ceR and saved it as a RData
 file ceR.Rdata.

It's important to understand that almost all objects in the Aroma
framework are basically pointers to external files.  For instance,
your 'ceR', which I assume you've got from something like ceR -
getAverageFile(ces), is referring to the file with pathname
getPathname(ceR).  More below...


 My first question is, how to use this data for any future copy number
 analysis? My guess is that instead of calculating the ceR from the sample
 set I can just load the ceR.RData file I saved and use it. Right?

First of all, please note that when do ceR - getAverageFile(ces) on
the same data set 'ces', the result is already available on file and
it will be quickly found and returned.  In other words, it will not
recalcuate the averages again [unless you do ceR -
getAverageFile(ces, force=TRUE)].

However, I do understand that you may not want to have to keep a large
'ces' data set around, when you're only interested in the pooled
average.  In that case, I would copy the file containing the average
to a new data set.  Currently, this is not straightforward in Aroma
(I'll think about something), but you can do the following:

# Calculate the pooled average
 ceR - getAverageFile(cesN);

# Copy this file to plmData/HapMap270,pooled/GenomeWideSNP_6/, e.g.
 filename - getFilename(ceR);
 filename
[1] .average-intensities-median-mad,d03faaf8b707a97c4e43381b1a5d1ef2.CEL
 rootPath - getParent(getPath(cesN), depth=2L);
 dataSet - HapMap270,pooled;
 chipType - getChipType(ceR, fullname=FALSE);
 path - file.path(rootPath, dataSet, chipType);
 path
[1] plmData/HapMap270,pooled/GenomeWideSNP_6
 mkdirs(path);
 copyFile(getPathname(ceR), file.path(path, filename));

With this done, you can then grab this pooled reference as:

 library(aroma.affymetrix)
 path - plmData/HapMap270,pooled/GenomeWideSNP_6;
 filename - 
 .average-intensities-median-mad,d03faaf8b707a97c4e43381b1a5d1ef2.CEL;
 ceR - CnChipEffectFile(filename, path=path);

Note, when you save 'ceR', you are basically saving the reference to
the file.  Yes, you can load it later, but make sure not to move it,
otherwise you'll get some type of file not found error.

 saveObject(ceR, HapMap270,GenomeWideSNP_6,reference.Rdata);

If already saved, and file not moved, you can then do:

 library(aroma.affymetrix);
 ceR - loadObject(HapMap270,GenomeWideSNP_6,reference.Rdata);

All this is very ad hoc (=non-aroma style), and as I said, I'll see if
I can come up with a cleaner solution for storing and retrieving
pooled averages.


 My second question is, how to go ahead from there to calculate the relative
 copy numbers for all unit from all samples? The two examples given in the
 Vignette  are for one unit from one sample and for a few unit on chromosome 2
 for one sample. What is the function to retrieve all units on all
 chromosomes instead of units - getUnitsOnChromosome(gi, chromosome=2,
 region=c(81,86)*1e6)?

You can set 'units' to NULL to retrieve all loci, i.e. no need to use
getUnitsOnChromosome().  FYI, units - NULL will give the same data as
with units - 1:nbrOfUnits(gi).

 And what is the function to retrieve all samples
 instead of ce - getFile(cesN, indexOf(cesN, NA06985))?

Hmm... not clear what you mean.  All samples are in 'cesN', and you do
need to iterate over them somehow.  Is this what you're looking for?

for (ii in seq_along(cesN)) {
  ce - getFile(cesN, ii)
  ...
}

Or are you asking how to extract the data from all samples?  Then you can do:

theta - extractTheta(cesN, units=units)

but be careful because that loads a lot of data into memory.

Hope this helps,

Henrik


 Thanks a lot for the help,

 Sean

 --
 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest
 version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/

 ---
 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to aroma-affymetrix+unsubscr...@googlegroups.com.
 For more options, visit