Re: [aroma.affymetrix] computing firma scores on a subset of CEL files

2011-09-21 Thread Henrik Bengtsson
Hi.

On Wed, Sep 21, 2011 at 9:04 AM, Tarca, Adi ata...@med.wayne.edu wrote:

 Hi all,
 I am processing some affymetrix exon arrays and trying to compute FIRMA 
 scores. I did this for all CEL files and now I just want to exclude some CEL 
 files from the analysis without removing them from the rawData folder 
 (rawData/mice2010/MoEx-1_0-st-v1/). I use the extract method to select 38 of 
 the 107 files CEL files.

 ...
 cs - AffymetrixCelSet$byName(mice2010, cdf=cdf)
 cs=extract(cs,1:38)
 

 In the message I get from running the code I can see a line saying  Reading 
 residuals unit by unit for 107 arrays...done and I was wondering if this 
 means that all 107 files were used in fitting the firma model or not.

The problem is that the Aroma framwork cannot tell the difference.
You need to add tags that reflects that you are subsetting to your
analysis pipeline.  If not, the framework will in each step find your
old files and think you've already processed them.

It is not clear from you message, but I assume you basically want to
do the same analysis as if you only had those 38 CEL files.   Assuming
you're following the outline of vignette 'Human exon array analysis'
[http://aroma-project.org/node/37], you could introduce a new tag by:

cs - AffymetrixCelSet$byName(mice2010, cdf=cdf);
cs - extract(cs, 1:38);

bc - RmaBackgroundCorrection(cs, tags=*,1-38,coreR2)
csBC - process(bc, verbose=verbose);

Note that this will redo the background correction for those 38 CEL
files.  You then have to repeat all the following steps.  All
following steps will now have this 1-38 tag added such that the
generated data directories are unique.  If you come up with a
different set of 38 CEL files, you need to use a unique tag for that
set.

ADVANCED:
Now, the RmaBackgroundCorrection method is actually done for each
sample independently, that is, its output does not depend on the other
samples or how many they are.   This means that you will get the
background-corrected signals for those 38 CELs will be the same as
when they where done in the whole lot.  To save yourself some time,
you can therefore subset after this step, i.e.

cs - AffymetrixCelSet$byName(mice2010, cdf=cdf)
bc - RmaBackgroundCorrection(cs, tags=*,coreR2)
csBC - process(bc, verbose=verbose);
csBC - extract(csBC, 1:38);
qn - QuantileNormalization(csBC, typesToUpdate=pm, tags=*,1-38)
print(qn)
csN - process(qn, verbose=verbose)

You have to subset before QN, because it is a method that dependend on
the pool of the arrays going in.

Hope this helps

Henrik

PS. Next time, please show your complete analysis script - helps
showing what to do.

 Thanks,
 Adi


 .
 20110921 10:13:29| Array #37 ('102_R860')...done
 20110921 10:13:29| Array #38 ('107_R955')...
 20110921 10:13:29|  Pathname: 
 plmData/mice2010,coreR1,A20080718,MR,QN,RMA,merged/MoEx-1_0-st-v1/107_R955,residuals.CEL
 20110921 10:13:29|  Already calculated.
 20110921 10:13:29| Array #38 ('107_R955')...done
           used (Mb) gc trigger (Mb) max used (Mb)
  Ncells  451249 24.1     818163 43.7   818163 43.7
  Vcells 1233293  9.5    2371218 18.1  2104276 16.1
 20110921 10:13:34|Calculating PLM residuals...done
 20110921 10:13:34|Fitting model of class FirmaModel...
  FirmaModel:
  Data set: mice2010
  Chip type: MoEx-1_0-st-v1,coreR1,A20080718,MR
  Input tags: coreR1,A20080718,MR,QN,RMA,merged
  Output tags: coreR1,A20080718,MR,QN,RMA,merged,FIRMA,medres
  Parameters: (probeModel: chr pm; shift: num 0).
  Path: 
 firmaData/mice2010,coreR1,A20080718,MR,QN,RMA,merged,FIRMA,medres/MoEx-1_0-st-v1
  RAM: 0.00MB
 20110921 10:13:34| Identifying non-estimated units...
 20110921 10:13:37|  Identifying non-assigned units in FIRMA file...
 20110921 10:13:37|   Pathname: 
 firmaData/mice2010,coreR1,A20080718,MR,QN,RMA,merged,FIRMA,medres/MoEx-1_0-st-v1/009_R884,FIRMAscores.CEL
 20110921 10:13:37|  Identifying non-assigned units in FIRMA file...done
 20110921 10:13:37|  Reading data for these 17831 units...
 20110921 10:14:24|   Looking for pixels == 0 indicating non-assigned units:
    int [1:17831] 1 2 3 4 5 6 7 8 9 10 ...
 

  10:16:31|    readCelUnits() of AffymetrixCelSet...done
 20110921 10:16:31|   Calling readUnits() in superclass...done
 20110921 10:16:35|  Reading residuals unit by unit for 107 arrays...done
 20110921 10:16:35|  Calculating FIRMA scores...


 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of 

RE: [aroma.affymetrix] computing firma scores on a subset of CEL files

2011-09-21 Thread Tarca, Adi

Thanks Henrik,

I have added a tag at the background processing phase and all seems ok. I use 
something like:

library(aroma.affymetrix)
verbose - Arguments$getVerbose(-8, timestamp=TRUE)
chipType - MoEx-1_0-st-v1
cdf - AffymetrixCdfFile$byChipType(chipType, tags=coreR1,A20080718,MR)
cs - AffymetrixCelSet$byName(mice2010, cdf=cdf)
cs=extract(cs,1:38)

bc - RmaBackgroundCorrection(cs, tag=my1to38,coreR1,A20080718,MR)
csBC - process(bc,verbose=verbose)
qn - QuantileNormalization(csBC, typesToUpdate=pm)
csN - process(qn, verbose=verbose)
plmTr - ExonRmaPlm(csN, mergeGroups=TRUE)
fit(plmTr, verbose=verbose)
firma - FirmaModel(plmTr)
fit(firma, verbose=verbose)
fs - getFirmaScores(firma)


regards,
Adi 



-Original Message-
From: aroma-affymetrix@googlegroups.com 
[mailto:aroma-affymetrix@googlegroups.com] On Behalf Of Henrik Bengtsson
Sent: Wednesday, September 21, 2011 2:00 PM
To: aroma-affymetrix@googlegroups.com
Subject: Re: [aroma.affymetrix] computing firma scores on a subset of CEL files

Hi.

On Wed, Sep 21, 2011 at 9:04 AM, Tarca, Adi ata...@med.wayne.edu wrote:

 Hi all,
 I am processing some affymetrix exon arrays and trying to compute FIRMA 
 scores. I did this for all CEL files and now I just want to exclude some CEL 
 files from the analysis without removing them from the rawData folder 
 (rawData/mice2010/MoEx-1_0-st-v1/). I use the extract method to select 38 of 
 the 107 files CEL files.

 ...
 cs - AffymetrixCelSet$byName(mice2010, cdf=cdf)
 cs=extract(cs,1:38)
 

 In the message I get from running the code I can see a line saying  Reading 
 residuals unit by unit for 107 arrays...done and I was wondering if this 
 means that all 107 files were used in fitting the firma model or not.

The problem is that the Aroma framwork cannot tell the difference.
You need to add tags that reflects that you are subsetting to your
analysis pipeline.  If not, the framework will in each step find your
old files and think you've already processed them.

It is not clear from you message, but I assume you basically want to
do the same analysis as if you only had those 38 CEL files.   Assuming
you're following the outline of vignette 'Human exon array analysis'
[http://aroma-project.org/node/37], you could introduce a new tag by:

cs - AffymetrixCelSet$byName(mice2010, cdf=cdf);
cs - extract(cs, 1:38);

bc - RmaBackgroundCorrection(cs, tags=*,1-38,coreR2)
csBC - process(bc, verbose=verbose);

Note that this will redo the background correction for those 38 CEL
files.  You then have to repeat all the following steps.  All
following steps will now have this 1-38 tag added such that the
generated data directories are unique.  If you come up with a
different set of 38 CEL files, you need to use a unique tag for that
set.

ADVANCED:
Now, the RmaBackgroundCorrection method is actually done for each
sample independently, that is, its output does not depend on the other
samples or how many they are.   This means that you will get the
background-corrected signals for those 38 CELs will be the same as
when they where done in the whole lot.  To save yourself some time,
you can therefore subset after this step, i.e.

cs - AffymetrixCelSet$byName(mice2010, cdf=cdf)
bc - RmaBackgroundCorrection(cs, tags=*,coreR2)
csBC - process(bc, verbose=verbose);
csBC - extract(csBC, 1:38);
qn - QuantileNormalization(csBC, typesToUpdate=pm, tags=*,1-38)
print(qn)
csN - process(qn, verbose=verbose)

You have to subset before QN, because it is a method that dependend on
the pool of the arrays going in.

Hope this helps

Henrik

PS. Next time, please show your complete analysis script - helps
showing what to do.

 Thanks,
 Adi


 .
 20110921 10:13:29| Array #37 ('102_R860')...done
 20110921 10:13:29| Array #38 ('107_R955')...
 20110921 10:13:29|  Pathname: 
 plmData/mice2010,coreR1,A20080718,MR,QN,RMA,merged/MoEx-1_0-st-v1/107_R955,residuals.CEL
 20110921 10:13:29|  Already calculated.
 20110921 10:13:29| Array #38 ('107_R955')...done
           used (Mb) gc trigger (Mb) max used (Mb)
  Ncells  451249 24.1     818163 43.7   818163 43.7
  Vcells 1233293  9.5    2371218 18.1  2104276 16.1
 20110921 10:13:34|Calculating PLM residuals...done
 20110921 10:13:34|Fitting model of class FirmaModel...
  FirmaModel:
  Data set: mice2010
  Chip type: MoEx-1_0-st-v1,coreR1,A20080718,MR
  Input tags: coreR1,A20080718,MR,QN,RMA,merged
  Output tags: coreR1,A20080718,MR,QN,RMA,merged,FIRMA,medres
  Parameters: (probeModel: chr pm; shift: num 0).
  Path: 
 firmaData/mice2010,coreR1,A20080718,MR,QN,RMA,merged,FIRMA,medres/MoEx-1_0-st-v1
  RAM: 0.00MB
 20110921 10:13:34| Identifying non-estimated units...
 20110921 10:13:37|  Identifying non-assigned units in FIRMA file...
 20110921 10:13:37|   Pathname: 
 firmaData/mice2010,coreR1,A20080718,MR,QN,RMA,merged,FIRMA,medres/MoEx-1_0-st-v1/009_R884,FIRMAscores.CEL
 20110921 10:13:37|  Identifying non-assigned units in FIRMA file...done
 20110921 10:13:37|  Reading data for these 17831 units