subject:"RE\: \[aroma.affymetrix\] computing firma scores on a subset of CEL files"

RE: [aroma.affymetrix] computing firma scores on a subset of CEL files

2011-09-21 Thread Tarca, Adi

Thanks Henrik,

I have added a tag at the background processing phase and all seems ok. I use 
something like:

library(aroma.affymetrix)
verbose <- Arguments$getVerbose(-8, timestamp=TRUE)
chipType <- "MoEx-1_0-st-v1"
cdf <- AffymetrixCdfFile$byChipType(chipType, tags="coreR1,A20080718,MR")
cs <- AffymetrixCelSet$byName("mice2010", cdf=cdf)
cs=extract(cs,1:38)

bc <- RmaBackgroundCorrection(cs, tag="my1to38,coreR1,A20080718,MR")
csBC <- process(bc,verbose=verbose)
qn <- QuantileNormalization(csBC, typesToUpdate="pm")
csN <- process(qn, verbose=verbose)
plmTr <- ExonRmaPlm(csN, mergeGroups=TRUE)
fit(plmTr, verbose=verbose)
firma <- FirmaModel(plmTr)
fit(firma, verbose=verbose)
fs <- getFirmaScores(firma)

regards,
Adi 

-Original Message-
From: aroma-affymetrix@googlegroups.com 
[mailto:aroma-affymetrix@googlegroups.com] On Behalf Of Henrik Bengtsson
Sent: Wednesday, September 21, 2011 2:00 PM
To: aroma-affymetrix@googlegroups.com
Subject: Re: [aroma.affymetrix] computing firma scores on a subset of CEL files

Hi.

On Wed, Sep 21, 2011 at 9:04 AM, Tarca, Adi  wrote:
>
> Hi all,
> I am processing some affymetrix exon arrays and trying to compute FIRMA 
> scores. I did this for all CEL files and now I just want to exclude some CEL 
> files from the analysis without removing them from the rawData folder 
> (rawData/mice2010/MoEx-1_0-st-v1/). I use the extract method to select 38 of 
> the 107 files CEL files.
>
> ...
> cs <- AffymetrixCelSet$byName("mice2010", cdf=cdf)
> cs=extract(cs,1:38)
> 
>
> In the message I get from running the code I can see a line saying " Reading 
> residuals unit by unit for 107 arrays...done" and I was wondering if this 
> means that all 107 files were used in fitting the firma model or not.

The problem is that the Aroma framwork cannot tell the difference.
You need to add tags that reflects that you are subsetting to your
analysis pipeline.  If not, the framework will in each step find your
old files and think you've already processed them.

It is not clear from you message, but I assume you basically want to
do the same analysis as if you only had those 38 CEL files.   Assuming
you're following the outline of vignette 'Human exon array analysis'
[http://aroma-project.org/node/37], you could introduce a new tag by:

cs <- AffymetrixCelSet$byName("mice2010", cdf=cdf);
cs <- extract(cs, 1:38);

bc <- RmaBackgroundCorrection(cs, tags="*,1-38,coreR2")
csBC <- process(bc, verbose=verbose);

Note that this will redo the background correction for those 38 CEL
files.  You then have to repeat all the following steps.  All
following steps will now have this "1-38" tag added such that the
generated data directories are unique.  If you come up with a
different set of 38 CEL files, you need to use a unique tag for that
set.

ADVANCED:
Now, the RmaBackgroundCorrection method is actually done for each
sample independently, that is, its output does not depend on the other
samples or how many they are.   This means that you will get the
background-corrected signals for those 38 CELs will be the same as
when they where done in the whole lot.  To save yourself some time,
you can therefore subset after this step, i.e.

cs <- AffymetrixCelSet$byName("mice2010", cdf=cdf)
bc <- RmaBackgroundCorrection(cs, tags="*,coreR2")
csBC <- process(bc, verbose=verbose);
csBC <- extract(csBC, 1:38);
qn <- QuantileNormalization(csBC, typesToUpdate="pm", tags="*,1-38")
print(qn)
csN <- process(qn, verbose=verbose)

You have to subset before QN, because it is a method that dependend on
the pool of the arrays going in.

Hope this helps

Henrik

PS. Next time, please show your complete analysis script - helps
showing what to do.

> Thanks,
> Adi
>
>
> .
> 20110921 10:13:29| Array #37 ('102_R860')...done
> 20110921 10:13:29| Array #38 ('107_R955')...
> 20110921 10:13:29|  Pathname: 
> plmData/mice2010,coreR1,A20080718,MR,QN,RMA,merged/MoEx-1_0-st-v1/107_R955,residuals.CEL
> 20110921 10:13:29|  Already calculated.
> 20110921 10:13:29| Array #38 ('107_R955')...done
>           used (Mb) gc trigger (Mb) max used (Mb)
>  Ncells  451249 24.1     818163 43.7   818163 43.7
>  Vcells 1233293  9.5    2371218 18.1  2104276 16.1
> 20110921 10:13:34|Calculating PLM residuals...done
> 20110921 10:13:34|Fitting model of class FirmaModel...
>  FirmaModel:
>  Data set: mice2010
>  Chip type: MoEx-1_0-st-v1,coreR1,A20080718,MR
>  Input tags: coreR1,A20080718,MR,QN,RMA,merged
>  Output tags: coreR1,A20080718,MR,QN,RMA,merged,FIRMA,medres
>  Parameters: (probeModel: chr "pm"; shift: num 0).
>  Path: 
> firmaData/mice2010,coreR1,A200

Re: [aroma.affymetrix] computing firma scores on a subset of CEL files

2011-09-21 Thread Henrik Bengtsson

Hi.

On Wed, Sep 21, 2011 at 9:04 AM, Tarca, Adi  wrote:
>
> Hi all,
> I am processing some affymetrix exon arrays and trying to compute FIRMA 
> scores. I did this for all CEL files and now I just want to exclude some CEL 
> files from the analysis without removing them from the rawData folder 
> (rawData/mice2010/MoEx-1_0-st-v1/). I use the extract method to select 38 of 
> the 107 files CEL files.
>
> ...
> cs <- AffymetrixCelSet$byName("mice2010", cdf=cdf)
> cs=extract(cs,1:38)
> 
>
> In the message I get from running the code I can see a line saying " Reading 
> residuals unit by unit for 107 arrays...done" and I was wondering if this 
> means that all 107 files were used in fitting the firma model or not.

The problem is that the Aroma framwork cannot tell the difference.
You need to add tags that reflects that you are subsetting to your
analysis pipeline.  If not, the framework will in each step find your
old files and think you've already processed them.

It is not clear from you message, but I assume you basically want to
do the same analysis as if you only had those 38 CEL files.   Assuming
you're following the outline of vignette 'Human exon array analysis'
[http://aroma-project.org/node/37], you could introduce a new tag by:

cs <- AffymetrixCelSet$byName("mice2010", cdf=cdf);
cs <- extract(cs, 1:38);

bc <- RmaBackgroundCorrection(cs, tags="*,1-38,coreR2")
csBC <- process(bc, verbose=verbose);

Note that this will redo the background correction for those 38 CEL
files.  You then have to repeat all the following steps.  All
following steps will now have this "1-38" tag added such that the
generated data directories are unique.  If you come up with a
different set of 38 CEL files, you need to use a unique tag for that
set.

ADVANCED:
Now, the RmaBackgroundCorrection method is actually done for each
sample independently, that is, its output does not depend on the other
samples or how many they are.   This means that you will get the
background-corrected signals for those 38 CELs will be the same as
when they where done in the whole lot.  To save yourself some time,
you can therefore subset after this step, i.e.

cs <- AffymetrixCelSet$byName("mice2010", cdf=cdf)
bc <- RmaBackgroundCorrection(cs, tags="*,coreR2")
csBC <- process(bc, verbose=verbose);
csBC <- extract(csBC, 1:38);
qn <- QuantileNormalization(csBC, typesToUpdate="pm", tags="*,1-38")
print(qn)
csN <- process(qn, verbose=verbose)

You have to subset before QN, because it is a method that dependend on
the pool of the arrays going in.

Hope this helps

Henrik

PS. Next time, please show your complete analysis script - helps
showing what to do.

> Thanks,
> Adi
>
>
> .
> 20110921 10:13:29| Array #37 ('102_R860')...done
> 20110921 10:13:29| Array #38 ('107_R955')...
> 20110921 10:13:29|  Pathname: 
> plmData/mice2010,coreR1,A20080718,MR,QN,RMA,merged/MoEx-1_0-st-v1/107_R955,residuals.CEL
> 20110921 10:13:29|  Already calculated.
> 20110921 10:13:29| Array #38 ('107_R955')...done
>           used (Mb) gc trigger (Mb) max used (Mb)
>  Ncells  451249 24.1     818163 43.7   818163 43.7
>  Vcells 1233293  9.5    2371218 18.1  2104276 16.1
> 20110921 10:13:34|Calculating PLM residuals...done
> 20110921 10:13:34|Fitting model of class FirmaModel...
>  FirmaModel:
>  Data set: mice2010
>  Chip type: MoEx-1_0-st-v1,coreR1,A20080718,MR
>  Input tags: coreR1,A20080718,MR,QN,RMA,merged
>  Output tags: coreR1,A20080718,MR,QN,RMA,merged,FIRMA,medres
>  Parameters: (probeModel: chr "pm"; shift: num 0).
>  Path: 
> firmaData/mice2010,coreR1,A20080718,MR,QN,RMA,merged,FIRMA,medres/MoEx-1_0-st-v1
>  RAM: 0.00MB
> 20110921 10:13:34| Identifying non-estimated units...
> 20110921 10:13:37|  Identifying non-assigned units in FIRMA file...
> 20110921 10:13:37|   Pathname: 
> firmaData/mice2010,coreR1,A20080718,MR,QN,RMA,merged,FIRMA,medres/MoEx-1_0-st-v1/009_R884,FIRMAscores.CEL
> 20110921 10:13:37|  Identifying non-assigned units in FIRMA file...done
> 20110921 10:13:37|  Reading data for these 17831 units...
> 20110921 10:14:24|   Looking for pixels == 0 indicating non-assigned units:
>    int [1:17831] 1 2 3 4 5 6 7 8 9 10 ...
> 
>
>  10:16:31|    readCelUnits() of AffymetrixCelSet...done
> 20110921 10:16:31|   Calling readUnits() in superclass...done
> 20110921 10:16:35|  Reading residuals unit by unit for 107 arrays...done
> 20110921 10:16:35|  Calculating FIRMA scores...
>
>
> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
> version of the package, 2) to report the output of sessionInfo() and 
> traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups 
> "aroma.affymetrix" group with website http://www.aroma-project.org/.
> To post to this group, send email to aroma-affymetrix@googlegroups.com
> To unsubscribe and other options, go to http://www.aroma-project.org/forum/
>

-- 
When reporting problems on aroma.affymetrix, mak

RE: [aroma.affymetrix] computing firma scores on a subset of CEL files

Re: [aroma.affymetrix] computing firma scores on a subset of CEL files

2 matches

Site Navigation

Mail list logo

Footer information