Re: [aroma.affymetrix] Use Existing FIRMA model on new data.

2013-12-30 Thread Henrik Bengtsson
Hi.

On Thu, Dec 26, 2013 at 12:47 PM, Sam Danziger  wrote:
> I'm using FIRMA algorithm, and I would like to use an existing FIRMA model
> (for instance one trained on 50 arrays) on new data (e.g. 2 new arrays).
>
> I am able to train a FIRMA model (and an RMA model) quite easily using the
> doFIRMA (and doRMA) functions, which work quite well.  However, I am not
> sure how I could extract the existing models to calculate FIRMA scores for
> the new data.
>
> To be more specific, I understand that FIRMA has basically 5 steps:

Yes, as illustrated by
http://aroma-project.org/vignettes/FIRMA-HumanExonArrayAnalysis and if
you peek inside the top-level doFIRMA() method, you'll see the same.

>
> -RMA steps-
> 1) background correction

More precisely, this is doing what is called RMA-style background
correction, which is an array-by-array background correction method.
This means that it does not matter if you process arrays in batches or
independently - you will get the same results both ways.  In other
words, there is no "across-sample modelling" in this step, so this
needs to be done the same way as you do it now.  This step is
basically:

bc <- RmaBackgroundCorrection(cs, tags="*,coreR2")
csBC <- process(bc)


> 2) normalization

This step is running (rank-based) quantile normalization.  The only
thing that is really "across-sample modelling" in this step is how the
target distribution is calculated.  The default is to use the average
of the all arrays in the batch.  However, you can specify a
pre-calculated target distribution - the difference will be very
small, especially if you use arrays from the same lab:

# Use the average of a pre-determined data set as the target distribution.
# (Needs to be background corrected the same way)
csReference <- extract(csBC, 1:10) # Just an example
cfTarget <- getAverageFile(csReference)

qn <- QuantileNormalization(csBC, typesToUpdate="pm",
targetDistribution=cfTarget)
csN <- process(qn)

(FYI, the default is basically cfTarget <- getAverageFile(csBC)).


> 3) summarization

This step is doing:

plmTr <- ExonRmaPlm(csN, mergeGroups=TRUE)
print(plmTr)

where ExonRmaPlm is just a special version of the RmaPlm
summarization.  Now, to use prior RMA parameter estimates in this
step, all you need do is to pass them as:

plmTr <- ExonRmaPlm(csN, mergeGroups=TRUE,
listOfPriors=list(probeAffinities=pfPrior))
print(plmTr)

where 'pfPrior' is a ProbeAffinityFile from a prior ExonRmaPlm model fit, e.g.

# Faking another background and normalized data set.
csNprior <- extract(csN, 1:10)
plmTrPrior <- ExonRmaPlm(csNprior, mergeGroups=TRUE)
fit(plmTrPrior)
pfPrior <- getProbeAffinityFile(plmTrPrior)

Also, for an example using the RmaPlm model, have a look at test script:

pathname <- 
"testScripts/system/chipTypes/HG-U133_Plus_2/11.doRMA,PLM,withPriors.R"
pathname <- system.file(pathname, package="aroma.affymetrix")

So, instead of estimating the FIRMA summarization model parameters
from the batch being processed, you propose to use prior estimates
based on some predetermined set of arrays, which effectively makes the
ExonRmaPlm summarization step an array-by-array method, i.e. it does
not matter which other arrays you include in the data set.   Like any
other RMA summarization step, in order to be able to reuse parameter
estimates this way, the probe affinities in the prior data set needs
to be approximately the same as for the "new" arrays.  If that
assumption does not hold, the results will be unreliable.  If it
holds, you should get roughly the same results as if you process all
arrays in one big batch.

This step is where I'd expect the main speedup will be.


> -FIRMA steps-
> 4) calculate residuals from fitting the standard RMA model

This step will be the same as before.  Nothing to gain here, so:

rs <- calculateResidualSet(plmTr, verbose=verbose)

> 5) calculate the FIRMA score.

Same here:

firma <- FirmaModel(plmTr)
fit(firma, verbose=verbose)
fs <- getFirmaScores(firma)


Hope this helps

Henrik

>
> I know that I could copy all 52 arrays (i.e. the old and the new) into a
> single folder and rerun the RMA / FIRMA algorithms on all data.  However,
> this will become slower (and probably less necessary) as I get more and more
> data.  I understand based on the link below that Step 1 treats each array
> independently, and I imagine that I can use the models (steps 2-4)
> calculated from the 50 old experiments on the 2 new experiments.  However, I
> don't know how to do that.
>
> *
> https://groups.google.com/forum/#!searchin/aroma-affymetrix/FIRMA$20same$20model$20on$20new$20data/aroma-affymetrix/Ime1c6DjzBs/7n-vTdo3pCgJ
>
> Can you please help?
>
> Thank you,
> -Sam
>
> --
> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the latest
> version of the package, 2) to report the output of sessionInfo() and
> traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups
> "aroma.affymet

[aroma.affymetrix] Use Existing FIRMA model on new data.

2013-12-26 Thread Sam Danziger
I'm using FIRMA algorithm, and I would like to use an existing FIRMA model 
(for instance one trained on 50 arrays) on new data (e.g. 2 new arrays).

I am able to train a FIRMA model (and an RMA model) quite easily using the 
doFIRMA (and doRMA) functions, which work quite well.  However, I am not 
sure how I could extract the existing models to calculate FIRMA scores for 
the new data.

To be more specific, I understand that FIRMA has basically 5 steps:

-RMA steps-
1) background correction
2) normalization
3) summarization
-FIRMA steps-
4) calculate residuals from fitting the standard RMA model
5) calculate the FIRMA score.

I know that I could copy all 52 arrays (i.e. the old and the new) into a 
single folder and rerun the RMA / FIRMA algorithms on all data.  However, 
this will become slower (and probably less necessary) as I get more and 
more data.  I understand based on the link below that Step 1 treats each 
array independently, and I imagine that I can use the models (steps 2-4) 
calculated from the 50 old experiments on the 2 new experiments.  However, 
I don't know how to do that.

*  
https://groups.google.com/forum/#!searchin/aroma-affymetrix/FIRMA$20same$20model$20on$20new$20data/aroma-affymetrix/Ime1c6DjzBs/7n-vTdo3pCgJ

Can you please help?

Thank you,
-Sam

-- 
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--- 
You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to aroma-affymetrix+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.