So, there are few things here:
1. In the Aroma Framework, we always try to store estimates/signals on
the original scale, e.g. even if, say, chip-effect estimates are
original on the log scale, we unlog those before saving. That is a
policy with the rationale that all data can always be represented on
the original scale but not always on, say, the log scale. An
example of the latter is when a probe signal ends up being slightly
negative after calibration and normalization. If that would have been
stored on the log-scale we would have lost the information, i.e. it
would be stored as NaN. Same problem with a signal that is exactly
zero (-Inf on the log scale). Why bother? For instance, say you end
up with replicated probe signals for a single SNP where they all
fluctuate around zero. After averaging, the average signal may end up
non-negative. If we stored on the log scale the averaging would have
only been on the signals that were positive (because we lost the
non-positive one). So, that's that rational.
2. For methods estimating CNs it depends on their approach/method/design:
(a) For instance, the CRMA v1 and CRMA v2 methods (ours; happens to be
implemented in the Aroma Framework, but should never be referred to as
the Aroma CN method, aroma.affymetrix CN method or similar), we
end up with CN signals that have not yet been standardized relative
to a reference. In other words, those signals are all over the
place, because each of them are scaled with an unknown scale factors.
It is only when you have a reference when you can create CN
ratios, e.g. C = tumor / matched normal or C = sample / pooled
average, etc. Comment: For CRMA v2, this is not that surprising,
because that method was specifically designed to work for each array
independently. That means you get CN signals out from a single
array, but then you need to compare them to a reference.
(b) The ACNE method (however), calculates CN ratios for you. This is
by design of the model, simply because how the NMF factorization model
works, cf. Eqns (10)-(14) in
http://bioinformatics.oxfordjournals.org/content/26/15/1827.full.
Comment: The ACNE method is a multi-array method requiring many arrays
to work and borrows information across arrays (e.g. estimate SNP
effects). Since this information is already done, it can equally well
return CN ratios for you. Having said this, there is nothing
preventing you from taking the ratios relative to another reference
(ratio), e.g. effectively C = (T / R) / (R / R*) = T / R.
So, in the ACNE vignette
[http://www.aroma-project.org/vignettes/ACNE/] that you're following
there is, near the end:
cf - ces[[1]]
data - extractTotalAndFreqB(cf, units=units)
CT - data[,total]
Since you used ACNE to estimate the CNs, here 'CT' is indeed a CN
ratios. The reason why we still refer to it as a total signal, is
because that is the most generic name we could come up with for the
underlying file format.
Hope this clarifies
/Henrik
On Fri, May 29, 2015 at 6:52 AM, hongen xu hongen.hugo...@gmail.com wrote:
Dear Aroma.affymetrix teem,
I followed the vignette Allele-specific copy numbers using non-negative
matrix factorization. I want to extract LRR and BAF to feed into another
software ASCAT to correct aneuploidy and normal contamination.
From the vignette, I can use extractTotalAndFreqB to extract total CN and
BAF. My question is how can I get LRR from Total CN. I went through this
thread creat binary data files containing BAF data, but get confused.
quote from creat binary data files containing BAF data
The take-home message is that
the definition of log2 ratios has become de facto standard in our
field, where as CN is a bit ambiguous, and can mean CN ratio
(C=theta/thetaR, or C=2*theta/thetaR) as well as CN intensity
(theta). By adding tags we try to make this less ambiguous.
How total CN was defined in aroma.affymetrix?
Best regards,
Hongen
my code
library(aroma.affymetrix)
library(ACNE)
log - verbose - Arguments$getVerbose(-8, timestamp=TRUE)
# Don't display too many decimals.
options(digits=4)
#Verifying annotation data files
cdf - AffymetrixCdfFile$byChipType(CytoScanHD_Array);
print(cdf)
gi - getGenomeInformation(cdf)
print(gi)
si - getSnpInformation(cdf)
print(si)
acs - AromaCellSequenceFile$byChipType(getChipType(cdf, fullname=FALSE))
print(acs)
##raw data sets
csR- AffymetrixCelSet$byName(smida, cdf=cdf);
###step 1 Calibration for crosstalk between allele probe pairs
acc - AllelicCrosstalkCalibration(csR, model=CRMAv2)
print(acc)
csC - process(acc, verbose=verbose)
print(csC)
##Step 2 - Normalization for nucleotide-position probe sequence effects
bpn - BasePositionNormalization(csC, target=zero)
csN - process(bpn, verbose=verbose)
cs - csN
#step 3
#Probe summarization using non-negative-matrix factorization (NMF)
plm - NmfSnpPlm(csN, mergeStrands=TRUE)
print(plm)
if (length(findUnitsTodo(plm)) 0) {
# Fit