[aroma.affymetrix] extract LRR and BAF from ACNE results

2015-05-29 Thread hongen xu
Dear Aroma.affymetrix teem,

I followed the vignette Allele-specific copy numbers using non-negative 
matrix factorization. I want to extract LRR and BAF to feed into another 
software ASCAT http://heim.ifi.uio.no/bioinf/Projects/ASCAT/ to correct 
aneuploidy and normal contamination. 

From the vignette, I can use extractTotalAndFreqB to extract total CN and 
BAF. My question is how can I get LRR from Total CN. I went through this 
thread  creat binary data files containing BAF data 
https://groups.google.com/forum/#!searchin/aroma-affymetrix/kai/aroma-affymetrix/1Qcj_nxiQvc/ViKo4wnWcvwJ,
 
but get confused.

quote from creat binary data files containing BAF data 
https://groups.google.com/forum/#%21searchin/aroma-affymetrix/kai/aroma-affymetrix/1Qcj_nxiQvc/ViKo4wnWcvwJ

The take-home message is that
the definition of log2 ratios has become de facto standard in our
field, where as CN is a bit ambiguous, and can mean CN ratio
(C=theta/thetaR, or C=2*theta/thetaR) as well as CN intensity
(theta).  By adding tags we try to make this less ambiguous.

How total CN was defined in aroma.affymetrix?


Best regards,
Hongen








my code 


library(aroma.affymetrix)
library(ACNE)


log - verbose - Arguments$getVerbose(-8, timestamp=TRUE)
# Don't display too many decimals.
options(digits=4)


#Verifying annotation data files
cdf - AffymetrixCdfFile$byChipType(CytoScanHD_Array);
print(cdf)
gi - getGenomeInformation(cdf)
print(gi)
si - getSnpInformation(cdf)
print(si)
acs - AromaCellSequenceFile$byChipType(getChipType(cdf, fullname=FALSE))
print(acs)

##raw data sets
csR- AffymetrixCelSet$byName(smida, cdf=cdf);

###step 1 Calibration for crosstalk between allele probe pairs
acc - AllelicCrosstalkCalibration(csR, model=CRMAv2)

print(acc)

csC - process(acc, verbose=verbose)
print(csC)


##Step 2 - Normalization for nucleotide-position probe sequence effects
bpn - BasePositionNormalization(csC, target=zero)
csN - process(bpn, verbose=verbose)
cs - csN



#step 3
#Probe summarization using non-negative-matrix factorization (NMF)

plm - NmfSnpPlm(csN, mergeStrands=TRUE)
print(plm)

if (length(findUnitsTodo(plm))  0) {
   # Fit CN probes quickly (~5-10s/array + some overhead)
  units - fitCnProbes(plm, verbose=verbose)
  str(units)
  # int [1:945826] 935590 935591 935592 935593 935594 935595 ...

  # Fit remaining units, i.e. SNPs (~5-10min/array)
  units - fit(plm, verbose=verbose)
  str(units)
}

ces - getChipEffectSet(plm)
print(ces)



my session information:
##
 sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 20 (Heisenbug)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
 [1] ACNE_0.8.0  aroma.light_2.2.1   aroma.affymetrix_2.13.2
 [4] aroma.core_2.13.1   R.devices_2.13.0R.filesets_2.7.2   
 [7] R.utils_2.0.2   R.oo_1.19.0 affxparser_1.38.0  
[10] R.methodsS3_1.7.0  

loaded via a namespace (and not attached):
 [1] aroma.apd_0.6.0base64enc_0.1-2digest_0.6.8   
DNAcopy_1.40.0
 [5] MASS_7.3-40matrixStats_0.14.0 PSCBS_0.44.0   
R.cache_0.11.0
 [9] R.huge_0.9.0   R.rsp_0.20.0   tools_3.1.3   



-- 
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--- 
You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to aroma-affymetrix+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [aroma.affymetrix] extract LRR and BAF from ACNE results

2015-05-29 Thread Henrik Bengtsson
So, there are few things here:

1. In the Aroma Framework, we always try to store estimates/signals on
the original scale, e.g. even if, say, chip-effect estimates are
original on the log scale, we unlog those before saving.  That is a
policy with the rationale that all data can always be represented on
the original scale but not always on, say, the log scale.  An
example of the latter is when a probe signal ends up being slightly
negative after calibration and normalization.  If that would have been
stored on the log-scale we would have lost the information, i.e. it
would be stored as NaN.  Same problem with a signal that is exactly
zero (-Inf on the log scale).  Why bother?  For instance, say you end
up with replicated probe signals for a single SNP where they all
fluctuate around zero.  After averaging, the average signal may end up
non-negative. If we stored on the log scale the averaging would have
only been on the signals that were positive (because we lost the
non-positive one).  So, that's that rational.

2. For methods estimating CNs it depends on their approach/method/design:

(a) For instance, the CRMA v1 and CRMA v2 methods (ours; happens to be
implemented in the Aroma Framework, but should never be referred to as
the Aroma CN method, aroma.affymetrix CN method or similar), we
end up with CN signals that have not yet been standardized relative
to a reference.  In other words, those signals are all over the
place, because each of them are scaled with an unknown scale factors.
  It is only when you have a reference when you can create CN
ratios, e.g. C = tumor / matched normal or C = sample / pooled
average, etc.   Comment: For CRMA v2, this is not that surprising,
because that method was specifically designed to work for each array
independently.  That means you get CN signals out from a single
array, but then you need to compare them to a reference.

(b) The ACNE method (however), calculates CN ratios for you.  This is
by design of the model, simply because how the NMF factorization model
works, cf. Eqns (10)-(14) in
http://bioinformatics.oxfordjournals.org/content/26/15/1827.full.
Comment: The ACNE method is a multi-array method requiring many arrays
to work and borrows information across arrays (e.g. estimate SNP
effects).  Since this information is already done, it can equally well
return CN ratios for you.  Having said this, there is nothing
preventing you from taking the ratios relative to another reference
(ratio), e.g. effectively C = (T / R) / (R / R*) = T / R.

So, in the ACNE vignette
[http://www.aroma-project.org/vignettes/ACNE/] that you're following
there is, near the end:

cf - ces[[1]]
data - extractTotalAndFreqB(cf, units=units)
CT - data[,total]

Since you used ACNE to estimate the CNs, here 'CT' is indeed a CN
ratios.  The reason why we still refer to it as a total signal, is
because that is the most generic name we could come up with for the
underlying file format.

Hope this clarifies


/Henrik

On Fri, May 29, 2015 at 6:52 AM, hongen xu hongen.hugo...@gmail.com wrote:
 Dear Aroma.affymetrix teem,

 I followed the vignette Allele-specific copy numbers using non-negative
 matrix factorization. I want to extract LRR and BAF to feed into another
 software ASCAT to correct aneuploidy and normal contamination.

 From the vignette, I can use extractTotalAndFreqB to extract total CN and
 BAF. My question is how can I get LRR from Total CN. I went through this
 thread  creat binary data files containing BAF data, but get confused.

 quote from creat binary data files containing BAF data
 The take-home message is that
 the definition of log2 ratios has become de facto standard in our
 field, where as CN is a bit ambiguous, and can mean CN ratio
 (C=theta/thetaR, or C=2*theta/thetaR) as well as CN intensity
 (theta).  By adding tags we try to make this less ambiguous.

 How total CN was defined in aroma.affymetrix?


 Best regards,
 Hongen








 my code

 
 library(aroma.affymetrix)
 library(ACNE)


 log - verbose - Arguments$getVerbose(-8, timestamp=TRUE)
 # Don't display too many decimals.
 options(digits=4)


 #Verifying annotation data files
 cdf - AffymetrixCdfFile$byChipType(CytoScanHD_Array);
 print(cdf)
 gi - getGenomeInformation(cdf)
 print(gi)
 si - getSnpInformation(cdf)
 print(si)
 acs - AromaCellSequenceFile$byChipType(getChipType(cdf, fullname=FALSE))
 print(acs)

 ##raw data sets
 csR- AffymetrixCelSet$byName(smida, cdf=cdf);

 ###step 1 Calibration for crosstalk between allele probe pairs
 acc - AllelicCrosstalkCalibration(csR, model=CRMAv2)

 print(acc)

 csC - process(acc, verbose=verbose)
 print(csC)


 ##Step 2 - Normalization for nucleotide-position probe sequence effects
 bpn - BasePositionNormalization(csC, target=zero)
 csN - process(bpn, verbose=verbose)
 cs - csN



 #step 3
 #Probe summarization using non-negative-matrix factorization (NMF)

 plm - NmfSnpPlm(csN, mergeStrands=TRUE)
 print(plm)

 if (length(findUnitsTodo(plm))  0) {
# Fit