[aroma.affymetrix] extract LRR and BAF from ACNE results

2015-05-29 Thread hongen xu
Dear Aroma.affymetrix teem,

I followed the vignette "Allele-specific copy numbers using non-negative 
matrix factorization". I want to extract LRR and BAF to feed into another 
software "ASCAT " to correct 
aneuploidy and normal contamination. 

>From the vignette, I can use extractTotalAndFreqB to extract total CN and 
BAF. My question is how can I get LRR from Total CN. I went through this 
thread  "creat binary data files containing BAF data 
",
 
but get confused.

quote from "creat binary data files containing BAF data 

"
"The take-home message is that
the definition of "log2 ratios" has become de facto standard in our
field, where as "CN" is a bit ambiguous, and can mean "CN ratio"
(C=theta/thetaR, or C=2*theta/thetaR) as well as "CN intensity"
(theta).  By adding tags we try to make this less ambiguous."

How total CN was defined in aroma.affymetrix?


Best regards,
Hongen








my code 


library("aroma.affymetrix")
library("ACNE")


log <- verbose <- Arguments$getVerbose(-8, timestamp=TRUE)
# Don't display too many decimals.
options(digits=4)


#Verifying annotation data files
cdf <- AffymetrixCdfFile$byChipType("CytoScanHD_Array");
print(cdf)
gi <- getGenomeInformation(cdf)
print(gi)
si <- getSnpInformation(cdf)
print(si)
acs <- AromaCellSequenceFile$byChipType(getChipType(cdf, fullname=FALSE))
print(acs)

##raw data sets
csR<- AffymetrixCelSet$byName("smida", cdf=cdf);

###step 1 Calibration for crosstalk between allele probe pairs
acc <- AllelicCrosstalkCalibration(csR, model="CRMAv2")

print(acc)

csC <- process(acc, verbose=verbose)
print(csC)


##Step 2 - Normalization for nucleotide-position probe sequence effects
bpn <- BasePositionNormalization(csC, target="zero")
csN <- process(bpn, verbose=verbose)
cs <- csN



#step 3
#Probe summarization using non-negative-matrix factorization (NMF)

plm <- NmfSnpPlm(csN, mergeStrands=TRUE)
print(plm)

if (length(findUnitsTodo(plm)) > 0) {
   # Fit CN probes quickly (~5-10s/array + some overhead)
  units <- fitCnProbes(plm, verbose=verbose)
  str(units)
  # int [1:945826] 935590 935591 935592 935593 935594 935595 ...

  # Fit remaining units, i.e. SNPs (~5-10min/array)
  units <- fit(plm, verbose=verbose)
  str(units)
}

ces <- getChipEffectSet(plm)
print(ces)



my session information:
##
> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 20 (Heisenbug)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
 [1] ACNE_0.8.0  aroma.light_2.2.1   aroma.affymetrix_2.13.2
 [4] aroma.core_2.13.1   R.devices_2.13.0R.filesets_2.7.2   
 [7] R.utils_2.0.2   R.oo_1.19.0 affxparser_1.38.0  
[10] R.methodsS3_1.7.0  

loaded via a namespace (and not attached):
 [1] aroma.apd_0.6.0base64enc_0.1-2digest_0.6.8   
DNAcopy_1.40.0
 [5] MASS_7.3-40matrixStats_0.14.0 PSCBS_0.44.0   
R.cache_0.11.0
 [9] R.huge_0.9.0   R.rsp_0.20.0   tools_3.1.3   



-- 
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--- 
You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to aroma-affymetrix+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [aroma.affymetrix] extract LRR and BAF from ACNE results

2015-05-29 Thread Henrik Bengtsson
So, there are few things here:

1. In the Aroma Framework, we always try to store estimates/signals on
the "original" scale, e.g. even if, say, chip-effect estimates are
original on the log scale, we unlog those before saving.  That is a
policy with the rationale that all data can always be represented on
the "original" scale but not always on, say, the log scale.  An
example of the latter is when a probe signal ends up being slightly
negative after calibration and normalization.  If that would have been
stored on the log-scale we would have lost the information, i.e. it
would be stored as NaN.  Same problem with a signal that is exactly
zero (-Inf on the log scale).  Why bother?  For instance, say you end
up with replicated probe signals for a single SNP where they all
fluctuate around zero.  After averaging, the average signal may end up
non-negative. If we stored on the log scale the averaging would have
only been on the signals that were positive (because we lost the
non-positive one).  So, that's that rational.

2. For methods estimating "CNs" it depends on their approach/method/design:

(a) For instance, the CRMA v1 and CRMA v2 methods (ours; happens to be
implemented in the Aroma Framework, but should never be referred to as
the "Aroma CN method", "aroma.affymetrix CN method" or similar), we
end up with "CN signals" that have not yet been standardized relative
to a reference.  In other words, those signals are "all over the
place", because each of them are scaled with an unknown scale factors.
  It is only when you have a reference when you can create "CN
ratios", e.g. C = tumor / matched normal or C = sample / pooled
average, etc.   Comment: For CRMA v2, this is not that surprising,
because that method was specifically designed to work for each array
independently.  That means you get "CN signals" out from a single
array, but then you need to compare them to a reference.

(b) The ACNE method (however), calculates CN ratios for you.  This is
by design of the model, simply because how the NMF factorization model
works, cf. Eqns (10)-(14) in
http://bioinformatics.oxfordjournals.org/content/26/15/1827.full.
Comment: The ACNE method is a multi-array method requiring many arrays
to work and borrows information across arrays (e.g. estimate SNP
effects).  Since this information is already done, it can equally well
return CN ratios for you.  Having said this, there is nothing
preventing you from taking the ratios relative to another reference
(ratio), e.g. effectively C = (T / R) / (R / R*) = T / R.

So, in the ACNE vignette
[http://www.aroma-project.org/vignettes/ACNE/] that you're following
there is, near the end:

cf <- ces[[1]]
data <- extractTotalAndFreqB(cf, units=units)
CT <- data[,"total"]

Since you used ACNE to estimate the CNs, here 'CT' is indeed a "CN
ratios".  The reason why we still refer to it as a "total" signal, is
because that is the most generic name we could come up with for the
underlying file format.

Hope this clarifies


/Henrik

On Fri, May 29, 2015 at 6:52 AM, hongen xu  wrote:
> Dear Aroma.affymetrix teem,
>
> I followed the vignette "Allele-specific copy numbers using non-negative
> matrix factorization". I want to extract LRR and BAF to feed into another
> software "ASCAT" to correct aneuploidy and normal contamination.
>
> From the vignette, I can use extractTotalAndFreqB to extract total CN and
> BAF. My question is how can I get LRR from Total CN. I went through this
> thread  "creat binary data files containing BAF data", but get confused.
>
> quote from "creat binary data files containing BAF data"
> "The take-home message is that
> the definition of "log2 ratios" has become de facto standard in our
> field, where as "CN" is a bit ambiguous, and can mean "CN ratio"
> (C=theta/thetaR, or C=2*theta/thetaR) as well as "CN intensity"
> (theta).  By adding tags we try to make this less ambiguous."
>
> How total CN was defined in aroma.affymetrix?
>
>
> Best regards,
> Hongen
>
>
>
>
>
>
>
>
> my code
>
> 
> library("aroma.affymetrix")
> library("ACNE")
>
>
> log <- verbose <- Arguments$getVerbose(-8, timestamp=TRUE)
> # Don't display too many decimals.
> options(digits=4)
>
>
> #Verifying annotation data files
> cdf <- AffymetrixCdfFile$byChipType("CytoScanHD_Array");
> print(cdf)
> gi <- getGenomeInformation(cdf)
> print(gi)
> si <- getSnpInformation(cdf)
> print(si)
> acs <- AromaCellSequenceFile$byChipType(getChipType(cdf, fullname=FALSE))
> print(acs)
>
> ##raw data sets
> csR<- AffymetrixCelSet$byName("smida", cdf=cdf);
>
> ###step 1 Calibration for crosstalk between allele probe pairs
> acc <- AllelicCrosstalkCalibration(csR, model="CRMAv2")
>
> print(acc)
>
> csC <- process(acc, verbose=verbose)
> print(csC)
>
>
> ##Step 2 - Normalization for nucleotide-position probe sequence effects
> bpn <- BasePositionNormalization(csC, target="zero")
> csN <- process(bpn, verbose=verbose)
> cs <- csN
>
>
>
> #step 3
> #Probe summarization using non-negative-matrix factor

Re: [aroma.affymetrix] extract LRR and BAF from ACNE results

2015-06-01 Thread hongen xu
Dear Henrik,

many thanks for your detailed explanation.



在 2015年5月29日星期五 UTC+2下午9:20:34,Henrik Bengtsson写道:
>
> So, there are few things here: 
>
> 1. In the Aroma Framework, we always try to store estimates/signals on 
> the "original" scale, e.g. even if, say, chip-effect estimates are 
> original on the log scale, we unlog those before saving.  That is a 
> policy with the rationale that all data can always be represented on 
> the "original" scale but not always on, say, the log scale.  An 
> example of the latter is when a probe signal ends up being slightly 
> negative after calibration and normalization.  If that would have been 
> stored on the log-scale we would have lost the information, i.e. it 
> would be stored as NaN.  Same problem with a signal that is exactly 
> zero (-Inf on the log scale).  Why bother?  For instance, say you end 
> up with replicated probe signals for a single SNP where they all 
> fluctuate around zero.  After averaging, the average signal may end up 
> non-negative. If we stored on the log scale the averaging would have 
> only been on the signals that were positive (because we lost the 
> non-positive one).  So, that's that rational. 
>
> 2. For methods estimating "CNs" it depends on their 
> approach/method/design: 
>
> (a) For instance, the CRMA v1 and CRMA v2 methods (ours; happens to be 
> implemented in the Aroma Framework, but should never be referred to as 
> the "Aroma CN method", "aroma.affymetrix CN method" or similar), we 
> end up with "CN signals" that have not yet been standardized relative 
> to a reference.  In other words, those signals are "all over the 
> place", because each of them are scaled with an unknown scale factors. 
>   It is only when you have a reference when you can create "CN 
> ratios", e.g. C = tumor / matched normal or C = sample / pooled 
> average, etc.   Comment: For CRMA v2, this is not that surprising, 
> because that method was specifically designed to work for each array 
> independently.  That means you get "CN signals" out from a single 
> array, but then you need to compare them to a reference. 
>
> (b) The ACNE method (however), calculates CN ratios for you.  This is 
> by design of the model, simply because how the NMF factorization model 
> works, cf. Eqns (10)-(14) in 
> http://bioinformatics.oxfordjournals.org/content/26/15/1827.full. 
> Comment: The ACNE method is a multi-array method requiring many arrays 
> to work and borrows information across arrays (e.g. estimate SNP 
> effects).  Since this information is already done, it can equally well 
> return CN ratios for you.  Having said this, there is nothing 
> preventing you from taking the ratios relative to another reference 
> (ratio), e.g. effectively C = (T / R) / (R / R*) = T / R. 
>
> So, in the ACNE vignette 
> [http://www.aroma-project.org/vignettes/ACNE/] that you're following 
> there is, near the end: 
>
> cf <- ces[[1]] 
> data <- extractTotalAndFreqB(cf, units=units) 
> CT <- data[,"total"] 
>
> Since you used ACNE to estimate the CNs, here 'CT' is indeed a "CN 
> ratios".  The reason why we still refer to it as a "total" signal, is 
> because that is the most generic name we could come up with for the 
> underlying file format. 
>
> Hope this clarifies 
>
>
> /Henrik 
>
> On Fri, May 29, 2015 at 6:52 AM, hongen xu  > wrote: 
> > Dear Aroma.affymetrix teem, 
> > 
> > I followed the vignette "Allele-specific copy numbers using non-negative 
> > matrix factorization". I want to extract LRR and BAF to feed into 
> another 
> > software "ASCAT" to correct aneuploidy and normal contamination. 
> > 
> > From the vignette, I can use extractTotalAndFreqB to extract total CN 
> and 
> > BAF. My question is how can I get LRR from Total CN. I went through this 
> > thread  "creat binary data files containing BAF data", but get confused. 
> > 
> > quote from "creat binary data files containing BAF data" 
> > "The take-home message is that 
> > the definition of "log2 ratios" has become de facto standard in our 
> > field, where as "CN" is a bit ambiguous, and can mean "CN ratio" 
> > (C=theta/thetaR, or C=2*theta/thetaR) as well as "CN intensity" 
> > (theta).  By adding tags we try to make this less ambiguous." 
> > 
> > How total CN was defined in aroma.affymetrix? 
> > 
> > 
> > Best regards, 
> > Hongen 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > my code 
> > 
> >  
> > library("aroma.affymetrix") 
> > library("ACNE") 
> > 
> > 
> > log <- verbose <- Arguments$getVerbose(-8, timestamp=TRUE) 
> > # Don't display too many decimals. 
> > options(digits=4) 
> > 
> > 
> > #Verifying annotation data files 
> > cdf <- AffymetrixCdfFile$byChipType("CytoScanHD_Array"); 
> > print(cdf) 
> > gi <- getGenomeInformation(cdf) 
> > print(gi) 
> > si <- getSnpInformation(cdf) 
> > print(si) 
> > acs <- AromaCellSequenceFile$byChipType(getChipType(cdf, 
> fullname=FALSE)) 
> > print(acs) 
> > 
> > ##raw data sets 
> > csR<- AffymetrixCelSet$byName("smida", cdf=