Re: [aroma.affymetrix] smoothing after CRMAv2 and before CBS?

2013-12-16 Thread Henrik Bengtsson
On Mon, Dec 16, 2013 at 12:03 PM, Emilie emilie.lalo...@gmail.com wrote:

 Yep, I think you're right, for this sample there are many points away from
 the y=x line.


 https://lh4.googleusercontent.com/-rLT39dQUYe0/Uq9aLgEKsPI/KWY/Uv1cJZGoVTo/s1600/2013-12-16_KB170TvsKB170B_beta_TvsN.png
 Thankfully this isn't the case for all the samples though. I suppose other
 than finding the right match algorithmically (as this is public data), I
 could use a pooled reference instead of the matched normal.


Being able to call a tumor-normal pair matched or not would be useful for
QC purposes such as detecting misannotated samples as here. Shouldn't be
too hard, but someone needs to do it.  It's not unusual that you find
misannotated samples in data sets - it happens all the time and it has been
observed several times in GEO data sets (e.g. the folks behind
progenetix.org/arraymap.org spend a lot of time fixing up such mistakes).

If you know there are matched normals, you should spend the time finding
them, because that is so much more valuable than a global reference when
dealing with parent-specific CNs.  The matched normal is needed to identify
the heterozygote SNPs, which are used to construct the DH track.


 Two more questions that I haven't been able to find in the documentation..
 1- Is there a way to remove (or ideally prevent) very small segments? I'm
 finding several small (2-10 probes) segments with very high copy number
 (10). I'm doubtful that they are all real.


This is called over segmentation.  It's a known issue of several
segmentation methods, including CBS (which is used by PSCBS).  My view on
this is that it needs to be taken care of downstream from the actual
segmentation (=finding change points) - in the steps where one cluster/call
segments.  We are working on methods for dealing with this, but nothing is
ready yet.

My best advice is to look at Section 'Pruning segmentation pro file' in the
PSCBS vignette, e.g.

 fitP - pruneByHClust(fit, h=0.25)

which merges segments with similar segment levels.  The downside is that
you manually have to tune 'h'.  There is currently know other way to set
that parameter.



 2- I am finding very few deletions (5 per genome, vs dozens of
 amplifications) in my samples. I'm calling gains/dels based on the standard
 deviation of the median probe log-ratio. Is this skewed distribution a
 common phenomena and should my threshold for deletions be more lenient than
 my threshold for gains?


First, I don't think one can make any general statements about the
distribution of number of losses and gains, so I wouldn't worry too much,
but I might be wrong.   However, identifying where your copy neutral level
is, that is, identify the ploidy of your tumor, will affect the
distribution.  In other words, you will get different calls if you think
your tumor is diploid or tetraploid.  (working on this as well)

Second, contrary to popular belief, calling gains and losses is not easy
(we're working on that too as part of the problem).  The problem is find
find thresholds for gains and losses that are adjusted for normal
contamination, other background signals, and ploidy (and even tumor
heterogeneity in the worst case) - parameters that can be rather tricky to
estimate reliably.  I know that there methods for calling gains and losses
that estimate these thresholds base on the noise level of the probes.  I'm
not very fond of such methods for the simple reason that two replicated
runs can have identical segment means but completely different noise levels
- you want to use the same calling thresholds in both cases.  What to use
instead?  (We're working on it for the PSCBS case).  For the total CN case
I would use callers similar to those used by the GLAD segmentation method.

Hope this helps

Henrik





 Thanks!

 Emilie



 On Friday, December 13, 2013 4:55:26 PM UTC-5, Henrik Bengtsson wrote:

 On Fri, Dec 13, 2013 at 11:15 AM, Emilie emilie@gmail.com wrote:

 Thanks again Henrick. I do see 3 bands, but not sure they are
 necessarily clean/distinct.


 https://lh5.googleusercontent.com/-yPngBXg2loA/Uqtcqi2z4YI/KWI/mcUOS0DWiyQ/s1600/2013-12-13_KB170B_BAF.png


 These normal BAFs look ok to me.


 https://lh5.googleusercontent.com/-yPngBXg2loA/Uqtcqi2z4YI/KWI/mcUOS0DWiyQ/s1600/2013-12-13_KB170B_BAF.png
 A similar but slightly noisier pattern is observed in the tumour sample.
 I also down-sampled the data 50x to be able to see the patterns (vs a black
 blob). Would you consider this as noise/a bad run?


 Down-sampling is alright when plotting whole-genome data.

 If the tumor BAFs ('betaT') are not much noisier than the normal BAFs
 ('betaN'), I suspect a mismatched pair.  Next, look at tumor vs normal
 BAFs, e.g.

 plot(betaN, betaT, xlim=c(0,1), ylim=c(0,1))

 What do you get?  You should see most data points scattered along the
 diagonal line, similar to the ones in Figure 4 of the online CalMaTe
 vignette [ 

Re: [aroma.affymetrix] smoothing after CRMAv2 and before CBS?

2013-12-13 Thread Henrik Bengtsson
On Fri, Dec 13, 2013 at 11:15 AM, Emilie emilie.lalo...@gmail.com wrote:

 Thanks again Henrick. I do see 3 bands, but not sure they are necessarily
 clean/distinct.


 https://lh5.googleusercontent.com/-yPngBXg2loA/Uqtcqi2z4YI/KWI/mcUOS0DWiyQ/s1600/2013-12-13_KB170B_BAF.png


These normal BAFs look ok to me.


 https://lh5.googleusercontent.com/-yPngBXg2loA/Uqtcqi2z4YI/KWI/mcUOS0DWiyQ/s1600/2013-12-13_KB170B_BAF.png
 A similar but slightly noisier pattern is observed in the tumour sample. I
 also down-sampled the data 50x to be able to see the patterns (vs a black
 blob). Would you consider this as noise/a bad run?


Down-sampling is alright when plotting whole-genome data.

If the tumor BAFs ('betaT') are not much noisier than the normal BAFs
('betaN'), I suspect a mismatched pair.  Next, look at tumor vs normal
BAFs, e.g.

plot(betaN, betaT, xlim=c(0,1), ylim=c(0,1))

What do you get?  You should see most data points scattered along the
diagonal line, similar to the ones in Figure 4 of the online CalMaTe
vignette [ http://aroma-project.org/vignettes/CalMaTe ].  If you see data
points all over the place, particularly in upper-left and the lower-right
corners, your tumor normal pair is not from the same patient.

/Henrik


 Emilie



 On Thursday, December 12, 2013 6:01:48 PM UTC-5, Henrik Bengtsson wrote:

 The tumor DH panel makes me believe that either your tumor or your normal
 chip data is bad, or  alternatively that the tumor and normal are not
 matched.

 Check the allele B fraction of your normal. It should show three distinct
 bands.  Do the same for the tumor. It should also show distinct bands with
 varying  of bands depending on aberrations.  If both look clean, then it's
 likely they're not matched.  If one is very noisy, then that one is simply
 a bad run/sample.

 Henrik
 On Dec 12, 2013 12:42 PM, Emilie emilie@gmail.com wrote:

 Thank you both very much! I was indeed referring to smooth.cna, sorry
 about that confusion.

 I've switched over to PSCBS and used the dropSegmentationOutliers- it
 seems to be running well. I've noticed that some of my samples have very
 fragmented profiles (see attached). Does this suggest poor quality data, or
 maybe an error in my normalization/plotting? Not all samples are like this,
 but it almost seems like the order of the of the probes is scrambled?


 Emilie


 On Thursday, December 5, 2013 1:08:46 PM UTC-5, Henrik Bengtsson wrote:

 Pierre beat me to this one.  Comments below...

 On Thu, Dec 5, 2013 at 9:20 AM, Pierre Neuvial
 pierre@genopole.cnrs.fr wrote:
  Hi Emilie,
 
  OK, so you are referring to the  “smooth.CNA function in the DNAcopy
  package, cf
  http://www.bioconductor.org/packages/2.13/bioc/vignettes/DNA
 copy/inst/doc/DNAcopy.pdf
 
  What this function is doing is detecting outliers (based on how far
 their
  signal value is from their neighbors) and shrink their signal values
 toward
  those of their neighbors.
 
  This is indeed appropriate and recommended.  I thought that by
 smoothing
  you meant performing some kind of local averaging of the original
 signal
  (e.g. using a mobile median or by binning): this I don't recommend.
  Sorry
  for the confusion.
 
 
  To drop outliers, one possibility is to use the
 dropSegmentationOutliers
  function from the PSCBS package.  See the vignettes at
  http://cran.fhcrc.org/web/packages/PSCBS/index.html
 
  Another comment: since you are following the vignette for paired CNA
  analysis, I am guessing that you are working with tumor/normal pairs.
  If
  so, then you should use PSCBS rather than CBS for segmentation.
  PSCBS is an
  extension of CBS to segment not only total copy numbers but also
 allelic
  ratios. See the PSCBS vignette in the above URL.

 To balance this a little bit, I would say there may exist outliers in
 the total copy number (TCN) signals that are so sever that they bias
 the estimators/test statistic of CBS (which assumes Gaussian signals).
  If one believes there are such outliers and worries that they are so
 extreme that they would affect the segmentation severely, one could
 either (i) drop or (ii) shrink (smooth) them.  In the vignettes of
 the PSCBS package, I've last night [PSCBS (= 0.39.8)]
 corrected/clarified Section 'Dropping TCN outliers' to say the
 following:

 There may be some outliers among the TCNs.  In
 CBS~\citep{OlshenA_etal_2004,VenkatramanOlshen_2007}, the authors
 propose a method for identifying outliers and then to shrink such
 values toward their neighbors (smooth) before performing
 segmentation.  At the time CBS was developed it made sense to not just
 to drop outliers because the resolution was low and every datapoint
 was valuable.  With modern technologies the resolution is much higher
 and we can afford dropping such outliers, which can be done by:

  data - dropSegmentationOutliers(data)

 Dropping TCN outliers is optional.

 Hope this clarifies.

 Back to the original question: It is not possible to drop (or 

Re: [aroma.affymetrix] smoothing after CRMAv2 and before CBS?

2013-12-05 Thread Henrik Bengtsson
Pierre beat me to this one.  Comments below...

On Thu, Dec 5, 2013 at 9:20 AM, Pierre Neuvial
pierre.neuv...@genopole.cnrs.fr wrote:
 Hi Emilie,

 OK, so you are referring to the  “smooth.CNA function in the DNAcopy
 package, cf
 http://www.bioconductor.org/packages/2.13/bioc/vignettes/DNAcopy/inst/doc/DNAcopy.pdf

 What this function is doing is detecting outliers (based on how far their
 signal value is from their neighbors) and shrink their signal values toward
 those of their neighbors.

 This is indeed appropriate and recommended.  I thought that by smoothing
 you meant performing some kind of local averaging of the original signal
 (e.g. using a mobile median or by binning): this I don't recommend.  Sorry
 for the confusion.


 To drop outliers, one possibility is to use the dropSegmentationOutliers
 function from the PSCBS package.  See the vignettes at
 http://cran.fhcrc.org/web/packages/PSCBS/index.html

 Another comment: since you are following the vignette for paired CNA
 analysis, I am guessing that you are working with tumor/normal pairs.  If
 so, then you should use PSCBS rather than CBS for segmentation.  PSCBS is an
 extension of CBS to segment not only total copy numbers but also allelic
 ratios. See the PSCBS vignette in the above URL.

To balance this a little bit, I would say there may exist outliers in
the total copy number (TCN) signals that are so sever that they bias
the estimators/test statistic of CBS (which assumes Gaussian signals).
 If one believes there are such outliers and worries that they are so
extreme that they would affect the segmentation severely, one could
either (i) drop or (ii) shrink (smooth) them.  In the vignettes of
the PSCBS package, I've last night [PSCBS (= 0.39.8)]
corrected/clarified Section 'Dropping TCN outliers' to say the
following:

There may be some outliers among the TCNs.  In
CBS~\citep{OlshenA_etal_2004,VenkatramanOlshen_2007}, the authors
propose a method for identifying outliers and then to shrink such
values toward their neighbors (smooth) before performing
segmentation.  At the time CBS was developed it made sense to not just
to drop outliers because the resolution was low and every datapoint
was valuable.  With modern technologies the resolution is much higher
and we can afford dropping such outliers, which can be done by:

 data - dropSegmentationOutliers(data)

Dropping TCN outliers is optional.

Hope this clarifies.

Back to the original question: It is not possible to drop (or smooth)
outliers using the CbsModel() pipeline [I'll add that to the todo
list].  The easiest is to turn use the PSCBS package, where you can do
plain old single-track CBS segmentation, paired PSCBS segmentation and
also non-paired PSCBS segmentation.  As Pierre says, if you have tumor
SNP data, you should look into doing parent-specific CN analysis,
which you can do either via paired or non-paired PSCBS depending on
whether you have match normals or not.

To take your allele-specific CRMAv2 and bring it into a format
recognized by the PSCBS package, see
http://aroma-project.org/vignettes/PairedPSCBS-lowlevel

/Henrik


 Best,

 Pierre


 On Wed, Dec 4, 2013 at 5:29 PM, Emilie emilie.lalo...@gmail.com wrote:

 Hi Pierre,

 Thanks for your answer. I may be wrong but I thought smoothing prior to
 segmentation was somewhat common. It is shown in the vignettes for DNACopy
 and seems to be fairly common in the literature (this approach was used in
 the Metabric paper for example,
 http://www.ncbi.nlm.nih.gov/pubmed/22522925).

 I'd be interested in hearing more of your thoughts against this. Do you
 have an idea of how much resolution is lost by smoothing?

 Emilie



 On Tuesday, December 3, 2013 5:26:38 PM UTC-5, Pierre Neuvial wrote:

 Hi Emilie,

 It's certainly possible to do this within the Aroma framework (e.g. using
 the function binnedSmoothing).  It's probably not as straightforward as
 running the segmentation directly, though, because this is not a typical use
 case.

 In fact, I'm not sure why you want to perform smoothing before
 segmentation ?  Smoothing is definitely not required before segmentation,
 and I would actually discourage to go this path because it will end up in a
 loss of resolution along the genome at the smoothing step.

 Best,

 Pierre


 On Tue, Dec 3, 2013 at 8:53 PM, Emilie emilie@gmail.com wrote:

 Hi there,

 I'm new to processing Affy SNP6 chips and so am mainly experimenting
 with different methods to date. I ran CRMAv2 and followed steps 1-4 from 
 the
 vignette (http://aroma-project.org/vignettes/CRMAv2). For step 5, I want to
 do a paired analysis.

 Previously I've used DNAcopy to perform CBS for other array types, and
 would like to follow a similar procedure, which includes smoothing prior to
 segmentation. Is this possible using the aroma.affymetrix package? So far
 I've followed the vignette for paired CNA analysis
 (http://aroma-project.org/vignettes/pairedTotalCopyNumberAnalysis) but
 haven't seen any options for 

Re: [aroma.affymetrix] smoothing after CRMAv2 and before CBS?

2013-12-04 Thread Emilie
Hi Pierre,

Thanks for your answer. I may be wrong but I thought smoothing prior to 
segmentation was somewhat common. It is shown in the vignettes for DNACopy 
and seems to be fairly common in the literature (this approach was used in 
the Metabric paper for example, http://www.ncbi.nlm.nih.gov/pubmed/22522925
). 

I'd be interested in hearing more of your thoughts against this. Do you 
have an idea of how much resolution is lost by smoothing?

Emilie



On Tuesday, December 3, 2013 5:26:38 PM UTC-5, Pierre Neuvial wrote:

 Hi Emilie,

 It's certainly possible to do this within the Aroma framework (e.g. using 
 the function binnedSmoothing).  It's probably not as straightforward as 
 running the segmentation directly, though, because this is not a typical 
 use case.  

 In fact, I'm not sure why you want to perform smoothing before 
 segmentation ?  Smoothing is definitely not required before segmentation, 
 and I would actually discourage to go this path because it will end up in a 
 loss of resolution along the genome at the smoothing step.

 Best,

 Pierre


 On Tue, Dec 3, 2013 at 8:53 PM, Emilie emilie@gmail.com javascript:
  wrote:

 Hi there,

 I'm new to processing Affy SNP6 chips and so am mainly experimenting with 
 different methods to date. I ran CRMAv2 and followed steps 1-4 from the 
 vignette (http://aroma-project.org/vignettes/CRMAv2). For step 5, I want 
 to do a paired analysis.

 Previously I've used DNAcopy to perform CBS for other array types, and 
 would like to follow a similar procedure, which includes smoothing prior to 
 segmentation. Is this possible using the aroma.affymetrix package? So far 
 I've followed the vignette for paired CNA analysis (
 http://aroma-project.org/vignettes/pairedTotalCopyNumberAnalysis) but 
 haven't seen any options for smoothing. 

 thank you very much,

 emilie

 -- 
 -- 
 When reporting problems on aroma.affymetrix, make sure 1) to run the 
 latest version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.
  
  
 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to 
 aroma-af...@googlegroups.comjavascript:
 To unsubscribe and other options, go to 
 http://www.aroma-project.org/forum/
  
 --- 
 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to aroma-affymetr...@googlegroups.com javascript:.
 For more options, visit https://groups.google.com/groups/opt_out.




-- 
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--- 
You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to aroma-affymetrix+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [aroma.affymetrix] smoothing after CRMAv2 and before CBS?

2013-12-03 Thread Pierre Neuvial
Hi Emilie,

It's certainly possible to do this within the Aroma framework (e.g. using
the function binnedSmoothing).  It's probably not as straightforward as
running the segmentation directly, though, because this is not a typical
use case.

In fact, I'm not sure why you want to perform smoothing before segmentation
?  Smoothing is definitely not required before segmentation, and I would
actually discourage to go this path because it will end up in a loss of
resolution along the genome at the smoothing step.

Best,

Pierre


On Tue, Dec 3, 2013 at 8:53 PM, Emilie emilie.lalo...@gmail.com wrote:

 Hi there,

 I'm new to processing Affy SNP6 chips and so am mainly experimenting with
 different methods to date. I ran CRMAv2 and followed steps 1-4 from the
 vignette (http://aroma-project.org/vignettes/CRMAv2). For step 5, I want
 to do a paired analysis.

 Previously I've used DNAcopy to perform CBS for other array types, and
 would like to follow a similar procedure, which includes smoothing prior to
 segmentation. Is this possible using the aroma.affymetrix package? So far
 I've followed the vignette for paired CNA analysis (
 http://aroma-project.org/vignettes/pairedTotalCopyNumberAnalysis) but
 haven't seen any options for smoothing.

 thank you very much,

 emilie

 --
 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the
 latest version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to
 http://www.aroma-project.org/forum/

 ---
 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to aroma-affymetrix+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--- 
You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to aroma-affymetrix+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.