Re: [aroma.affymetrix] An error message from running PSCBS

2012-01-15 Thread Henrik Bengtsson
On Sat, Jan 14, 2012 at 10:02 PM, Hoon Kim wise...@gmail.com wrote:
 Dear Henrik,

 Thank you very much for your time to reply to my questions, and your
 suggestions worked very well. By following your advice, I successfully
 evaluated LOHs for all samples.

 I wonder if I can ask you two more questions, one related to your previous
 email and the other related to the output of PSCBS.

 1.  A question related to your previous email:
 --A part of your previous email
 # Estimate the LOH *threshold* from autosomal chromosomes alone
 DeltaLOH - estimateDeltaLOH(extractChromosomes(fit, 1:22));
 # Call LOH
 fitC - callLOH(fit, delta=DeltaLOH);
 -
 In this case, do I also have to estimate the Alleleic Balance *threshold*
 (deltaAB) from autosomal chromosomes alone in order to do callAB?

You could do that, but since the AB caller relies on weaker
assumptions it is more likely to work well anyway.  The advantage of
estimating the DeltaAB and DeltaLOH threshold from only the autosomal
chromosomes is that (ceteris paribus) the estimates will be the same
regardless of the germline being XX or XY; include X and Y in those
estimates will give a tiny (and negligible) bias between XXs and XYs.


 2. Please, find the attached file that contains the PSCBS output for two
 patients. In Patient 1, most lohCalls are NA rather than False, while
 only a few of NAs are shown in Patient 2.  I would greatly appreciate it
 if you let me know why Patient 1 has many NAs rather than False.
 Unfortunately, I couldn't find a reference where I can find answer myself,
 but I might have missed.

See argument 'xorCalls' in callLOH(): If @TRUE, a region already
called AB, will not be called LOH.  The basic idea is that if a
segment is already called AB, it cannot be LOH.  Thus, if AB = TRUE
then LOH = FALSE, i.e. (AB,LOH)=(TRUE,FALSE).  However, we still run
the LOH caller and gets the calls.  Now, to distinguish the cases
where the AB and LOH agree from when they disagree, we report either
(AB,LOH)=(TRUE,FALSE) or (TRUE,NA).  The former is reported when they
agree and the latter when they disagree.  You can always override this
yourself by the logic: LOH[AB] - FALSE.

In the next release help(callLOH, package=PSCBS) will clarify this better.

Cheers,

Henrik


 Thank you very much.

 Best,

 Hoon

 On Fri, Jan 13, 2012 at 4:31 PM, Henrik Bengtsson
 henrik.bengts...@aroma-project.org wrote:

 Hi,

 I could reproduce this using the PairedPSCBS object that you sent me;

 library(PSCBS);

 # Load Paired PSCBS results with AB calls already done
 fit - loadObject(HoonKim_20120110.Rbin);
 fitC - callLOH(fit);
 ## Error: muC1atNonAB  muC1atAB is not TRUE

 # The CN data (figure attached) looks good
 toPNG(HoonKim_20120110, tags=PSCBS, width=840, aspectRatio=0.8, {
 plotTracks(fit); })

 What causing the problem is basically that nothing is going on in the
 tumor, that is, it behaves as the germline.  No change points were
 detected on any chromomosome (Chr 1-22, 23 (X), 24 (Y), 25 (M)),
 resulting in one segment per chromosome.  Each such segment/chromosome
 was then correctly called to be in allelic balance (AB) by callAB()
 that you did earlier.  The exception is ChrY, which is non-AB.  The
 total CNs and the BAFs for Chr24 are probably garbage, because this
 sample is an XX person, meaning we only observe noise on ChrY.

 One obvious solution is to exclude ChrY, either already when running
 the segmentation, or immediately afterward, e.g.

 df - subset(df, chromosome %in% 1:23);
 fit   - segmentByPairedPSCBS(df, verbose=verbose);
 fit - callAB(fit);
 fit - callLOH(fit);

 Alternatively, if you wish to keep ChrY, say for consistency with
 other samples, you can do as follows:

 # Estimate the LOH *threshold* from autosomal chromosomes alone
 DeltaLOH - estimateDeltaLOH(extractChromosomes(fit, 1:22));
 # Call LOH
 fitC - callLOH(fit, delta=DeltaLOH);

 In both cases you will get a warning that DeltaLOH was set to -Inf,
 which is alright.  See below.


 DETAILS:
 When calling LOH, we compare (a bootstrap quantile estimate of) the
 minor CN (C1) of each segment toward a threshold DeltaLOH.  See Eqn
 (7) in Olshen et al. (2011) [http://aroma-project.org/publications/]
 for details.  The main problem in calling LOH is to choose a good
 DeltaLOH.  As explained in the paper, this threshold is sample
 specific and requires rather strong assumptions and ad hoc estimators.
  In the Section S1.2 of the Supplementary Materials, we propose on
 approach to estimate DeltaLOH from data.  The idea is to choose
 DeltaLOH as the midpoint between where we believe the true CN=1 and
 true CN=0 is.  We estimate the true CN=1 (mu1) from all segment that
 are in AB and with TCN near two.  That estimate is not that debatable
 and works for data sets.  However, estimating where the true CN=0
 (mu0) is, requires that we have some segments that have a zero minor
 CN.  The method for 

Re: [aroma.affymetrix] An error message from running PSCBS

2012-01-15 Thread Henrik Bengtsson
On Sun, Jan 15, 2012 at 11:20 AM, Henrik Bengtsson
henrik.bengts...@aroma-project.org wrote:
 On Sat, Jan 14, 2012 at 10:02 PM, Hoon Kim wise...@gmail.com wrote:
 Dear Henrik,

 Thank you very much for your time to reply to my questions, and your
 suggestions worked very well. By following your advice, I successfully
 evaluated LOHs for all samples.

 I wonder if I can ask you two more questions, one related to your previous
 email and the other related to the output of PSCBS.

 1.  A question related to your previous email:
 --A part of your previous email
 # Estimate the LOH *threshold* from autosomal chromosomes alone
 DeltaLOH - estimateDeltaLOH(extractChromosomes(fit, 1:22));
 # Call LOH
 fitC - callLOH(fit, delta=DeltaLOH);
 -
 In this case, do I also have to estimate the Alleleic Balance *threshold*
 (deltaAB) from autosomal chromosomes alone in order to do callAB?

 You could do that, but since the AB caller relies on weaker
 assumptions it is more likely to work well anyway.  The advantage of
 estimating the DeltaAB and DeltaLOH threshold from only the autosomal
 chromosomes is that (ceteris paribus) the estimates will be the same
 regardless of the germline being XX or XY; include X and Y in those
 estimates will give a tiny (and negligible) bias between XXs and XYs.


 2. Please, find the attached file that contains the PSCBS output for two
 patients. In Patient 1, most lohCalls are NA rather than False, while
 only a few of NAs are shown in Patient 2.  I would greatly appreciate it
 if you let me know why Patient 1 has many NAs rather than False.
 Unfortunately, I couldn't find a reference where I can find answer myself,
 but I might have missed.

 See argument 'xorCalls' in callLOH(): If @TRUE, a region already
 called AB, will not be called LOH.  The basic idea is that if a
 segment is already called AB, it cannot be LOH.  Thus, if AB = TRUE
 then LOH = FALSE, i.e. (AB,LOH)=(TRUE,FALSE).  However, we still run
 the LOH caller and gets the calls.  Now, to distinguish the cases
 where the AB and LOH agree from when they disagree, we report either
 (AB,LOH)=(TRUE,FALSE) or (TRUE,NA).  The former is reported when they
 agree and the latter when they disagree.  You can always override this
 yourself by the logic: LOH[AB] - FALSE.

Forgot to add that (AB,LOH)=(TRUE,NA) call also be reported when the
LOH caller itself could not decide on either TRUE or FALSE, and hence
return NA, e.g. when there is too little evidence/data.

For your Patient #1, it is likely that the estimate of the DeltaLOH
threshold is not correct/sensible, which would explain your
(TRUE,FALSE) and (TRUE,NA) calls, as well as the (FALSE,FALSE) and
(FALSE,TRUE) calls.

/Henrik


 In the next release help(callLOH, package=PSCBS) will clarify this better.

 Cheers,

 Henrik


 Thank you very much.

 Best,

 Hoon

 On Fri, Jan 13, 2012 at 4:31 PM, Henrik Bengtsson
 henrik.bengts...@aroma-project.org wrote:

 Hi,

 I could reproduce this using the PairedPSCBS object that you sent me;

 library(PSCBS);

 # Load Paired PSCBS results with AB calls already done
 fit - loadObject(HoonKim_20120110.Rbin);
 fitC - callLOH(fit);
 ## Error: muC1atNonAB  muC1atAB is not TRUE

 # The CN data (figure attached) looks good
 toPNG(HoonKim_20120110, tags=PSCBS, width=840, aspectRatio=0.8, {
 plotTracks(fit); })

 What causing the problem is basically that nothing is going on in the
 tumor, that is, it behaves as the germline.  No change points were
 detected on any chromomosome (Chr 1-22, 23 (X), 24 (Y), 25 (M)),
 resulting in one segment per chromosome.  Each such segment/chromosome
 was then correctly called to be in allelic balance (AB) by callAB()
 that you did earlier.  The exception is ChrY, which is non-AB.  The
 total CNs and the BAFs for Chr24 are probably garbage, because this
 sample is an XX person, meaning we only observe noise on ChrY.

 One obvious solution is to exclude ChrY, either already when running
 the segmentation, or immediately afterward, e.g.

 df - subset(df, chromosome %in% 1:23);
 fit   - segmentByPairedPSCBS(df, verbose=verbose);
 fit - callAB(fit);
 fit - callLOH(fit);

 Alternatively, if you wish to keep ChrY, say for consistency with
 other samples, you can do as follows:

 # Estimate the LOH *threshold* from autosomal chromosomes alone
 DeltaLOH - estimateDeltaLOH(extractChromosomes(fit, 1:22));
 # Call LOH
 fitC - callLOH(fit, delta=DeltaLOH);

 In both cases you will get a warning that DeltaLOH was set to -Inf,
 which is alright.  See below.


 DETAILS:
 When calling LOH, we compare (a bootstrap quantile estimate of) the
 minor CN (C1) of each segment toward a threshold DeltaLOH.  See Eqn
 (7) in Olshen et al. (2011) [http://aroma-project.org/publications/]
 for details.  The main problem in calling LOH is to choose a good
 DeltaLOH.  As explained in the paper, this threshold is sample
 specific and requires rather strong 

Re: [aroma.affymetrix] An error message from running PSCBS

2012-01-15 Thread Hoon Kim
Dear Henrik,

Thank you very much for your clear explanations.
Because the estimate of the DeltaLOH
threshold seems not correct for several samples (as you mentioned in the
previous email), I am thinking of only using the *dhMean* column rather
than the *lohCall* for interpreting my data.

Again, thank you very much, and have a good holiday.

Best,

Hoon Kim

On Sun, Jan 15, 2012 at 1:20 PM, Henrik Bengtsson 
henrik.bengts...@aroma-project.org wrote:

 On Sat, Jan 14, 2012 at 10:02 PM, Hoon Kim wise...@gmail.com wrote:
  Dear Henrik,
 
  Thank you very much for your time to reply to my questions, and your
  suggestions worked very well. By following your advice, I successfully
  evaluated LOHs for all samples.
 
  I wonder if I can ask you two more questions, one related to your
 previous
  email and the other related to the output of PSCBS.
 
  1.  A question related to your previous email:
  --A part of your previous email
  # Estimate the LOH *threshold* from autosomal chromosomes alone
  DeltaLOH - estimateDeltaLOH(extractChromosomes(fit, 1:22));
  # Call LOH
  fitC - callLOH(fit, delta=DeltaLOH);
  -
  In this case, do I also have to estimate the Alleleic Balance *threshold*
  (deltaAB) from autosomal chromosomes alone in order to do callAB?

 You could do that, but since the AB caller relies on weaker
 assumptions it is more likely to work well anyway.  The advantage of
 estimating the DeltaAB and DeltaLOH threshold from only the autosomal
 chromosomes is that (ceteris paribus) the estimates will be the same
 regardless of the germline being XX or XY; include X and Y in those
 estimates will give a tiny (and negligible) bias between XXs and XYs.

 
  2. Please, find the attached file that contains the PSCBS output for two
  patients. In Patient 1, most lohCalls are NA rather than False,
 while
  only a few of NAs are shown in Patient 2.  I would greatly appreciate
 it
  if you let me know why Patient 1 has many NAs rather than False.
  Unfortunately, I couldn't find a reference where I can find answer
 myself,
  but I might have missed.

 See argument 'xorCalls' in callLOH(): If @TRUE, a region already
 called AB, will not be called LOH.  The basic idea is that if a
 segment is already called AB, it cannot be LOH.  Thus, if AB = TRUE
 then LOH = FALSE, i.e. (AB,LOH)=(TRUE,FALSE).  However, we still run
 the LOH caller and gets the calls.  Now, to distinguish the cases
 where the AB and LOH agree from when they disagree, we report either
 (AB,LOH)=(TRUE,FALSE) or (TRUE,NA).  The former is reported when they
 agree and the latter when they disagree.  You can always override this
 yourself by the logic: LOH[AB] - FALSE.

 In the next release help(callLOH, package=PSCBS) will clarify this
 better.

 Cheers,

 Henrik

 
  Thank you very much.
 
  Best,
 
  Hoon
 
  On Fri, Jan 13, 2012 at 4:31 PM, Henrik Bengtsson
  henrik.bengts...@aroma-project.org wrote:
 
  Hi,
 
  I could reproduce this using the PairedPSCBS object that you sent me;
 
  library(PSCBS);
 
  # Load Paired PSCBS results with AB calls already done
  fit - loadObject(HoonKim_20120110.Rbin);
  fitC - callLOH(fit);
  ## Error: muC1atNonAB  muC1atAB is not TRUE
 
  # The CN data (figure attached) looks good
  toPNG(HoonKim_20120110, tags=PSCBS, width=840, aspectRatio=0.8, {
  plotTracks(fit); })
 
  What causing the problem is basically that nothing is going on in the
  tumor, that is, it behaves as the germline.  No change points were
  detected on any chromomosome (Chr 1-22, 23 (X), 24 (Y), 25 (M)),
  resulting in one segment per chromosome.  Each such segment/chromosome
  was then correctly called to be in allelic balance (AB) by callAB()
  that you did earlier.  The exception is ChrY, which is non-AB.  The
  total CNs and the BAFs for Chr24 are probably garbage, because this
  sample is an XX person, meaning we only observe noise on ChrY.
 
  One obvious solution is to exclude ChrY, either already when running
  the segmentation, or immediately afterward, e.g.
 
  df - subset(df, chromosome %in% 1:23);
  fit   - segmentByPairedPSCBS(df, verbose=verbose);
  fit - callAB(fit);
  fit - callLOH(fit);
 
  Alternatively, if you wish to keep ChrY, say for consistency with
  other samples, you can do as follows:
 
  # Estimate the LOH *threshold* from autosomal chromosomes alone
  DeltaLOH - estimateDeltaLOH(extractChromosomes(fit, 1:22));
  # Call LOH
  fitC - callLOH(fit, delta=DeltaLOH);
 
  In both cases you will get a warning that DeltaLOH was set to -Inf,
  which is alright.  See below.
 
 
  DETAILS:
  When calling LOH, we compare (a bootstrap quantile estimate of) the
  minor CN (C1) of each segment toward a threshold DeltaLOH.  See Eqn
  (7) in Olshen et al. (2011) [http://aroma-project.org/publications/]
  for details.  The main problem in calling LOH is to choose a good
  DeltaLOH.  As explained in the paper, this threshold is sample
  specific 

Re: [aroma.affymetrix] An error message from running PSCBS

2012-01-15 Thread Henrik Bengtsson
On Sun, Jan 15, 2012 at 11:38 AM, Hoon Kim wise...@gmail.com wrote:
 Dear Henrik,

 Thank you very much for your clear explanations.
 Because the estimate of the DeltaLOH
 threshold seems not correct for several samples (as you mentioned in the
 previous email), I am thinking of only using the *dhMean* column rather than
 the *lohCall* for interpreting my data.

It is probably better using the minor CN (C1) estimates for calling
LOH, and instead of doing that manually, use a preset DeltaLOH
threshold in callLOH().  That is pretty much what you are proposing
while keeping the advantage of the bootstrap estimate of the C1 mean.
See Eqn (6) in Olshen et al. (2011)
[http://aroma-project.org/publications/].  If you're still want to use
DH instead of C1, see the code of callLOH(); there are flavors of
the caller that uses DH instead.  Note, calling LOH is very
complicated, both from a theoretical as well as a practical point of
view.  The complication of LOH is unfortunately rarely appreciated and
often overlooked in most of the literature.

/Henrik


 Again, thank you very much, and have a good holiday.

 Best,

 Hoon Kim

 On Sun, Jan 15, 2012 at 1:20 PM, Henrik Bengtsson
 henrik.bengts...@aroma-project.org wrote:

 On Sat, Jan 14, 2012 at 10:02 PM, Hoon Kim wise...@gmail.com wrote:
  Dear Henrik,
 
  Thank you very much for your time to reply to my questions, and your
  suggestions worked very well. By following your advice, I successfully
  evaluated LOHs for all samples.
 
  I wonder if I can ask you two more questions, one related to your
  previous
  email and the other related to the output of PSCBS.
 
  1.  A question related to your previous email:
  --A part of your previous email
  # Estimate the LOH *threshold* from autosomal chromosomes alone
  DeltaLOH - estimateDeltaLOH(extractChromosomes(fit, 1:22));
  # Call LOH
  fitC - callLOH(fit, delta=DeltaLOH);
  -
  In this case, do I also have to estimate the Alleleic Balance
  *threshold*
  (deltaAB) from autosomal chromosomes alone in order to do callAB?

 You could do that, but since the AB caller relies on weaker
 assumptions it is more likely to work well anyway.  The advantage of
 estimating the DeltaAB and DeltaLOH threshold from only the autosomal
 chromosomes is that (ceteris paribus) the estimates will be the same
 regardless of the germline being XX or XY; include X and Y in those
 estimates will give a tiny (and negligible) bias between XXs and XYs.

 
  2. Please, find the attached file that contains the PSCBS output for two
  patients. In Patient 1, most lohCalls are NA rather than False,
  while
  only a few of NAs are shown in Patient 2.  I would greatly appreciate
  it
  if you let me know why Patient 1 has many NAs rather than False.
  Unfortunately, I couldn't find a reference where I can find answer
  myself,
  but I might have missed.

 See argument 'xorCalls' in callLOH(): If @TRUE, a region already
 called AB, will not be called LOH.  The basic idea is that if a
 segment is already called AB, it cannot be LOH.  Thus, if AB = TRUE
 then LOH = FALSE, i.e. (AB,LOH)=(TRUE,FALSE).  However, we still run
 the LOH caller and gets the calls.  Now, to distinguish the cases
 where the AB and LOH agree from when they disagree, we report either
 (AB,LOH)=(TRUE,FALSE) or (TRUE,NA).  The former is reported when they
 agree and the latter when they disagree.  You can always override this
 yourself by the logic: LOH[AB] - FALSE.

 In the next release help(callLOH, package=PSCBS) will clarify this
 better.

 Cheers,

 Henrik

 
  Thank you very much.
 
  Best,
 
  Hoon
 
  On Fri, Jan 13, 2012 at 4:31 PM, Henrik Bengtsson
  henrik.bengts...@aroma-project.org wrote:
 
  Hi,
 
  I could reproduce this using the PairedPSCBS object that you sent me;
 
  library(PSCBS);
 
  # Load Paired PSCBS results with AB calls already done
  fit - loadObject(HoonKim_20120110.Rbin);
  fitC - callLOH(fit);
  ## Error: muC1atNonAB  muC1atAB is not TRUE
 
  # The CN data (figure attached) looks good
  toPNG(HoonKim_20120110, tags=PSCBS, width=840, aspectRatio=0.8, {
  plotTracks(fit); })
 
  What causing the problem is basically that nothing is going on in the
  tumor, that is, it behaves as the germline.  No change points were
  detected on any chromomosome (Chr 1-22, 23 (X), 24 (Y), 25 (M)),
  resulting in one segment per chromosome.  Each such segment/chromosome
  was then correctly called to be in allelic balance (AB) by callAB()
  that you did earlier.  The exception is ChrY, which is non-AB.  The
  total CNs and the BAFs for Chr24 are probably garbage, because this
  sample is an XX person, meaning we only observe noise on ChrY.
 
  One obvious solution is to exclude ChrY, either already when running
  the segmentation, or immediately afterward, e.g.
 
  df - subset(df, chromosome %in% 1:23);
  fit   - segmentByPairedPSCBS(df, verbose=verbose);
  fit - callAB(fit);
  fit - 

[aroma.affymetrix] An error message from running PSCBS

2012-01-09 Thread wisekh6
Dear Henrik,
I have been doing paired PSCBS analysis on a Genome Wide SNP array 6.0
data set in order to estimate LOH for 20 pairs of samples.
The code on the aroma web site successfully estimated LOH for most
pairs, while it fails to estimate LOH for several bad pairs.
An error message I got from a bad pair is :
fit   - callLOH(fit, verbose=verbose);
Error: muC1atNonAB  muC1atAB is not TRUE

The code, which is not exactly the same as my actual one, that I use
is as follows:
===
library(aroma.affymetrix);
verbose   - Arguments$getVerbose(-10, timestamp=TRUE);
library(PSCBS);

res - doASCRMAv2(csR, verbose=verbose);
data- extractPSCNArray(res$total);
# Total CNs for the tumor relative to the matched normal
CT  - 2 * (data[,total,T] / data[,total,N]);
# Allele B fractions for the tumor
betaT   - data[,fracB,T];
# Allele B fractions for the normal
betaN   - data[,fracB,N];
# Get (chromosome, position) annotation data
ugp - getAromaUgpFile(res$total);
chromosome - ugp[,1,drop=TRUE];
x   - ugp[,2,drop=TRUE];

# Setup data structure for Paired PSCBS
df- data.frame(chromosome=chromosome, x=x, CT=CT, betaT=betaT,
betaN=betaN);

# Run PSCBS
fit   - segmentByPairedPSCBS(df, verbose=verbose);
print(...segmentByPairedPSCBS)

fit   - callAB(fit, verbose=verbose);
print(...callAB)

fit   - callLOH(fit, verbose=verbose);
print(...callLOH)
===

I wonder if you provide me with any suggestion regarding this error
message.

Thank you,

Best,

Hoon

sessionInfo() is as follows:
===
 sessionInfo()

R version 2.13.0 (2011-04-13)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] splines   stats graphics  grDevices utils datasets
methods
[8] base

other attached packages:
 [1] Hmisc_3.8-3survival_2.36-5DNAcopy_1.26.0
 [4] PSCBS_0.17.1   aroma.affymetrix_2.3.0 affxparser_1.24.0
 [7] aroma.apd_0.2.0R.huge_0.3.0   aroma.core_2.3.2
[10] aroma.light_1.22.0 matrixStats_0.4.3  R.rsp_0.7.0
[13] R.cache_0.5.2  R.filesets_1.1.3   digest_0.5.1
[16] R.utils_1.9.6  R.oo_1.8.3 R.methodsS3_1.2.1

loaded via a namespace (and not attached):
[1] cluster_1.13.3  grid_2.13.0 lattice_0.19-23
===

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/