Re: [aroma.affymetrix] Analysis of GenomeWideSNP6.0 data

2010-08-04 Thread Pierre Neuvial

On Tue, Aug 3, 2010 at 8:12 AM, Ajanthah Sangaralingam wrote:
 Thank you for the reply. I  actually need to get the log2 copy number ratios
 form the raw .cel files of a GenomeWideSNP6.0 array - I was using CRMA1 but
 am now repeating the analysis using CRMA v2.
 I am putting all of the different tumour types either matched or unmatched
 with a germline sample in the same directory and all the normal samples in
 another directory.

This is fine, but note that did not need to put them in separate directories.

 I will then go through the processes of qulaity assesment, calibration
 crosstalk, normalization for probe sequence effect, probe summarization, and
 normalization of PCR fragment length effects.

 Do I need to calculate the raw copy numbers and turn these into log2 copy

I don't understand your question.

 How would I then calculate the copy numbers for
 1. Unpaired tumour samples - will need to be compared to a pooled reference
 from a particular tumpur type
 2. Paired samples?

It's hard to be more specific than what Henrik already said without
more details on your sample names and tumor types, and most
importantly on the design of your study.

I'll try for the unpaired analysis (your 1.)

I am assuming that you have two data sets:
- 'dsT' for the tumor samples,
- 'dsN' for the normal samples.

It seems that your concern is to use *tumor-type specific* sets of
normal samples.  Is that correct ?  See my remark below on the fact
that I'm not sure it's what you should do.

If so, then assuming that 'idxT1' contains the indices of all tumor
samples from a  particular tumor type in dsT, and 'idxN1' contains the
indices of normal samples from the same tumor type in dsN, you can do

dsN1 - extract(dsN, idxN1);  ## normal samples of tumor type 1
dsT1 - extract(dsT, idxT1);   ## tumor samples of tumor type 1

dfR1 - getAverageFile(dsN1);## pool of normal samples of tumor type 1
sm1 - CbsModel(dsT1, dfR1);

Then you can do

fit(sm1, chromosome=1, array=1, verbose=log);

to perform CBS segmentation and/or

rawCNs1 - extractRawCopyNumbers(sm1, array=1, chromosome=1)

to extract and plot raw copy numbers (independently of CBS).

And so on for each tumor type.

This should answer your 1.  However, I'm not sure that using
tumor-type specific sets of normal samples will give you better
results.  This depends in particular  on the following specific points
in your design:
- Are you normals  normal tissue samples blood samples ?
- Were all the tumor and normal microarrays done in the same lab, and
approximately at the same time ?  If so, combining all the normals
could be better.
One way to know which option is best (tumor-specific reference or
global reference) is to try both and compare the segmentation results
(e.g. using ChromosomeExplorer).

For your 2 (paired tumor/normal analysis), I think Henrik gave all the
necessary information already, but

assuming that 'idxT2' contains the indices of all tumor samples from a
 particular tumor type in dsT that have a paired normal, and 'idxN2'
contains the indices of these paired normal samples from the same
tumor type in dsN, further assuming that *the samples are in the same
order in the two sets of indices*, you can do

dsN2 - extract(dsN, idxN2);
dsT2 - extract(dsT, idxT2);

sm2 - CbsModel(dsT2, dsN2);

fit(sm2, chromosome=1, array=1, verbose=log);
rawCNs2 - extractRawCopyNumbers(sm2, array=1, chromosome=1)

I hope this helps,


 Many thanks for your help

 On 18/07/2010 12:01, Ajanthah Sangaralingam


 Yes this is correct.

 Many thanks

 From: [] 
 Behalf Of Henrik Bengtsson []
 Sent: Sunday, July 18, 2010 11:28 AM
 To: aroma-affymetrix
 Subject: Re: [aroma.affymetrix] Analysis of GenomeWideSNP6.0 data


 On Fri, Jul 16, 2010 at 11:13 AM, Ajanthah Sangaralingam wrote:

 I have been doing some paired total copy number analysis in aroma 
 The dataset I have is complicated for haf the dataset I have reference
 samples, for the other half I will do an unpiared analysis.

 So, to make sure I don't misunderstand, you have an Affymetrix
 GenomeWideSNP_6 (GWS6) data set that contains tumors and for some, but
 not all of the you have matched normal samples, where matched normal
 mean a normal tissue or normal blood extract from the same patient as
 the tumor was taken.  Is this correct?

 I alos have data from many different tomor types not just one - I do not 
 the sample number of samples from each type of tumor.

 My questions are:

 When doing a paired analysis - the normal and tumour data have there own
 directories and allelic cross talk calibration, summarization and PCR
 fragment length normlization is all done separately

Re: [aroma.affymetrix] Analysis of GenomeWideSNP6.0 data

2010-08-03 Thread Ajanthah Sangaralingam
Thank you for the reply. I  actually need to get the log2 copy number ratios
form the raw .cel files of a GenomeWideSNP6.0 array - I was using CRMA1 but
am now repeating the analysis using CRMA v2.
I am putting all of the different tumour types either matched or unmatched
with a germline sample in the same directory and all the normal samples in
another directory.
I will then go through the processes of qulaity assesment, calibration
crosstalk, normalization for probe sequence effect, probe summarization, and
normalization of PCR fragment length effects.

Do I need to calculate the raw copy numbers and turn these into log2 copy
How would I then calculate the copy numbers for
1. Unpaired tumour samples - will need to be compared to a pooled reference
from a particular tumpur type
2. Paired samples?

Many thanks for your help

On 18/07/2010 12:01, Ajanthah Sangaralingam

 Yes this is correct.
 Many thanks
 From: [] On
 Behalf Of Henrik Bengtsson []
 Sent: Sunday, July 18, 2010 11:28 AM
 To: aroma-affymetrix
 Subject: Re: [aroma.affymetrix] Analysis of GenomeWideSNP6.0 data
 On Fri, Jul 16, 2010 at 11:13 AM, Ajanthah Sangaralingam wrote:
 I have been doing some paired total copy number analysis in aroma afyymetrix.
 The dataset I have is complicated for haf the dataset I have reference
 samples, for the other half I will do an unpiared analysis.
 So, to make sure I don't misunderstand, you have an Affymetrix
 GenomeWideSNP_6 (GWS6) data set that contains tumors and for some, but
 not all of the you have matched normal samples, where matched normal
 mean a normal tissue or normal blood extract from the same patient as
 the tumor was taken.  Is this correct?
 I alos have data from many different tomor types not just one - I do not have
 the sample number of samples from each type of tumor.
 My questions are:
 When doing a paired analysis - the normal and tumour data have there own
 directories and allelic cross talk calibration, summarization and PCR
 fragment length normlization is all done separately.
 It is important to know which preprocessing method you are following.
  Since you are working with GWS6 arrays, I recommend that you use the
 CRMAv2 preprocessing method as described in vignette 'Estimation of
 total copy numbers using the CRMA v2 method (10K-GWS6)':
 Note the function doCRMAv2() which is convenient when you do not want
 to dig into the details.
 Since you are not mentioning probe-sequence normalization, it looks
 like you are indeed using CRMA v1.  If so, I recommend that you use
 CRMA v2 instead.  Using CRMA v2 will be really useful for you, as
 explained below.
 Is this tue for the different tumor types as well - should they be treated
 separately for all of tehse stages or can all the tumor types be put into one
 tumour directory.
 This is perfectly fine if you are using CRMA v2 (but not CRMA v1).
 As now clarified in the vignette, in addition to the CRMAv2 paper, you
 will get identical results with CRMAv2 regardless what other samples
 you put in your data set; the CRMAv2 method is truly a single-array
 method.  It is only when you get to the step where calculate copy
 numbers relative to a pool of references you have to make a decision
 on what pool of reference samples you'll use.
 Also, I am unable to extarct the reference samples that I want after
 normaization to compare to the matching sanmples say in another tumor type.
 Segmentation models cannot be fit unless the number of samples match exactly.
 It actually can, as explained below.
 Does this mean that I need to do all the stages again for the subsets of
 reference samples that have matching pairs in the other tumor types?
 The segmentation models, for instance CbsModel, segments each tumor
 either (a) to a matched normal, or (b) to a global reference.  When
 you do (a), by definition there has to be an equal number of tumors as
 matched normals, whereas when you do (b), there can only be one
 reference sample specified.
 Example of paired tumor-normal segmentation:
 # A set of tumor samples
 dsT - ...
 # A set of matched normal samples ordered such that they
 # match the ordering in the tumor data set 'dsT'.
 dsN - ...
 sm - CbsModel(dsT, dsN);
 Example of tumor-global reference segmentation:
 # A set of tumor samples
 dsT - ...
 # A set of reference samples (can be normals or everything)
 dsR - ...
 # Use the pool of all reference samples as the reference
 dfR - getAverageFile(dsR);
 sm - CbsModel(dsT, dfR);
 Note that 'dfR' is a single virtual array, not a data set.
 Did that above make sense?
 Many thanks
 This email may contain information that is privileged