Re: [aroma.affymetrix] genotyping crlmm genomewidesnp 6.0

2013-04-10 Thread Carles Hernández
Good afternoon,

First of all offer my apologies for the delay of this response.

El sábado, 23 de marzo de 2013 22:28:52 UTC+1, Henrik Bengtsson escribió:

 Hi. 

 On Sat, Mar 23, 2013 at 4:00 AM, Carles Hernández 
 kurag...@gmail.comjavascript: 
 wrote: 
  Good morning, 
  
  First of all, thanks for answering so fast. Its really helpful to be 
 able to 
  talk with the main creator of the library. 
  
  Going back to the topic, sorry I didn't express myself properly. I have 
 no 
  idea what the CEL files contain so, the idea is to analyze the 
 microarrays 
  using, the FreqB, LRR and genotypes. Some of them can are tumoral but I 
  can't know. I will use the genotype to classify the probes in AA, Ab and 
 BB 
  in order to study the FreqB compared with LRR and use an external 
 program 
  called MAD. 

 But do you agree with me that it does not make sense to classify a SNP 
 into (AA, AB, BB), i.e. call the genotype, if the SNP is for instance 
 A, ABB, AAABB, or even worse a mixture of, say, 10% A, 38.5% ABB and 
 40.1% AAABB and the rest being the normal AB?  So, I still argue that 
 genotypes will only make sense for SNPs that you know are normal.  If 
 you don't know which samples are normal and which are tumors you will 
 never know which SNPs/genotype calls you can trust, which to me makes 
 the (artifical) genotype calls useless.  Although I still haven't seen 
 one, I'm all ear for a good argument for where it makes sense to call 
 genotypes in a tumor.  I'm just trying to safe you from wasting your 
 time going down the wrong path. 


Yes, I agree with that but in fact I want a baf estimation and for that I 
want to use CRLMM, which also predicts the genotype, but it is not ready 
for GenomeWideSNP 6 so use the implementation of CRMAv2 which predicts baf 
pretty well it may be a solution. 

Could you provide a reference to MAD - never heard of it. 


Here you can get some information related to MAD:

 - http://www.biomedcentral.com/1471-2105/12/166
 - http://www.creal.cat/jrgonzalez/software.htm#ancla-MAD 

 
  So, you said CRLMM is not implemented for GenomeWideSNP 6.0, may I can 
  contribute implementing it? 

 Certainly, that would be great and most appreciated.  Just a heads up, 
 it's more than a standard programming task.  It requires diving into 
 the oligo::crlmm() code and its algorithm to find out which modules 
 can be reused and which needs to be ported.  The two CrlmmModel.R and 
 CrlmmModel.EXT.R in aroma.affymetrix/R/ would serve as a good 
 start/template: 


 https://r-forge.r-project.org/scm/viewvc.php/pkg/aroma.affymetrix/R/CrlmmModel.R?view=markuproot=aroma-dots
  

 https://r-forge.r-project.org/scm/viewvc.php/pkg/aroma.affymetrix/R/CrlmmModel.EXT.R?view=markuproot=aroma-dots
  

 If you look inside oligo::crlmm() you see that it itself takes two 
 separate paths depending whether the chip type is (a) 
 Mapping50K_(Hind|Xba)240 and Mapping250K_(Nsp|Sty) [which is ported to 
 aroma.affymetrix], or (b) GenomeWideSNP_(5|6) [which is not ported]. 
 In other words, it's the internal oligo:::genotypeOne() that needs to 
 be ported. 


Actually I am battling with clrmm, oligo and oligoClasses to manage my 
GenomeWideSNP cel files. My prior is to finish this analysis but may be I 
will take a hand on this porting, not sure but in mind.
 

  
  Anyway, thank you to share with us the aroma.affymetrix suite. 

 You're welcome - hopefully it makes everyday science a bit easier. 

 /Henrik 


Lots of thanks for you answers.

Carles


PS\ Some consideration to apply CRLMM to Affymetrix Axiom and Affymetrix 
Axiom Exome arrays?
 

  
  
  El viernes, 22 de marzo de 2013 19:31:10 UTC+1, Henrik Bengtsson 
 escribió: 
  
  Hi Carles, 
  
  the quick answer it that aroma.affymetrix only implements the CRLMM 
  method for the 100K (Mapping50K_Xba142 and Mapping50K_Hind142) and 
  500K (Mapping250K_Nsp and Mapping250K_Sty) chip types.   For newer 
  methods you need to turn to the Bioconductor 'oligo' package. 
  
  However, what are you going to use the genotypes for?  I'm asking 
  because it is rather common, and according to me incorrect, to try to 
  call genotypes in tumor samples.  Genotypes are really only defined in 
  normal/germline genomes and most (all?) genotype methods assume that 
  the samples are such.  Calling genotypes in tumors is rather a 
  problem of inferring parent-specific CNs (PSCNs) - not at the 
  SNP-by-SNP level but in segments along the genome.  Contrary to normal 
  PSCNs (genotypes), tumor PSCNs may not take discrete levels due 
  clonality and normal contamination.  In other words, if you do indeed 
  have tumors, it does not make sense to use CRLMM on them.  Instead you 
  want to to PSCN segmentation/calling. 
  
  Hope this helps 
  
  Henrik 
  
  
  On Fri, Mar 22, 2013 at 7:47 AM, Carles Hernández kurag...@gmail.com 
  wrote: 
   Good afternoon, 
   
   I am trying to analyse a set of CEL files from Affymetrix 
 GenomeWideSNP 
   6.0 
   and get its 

Re: [aroma.affymetrix] genotyping crlmm genomewidesnp 6.0

2013-03-23 Thread Carles Hernández
Good morning,

First of all, thanks for answering so fast. Its really helpful to 
be able to talk with the main creator of the library.

Going back to the topic, sorry I didn't express myself properly. I have no 
idea what the CEL files contain so, the idea is to analyze the microarrays 
using, the FreqB, LRR and genotypes. Some of them can are tumoral but I 
can't know. I will use the genotype to classify the probes in AA, Ab and BB 
in order to study the FreqB compared with LRR and use an external program 
called MAD.

So, you said CRLMM is not implemented for GenomeWideSNP 6.0, may I can 
contribute implementing it?

Anyway, thank you to share with us the aroma.affymetrix suite.


El viernes, 22 de marzo de 2013 19:31:10 UTC+1, Henrik Bengtsson escribió:

 Hi Carles, 

 the quick answer it that aroma.affymetrix only implements the CRLMM 
 method for the 100K (Mapping50K_Xba142 and Mapping50K_Hind142) and 
 500K (Mapping250K_Nsp and Mapping250K_Sty) chip types.   For newer 
 methods you need to turn to the Bioconductor 'oligo' package. 

 However, what are you going to use the genotypes for?  I'm asking 
 because it is rather common, and according to me incorrect, to try to 
 call genotypes in tumor samples.  Genotypes are really only defined in 
 normal/germline genomes and most (all?) genotype methods assume that 
 the samples are such.  Calling genotypes in tumors is rather a 
 problem of inferring parent-specific CNs (PSCNs) - not at the 
 SNP-by-SNP level but in segments along the genome.  Contrary to normal 
 PSCNs (genotypes), tumor PSCNs may not take discrete levels due 
 clonality and normal contamination.  In other words, if you do indeed 
 have tumors, it does not make sense to use CRLMM on them.  Instead you 
 want to to PSCN segmentation/calling. 

 Hope this helps 

 Henrik 


 On Fri, Mar 22, 2013 at 7:47 AM, Carles Hernández 
 kurag...@gmail.comjavascript: 
 wrote: 
  Good afternoon, 
  
  I am trying to analyse a set of CEL files from Affymetrix GenomeWideSNP 
 6.0 
  and get its LRR, FreqB and genotype (for all individuals and for all 
  chromosomes). 
  
  I have started with the vignettes CRMA (v1): Total copy number analysis 
  using CRMA v1 (10K, 100K, 500K) and CRMA (v2): Estimation of total 
 copy 
  numbers using the CRMA v2 method (10K-CytoScanHD) since I am new in 
 this 
  world of microarrays analysis. 
  
  But I didn't fine any way to retrieve the genotype I moved to CRLMM 
  genotyping (100K and 500K). 
  
  So, from both methods I can get the LRR and FreqB with extactCNT of with 
  extractTotalAndFraqB but only from the second one (CRLMM) I can use the 
  extractGenotypes (becouse the chiptype's crlmm model is required). On 
 the 
  other hand when I try to create the crlmm model for GenomeWideSNP 6.0 
 the 
  following error succeed: 
  
  
  Exception: Cannot fit CRLMM model: Model fitting for this chip type is 
 not 
  supported/implemented: GenomeWideSNP_6 
at #02. CrlmmModel(ces, tags = *,oligo) 
- CrlmmModel() is in environment 'aroma.affymetrix' 
at #01. process_dataset(GenomeWideSNP_6, gal, verbose = TRUE) 
- process_dataset() is in environment 'R_GlobalEnv' 
  Error: Cannot fit CRLMM model: Model fitting for this chip type is not 
  supported/implemented: GenomeWideSNP_6 
  
  
  So... Am I doing something wrong? If no, is there some way to get the 
 full 
  set of data I need (sample's name, sample's position, chromosome, LRR, 
 FraqB 
  and genotype) using a single method? 
  
  My full code-snippet: 
  
  library( 'aroma.affymetrix' ) 
  
  
  write_table - function( dataset, file_name ) { 
  [...] 
  } 
  
  process_dataset - function( dataset_name chip_type ) { 
  cdf - AffymetrixCdfFile$byChipType( chip_type ); 
  csR - AffymetrixCelSet$byName( dataset_name, cdf=cdf ); 
  ces - justSNPRMA( csR, normalizeToHapmap=TRUE, returnESet=FALSE ); 
  crlmm - CrlmmModel( ces, tags=*,oligo ); 
  units - fit( crlmm, ram=oligo ); 
  callSet - getCallSet( crlmm ); 
  
  
  gi - getGenomeInformation( cdf ); 
  
  
  for( array in 1:length( csR ) ) { 
  ds - NULL; 
  ce - getFile( ces, array ); 
  for( chr in chr_list ) { 
  chrunits - getUnitsOnChromosome( gi, chromosome=chr ); 
  chrnames - getUnitNames( cdf, units=chrunits ) 
  pos - getPositions( gi, units=chrunits ); # / 1e6; 
  cf - getFile( callSet, array ); 
  calls - extractGenotypes( cf, units=chrunits ); 
  dta - extractTotalAndFreqB( ce, units=chrunits ); 
  theta - dta[,total]; 
  
  ceR - getAverageFile( ces ); 
  dataR - extractTotalAndFreqB( ceR, units=chrunits ); 
  thetaR - dataR[,total]; 
  
  l2r - log2(theta/thetaR); 
  ds - add_to_ds( chrnames, rep( chr, length( chrnames ) 
 ), 
  pos, l2r, dta[,FreqB], calls 

Re: [aroma.affymetrix] genotyping crlmm genomewidesnp 6.0

2013-03-23 Thread Henrik Bengtsson
Hi.

On Sat, Mar 23, 2013 at 4:00 AM, Carles Hernández kuragari...@gmail.com wrote:
 Good morning,

 First of all, thanks for answering so fast. Its really helpful to be able to
 talk with the main creator of the library.

 Going back to the topic, sorry I didn't express myself properly. I have no
 idea what the CEL files contain so, the idea is to analyze the microarrays
 using, the FreqB, LRR and genotypes. Some of them can are tumoral but I
 can't know. I will use the genotype to classify the probes in AA, Ab and BB
 in order to study the FreqB compared with LRR and use an external program
 called MAD.

But do you agree with me that it does not make sense to classify a SNP
into (AA, AB, BB), i.e. call the genotype, if the SNP is for instance
A, ABB, AAABB, or even worse a mixture of, say, 10% A, 38.5% ABB and
40.1% AAABB and the rest being the normal AB?  So, I still argue that
genotypes will only make sense for SNPs that you know are normal.  If
you don't know which samples are normal and which are tumors you will
never know which SNPs/genotype calls you can trust, which to me makes
the (artifical) genotype calls useless.  Although I still haven't seen
one, I'm all ear for a good argument for where it makes sense to call
genotypes in a tumor.  I'm just trying to safe you from wasting your
time going down the wrong path.

Could you provide a reference to MAD - never heard of it.


 So, you said CRLMM is not implemented for GenomeWideSNP 6.0, may I can
 contribute implementing it?

Certainly, that would be great and most appreciated.  Just a heads up,
it's more than a standard programming task.  It requires diving into
the oligo::crlmm() code and its algorithm to find out which modules
can be reused and which needs to be ported.  The two CrlmmModel.R and
CrlmmModel.EXT.R in aroma.affymetrix/R/ would serve as a good
start/template:

https://r-forge.r-project.org/scm/viewvc.php/pkg/aroma.affymetrix/R/CrlmmModel.R?view=markuproot=aroma-dots
https://r-forge.r-project.org/scm/viewvc.php/pkg/aroma.affymetrix/R/CrlmmModel.EXT.R?view=markuproot=aroma-dots

If you look inside oligo::crlmm() you see that it itself takes two
separate paths depending whether the chip type is (a)
Mapping50K_(Hind|Xba)240 and Mapping250K_(Nsp|Sty) [which is ported to
aroma.affymetrix], or (b) GenomeWideSNP_(5|6) [which is not ported].
In other words, it's the internal oligo:::genotypeOne() that needs to
be ported.


 Anyway, thank you to share with us the aroma.affymetrix suite.

You're welcome - hopefully it makes everyday science a bit easier.

/Henrik



 El viernes, 22 de marzo de 2013 19:31:10 UTC+1, Henrik Bengtsson escribió:

 Hi Carles,

 the quick answer it that aroma.affymetrix only implements the CRLMM
 method for the 100K (Mapping50K_Xba142 and Mapping50K_Hind142) and
 500K (Mapping250K_Nsp and Mapping250K_Sty) chip types.   For newer
 methods you need to turn to the Bioconductor 'oligo' package.

 However, what are you going to use the genotypes for?  I'm asking
 because it is rather common, and according to me incorrect, to try to
 call genotypes in tumor samples.  Genotypes are really only defined in
 normal/germline genomes and most (all?) genotype methods assume that
 the samples are such.  Calling genotypes in tumors is rather a
 problem of inferring parent-specific CNs (PSCNs) - not at the
 SNP-by-SNP level but in segments along the genome.  Contrary to normal
 PSCNs (genotypes), tumor PSCNs may not take discrete levels due
 clonality and normal contamination.  In other words, if you do indeed
 have tumors, it does not make sense to use CRLMM on them.  Instead you
 want to to PSCN segmentation/calling.

 Hope this helps

 Henrik


 On Fri, Mar 22, 2013 at 7:47 AM, Carles Hernández kurag...@gmail.com
 wrote:
  Good afternoon,
 
  I am trying to analyse a set of CEL files from Affymetrix GenomeWideSNP
  6.0
  and get its LRR, FreqB and genotype (for all individuals and for all
  chromosomes).
 
  I have started with the vignettes CRMA (v1): Total copy number analysis
  using CRMA v1 (10K, 100K, 500K) and CRMA (v2): Estimation of total
  copy
  numbers using the CRMA v2 method (10K-CytoScanHD) since I am new in
  this
  world of microarrays analysis.
 
  But I didn't fine any way to retrieve the genotype I moved to CRLMM
  genotyping (100K and 500K).
 
  So, from both methods I can get the LRR and FreqB with extactCNT of with
  extractTotalAndFraqB but only from the second one (CRLMM) I can use the
  extractGenotypes (becouse the chiptype's crlmm model is required). On
  the
  other hand when I try to create the crlmm model for GenomeWideSNP 6.0
  the
  following error succeed:
 
 
  Exception: Cannot fit CRLMM model: Model fitting for this chip type is
  not
  supported/implemented: GenomeWideSNP_6
at #02. CrlmmModel(ces, tags = *,oligo)
- CrlmmModel() is in environment 'aroma.affymetrix'
at #01. process_dataset(GenomeWideSNP_6, gal, verbose = TRUE)
- process_dataset() is in 

Re: [aroma.affymetrix] genotyping crlmm genomewidesnp 6.0

2013-03-22 Thread Henrik Bengtsson
Hi Carles,

the quick answer it that aroma.affymetrix only implements the CRLMM
method for the 100K (Mapping50K_Xba142 and Mapping50K_Hind142) and
500K (Mapping250K_Nsp and Mapping250K_Sty) chip types.   For newer
methods you need to turn to the Bioconductor 'oligo' package.

However, what are you going to use the genotypes for?  I'm asking
because it is rather common, and according to me incorrect, to try to
call genotypes in tumor samples.  Genotypes are really only defined in
normal/germline genomes and most (all?) genotype methods assume that
the samples are such.  Calling genotypes in tumors is rather a
problem of inferring parent-specific CNs (PSCNs) - not at the
SNP-by-SNP level but in segments along the genome.  Contrary to normal
PSCNs (genotypes), tumor PSCNs may not take discrete levels due
clonality and normal contamination.  In other words, if you do indeed
have tumors, it does not make sense to use CRLMM on them.  Instead you
want to to PSCN segmentation/calling.

Hope this helps

Henrik


On Fri, Mar 22, 2013 at 7:47 AM, Carles Hernández kuragari...@gmail.com wrote:
 Good afternoon,

 I am trying to analyse a set of CEL files from Affymetrix GenomeWideSNP 6.0
 and get its LRR, FreqB and genotype (for all individuals and for all
 chromosomes).

 I have started with the vignettes CRMA (v1): Total copy number analysis
 using CRMA v1 (10K, 100K, 500K) and CRMA (v2): Estimation of total copy
 numbers using the CRMA v2 method (10K-CytoScanHD) since I am new in this
 world of microarrays analysis.

 But I didn't fine any way to retrieve the genotype I moved to CRLMM
 genotyping (100K and 500K).

 So, from both methods I can get the LRR and FreqB with extactCNT of with
 extractTotalAndFraqB but only from the second one (CRLMM) I can use the
 extractGenotypes (becouse the chiptype's crlmm model is required). On the
 other hand when I try to create the crlmm model for GenomeWideSNP 6.0 the
 following error succeed:


 Exception: Cannot fit CRLMM model: Model fitting for this chip type is not
 supported/implemented: GenomeWideSNP_6
   at #02. CrlmmModel(ces, tags = *,oligo)
   - CrlmmModel() is in environment 'aroma.affymetrix'
   at #01. process_dataset(GenomeWideSNP_6, gal, verbose = TRUE)
   - process_dataset() is in environment 'R_GlobalEnv'
 Error: Cannot fit CRLMM model: Model fitting for this chip type is not
 supported/implemented: GenomeWideSNP_6


 So... Am I doing something wrong? If no, is there some way to get the full
 set of data I need (sample's name, sample's position, chromosome, LRR, FraqB
 and genotype) using a single method?

 My full code-snippet:

 library( 'aroma.affymetrix' )


 write_table - function( dataset, file_name ) {
 [...]
 }

 process_dataset - function( dataset_name chip_type ) {
 cdf - AffymetrixCdfFile$byChipType( chip_type );
 csR - AffymetrixCelSet$byName( dataset_name, cdf=cdf );
 ces - justSNPRMA( csR, normalizeToHapmap=TRUE, returnESet=FALSE );
 crlmm - CrlmmModel( ces, tags=*,oligo );
 units - fit( crlmm, ram=oligo );
 callSet - getCallSet( crlmm );


 gi - getGenomeInformation( cdf );


 for( array in 1:length( csR ) ) {
 ds - NULL;
 ce - getFile( ces, array );
 for( chr in chr_list ) {
 chrunits - getUnitsOnChromosome( gi, chromosome=chr );
 chrnames - getUnitNames( cdf, units=chrunits )
 pos - getPositions( gi, units=chrunits ); # / 1e6;
 cf - getFile( callSet, array );
 calls - extractGenotypes( cf, units=chrunits );
 dta - extractTotalAndFreqB( ce, units=chrunits );
 theta - dta[,total];

 ceR - getAverageFile( ces );
 dataR - extractTotalAndFreqB( ceR, units=chrunits );
 thetaR - dataR[,total];

 l2r - log2(theta/thetaR);
 ds - add_to_ds( chrnames, rep( chr, length( chrnames ) ),
 pos, l2r, dta[,FreqB], calls );
 }
 colnames( ds ) - c( Name, Chr, Position, Log.R.Ratio,
 B.Allele.Freq, GType );
 write_table( ds, paste0( getName( ce ), .txt ) )
 }
 }
 }

 process_dataset( GenomeWideSNP_6, gal )

 --
 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest
 version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/

 ---
 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to aroma-affymetrix+unsubscr...@googlegroups.com.
 For more