Re: [aroma.affymetrix] SNP_6 raw intensity Extraction

2013-04-30 Thread jonathan . crowther2
Hi Henrik,

That is exactly what I was looking for. I was missing the Order by Cell 
Indices.
Thank you.

Jonathan

On Monday, April 29, 2013 11:17:57 PM UTC+2, Henrik Bengtsson wrote:

 Hi. 

 On Mon, Apr 29, 2013 at 4:16 AM,  jonathan@mail.dcu.ie javascript: 
 wrote: 
  Hi Henrik, 
  
  Thank you for the reply. 
  This is quite helpful however from using the following code I am able to 
  produce a matrix of the intensities for each of the probes... 
 (replicates 
  included) 
  
  library(aroma.affymetrix) 
  cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) 
  raw_sample - AffymetrixCelSet$byName(Sample, cdf=cdf) 
  ciS- getCellIndices(cdf, unlist=TRUE, useNames=TRUE) 
  s- sample(ciS, 250) 
  head(ciS) 
  d- extractMatrix(cs, cells=ciS, verbose=-50) 
  write.table(file=all_probes.txt,df, quote=FALSE, sep=\t, 
  row.names=FALSE) 
  
  My issue now is that with this matrix I can see the 6+million probes 
 however 
  the probe ID's are not present. Maybe I am missing something but If you 
  could help me associate each probe intensity value with the probe ID I 
 would 
  be very grateful. 

 What do you mean by 'probe ID'; what you expect it do be?  Note that 
 in Affymetrix terms, there are 'unit' and 'group' (probeset) 
 IDs/names, but probes don't really have IDs other that an (x,y) 
 location or an index (as you use it above). 

 However, you could pull out probe-specific CDF annotation from the CDF 
 file as follows: 

 library(aroma.affymetrix); 
 cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full); 
 # Some example units 
 units - c(1012:1013, 950123:950125); 
 # Read CDF info as data.frame 
 cdfData - readDataFrame(cdf, units=units);  # Will take a very long 
 time if done for many units 
 # Order by cell indices 
 o - order(cdfData$cell); 
 cdfData - cdfData[o,]; 
 'data.frame':   15 obs. of  16 variables: 
  $ unit   : int  1012 1012 1013 1013 1012 1012 101.. 
  $ unitType   : chr  genotyping genotyping genot.. 
  $ unitType   : chr  genotyping genotyping genotyping ... 
  $ unitDirection  : chr  sense sense sense sense ... 
  $ unitNbrOfAtoms : int  6 6 6 6 6 6 6 6 1 1 ... 
  $ cell   : int  539387 539388 902651 902652 18384.. 
  $ x  : int  706 707 2170 2171 2630 2631 998 9.. 
  $ y  : int  201 201 336 336 685 685 1112 1112.. 
  $ groupNbrOfAtoms: int  3 3 3 3 3 3 3 3 1 1 ... 
  $ cell   : int  539387 539388 902651 902652 ... 
  $ unit   : int  1012 1012 1013 1013 1012 1012 101.. 
  $ y  : int  201 201 336 336 685 685 1112 1112 1195 ... 
  $ unitName   : chr  SNP_A-2001598 SNP_A-2001598 ... 
  $ unitType   : chr  genotyping genotyping genot.. 
  $ indexPos   : int  2 1 6 5 4 3 4 3 1 1 ... 
  $ cell   : int  539387 539388 902651 902652 18384... 

 # Note that for some chip types some probes (cells) occur in multiple 
 # probe sets meaning you may have duplicates.  I don't think this is 
 # the case for SNP chips though.  Sanity check... 
 stopifnot(!anyDuplicated(cdfData$cell)); 

 # Extract the corresponding probe signals from 'csR' (AffymetrixCelSet) 
 Y - extractMatrix(csR, cells=cdfData$cell); 

 # Merge CDF annotation data with signals 
 data - cbind(cdfData, Y); 

 # Save 
 # [see help(writeDataFrame.data.frame)] 
 pathname - writeDataFrame(data, file=all_probes.txt, 
 header=list(chipType=getChipType(cdf))); 


 Again, not sure what you're going to use this for/where to import it; 
 you may end up reinventing the wheel. 

 Hope this helps 

 /Henrik 

  
  Thanks, 
  Jonathan 
  
  On Tuesday, April 23, 2013 2:40:56 AM UTC+2, Henrik Bengtsson wrote: 
  
  Hi Jonathan. 
  
  On Thu, Apr 11, 2013 at 6:38 AM,  jonathan@mail.dcu.ie wrote: 
   Hi all, 
   
   I suppose this is a simple enough task even for a newbie like me, I 
 have 
   found a similar related post but I have two questions: 
   
   My First Question when I use the following commands in R: 
   
   library(aroma.affymetrix) 
   
   cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) 
   cs - AffymetrixCelSet$byName(Arles, cdf=cdf) 
   
   unit - indexOf(cdf, SNP_A-8656720) 
   y - readUnits(cs, units=unit) 
   str(y) 
   
   This allows me to gather the raw intensities for a SNP or CN probe. 
 as 
   follows: 
   $`SNP_A-8656720`$A$intensities 
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] 
   [,13] 
   [,14] 
   [1,]  786  807 1051  879 1971 1447 1826  236 1249  2335  1140   416 
   1147 
   2054 
   [2,]  694  823 1027  835 1673 1167 1729  252 1068  2339   982   411 
   769 
   1786 
   [3,]  752  665  913  820 1621 1356 1555  248 1344  2362  1417   339 
   991 
   1835 
   
   
   $`SNP_A-8656720`$G 
   $`SNP_A-8656720`$G$intensities 
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] 
   [,13] 
   [,14] 
   [1,]  273 1014  481 1012  402  383  421 1138  321   861   614  1859 
   687 
   549 
   [2,]  222  528  476  825  602  719  460  912  

Re: [aroma.affymetrix] SNP_6 raw intensity Extraction

2013-04-30 Thread nawin MOHAMMED


Greeting  , 

iam  a Phd student , i  try implement aroma package but its not working 
with me , i don't know where is the error,   and  i  have question the 
shiptype folder  is not exist be default  in annotationdata,  i  just 
 create a folder in annotation data name it chiptype and i put the cdf file 
in it , is that possible ???   please  i need your advice this is my 
program 



 getwd()
[1] 
C:/Users/nwayyin/Documents/R/win-library/3.0/aroma.affymetrix/annotationData/ChipType
 ChipType(HG-U133_Plus_2)
Error: unexpected symbol in ChipType(HG
 ChipType(HG-U133_Plus_2)
Error: could not find function ChipType
 chipType(HG-U133_Plus_2)
Error: could not find function chipType
 library(aroma.affymetrix)
 ChipeType(HG-U133_Plus_2)
Error: could not find function ChipeType
 chipType-HG-U133_Plus_2
 cdf-AffymetrixcdfFile$byChipType(HG-U133_Plus_2)
Error: object 'AffymetrixcdfFile' not found
 cdf-AffymetrixcdfFile$bychipType(HG-U133_Plus_2)
Error: object 'AffymetrixcdfFile' not found
 cdf-AffymetrixcdfFile$byChipType(HG-U133_Plus_2)
Error: object 'AffymetrixcdfFile' not found
 cdf-HG-U133_Plus_2$bychipType(HG-U133_Plus_2)
Error: object 'HG' not found
 cdf-AffymetrixcdfFile$bychipType(HG-U133_Plus_2, tags=ChipType)
Error: object 'AffymetrixcdfFile' not found
 

the error is  with cdf file  which can not be read 
i  download all the package 

  source(http://bioconductor.org/biocLite.R;)

biocLite(biomaRt)

 hbInstall(aroma.affymetrix)

thank you 


On Tuesday, April 30, 2013 8:00:45 AM UTC+1, jonathan@mail.dcu.ie wrote:

 Hi Henrik,

 That is exactly what I was looking for. I was missing the Order by Cell 
 Indices.
 Thank you.

 Jonathan

 On Monday, April 29, 2013 11:17:57 PM UTC+2, Henrik Bengtsson wrote:

 Hi. 

 On Mon, Apr 29, 2013 at 4:16 AM,  jonathan@mail.dcu.ie wrote: 
  Hi Henrik, 
  
  Thank you for the reply. 
  This is quite helpful however from using the following code I am able 
 to 
  produce a matrix of the intensities for each of the probes... 
 (replicates 
  included) 
  
  library(aroma.affymetrix) 
  cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) 
  raw_sample - AffymetrixCelSet$byName(Sample, cdf=cdf) 
  ciS- getCellIndices(cdf, unlist=TRUE, useNames=TRUE) 
  s- sample(ciS, 250) 
  head(ciS) 
  d- extractMatrix(cs, cells=ciS, verbose=-50) 
  write.table(file=all_probes.txt,df, quote=FALSE, sep=\t, 
  row.names=FALSE) 
  
  My issue now is that with this matrix I can see the 6+million probes 
 however 
  the probe ID's are not present. Maybe I am missing something but If you 
  could help me associate each probe intensity value with the probe ID I 
 would 
  be very grateful. 

 What do you mean by 'probe ID'; what you expect it do be?  Note that 
 in Affymetrix terms, there are 'unit' and 'group' (probeset) 
 IDs/names, but probes don't really have IDs other that an (x,y) 
 location or an index (as you use it above). 

 However, you could pull out probe-specific CDF annotation from the CDF 
 file as follows: 

 library(aroma.affymetrix); 
 cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full); 
 # Some example units 
 units - c(1012:1013, 950123:950125); 
 # Read CDF info as data.frame 
 cdfData - readDataFrame(cdf, units=units);  # Will take a very long 
 time if done for many units 
 # Order by cell indices 
 o - order(cdfData$cell); 
 cdfData - cdfData[o,]; 
 'data.frame':   15 obs. of  16 variables: 
  $ unit   : int  1012 1012 1013 1013 1012 1012 101.. 
  $ unitType   : chr  genotyping genotyping genot.. 
  $ unitType   : chr  genotyping genotyping genotyping ... 
  $ unitDirection  : chr  sense sense sense sense ... 
  $ unitNbrOfAtoms : int  6 6 6 6 6 6 6 6 1 1 ... 
  $ cell   : int  539387 539388 902651 902652 18384.. 
  $ x  : int  706 707 2170 2171 2630 2631 998 9.. 
  $ y  : int  201 201 336 336 685 685 1112 1112.. 
  $ groupNbrOfAtoms: int  3 3 3 3 3 3 3 3 1 1 ... 
  $ cell   : int  539387 539388 902651 902652 ... 
  $ unit   : int  1012 1012 1013 1013 1012 1012 101.. 
  $ y  : int  201 201 336 336 685 685 1112 1112 1195 ... 
  $ unitName   : chr  SNP_A-2001598 SNP_A-2001598 ... 
  $ unitType   : chr  genotyping genotyping genot.. 
  $ indexPos   : int  2 1 6 5 4 3 4 3 1 1 ... 
  $ cell   : int  539387 539388 902651 902652 18384... 

 # Note that for some chip types some probes (cells) occur in multiple 
 # probe sets meaning you may have duplicates.  I don't think this is 
 # the case for SNP chips though.  Sanity check... 
 stopifnot(!anyDuplicated(cdfData$cell)); 

 # Extract the corresponding probe signals from 'csR' (AffymetrixCelSet) 
 Y - extractMatrix(csR, cells=cdfData$cell); 

 # Merge CDF annotation data with signals 
 data - cbind(cdfData, Y); 

 # Save 
 # [see help(writeDataFrame.data.frame)] 
 pathname - writeDataFrame(data, file=all_probes.txt, 
 header=list(chipType=getChipType(cdf))); 


 Again, not sure what you're going to use 

Re: [aroma.affymetrix] SNP_6 raw intensity Extraction

2013-04-29 Thread Henrik Bengtsson
Hi.

On Mon, Apr 29, 2013 at 4:16 AM,  jonathan.crowth...@mail.dcu.ie wrote:
 Hi Henrik,

 Thank you for the reply.
 This is quite helpful however from using the following code I am able to
 produce a matrix of the intensities for each of the probes... (replicates
 included)

 library(aroma.affymetrix)
 cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full)
 raw_sample - AffymetrixCelSet$byName(Sample, cdf=cdf)
 ciS- getCellIndices(cdf, unlist=TRUE, useNames=TRUE)
 s- sample(ciS, 250)
 head(ciS)
 d- extractMatrix(cs, cells=ciS, verbose=-50)
 write.table(file=all_probes.txt,df, quote=FALSE, sep=\t,
 row.names=FALSE)

 My issue now is that with this matrix I can see the 6+million probes however
 the probe ID's are not present. Maybe I am missing something but If you
 could help me associate each probe intensity value with the probe ID I would
 be very grateful.

What do you mean by 'probe ID'; what you expect it do be?  Note that
in Affymetrix terms, there are 'unit' and 'group' (probeset)
IDs/names, but probes don't really have IDs other that an (x,y)
location or an index (as you use it above).

However, you could pull out probe-specific CDF annotation from the CDF
file as follows:

library(aroma.affymetrix);
cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full);
# Some example units
units - c(1012:1013, 950123:950125);
# Read CDF info as data.frame
cdfData - readDataFrame(cdf, units=units);  # Will take a very long
time if done for many units
# Order by cell indices
o - order(cdfData$cell);
cdfData - cdfData[o,];
'data.frame':   15 obs. of  16 variables:
 $ unit   : int  1012 1012 1013 1013 1012 1012 101..
 $ unitType   : chr  genotyping genotyping genot..
 $ unitType   : chr  genotyping genotyping genotyping ...
 $ unitDirection  : chr  sense sense sense sense ...
 $ unitNbrOfAtoms : int  6 6 6 6 6 6 6 6 1 1 ...
 $ cell   : int  539387 539388 902651 902652 18384..
 $ x  : int  706 707 2170 2171 2630 2631 998 9..
 $ y  : int  201 201 336 336 685 685 1112 1112..
 $ groupNbrOfAtoms: int  3 3 3 3 3 3 3 3 1 1 ...
 $ cell   : int  539387 539388 902651 902652 ...
 $ unit   : int  1012 1012 1013 1013 1012 1012 101..
 $ y  : int  201 201 336 336 685 685 1112 1112 1195 ...
 $ unitName   : chr  SNP_A-2001598 SNP_A-2001598 ...
 $ unitType   : chr  genotyping genotyping genot..
 $ indexPos   : int  2 1 6 5 4 3 4 3 1 1 ...
 $ cell   : int  539387 539388 902651 902652 18384...

# Note that for some chip types some probes (cells) occur in multiple
# probe sets meaning you may have duplicates.  I don't think this is
# the case for SNP chips though.  Sanity check...
stopifnot(!anyDuplicated(cdfData$cell));

# Extract the corresponding probe signals from 'csR' (AffymetrixCelSet)
Y - extractMatrix(csR, cells=cdfData$cell);

# Merge CDF annotation data with signals
data - cbind(cdfData, Y);

# Save
# [see help(writeDataFrame.data.frame)]
pathname - writeDataFrame(data, file=all_probes.txt,
header=list(chipType=getChipType(cdf)));


Again, not sure what you're going to use this for/where to import it;
you may end up reinventing the wheel.

Hope this helps

/Henrik


 Thanks,
 Jonathan

 On Tuesday, April 23, 2013 2:40:56 AM UTC+2, Henrik Bengtsson wrote:

 Hi Jonathan.

 On Thu, Apr 11, 2013 at 6:38 AM,  jonathan@mail.dcu.ie wrote:
  Hi all,
 
  I suppose this is a simple enough task even for a newbie like me, I have
  found a similar related post but I have two questions:
 
  My First Question when I use the following commands in R:
 
  library(aroma.affymetrix)
 
  cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full)
  cs - AffymetrixCelSet$byName(Arles, cdf=cdf)
 
  unit - indexOf(cdf, SNP_A-8656720)
  y - readUnits(cs, units=unit)
  str(y)
 
  This allows me to gather the raw intensities for a SNP or CN probe. as
  follows:
  $`SNP_A-8656720`$A$intensities
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
  [,13]
  [,14]
  [1,]  786  807 1051  879 1971 1447 1826  236 1249  2335  1140   416
  1147
  2054
  [2,]  694  823 1027  835 1673 1167 1729  252 1068  2339   982   411
  769
  1786
  [3,]  752  665  913  820 1621 1356 1555  248 1344  2362  1417   339
  991
  1835
 
 
  $`SNP_A-8656720`$G
  $`SNP_A-8656720`$G$intensities
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
  [,13]
  [,14]
  [1,]  273 1014  481 1012  402  383  421 1138  321   861   614  1859
  687
  549
  [2,]  222  528  476  825  602  719  460  912  417   796   650  1617
  537
  661
  [3,]  259  781  543  754  492  452  550  909  316   743   518  1847
  529
  651
 
  From the previous post this data is supposed to reference to the A and B
  allele and for the forward and reverse strands. My question is, what
  refers
  the Allele A/B ($A/$G) and forward / reverse, also by that logic there
  should be 4 sets of data?

 Starting with SNP chip GenomeWideSNP_5 (sic!), Affymetrix no longer
 put