Re: [aroma.affymetrix] SNP_6 raw intensity Extraction
Hi Henrik, That is exactly what I was looking for. I was missing the Order by Cell Indices. Thank you. Jonathan On Monday, April 29, 2013 11:17:57 PM UTC+2, Henrik Bengtsson wrote: Hi. On Mon, Apr 29, 2013 at 4:16 AM, jonathan@mail.dcu.ie javascript: wrote: Hi Henrik, Thank you for the reply. This is quite helpful however from using the following code I am able to produce a matrix of the intensities for each of the probes... (replicates included) library(aroma.affymetrix) cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) raw_sample - AffymetrixCelSet$byName(Sample, cdf=cdf) ciS- getCellIndices(cdf, unlist=TRUE, useNames=TRUE) s- sample(ciS, 250) head(ciS) d- extractMatrix(cs, cells=ciS, verbose=-50) write.table(file=all_probes.txt,df, quote=FALSE, sep=\t, row.names=FALSE) My issue now is that with this matrix I can see the 6+million probes however the probe ID's are not present. Maybe I am missing something but If you could help me associate each probe intensity value with the probe ID I would be very grateful. What do you mean by 'probe ID'; what you expect it do be? Note that in Affymetrix terms, there are 'unit' and 'group' (probeset) IDs/names, but probes don't really have IDs other that an (x,y) location or an index (as you use it above). However, you could pull out probe-specific CDF annotation from the CDF file as follows: library(aroma.affymetrix); cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full); # Some example units units - c(1012:1013, 950123:950125); # Read CDF info as data.frame cdfData - readDataFrame(cdf, units=units); # Will take a very long time if done for many units # Order by cell indices o - order(cdfData$cell); cdfData - cdfData[o,]; 'data.frame': 15 obs. of 16 variables: $ unit : int 1012 1012 1013 1013 1012 1012 101.. $ unitType : chr genotyping genotyping genot.. $ unitType : chr genotyping genotyping genotyping ... $ unitDirection : chr sense sense sense sense ... $ unitNbrOfAtoms : int 6 6 6 6 6 6 6 6 1 1 ... $ cell : int 539387 539388 902651 902652 18384.. $ x : int 706 707 2170 2171 2630 2631 998 9.. $ y : int 201 201 336 336 685 685 1112 1112.. $ groupNbrOfAtoms: int 3 3 3 3 3 3 3 3 1 1 ... $ cell : int 539387 539388 902651 902652 ... $ unit : int 1012 1012 1013 1013 1012 1012 101.. $ y : int 201 201 336 336 685 685 1112 1112 1195 ... $ unitName : chr SNP_A-2001598 SNP_A-2001598 ... $ unitType : chr genotyping genotyping genot.. $ indexPos : int 2 1 6 5 4 3 4 3 1 1 ... $ cell : int 539387 539388 902651 902652 18384... # Note that for some chip types some probes (cells) occur in multiple # probe sets meaning you may have duplicates. I don't think this is # the case for SNP chips though. Sanity check... stopifnot(!anyDuplicated(cdfData$cell)); # Extract the corresponding probe signals from 'csR' (AffymetrixCelSet) Y - extractMatrix(csR, cells=cdfData$cell); # Merge CDF annotation data with signals data - cbind(cdfData, Y); # Save # [see help(writeDataFrame.data.frame)] pathname - writeDataFrame(data, file=all_probes.txt, header=list(chipType=getChipType(cdf))); Again, not sure what you're going to use this for/where to import it; you may end up reinventing the wheel. Hope this helps /Henrik Thanks, Jonathan On Tuesday, April 23, 2013 2:40:56 AM UTC+2, Henrik Bengtsson wrote: Hi Jonathan. On Thu, Apr 11, 2013 at 6:38 AM, jonathan@mail.dcu.ie wrote: Hi all, I suppose this is a simple enough task even for a newbie like me, I have found a similar related post but I have two questions: My First Question when I use the following commands in R: library(aroma.affymetrix) cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) cs - AffymetrixCelSet$byName(Arles, cdf=cdf) unit - indexOf(cdf, SNP_A-8656720) y - readUnits(cs, units=unit) str(y) This allows me to gather the raw intensities for a SNP or CN probe. as follows: $`SNP_A-8656720`$A$intensities [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [1,] 786 807 1051 879 1971 1447 1826 236 1249 2335 1140 416 1147 2054 [2,] 694 823 1027 835 1673 1167 1729 252 1068 2339 982 411 769 1786 [3,] 752 665 913 820 1621 1356 1555 248 1344 2362 1417 339 991 1835 $`SNP_A-8656720`$G $`SNP_A-8656720`$G$intensities [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [1,] 273 1014 481 1012 402 383 421 1138 321 861 614 1859 687 549 [2,] 222 528 476 825 602 719 460 912
Re: [aroma.affymetrix] SNP_6 raw intensity Extraction
Greeting , iam a Phd student , i try implement aroma package but its not working with me , i don't know where is the error, and i have question the shiptype folder is not exist be default in annotationdata, i just create a folder in annotation data name it chiptype and i put the cdf file in it , is that possible ??? please i need your advice this is my program getwd() [1] C:/Users/nwayyin/Documents/R/win-library/3.0/aroma.affymetrix/annotationData/ChipType ChipType(HG-U133_Plus_2) Error: unexpected symbol in ChipType(HG ChipType(HG-U133_Plus_2) Error: could not find function ChipType chipType(HG-U133_Plus_2) Error: could not find function chipType library(aroma.affymetrix) ChipeType(HG-U133_Plus_2) Error: could not find function ChipeType chipType-HG-U133_Plus_2 cdf-AffymetrixcdfFile$byChipType(HG-U133_Plus_2) Error: object 'AffymetrixcdfFile' not found cdf-AffymetrixcdfFile$bychipType(HG-U133_Plus_2) Error: object 'AffymetrixcdfFile' not found cdf-AffymetrixcdfFile$byChipType(HG-U133_Plus_2) Error: object 'AffymetrixcdfFile' not found cdf-HG-U133_Plus_2$bychipType(HG-U133_Plus_2) Error: object 'HG' not found cdf-AffymetrixcdfFile$bychipType(HG-U133_Plus_2, tags=ChipType) Error: object 'AffymetrixcdfFile' not found the error is with cdf file which can not be read i download all the package source(http://bioconductor.org/biocLite.R;) biocLite(biomaRt) hbInstall(aroma.affymetrix) thank you On Tuesday, April 30, 2013 8:00:45 AM UTC+1, jonathan@mail.dcu.ie wrote: Hi Henrik, That is exactly what I was looking for. I was missing the Order by Cell Indices. Thank you. Jonathan On Monday, April 29, 2013 11:17:57 PM UTC+2, Henrik Bengtsson wrote: Hi. On Mon, Apr 29, 2013 at 4:16 AM, jonathan@mail.dcu.ie wrote: Hi Henrik, Thank you for the reply. This is quite helpful however from using the following code I am able to produce a matrix of the intensities for each of the probes... (replicates included) library(aroma.affymetrix) cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) raw_sample - AffymetrixCelSet$byName(Sample, cdf=cdf) ciS- getCellIndices(cdf, unlist=TRUE, useNames=TRUE) s- sample(ciS, 250) head(ciS) d- extractMatrix(cs, cells=ciS, verbose=-50) write.table(file=all_probes.txt,df, quote=FALSE, sep=\t, row.names=FALSE) My issue now is that with this matrix I can see the 6+million probes however the probe ID's are not present. Maybe I am missing something but If you could help me associate each probe intensity value with the probe ID I would be very grateful. What do you mean by 'probe ID'; what you expect it do be? Note that in Affymetrix terms, there are 'unit' and 'group' (probeset) IDs/names, but probes don't really have IDs other that an (x,y) location or an index (as you use it above). However, you could pull out probe-specific CDF annotation from the CDF file as follows: library(aroma.affymetrix); cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full); # Some example units units - c(1012:1013, 950123:950125); # Read CDF info as data.frame cdfData - readDataFrame(cdf, units=units); # Will take a very long time if done for many units # Order by cell indices o - order(cdfData$cell); cdfData - cdfData[o,]; 'data.frame': 15 obs. of 16 variables: $ unit : int 1012 1012 1013 1013 1012 1012 101.. $ unitType : chr genotyping genotyping genot.. $ unitType : chr genotyping genotyping genotyping ... $ unitDirection : chr sense sense sense sense ... $ unitNbrOfAtoms : int 6 6 6 6 6 6 6 6 1 1 ... $ cell : int 539387 539388 902651 902652 18384.. $ x : int 706 707 2170 2171 2630 2631 998 9.. $ y : int 201 201 336 336 685 685 1112 1112.. $ groupNbrOfAtoms: int 3 3 3 3 3 3 3 3 1 1 ... $ cell : int 539387 539388 902651 902652 ... $ unit : int 1012 1012 1013 1013 1012 1012 101.. $ y : int 201 201 336 336 685 685 1112 1112 1195 ... $ unitName : chr SNP_A-2001598 SNP_A-2001598 ... $ unitType : chr genotyping genotyping genot.. $ indexPos : int 2 1 6 5 4 3 4 3 1 1 ... $ cell : int 539387 539388 902651 902652 18384... # Note that for some chip types some probes (cells) occur in multiple # probe sets meaning you may have duplicates. I don't think this is # the case for SNP chips though. Sanity check... stopifnot(!anyDuplicated(cdfData$cell)); # Extract the corresponding probe signals from 'csR' (AffymetrixCelSet) Y - extractMatrix(csR, cells=cdfData$cell); # Merge CDF annotation data with signals data - cbind(cdfData, Y); # Save # [see help(writeDataFrame.data.frame)] pathname - writeDataFrame(data, file=all_probes.txt, header=list(chipType=getChipType(cdf))); Again, not sure what you're going to use
Re: [aroma.affymetrix] SNP_6 raw intensity Extraction
Hi. On Mon, Apr 29, 2013 at 4:16 AM, jonathan.crowth...@mail.dcu.ie wrote: Hi Henrik, Thank you for the reply. This is quite helpful however from using the following code I am able to produce a matrix of the intensities for each of the probes... (replicates included) library(aroma.affymetrix) cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) raw_sample - AffymetrixCelSet$byName(Sample, cdf=cdf) ciS- getCellIndices(cdf, unlist=TRUE, useNames=TRUE) s- sample(ciS, 250) head(ciS) d- extractMatrix(cs, cells=ciS, verbose=-50) write.table(file=all_probes.txt,df, quote=FALSE, sep=\t, row.names=FALSE) My issue now is that with this matrix I can see the 6+million probes however the probe ID's are not present. Maybe I am missing something but If you could help me associate each probe intensity value with the probe ID I would be very grateful. What do you mean by 'probe ID'; what you expect it do be? Note that in Affymetrix terms, there are 'unit' and 'group' (probeset) IDs/names, but probes don't really have IDs other that an (x,y) location or an index (as you use it above). However, you could pull out probe-specific CDF annotation from the CDF file as follows: library(aroma.affymetrix); cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full); # Some example units units - c(1012:1013, 950123:950125); # Read CDF info as data.frame cdfData - readDataFrame(cdf, units=units); # Will take a very long time if done for many units # Order by cell indices o - order(cdfData$cell); cdfData - cdfData[o,]; 'data.frame': 15 obs. of 16 variables: $ unit : int 1012 1012 1013 1013 1012 1012 101.. $ unitType : chr genotyping genotyping genot.. $ unitType : chr genotyping genotyping genotyping ... $ unitDirection : chr sense sense sense sense ... $ unitNbrOfAtoms : int 6 6 6 6 6 6 6 6 1 1 ... $ cell : int 539387 539388 902651 902652 18384.. $ x : int 706 707 2170 2171 2630 2631 998 9.. $ y : int 201 201 336 336 685 685 1112 1112.. $ groupNbrOfAtoms: int 3 3 3 3 3 3 3 3 1 1 ... $ cell : int 539387 539388 902651 902652 ... $ unit : int 1012 1012 1013 1013 1012 1012 101.. $ y : int 201 201 336 336 685 685 1112 1112 1195 ... $ unitName : chr SNP_A-2001598 SNP_A-2001598 ... $ unitType : chr genotyping genotyping genot.. $ indexPos : int 2 1 6 5 4 3 4 3 1 1 ... $ cell : int 539387 539388 902651 902652 18384... # Note that for some chip types some probes (cells) occur in multiple # probe sets meaning you may have duplicates. I don't think this is # the case for SNP chips though. Sanity check... stopifnot(!anyDuplicated(cdfData$cell)); # Extract the corresponding probe signals from 'csR' (AffymetrixCelSet) Y - extractMatrix(csR, cells=cdfData$cell); # Merge CDF annotation data with signals data - cbind(cdfData, Y); # Save # [see help(writeDataFrame.data.frame)] pathname - writeDataFrame(data, file=all_probes.txt, header=list(chipType=getChipType(cdf))); Again, not sure what you're going to use this for/where to import it; you may end up reinventing the wheel. Hope this helps /Henrik Thanks, Jonathan On Tuesday, April 23, 2013 2:40:56 AM UTC+2, Henrik Bengtsson wrote: Hi Jonathan. On Thu, Apr 11, 2013 at 6:38 AM, jonathan@mail.dcu.ie wrote: Hi all, I suppose this is a simple enough task even for a newbie like me, I have found a similar related post but I have two questions: My First Question when I use the following commands in R: library(aroma.affymetrix) cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) cs - AffymetrixCelSet$byName(Arles, cdf=cdf) unit - indexOf(cdf, SNP_A-8656720) y - readUnits(cs, units=unit) str(y) This allows me to gather the raw intensities for a SNP or CN probe. as follows: $`SNP_A-8656720`$A$intensities [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [1,] 786 807 1051 879 1971 1447 1826 236 1249 2335 1140 416 1147 2054 [2,] 694 823 1027 835 1673 1167 1729 252 1068 2339 982 411 769 1786 [3,] 752 665 913 820 1621 1356 1555 248 1344 2362 1417 339 991 1835 $`SNP_A-8656720`$G $`SNP_A-8656720`$G$intensities [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [1,] 273 1014 481 1012 402 383 421 1138 321 861 614 1859 687 549 [2,] 222 528 476 825 602 719 460 912 417 796 650 1617 537 661 [3,] 259 781 543 754 492 452 550 909 316 743 518 1847 529 651 From the previous post this data is supposed to reference to the A and B allele and for the forward and reverse strands. My question is, what refers the Allele A/B ($A/$G) and forward / reverse, also by that logic there should be 4 sets of data? Starting with SNP chip GenomeWideSNP_5 (sic!), Affymetrix no longer put