[aroma.affymetrix] How to cite CRMA v2
Hi there, thanks all for citing our publication directly or indirectly related to the aroma project framework. Since I noticed that the original CRMA paper often get cited even when the CRMA v2 method is used/meant, I would like to clarify to the list that CRMA v2 is preferably referenced as: H. Bengtsson, P. Wirapati T.P. Speed, A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 6, Bioinformatics, 2009. [PMID: 19535535] You can find all our references at http://www.aroma-project.org/publications/ each with a small Cite this for: note indicating for which method it should be cited. If you are uncertain, just drop us an email and we'll clarify. Cheers, Henrik -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Re: CbsModel parameters
Are you sure you are not picking up old results, that is, did you use fit(cbs, ..., force=TRUE) or simply did you remove the previous segmentation results in cbsData/? You can troubleshoot with one array and one chromosome, e.g. fit(cbs, arrays=6, chromosomes=16, min.width=5, undo.splits=sdundo, undo.SD=1, force=TRUE, verbose=-10); /Henrik On Wed, Oct 27, 2010 at 11:20 AM, Kai wangz...@gmail.com wrote: Hi Henrik, Thank you for your reply. However, I followed your instructions but still got segments with only 2 markers: These are the codes I ran: cbs = CbsModel(ds); cbs$.calculateRatios = FALSE; fit(cbs, chromosomes=c(1:23), min.width=5, undo.splits=sdundo, undo.SD=1, verbose=-10); ce = ChromosomeExplorer(cbs); process(ce,chromosomes=c(1:23)); These are what I found out in the results (there are a total of 4 samples): min(getRegions(cbs)[[1]][,5]) [1] 5 min(getRegions(cbs)[[2]][,5]) [1] 2 min(getRegions(cbs)[[3]][,5]) [1] 2 min(getRegions(cbs)[[4]][,5]) [1] 2 which(getRegions(cbs)[[4]][,5]==2) [1] 52 139 getRegions(cbs)[[4]][139,1:5] chromosome start stop mean count 139 16 45057510 45057696 -1.427 2 It seems to me that min.width=5 worked only in the first sample. Do you have any idea on this? Thanks! Best, Kai On Oct 26, 9:09 pm, Henrik Bengtsson henrik.bengts...@aroma- project.org wrote: I forgot to say that in the next release of aroma.core package, you will be able to specify additional arguments when you setup the CBS model: cbs - CbsModel(ds, min.width=5); ...but until then you have to stick with the below workaround. /Henrik On Tue, Oct 26, 2010 at 9:07 PM, hb h...@biostat.ucsf.edu wrote: Hi, sorry my mistake. I meant to write that you should pass the additional arguments to fit() for the CbsModel (not process()), e.g. cbs - CbsModel(ds); cbs$.calculateRatios - FALSE; fit(cbs, chromosomes=1:23, min.width=5, verbose=-10); This will (explicitly) fit the segmentation model. Have a look at the verbose output; you'll see that min.width should show up in the output just before the DNAcopy segment() is called. After you've done the segmentation for all of you arrays and chromosomes, you can have the ChromosomeExplorer generate the report for you as usual, i.e. ce - ChromosomeExplorer(cbs); process(ce, chromosomes=1:23); Note that in your case you have to either delete already generated CBS results, or use fit(..., force=TRUE), in order for aroma.* not to pick up the old segmentation. You also need to delete the already generated PNG files for the ChromosomeExplorer under reports/... On Tue, Oct 26, 2010 at 4:43 PM, Kai wangz...@gmail.com wrote: Hi Henrik, Thank you very much for your response. However, I tried the following codes to set the minimal number of marker to 5, but the results I got still contain segments with only 2 markers ... cbs = CbsModel(ds); cbs$.calculateRatios = FALSE; ce = ChromosomeExplorer(cbs); process(ce,chromosomes=c(1:23),min.width=5); I am not clear where I should put min.width=5? If I do process(cbs,min.width=5) first, how can I send the results to be displayed by chromosome explorer? Thanks again for your help. I look forward to hearing from you soon. Best, Kai On Sep 27, 9:47 pm, Henrik Bengtsson henrik.bengts...@gmail.com wrote: Hi. On Mon, Sep 27, 2010 at 4:51 PM, Kai wangz...@gmail.com wrote: Hi Henrik, I was wondering whether there is a way I can fine tune the behavior of CbsModel. Sometimes the default algorithm produces too many small fragments right next to each other without much separation in mean copy numbers. Is there a way to control how smooth the segmentation results are? Any additional arguments (in ...) that you pass to process(cbs, ...) will be passed down to the DNAcopy::segment(), which is the function doing the actual segmentation. For more details on how fine tuning the CBS algorithm, see help(segment, package=DNAcopy). You may also want to contact the authors of that method/package. /Henrik Thanks a lot! Best, Kai -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with websitehttp://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go tohttp://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group
Re: [aroma.affymetrix] Re: CbsModel parameters
Hi, sorry my mistake. I meant to write that you should pass the additional arguments to fit() for the CbsModel (not process()), e.g. cbs - CbsModel(ds); cbs$.calculateRatios - FALSE; fit(cbs, chromosomes=1:23, min.width=5, verbose=-10); This will (explicitly) fit the segmentation model. Have a look at the verbose output; you'll see that min.width should show up in the output just before the DNAcopy segment() is called. After you've done the segmentation for all of you arrays and chromosomes, you can have the ChromosomeExplorer generate the report for you as usual, i.e. ce - ChromosomeExplorer(cbs); process(ce, chromosomes=1:23); Note that in your case you have to either delete already generated CBS results, or use fit(..., force=TRUE), in order for aroma.* not to pick up the old segmentation. You also need to delete the already generated PNG files for the ChromosomeExplorer under reports/... On Tue, Oct 26, 2010 at 4:43 PM, Kai wangz...@gmail.com wrote: Hi Henrik, Thank you very much for your response. However, I tried the following codes to set the minimal number of marker to 5, but the results I got still contain segments with only 2 markers ... cbs = CbsModel(ds); cbs$.calculateRatios = FALSE; ce = ChromosomeExplorer(cbs); process(ce,chromosomes=c(1:23),min.width=5); I am not clear where I should put min.width=5? If I do process(cbs,min.width=5) first, how can I send the results to be displayed by chromosome explorer? Thanks again for your help. I look forward to hearing from you soon. Best, Kai On Sep 27, 9:47 pm, Henrik Bengtsson henrik.bengts...@gmail.com wrote: Hi. On Mon, Sep 27, 2010 at 4:51 PM, Kai wangz...@gmail.com wrote: Hi Henrik, I was wondering whether there is a way I can fine tune the behavior of CbsModel. Sometimes the default algorithm produces too many small fragments right next to each other without much separation in mean copy numbers. Is there a way to control how smooth the segmentation results are? Any additional arguments (in ...) that you pass to process(cbs, ...) will be passed down to the DNAcopy::segment(), which is the function doing the actual segmentation. For more details on how fine tuning the CBS algorithm, see help(segment, package=DNAcopy). You may also want to contact the authors of that method/package. /Henrik Thanks a lot! Best, Kai -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with websitehttp://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go tohttp://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Can aroma.affymertix handle the data of agilent chip?
Hi. On Wed, Oct 13, 2010 at 5:24 PM, Yue Hu yuehu.m...@gmail.com wrote: Hi, Just shift from affymetrix to agilent recently and since I prefer the plot generated by aroma.affymetrix I am just wondering if aroma.affymetrix is able to handle agilent chip data in some way. When you say plot generated by aroma.affymetrix, are you thinking of the copy-number image files generated by the ChromosomeExplorer? If so, yes, there is *some* support for using data other platforms, but it is less documented. The main hurdle is that there are no automated ways to import data (other than Affymetrix), but on the other hand in most cases it is not really harder than using read.table(). See section 'Generalization to other technologies than Affymetrix' on the 'Future directions' page [http://aroma-project.org/features/future/] for more information. /Henrik best, Yue -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Exception: None of the data directories exist
Hi. On Thu, Oct 14, 2010 at 4:38 AM, allab asphod...@googlemail.com wrote: Dear aroma users/authors, i am doing now Affy 6.0 SNP data analysis and my goal is to become BAF values so that i can further use them with the method SOMATICS (Assie'08). I have not used from the very beginning the wrapper ds - doASCRMAv2(TumorProjekt, chipType=GenomeWideSNP_6); but did all analysis steps explicitly, actually 2 times: one with argument combineAlleles=FALSE and one with argument combineAlleles=TRUE. If you wish to get allele B fractions (BAFs) you need combineAlleles=FALSE, which is the default of doASCRMAv2(). As i understand it correctly, if i had object ds i could use ds$fracB to become BAF. Correct. The 'ds' object is actually an R list containing two data set elements: 'total' and 'fracB'. You are interested in the latter here. I did not want to recalculate everything and started following: FYI, the key thing with the aroma framework is that it will *not* recalculate already processed data; your results are persistent across R sessions since they are stored on the file system. Sure, if you redo doASCRMAv2() there will be some overhead, but most steps are skipped. However, it is true that you can also load the data sets as you try next: dataSet - TumorProjekt; tags - ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY; chipType - GenomeWideSNP_6; ds - AromaUnitFracBCnBinarySet$byName(dataSet, tags=tags, chipType=chipType); dfTxt - writeDataFrame(ds, columns=c(unitName, chromosome,position, *)); but become Exception: None of the data directories exist: totalAndFracBData, rawCnData, cnData, smoothCnData What could be the reason for this? So the error occur in the ds - AromaUnitFracBCnBinarySet$byName(...) step. The writeDataFrame() step is not part of this. (Please try to paste the error where it belongs/as it occurs). The error says that it cannot find the data set you are asking it to load (in any of the so called root directories totalAndFracBData/, rawCnData/, cnData/, smoothCnData/ in the current directory). The key to get this right is that you are in the same working directory as you were when you did doASCRMAv2(); from the error message it looks like you are in a different working directory because it cannot find any of the reported directories and it should find totalAndFracBData/. You also have to make sure you are using exactly the same data set name and tags. It should match what: dsList - doASCRMAv2(TumorProjekt, chipType=GenomeWideSNP_6); print(dsList); outputs, especially what print(dsList$fracB) reports. Hope this helps /Henrik -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Re: Trying to create a CDF file from an R package/environment problems
Thanks for follow up/reporting back to the list. /Henrik On Thu, Oct 14, 2010 at 12:03 PM, Fong fongchunc...@gmail.com wrote: For those interested I got a reply from the makers of the CDF files and apparently it is an issue on their end. Here is their reply: We have been studying the problem and we have discovered a bug in perfect match and mismatch probes annotation that makes Env2Cdf function unable to use packages from GATExplorer. We send you GeneMapper and TranscriptMapper for HG_U133_Plus2 with the bug corrected. In a couple of days we will upload all the packages to GATExplorer website. Looks like it was an issue on their end. On Oct 12, 12:48 pm, Fong fongchunc...@gmail.com wrote: Hi, I've found a set of R packages (CDF) files from a service called GATExplorer (http://bioinfow.dep.usal.es/xgate/mapping/mapping.php? content=rprogram) and I am trying to create a CDF file from the R packages. I've followed the instructions found athttp://www.aroma-project.org/node/41 but I am running to errors. This is what happens: Env2Cdf(genemapperhgu133plus2cdf, u1332plus_ivt_breast_A.CEL, overwrite=TRUE) Loading required package: affxparser Reading environment: genemapperhgu133plus2cdf. Reading CEL file header. Creating CDF list for 20172 units. Error in FUN(X[[1L]], ...) : no 'dimnames' attribute for array I am not too familiar with how CDF R packages work. Does anyone have any advice on what I could do? Thanks, Fong -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Help needed regarding GCRMA normalization of exon arrays using aroma.affymetrix.....
to replace the NaN and missing values to some other value (like the mean or median or some small number close to 0)? It is good that we made progress. The new error looks like a bug in the sense that that piece of code does not expect NAs to appear. I have to looking into this and figure out if NAs can indeed be expected of if they are incorrectly introduced earlier in the pipeline or not. I will also go back to our redundancy tests and check, because we do not detect this problem there. If NAs should be allowed, the fix should be simple, but has to be done by me updating the code. I'll get back to your when I've done this. Also I am getting some output in the probeData folder as: probeData/Exon Data,OBC/MoEx-1_0-st-v1/(all the *.cel files) and probeData/Exon Data,GRBC/MoEx-1_0-st-v1/MoEx-1_0-st-v1-affinities.apa. What are this outputs corresponds to? That file contains the GCRMA probe affinities computed from the CDF and the probe-sequence file. Consider it as an internal file that is saved to disk so that the next time you run the pipeline, if redone, it be found and the processing will be much faster. Sorry about all these issues. As I almost write in every correspondence related to gcRMA processing - the inner code was written in the early days and specifically for a few chip types. After that new chip types came around and things became a bit shaky. However and although not really visible to the end user, we are slowly updating the code and moving to a more robust and generic solution. What the end user probably sees is more and more informative error messages. So, base with us. Cheers, Henrik Again thank you for the help. Prithish Banerjee, Graduate Research Assistant, Department of Statistics, West Virginia University. On Tue, Sep 28, 2010 at 1:47 AM, Henrik Bengtsson h...@aroma-project.org wrote: Hi, sorry, my mistake. I missed that you already did this. You are missing the 'MoEx-1_0-st-v1.probe.tab' annotation data file. You can download it from Affymetrix and you'll find a link to their support page via http://www.aroma-project.org/chipTypes/MoEx-1_0-st-v1 . Download it (something like MoEx-1_0-st-v1 Probe Sequences, tabular format (130 MB, 3/19/08)) and place it in annotationData/chipTypes/MoEx-1_0-st-v1/. You can verify that it is correct by trying the following: library(aroma.affymetrix); ptf - AffymetrixProbeTabFile$byChipType(MoEx-1_0-st-v1); ptf AffymetrixProbeTabFile: Name: MoEx-1_0-st-v1 Tags: Full name: MoEx-1_0-st-v1 Pathname: annotationData/chipTypes/MoEx-1_0-st-v1/MoEx-1_0-st-v1.probe.tab File size: 460.47 MB (482839635 bytes) RAM: 0.01 MB Number of data rows: NA Columns [12]: 'probeID', 'probeSetID', 'probeXPos', 'probeYPos', 'assembly', 'seqname', 'start', 'stop', 'strand', 'probeSequence', 'targetStrandedness', 'category' Number of text lines: NA AffymetrixCdfFile: Path: annotationData/chipTypes/MoEx-1_0-st-v1 Filename: MoEx-1_0-st-v1.cdf Filesize: 274.30MB Chip type: MoEx-1_0-st-v1 RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 2560x2560 Number of cells: 6553600 Number of units: 1257006 Cells per unit: 5.21 Number of QC units: 0 If you get that to work, then your script should work. Let me now if this solved your problem. /Henrik On Mon, Sep 27, 2010 at 1:58 PM, Prithish Banerjee prithish.baner...@gmail.com wrote: Respected Dr Bengtsson, My codes and outputs are as follows: source(http://aroma-project.org/hbLite.R;); hbInstall(aroma.affymetrix) source(http://aroma-project.org/hbLite.R;); hbInstall(aroma.cn) verbose - Arguments$getVerbose(-10, timestamp=TRUE); dataSet - Exon Data chipType - MoEx-1_0-st-v1 cdf - AffymetrixCdfFile$byChipType(chipType,tags=coreR1,A20080718,MR) print(cdf) AffymetrixCdfFile: Path: annotationData/chipTypes/MoEx-1_0-st-v1 Filename: MoEx-1_0-st-v1,coreR1,A20080718,MR.cdf Filesize: 30.53MB Chip type: MoEx-1_0-st-v1,coreR1,A20080718,MR RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 2560x2560 Number of cells: 6553600 Number of units: 17831 Cells per unit: 367.54 Number of QC units: 1 csR - AffymetrixCelSet$byName(dataSet, chipType=chipType) print(csR) AffymetrixCelSet: Name: Exon Data Tags: Path: rawData/Exon Data/MoEx-1_0-st-v1 Platform: Affymetrix Chip type: MoEx-1_0-st-v1 Number of arrays: 7 Names: DK Litter D15 P1_(MoEx-1_0-st-v1), DK Litter D15 P14_(MoEx-1_0-st-v1), ..., DK Litter D15 P6_(MoEx-1_0-st-v1) Time period: 2009-06-18 13:22:04 -- 2009-06-30 15:13:54 Total file size: 440.55MB RAM: 0.01MB cdf - getCdf(csR) cdfS - AffymetrixCdfFile$byChipType(getChipType(cdf, fullname=FALSE)) setCdf(csR, cdfS) bc - GcRmaBackgroundCorrection(csR, type=affinities) print(bc) GcRmaBackgroundCorrection: Data set: Exon Data Input tags: User tags: * Asterisk ('*') tags: GRBC Output tags: GRBC Number of files: 7 (440.55MB) Platform: Affymetrix Chip type: MoEx-1_0-st-v1
Re: [aroma.affymetrix] Parameters Sent To DNAcopy Functions
Hi. On Wed, Sep 29, 2010 at 8:00 PM, Dario Strbenac d.strbe...@garvan.org.au wrote: Hello, I remember reading a while ago that you can pass in additional parameters to CbsModel, and they will get passed onto DNAcopy functions. However, it doesn't seem to be working for me. I don't want any segments less than 5 probes wide. However, the 8th segment is only 2 wide. model - CbsModel(extract(normalisedCels, 1), extract(normalisedCels, 2), min.width = 5) fit(model, force = TRUE) Your expectation that you should specify the extra parameters in the setup of the CbsModel object follows the overall style of the aroma framework. However, in this particular case we haven't implemented passing parameters that way. However, a workaround is to do it via the fit() call instead. In your case, you would do: model - CbsModel(extract(normalisedCels, 1), extract(normalisedCels, 2)); fit(model, min.width=5, force=TRUE); Hope this helps Henrik There were 50 or more warnings (use warnings() to see the first 50) foldChangeTable - getRegions(model)[[1]] foldChangeTable[1:10,] chromosome start stop mean count url 1 1 51599 14941584 0.1975 7929 http://genome.ucsc.edu/cgi-bin/hgTracks?clade=vertebrateorg=Humandb=hg18position=chr1%3A0-16430583 2 1 14944039 16878322 -0.3535 1163 http://genome.ucsc.edu/cgi-bin/hgTracks?clade=vertebrateorg=Humandb=hg18position=chr1%3A14750611-17071750 3 1 16878364 17215511 -0.0717 163 http://genome.ucsc.edu/cgi-bin/hgTracks?clade=vertebrateorg=Humandb=hg18position=chr1%3A16844649-17249226 4 1 17217671 26830062 -0.3424 6070 http://genome.ucsc.edu/cgi-bin/hgTracks?clade=vertebrateorg=Humandb=hg18position=chr1%3A16256432-27791301 5 1 26830481 72541505 0.1856 28889 http://genome.ucsc.edu/cgi-bin/hgTracks?clade=vertebrateorg=Humandb=hg18position=chr1%3A22259379-77112607 6 1 72541525 72583737 2.0660 45 http://genome.ucsc.edu/cgi-bin/hgTracks?clade=vertebrateorg=Humandb=hg18position=chr1%3A72537304-72587958 7 1 72584492 101046159 0.1900 18772 http://genome.ucsc.edu/cgi-bin/hgTracks?clade=vertebrateorg=Humandb=hg18position=chr1%3A69738325-103892326 8 1 101046857 101047369 -2.8866 2 http://genome.ucsc.edu/cgi-bin/hgTracks?clade=vertebrateorg=Humandb=hg18position=chr1%3A101046806-101047420 9 1 101047606 150822152 0.1841 15874 http://genome.ucsc.edu/cgi-bin/hgTracks?clade=vertebrateorg=Humandb=hg18position=chr1%3A96070151-155799607 10 1 150822331 150852863 2.6550 31 http://genome.ucsc.edu/cgi-bin/hgTracks?clade=vertebrateorg=Humandb=hg18position=chr1%3A150819278-150855916 I don't think the warnings are related to my question, but here they are, anyway : warnings() Warning messages: 1: In log(M, base = 2) : NaNs produced 2: In log(A, base = 2) : NaNs produced 3: In DNAcopy::CNA(genomdat = data$y, chrom = data$chromosome, ... : array has repeated maploc positions 4: In log(M, base = 2) : NaNs produced 5: In log(A, base = 2) : NaNs produced 6: In DNAcopy::CNA(genomdat = data$y, chrom = data$chromosome, ... : array has repeated maploc positions ... ... ... I'm using aroma.affymetrix 1.7.0 on R 2.12.0 alpha. -- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] unusual copy number analysis result - split copy numbers
Hi. On Tue, Sep 28, 2010 at 11:54 AM, Patrick Danaher patrickjdana...@gmail.com wrote: Hi Henrik, Thanks for your response. The thread you suggested ( http://goo.gl/FGVe ) describes my problem well - I'm getting a very similar intensity profile for some chromosomes in some samples. The attached png shows the problem (red dots are intensities; the black dots are from a copy number calling problem and can be ignored). The second figure plots the called intensities against normal reference intensities for the same loci. As for your specific questions: I've never used CRMAv2 before, my dataset isn't public, and it's an affy 6.0 chip. What's your chip type? The other thread reported problems on Mapping250K_Sty though labelled as Sty 2 and I don't know what the 2 means there. Do you think annotation issues would cause this problem only in a small subset of my samples? When I say annotation issues, I really mean that if the CDF for the chip type is not the correct one, you might pick up the wrong probe signals, especially for SNPs, e.g. PM_A may get the value of a total CN probe once in a while, say. It could be a software/annotation bug in the Affymetrix DAT to CEL file conversion and so on. That's why it is crucial to know more about the chip used. I also recommend that you try dChip and/or Affymetrix GTC, if possible. /Henrik Thanks, Patrick On Sun, Sep 26, 2010 at 1:36 PM, Henrik Bengtsson h...@aroma-project.org wrote: Hi. On Mon, Sep 13, 2010 at 4:19 PM, Patrick patrickjdana...@gmail.com wrote: Hi everyone, I'm using AROMA's implementation of the CRMA v2 method to get copy number estimates for cancer samples, and I'm getting a very unusual result. Many of the samples have a chromosome where AROMA has called primarily copy number gains or losses, and the losses are mixed in with each other. That is, if you plot the probes' intensities by their positions on the chromosome, you see large stretches (~10,000 probes) where there are no intensities in the normal range (corresponding to no gain or loss), and there are intensities both above and below the normal range, mixed in with each other along the chromosome. It is as if the plots for a chromosome with a long deletion and a chromosome with a long addition were laid atop each other. It is not 100% clear from your description what you are observing. Note that it is possible to attach PNGs to messages sent to this mailing list as long as you send it as an email (not via the web interface). What chip type are you working on and do you look at a public data set? Have you used CRMAv2 on other data sets without a problem? FYI, Johan Staaf reported odd looking copy number results that are reproducible and very odd. See thread 'Problems with Affymetrix 250K Sty2 arrays after CRMAv2 analysis' on June 23-August 5, 2010, cf. http://goo.gl/FGVe. From the discussion in that thread, it seems to have something to do with annotation issues, but it is still to be solved. Is that what you are experiencing? It seems implausible that a cancer sample would have copy number gains and losses mixed in with each other in such small intervals, over such large stretches of chromosome, without any loci having the usual 2 copies, so I suspect the normalization or the affy array is the source of this phenomenon. I looked at the data without using AROMA, and the phenomenon was not evident. I re-normalized the data 3 times, each time using only one step of the AROMA normalization in isolation. The base position normalization step produced the phenomenon, and the allele crosstalk calibration and the fragment length normalization steps did not. What would help troubleshooting is if you could see other software such as dChip of Affymetrix GTC produces the same oddities. If they do, we know for sure it's something odd with the annotation. /Henrik Any thoughts on what I'm seeing and on how the base pair normalization could cause it would be very appreciated. Thanks, Patrick -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix
Re: [aroma.affymetrix] unusual copy number analysis result - split copy numbers
On Tue, Sep 28, 2010 at 12:03 PM, hb h...@biostat.ucsf.edu wrote: Hi. On Tue, Sep 28, 2010 at 11:54 AM, Patrick Danaher patrickjdana...@gmail.com wrote: Hi Henrik, Thanks for your response. The thread you suggested ( http://goo.gl/FGVe ) describes my problem well - I'm getting a very similar intensity profile for some chromosomes in some samples. The attached png shows the problem (red dots are intensities; the black dots are from a copy number calling problem and can be ignored). The second figure plots the called intensities against normal reference intensities for the same loci. As for your specific questions: I've never used CRMAv2 before, my dataset isn't public, and it's an affy 6.0 chip. What's your chip type? The other thread reported problems on Mapping250K_Sty though labelled as Sty 2 and I don't know what the 2 means there. Woops, I read it as it was *not* a GenomeWideSNP_6 chip in your ...and it's an affy 6.0 chip note. So, it is GenomeWideSNP_6. Do you think annotation issues would cause this problem only in a small subset of my samples? When I say annotation issues, I really mean that if the CDF for the chip type is not the correct one, you might pick up the wrong probe signals, especially for SNPs, e.g. PM_A may get the value of a total CN probe once in a while, say. It could be a software/annotation bug in the Affymetrix DAT to CEL file conversion and so on. That's why it is crucial to know more about the chip used. I also recommend that you try dChip and/or Affymetrix GTC, if possible. Since it is GenomeWideSNP_6, you should be able to try it on Affymetrix GTC. /Henrik /Henrik Thanks, Patrick On Sun, Sep 26, 2010 at 1:36 PM, Henrik Bengtsson h...@aroma-project.org wrote: Hi. On Mon, Sep 13, 2010 at 4:19 PM, Patrick patrickjdana...@gmail.com wrote: Hi everyone, I'm using AROMA's implementation of the CRMA v2 method to get copy number estimates for cancer samples, and I'm getting a very unusual result. Many of the samples have a chromosome where AROMA has called primarily copy number gains or losses, and the losses are mixed in with each other. That is, if you plot the probes' intensities by their positions on the chromosome, you see large stretches (~10,000 probes) where there are no intensities in the normal range (corresponding to no gain or loss), and there are intensities both above and below the normal range, mixed in with each other along the chromosome. It is as if the plots for a chromosome with a long deletion and a chromosome with a long addition were laid atop each other. It is not 100% clear from your description what you are observing. Note that it is possible to attach PNGs to messages sent to this mailing list as long as you send it as an email (not via the web interface). What chip type are you working on and do you look at a public data set? Have you used CRMAv2 on other data sets without a problem? FYI, Johan Staaf reported odd looking copy number results that are reproducible and very odd. See thread 'Problems with Affymetrix 250K Sty2 arrays after CRMAv2 analysis' on June 23-August 5, 2010, cf. http://goo.gl/FGVe. From the discussion in that thread, it seems to have something to do with annotation issues, but it is still to be solved. Is that what you are experiencing? It seems implausible that a cancer sample would have copy number gains and losses mixed in with each other in such small intervals, over such large stretches of chromosome, without any loci having the usual 2 copies, so I suspect the normalization or the affy array is the source of this phenomenon. I looked at the data without using AROMA, and the phenomenon was not evident. I re-normalized the data 3 times, each time using only one step of the AROMA normalization in isolation. The base position normalization step produced the phenomenon, and the allele crosstalk calibration and the fragment length normalization steps did not. What would help troubleshooting is if you could see other software such as dChip of Affymetrix GTC produces the same oddities. If they do, we know for sure it's something odd with the annotation. /Henrik Any thoughts on what I'm seeing and on how the base pair normalization could cause it would be very appreciated. Thanks, Patrick -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version
Re: [aroma.affymetrix] CbsModel parameters
Hi. On Mon, Sep 27, 2010 at 4:51 PM, Kai wangz...@gmail.com wrote: Hi Henrik, I was wondering whether there is a way I can fine tune the behavior of CbsModel. Sometimes the default algorithm produces too many small fragments right next to each other without much separation in mean copy numbers. Is there a way to control how smooth the segmentation results are? Any additional arguments (in ...) that you pass to process(cbs, ...) will be passed down to the DNAcopy::segment(), which is the function doing the actual segmentation. For more details on how fine tuning the CBS algorithm, see help(segment, package=DNAcopy). You may also want to contact the authors of that method/package. /Henrik Thanks a lot! Best, Kai -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Help needed regarding GCRMA normalization of exon arrays using aroma.affymetrix.....
) at computeAffinities.AffymetrixCdfFile(cdf, paths = probePath, ..., ver at computeAffinities(cdf, paths = probePath, ..., verbose = less(verbos at bgAdjustGcrma.AffymetrixCelSet(NA, path = probeData/Exon Data,GRBC/ at bgAdjustGcrma(NA, path = probeData/Exon Data,GRBC/MoEx-1_0-st-v1, at do.call(bgAdjustGcrma, args = args) at process.GcRmaBackgroundCorrection( 20100927 16:33:30| Locating probe-tab file...done 20100927 16:33:30| Retrieving probe-sequence data...done 20100927 16:33:30| Reading probe-sequence data...done 20100927 16:33:30| Computing GCRMA probe affinities for 1257006 units...done 20100927 16:33:30| Computing probe affinities...done 20100927 16:33:30|Background correcting data set...done traceback() 17: throw.Exception(Exception(...)) 16: throw(Exception(...)) 15: throw.default(Found probe-tab file only by means of deprectated (v1) search rules: , pathname) 14: throw(Found probe-tab file only by means of deprectated (v1) search rules: , pathname) 13: method(static, ...) 12: AffymetrixProbeTabFile$findByChipType(chipType, what = what, ...) 11: method(static, ...) 10: AffymetrixProbeTabFile$byChipType(chipType = chipType, verbose = less(verbose, 100)) 9: getProbeSequenceData.AffymetrixCdfFile(this, safe = safe, verbose = verbose) 8: getProbeSequenceData(this, safe = safe, verbose = verbose) 7: computeAffinities.AffymetrixCdfFile(cdf, paths = probePath, ..., verbose = less(verbose)) 6: computeAffinities(cdf, paths = probePath, ..., verbose = less(verbose)) 5: bgAdjustGcrma.AffymetrixCelSet(NA, path = probeData/Exon Data,GRBC/MoEx-1_0-st-v1, verbose = TRUE, overwrite = FALSE, subsetToUpdate = NULL, typesToUpdate = pm, indicesNegativeControl = NULL, affinities = NULL, type = affinities, opticalAdjust = TRUE, gsbAdjust = TRUE, gsbParameters = NULL, .deprecated = FALSE) 4: bgAdjustGcrma(NA, path = probeData/Exon Data,GRBC/MoEx-1_0-st-v1, verbose = TRUE, overwrite = FALSE, subsetToUpdate = NULL, typesToUpdate = pm, indicesNegativeControl = NULL, affinities = NULL, type = affinities, opticalAdjust = TRUE, gsbAdjust = TRUE, gsbParameters = NULL, .deprecated = FALSE) 3: do.call(bgAdjustGcrma, args = args) 2: process.GcRmaBackgroundCorrection(bc, verbose = verbose) 1: process(bc, verbose = verbose) sessionInfo() R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] aroma.affymetrix_1.7.0 aroma.apd_0.1.7 affxparser_1.20.0 [4] R.huge_0.2.0 aroma.core_1.7.0 aroma.light_1.16.1 [7] matrixStats_0.2.1 R.rsp_0.4.0 R.filesets_0.9.0 [10] digest_0.4.2 R.cache_0.3.0 R.utils_1.5.2 [13] R.oo_1.7.4 R.methodsS3_1.2.1 loaded via a namespace (and not attached): [1] tools_2.11.1 The working directory is desktop and the path for the cdf file and the raw data is as follows: /Users/prithish/Desktop/annotationData/chipTypes/MoEx-1_0-st-v1/MoEx-1_0-st-v1,coreR1,A20080718,MR.cdf ( I have several other cdf files like MoEx-1_0-st-v1,extendedR1,A20080718,MR.cdf/MoEx-1_0-st-v1,fullR1,A20080718,MR.cdf and MoEx-1_0-st-v1.cdf in the same directory.) /Users/prithish/Desktop/rawData/Exon Data/MoEx-1_0-st-v1/ DK Litter D15 P1_(MoEx-1_0-st-v1).CEL DK Litter D15 P2_(MoEx-1_0-st-v1).CEL DK Litter D15 P3 #2_(MoEx-1_0-st-v1).CEL DK Litter D15 P3_(MoEx-1_0-st-v1).CEL DK Litter D15 P6_(MoEx-1_0-st-v1).CEL DK Litter D15 P14_(MoEx-1_0-st-v1).CEL Moreover I am following the thread and implementing the code you suggested there but it is not working with my dataset. Please help. Thank you, Prithish Banerjee, Graduate Research Assistant, Department of Statistics, West Virginia University. On Sun, Sep 26, 2010 at 4:23 PM, Henrik Bengtsson h...@aroma-project.org wrote: Hi, first of all, for this chip type you need to specify: bc - GcRmaBackgroundCorrection(csR, type=affinities); Moreover, you cannot use the custom CDF in the GcRmaBackgroundCorrection step, and have to do the follow workaround illustrated in the below example: library(aroma.affymetrix); verbose - Arguments$getVerbose(-10, timestamp=TRUE); dataSet - Affymetrix-Tissues; chipType - MoEx-1_0-st-v1; # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # Setup data set # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - cdf - AffymetrixCdfFile$byChipType(chipType, tags=coreR1,A20080718,MR); print(cdf); csR - AffymetrixCelSet$byName(dataSet, chipType=chipType); print(csR); # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # gcRMA-style background correction # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # Currently, you must use the standard CDF file. cdf - getCdf(csR
Re: [aroma.affymetrix] Multiple NUSE and RLE Plots?
Hi, On Fri, Sep 24, 2010 at 1:27 PM, Vonn vwal...@email.unc.edu wrote: Hi All, I'm using aroma to analyze CEL files from 141 SNP 6.0 arrays. I fit the quality assessment model as follows: plm = RmaPlm(csR) fit(plm, verbose = log) qam = QualityAssessmentModel(plm) Then I'd like to produce NUSE and RLE plots for 10 arrays at a time. Can someone please tell me how to do this? plotNuse() and plotRle() for QualityAssessmentModel takes argument 'arrays', e.g. plotNuse(qam, arrays=1:10); plotNuse(qam, arrays=11:20); ... Note that the NUSE and RLE estimates are, as wanted, calculated using the complete data set, that is, the 'arrays' argument is only applied to the plotting part. /Henrik Thanks in advance for your response, Vonn -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Help needed regarding GCRMA normalization of exon arrays using aroma.affymetrix.....
Hi, first of all, for this chip type you need to specify: bc - GcRmaBackgroundCorrection(csR, type=affinities); Moreover, you cannot use the custom CDF in the GcRmaBackgroundCorrection step, and have to do the follow workaround illustrated in the below example: library(aroma.affymetrix); verbose - Arguments$getVerbose(-10, timestamp=TRUE); dataSet - Affymetrix-Tissues; chipType - MoEx-1_0-st-v1; # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # Setup data set # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - cdf - AffymetrixCdfFile$byChipType(chipType, tags=coreR1,A20080718,MR); print(cdf); csR - AffymetrixCelSet$byName(dataSet, chipType=chipType); print(csR); # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # gcRMA-style background correction # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # Currently, you must use the standard CDF file. cdf - getCdf(csR); cdfS - AffymetrixCdfFile$byChipType(getChipType(cdf, fullname=FALSE)); setCdf(csR, cdfS); bc - GcRmaBackgroundCorrection(csR, type=affinities); print(bc); csB - process(bc, verbose=verbose); print(csB); # Now, use the custom CDF in what follows setCdf(csB, cdf); print(csB); Yes, those last steps are rather confusing - we're working on updating the code so you don't have to do that yourself. FYI, the above solution/workaround was resolved in thread 'GCRMA normalization with MoEx-1_0-st-v1' of March 24-April 8, 2010, cf. http://goo.gl/cniq. Hope this helps /Henrik On Fri, Sep 24, 2010 at 2:30 PM, Prithish Banerjee prithish.baner...@gmail.com wrote: Hi All, I am trying to normalize a mouse exon array dataset using GCRMA normalization technique. I have exactly followed all the necessary steps for storing the dataset and the cdf file. the code and the output I am using are as follows: source(http://aroma-project.org/hbLite.R;); hbInstall(aroma.affymetrix) source(http://aroma-project.org/hbLite.R;); hbInstall(aroma.cn) verbose - Arguments$getVerbose(-10, timestamp=TRUE); dataSet - Exon Data [the path in the working directory is rawData/Exon Data/MoEx-1_0-st-v1/*.CEL files] chipType - MoEx-1_0-st-v1 [the path in the working directory is annotationData/chipTypes/MoEx-1_0-st-v1/MoEx-1_0-st-v1,coreR1,A20080718,MR.cdf] cdf - AffymetrixCdfFile$byChipType(chipType,tags=coreR1,A20080718,MR) [converted to binary using convertCdf command] print(cdf) AffymetrixCdfFile: Path: annotationData/chipTypes/MoEx-1_0-st-v1 Filename: MoEx-1_0-st-v1,coreR1,A20080718,MR.cdf Filesize: 30.53MB Chip type: MoEx-1_0-st-v1,coreR1,A20080718,MR RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 2560x2560 Number of cells: 6553600 Number of units: 17831 Cells per unit: 367.54 Number of QC units: 1 csR - AffymetrixCelSet$byName(dataSet, chipType=chipType) print(csR) AffymetrixCelSet: Name: Exon Data Tags: Path: rawData/Exon Data/MoEx-1_0-st-v1 Platform: Affymetrix Chip type: MoEx-1_0-st-v1 Number of arrays: 7 Names: DK Litter D15 P1_(MoEx-1_0-st-v1), DK Litter D15 P14_(MoEx-1_0-st-v1), ..., DK Litter D15 P6_(MoEx-1_0-st-v1) Time period: 2009-06-18 13:22:04 -- 2009-06-30 15:13:54 Total file size: 440.55MB RAM: 0.01MB cdf - getCdf(csR) cdfS - AffymetrixCdfFile$byChipType(getChipType(cdf, fullname=FALSE)) setCdf(csR, cdfS) bc - GcRmaBackgroundCorrection(csR, type=affinities) print(bc) GcRmaBackgroundCorrection: Data set: Exon Data Input tags: User tags: * Asterisk ('*') tags: GRBC Output tags: GRBC Number of files: 7 (440.55MB) Platform: Affymetrix Chip type: MoEx-1_0-st-v1 Algorithm parameters: (subsetToUpdate: NULL, typesToUpdate: chr pm, indicesNegativeControl: NULL, affinities: NULL, type: chr affinities, opticalAdjust: logi TRUE, gsbAdjust: logi TRUE, gsbParameters: NULL) Output path: probeData/Exon Data,GRBC/MoEx-1_0-st-v1 Is done: FALSE RAM: 0.00MB csB - process(bc, verbose=verbose) 20100923 13:24:12|Background correcting data set... 20100923 13:24:12| Computing probe affinities... 20100923 13:24:12| Computing GCRMA probe affinities for 1257006 units... 20100923 13:24:12| Identify PMs and MMs among the CDF cell indices... logi [1:5266159] TRUE TRUE TRUE TRUE TRUE TRUE ... Mode FALSE TRUE NA's logical 334476 4931683 0 20100923 13:25:57| MMs are defined as non-PMs 20100923 13:25:57| Number of PMs: 4931683 20100923 13:25:57| Number of MMs: 334476 20100923 13:25:57| Identify PMs and MMs among the CDF cell indices...done 20100923 13:25:57| Reading probe-sequence data... 20100923 13:25:57| Retrieving probe-sequence data... 20100923 13:25:57| Chip type (full): MoEx-1_0-st-v1 20100923 13:25:57| Locating probe-tab file... 20100923 13:25:57| Chip type: MoEx-1_0-st-v1 Error in list(`process(bc, verbose = verbose)` = environment, `process.GcRmaBackgroundCorrection(bc, verbose = verbose)` = environment, : [2010-09-23
Re: [aroma.affymetrix] unusual copy number analysis result - split copy numbers
Hi. On Mon, Sep 13, 2010 at 4:19 PM, Patrick patrickjdana...@gmail.com wrote: Hi everyone, I'm using AROMA's implementation of the CRMA v2 method to get copy number estimates for cancer samples, and I'm getting a very unusual result. Many of the samples have a chromosome where AROMA has called primarily copy number gains or losses, and the losses are mixed in with each other. That is, if you plot the probes' intensities by their positions on the chromosome, you see large stretches (~10,000 probes) where there are no intensities in the normal range (corresponding to no gain or loss), and there are intensities both above and below the normal range, mixed in with each other along the chromosome. It is as if the plots for a chromosome with a long deletion and a chromosome with a long addition were laid atop each other. It is not 100% clear from your description what you are observing. Note that it is possible to attach PNGs to messages sent to this mailing list as long as you send it as an email (not via the web interface). What chip type are you working on and do you look at a public data set? Have you used CRMAv2 on other data sets without a problem? FYI, Johan Staaf reported odd looking copy number results that are reproducible and very odd. See thread 'Problems with Affymetrix 250K Sty2 arrays after CRMAv2 analysis' on June 23-August 5, 2010, cf. http://goo.gl/FGVe. From the discussion in that thread, it seems to have something to do with annotation issues, but it is still to be solved. Is that what you are experiencing? It seems implausible that a cancer sample would have copy number gains and losses mixed in with each other in such small intervals, over such large stretches of chromosome, without any loci having the usual 2 copies, so I suspect the normalization or the affy array is the source of this phenomenon. I looked at the data without using AROMA, and the phenomenon was not evident. I re-normalized the data 3 times, each time using only one step of the AROMA normalization in isolation. The base position normalization step produced the phenomenon, and the allele crosstalk calibration and the fragment length normalization steps did not. What would help troubleshooting is if you could see other software such as dChip of Affymetrix GTC produces the same oddities. If they do, we know for sure it's something odd with the annotation. /Henrik Any thoughts on what I'm seeing and on how the base pair normalization could cause it would be very appreciated. Thanks, Patrick -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
[aroma.affymetrix] Re: Alternatives way to access the mailing list archive
Hi, as a follow up on this one; does anyone know of alternative websites that archives our mailing list? We currently have: http://groups.google.com/group/aroma-affymetrix/topics/ http://www.mail-archive.com/aroma-affymetrix@googlegroups.com/maillist.html More specifically, I'm looking for alternatives that are accessible from within China, and the above seem not to be. It would be great to solve this, because the archive is very useful resource. Thanks Henrik On Tue, Sep 21, 2010 at 9:32 PM, Henrik Bengtsson h...@aroma-project.org wrote: Hi, it has been brought to my attention that the Google Group site, which provides our mailing list and its archive: http://groups.google.com/group/aroma-affymetrix/topics/ is not accessible from/blocked by certain countries. Luckily there are some alternatives by other services providing archives of the mailing list, such as: http://www.mail-archive.com/aroma-affymetrix@googlegroups.com/maillist.html I have added a link to the latter on http://aroma-project.org/forum/ /Henrik -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Re: CbsModel
Hi. On Wed, Sep 22, 2010 at 9:35 AM, Kai wangz...@gmail.com wrote: Hi Henrik, Thank you for letting me know the hidden trick. It seems to work. I have another related question regarding the CbsModel function. You've mentioned in addressing another post that for non-affy platforms, once one has setup a platform-independent data set (*.asb files) as in Vignette: Creating binary data files containing copy number estimates, e.g. ds - AromaUnitTotalCnBinarySet$byName(dataSet, tags=tags, chipType=*); One can then pass this to CbsModel just as one passes an CnChipEffectSet 'ces' in other vignettes for affymetrix genotyping platforms. However, it seems to me that what are stored in the CnChipEffectSet are raw CN estimates, whereas the AromaUnitTotalCnBinaryFile objects contain log2 CN ratios. If these are correct, my question is that how CbsModel can tell whether the input data are in log2-scale or not, or whether the input data are ratios or not? Thank you very much for your help on this. It does this by looking for special tags of the *.asb file. More precisely, if the filename has a log2ratio tag, then it's content is assumed to log2-ratio. Likewise, if there is a log10ratio tag, it's content is assumed to be log10-ratios. For historical reasons, a logRatio tag is interpreted as log10ratio. If none of these tags exist, the content is assumed to be on the non-logarithmic scale. I recommend to use non-logged storage, because that is well defined also for non-positive values. DETAILS: The above is taken care of by the AromaUnitTotalCnBinaryFile class and more precisely the internal/private getAM() method. The CbsModel and likewise does not know about this layer and happily receives log2 ratios regardless of what is stored on file. Hope this helps /Henrik Best, Kai On Sep 20, 1:06 pm, Henrik Bengtsson h...@aroma-project.org wrote: Hi Kai. I am aware of the issue, and it is on the todo list to add argument specify that you don't want ratios to be calculated. There is currently a secret workaround for this that should not be read as an official documented feature [that's a warning for users reading this thread in the future], but it should solve your immediate needs. cbs - CbsModel(ds); cbs$.calculateRatios - FALSE; See if that does it for you. /Henrik On Wed, Sep 15, 2010 at 10:14 PM, Kai wangz...@gmail.com wrote: Dear Henrik, I was trying to run CBS model on a set of paired CN estimates. The data were generated using an Illumina platform, so I have followed Vignette: Creating binary data files containing copy number estimates to create the log2ratio CN estimates between a tumor sample and its matched normal. I have loaded the data with the following codes: dataSet = Dataset,tagA,tagB; chipType = HumanOmni1-Quad; ds = AromaUnitTotalCnBinarySet$byName(dataSet,chipType=chipType); cbs = CbsModel(ds); However, when I looked at how the CBS model was set up, it says: cbs CbsModel: Name: Dataset Tags: tagA,tagB Chip type (virtual): HumanOmni1-Quad Path: cbsData/Dataset,tagA,tagB/HumanOmni1-Quad Number of chip types: 1 Sample reference file pairs: Chip type #1 of 1 ('HumanOmni1-Quad'): Sample data set: AromaUnitTotalCnBinarySet: Name: Dataset Tags: tagA,tagB Full name: Dataset,tagA,tagB Number of files: 10 Names: sample1, sample2, ..., sample10 [10] Path (to the first file): rawCnData/Dataset,tagA,tagB/HumanOmni1-Quad Total file size: 43.46 MB RAM: 0.02MB Reference data set/file: average across arrays RAM: 0.00MB It seems to me that the CBS model is using average across arrays as reference, which would not be what I want, since my CN estimates have already been referenced. So my questions are: 1. Is this how CBS will behave? 2. Is there a way to let CBS take the CN estimates as is, without contrasting to any reference? Thank you very much for your help on this. Best, Kai -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with websitehttp://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go tohttp://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When
Re: [aroma.affymetrix] Relative Copy Number Analysis
Hi Dario, Pierre Neuvial has kindly provided a more up-to-date vignette for doing paired total copy number analysis. You find it at: http://aroma-project.org/vignettes/pairedTotalCopyNumberAnalysis See if that helps /Henrik On Wed, Sep 22, 2010 at 5:05 PM, Dario Strbenac d.strbe...@garvan.org.au wrote: Hello, I see the vignette for absolute copy number analysis, where you compare to a HapMap sample pool, but I'm not sure how to do a control / treatment analysis. I have 1 Affymetrix SNP6 .CEL of a cancer sample and 1 of a normal sample. The documentation is brief or non-existent for most of the functions that appear in the total copy number vignette. Can anyone share a workflow for a relative analysis ? -- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] CbsModel
Hi Kai. I am aware of the issue, and it is on the todo list to add argument specify that you don't want ratios to be calculated. There is currently a secret workaround for this that should not be read as an official documented feature [that's a warning for users reading this thread in the future], but it should solve your immediate needs. cbs - CbsModel(ds); cbs$.calculateRatios - FALSE; See if that does it for you. /Henrik On Wed, Sep 15, 2010 at 10:14 PM, Kai wangz...@gmail.com wrote: Dear Henrik, I was trying to run CBS model on a set of paired CN estimates. The data were generated using an Illumina platform, so I have followed Vignette: Creating binary data files containing copy number estimates to create the log2ratio CN estimates between a tumor sample and its matched normal. I have loaded the data with the following codes: dataSet = Dataset,tagA,tagB; chipType = HumanOmni1-Quad; ds = AromaUnitTotalCnBinarySet$byName(dataSet,chipType=chipType); cbs = CbsModel(ds); However, when I looked at how the CBS model was set up, it says: cbs CbsModel: Name: Dataset Tags: tagA,tagB Chip type (virtual): HumanOmni1-Quad Path: cbsData/Dataset,tagA,tagB/HumanOmni1-Quad Number of chip types: 1 Sample reference file pairs: Chip type #1 of 1 ('HumanOmni1-Quad'): Sample data set: AromaUnitTotalCnBinarySet: Name: Dataset Tags: tagA,tagB Full name: Dataset,tagA,tagB Number of files: 10 Names: sample1, sample2, ..., sample10 [10] Path (to the first file): rawCnData/Dataset,tagA,tagB/HumanOmni1-Quad Total file size: 43.46 MB RAM: 0.02MB Reference data set/file: average across arrays RAM: 0.00MB It seems to me that the CBS model is using average across arrays as reference, which would not be what I want, since my CN estimates have already been referenced. So my questions are: 1. Is this how CBS will behave? 2. Is there a way to let CBS take the CN estimates as is, without contrasting to any reference? Thank you very much for your help on this. Best, Kai -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] How long should it take to run CRMAv2 on 270 samples for Affymetrix SNP 6.0 arrays
Hi. On Fri, Sep 17, 2010 at 12:58 PM, Matt matt.kowg...@gmail.com wrote: Hi Henrik, I am processing the data from the 270 HapMap samples on the SNP 6.0 arrays using the CRMAv2 method. I wrote a script to follow the steps, minus the plotting, outlined on http://www.aroma-project.org/vignettes/CRMAv2 It has been running for a week now without error. I have checked the .Rout file and it is still running, but it says it is doing chunk #622 of 1252, so only half-way. I wonder if there is a way I can run this faster? Is it possible to break the 270 samples up? Have a look at the how-to page on 'Improve processing time': http://www.aroma-project.org/howtos/ImproveProcessingTime Obviously it depends on your computer, but on a decent machine I would expect something like 5-10 mins per array. If you see more than say 15 min per array, you should definitely look into the above how-to page. The CRMAv2 algorithm was designed to be a truly single array statistical method, meaning it will give identical result is process each array independently and then merge or you merge and the process all in one batch. This is neat, because you can process samples as they are added to an experiment/project. Because of this you can also run CRMAv2 on multiple machines in parallel. Note that this is not case with CRMA(v1) or any other CN preprocessing methods out there. If you wish to take this approach, you might by the doCRMAv2() function useful. See page 'Block: doCRMAv2() / doASCRMAv2()': http://www.aroma-project.org/blocks/doCRMAv2 Hope this helps Henrik Thanks for your help. Matt -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Re: Custom Canine SNP (DogSty06m520431); problem with chr24-39
Hi, On Tue, Aug 24, 2010 at 3:01 AM, Denis amer.ak...@rub.de wrote: Hi Henrik, Sorry for the delay, I had some difficulties in getting GLAD strated (including gsl ...). What should I else say than your the best and thank you very much for your help. I finally got it. I would like to provide you with a finallized version of the Canine,chromosomes.txt including the band pattern for your aroma-project. If you could give me a hint how to manage this (with respect to what information and column heading are necessary) I would start asap and attach it hopefully to the next reply :). I missed that you were asking for more help here. If you could send me a tab-delimited text file with two columns 'chromosome' and 'nbrOfBases', where the chromosome column should contain chromosome indices (integers) and the nbrOfBases column the number of nucleotides/length of that chromosome (integer), that would be great and I add it to the package. Here is an example of what it should look like: chromosome nbrOfBases 1 12548891 2 88182672 3 94659212 4 91429679 5 91969480 6 80531300 7 83918830 8 76319553 9 64388646 10 72471775 11 77415697 12 75458181 13 66171193 14 63877078 15 67136955 16 62534543 17 67108084 18 58768331 19 56709702 20 61246656 21 53889065 22 64179564 23 55386667 24 50674002 25 54562819 26 42004589 27 48236582 28 44161646 29 44727140 30 43154895 31 41581023 32 41575543 33 34226766 34 45065038 35 29486428 36 33827475 37 33893929 38 26869798 39 125840674 /Henrik Cheers Denis -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Problems updating aroma.affymetrix
Hi, if you are following Pierre's advice and still getting *that* error message my best guess is that you are getting an error while installing one of the packages that aroma.affymetrix depends on, which in turn probably will fail the installation of aroma.affymetrix itself. The reason why one of the required packages didn't install, may because your R version is far too old. So, update R, then retry again. More details: The installation code - hbInstall() - assumes that the installation went well and at the end tries to download and install patches. Since it did install well, you'll get that error on a too old version. I've updated that piece of the code not to try to patch if installation failed. Hope this helps Henrik On Wed, Sep 1, 2010 at 10:03 PM, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Matt. Another point to mention is that you should update your version of R ... 2.7.x is 2.5 years old, which is a long time in R. I'd recommend 2.11.x .. Cheers, Mark On 2010-09-02, at 2:51 PM, Pierre Neuvial wrote: [Forwarding this to the list so that others can read this thread] Pierre On Wed, Sep 1, 2010 at 12:57 PM, Matt matt.kowg...@gmail.com wrote: Hi Pierre, Thanks for the reply. I guess it's a problem with the source because what you say is exactly what I did and I get the error message shown. Best, Matt On Wed, Sep 1, 2010 at 3:20 PM, Pierre Neuvial pie...@stat.berkeley.edu wrote: Hi Matt, You want to *update* the package, not *patch* it: the difference between updates and patches is explained at http://aroma-project.org/howtos/updateOrPatch. So, to update aroma.affymetrix, do: source(http://aroma-project.org/hbLite.R;); hbInstall(aroma.affymetrix); as explained at http://aroma-project.org/install Hope this helps, Pierre. On Wed, Sep 1, 2010 at 10:08 AM, Matt matt.kowg...@gmail.com wrote: Hi there, I'd like to update my version of aroma.affymetrix, current version in use 0.9.1, so that I can utilize the new CN processing method. I followed the instructions on the site but I get the following error message Patching /home/matthew/.Rpatches/aroma.affymetrix/20080508/ WeightsFile.R Failed to source: http://www.braju.com/R//patches/aroma.affymetrix/download.R Error in stop(ex$message) : Your version (0.9.1) of aroma.affymetrix is out of date. Please update. In addition: There were 11 warnings (use warnings() to see them) sessionInfo() R version 2.7.0 Under development (unstable) (2008-01-21 r44087) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] tools stats graphics grDevices datasets utils methods [8] base other attached packages: [1] Biobase_2.0.1 aroma.affymetrix_0.9.1 aroma.apd_0.1.7 [4] R.huge_0.2.0 digest_0.4.2 aroma.light_1.16.1 [7] affxparser_1.12.2 R.rsp_0.3.6 R.cache_0.3.0 [10] R.utils_1.5.0 R.oo_1.7.3 R.methodsS3_1.2.0 How can I fix this? Thanks for any help. Mattt -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 --
Re: [aroma.affymetrix] Custom Canine SNP (DogSty06m520431); problem with chr24-39
Hi Denis, you refer to the thread 'Custom Canine SNP' started on July 18, 2008. In KD's message on August 14, 2008 you can see how he explicitly set argument genome=Canine when he sets up the GLAD model. From the verbose output I can see you are using CBS, but it is not clear how you set it up. Are you doing: cbs - CbsModel(ces, genome=Canine)? If not, do that. Then (for troubleshooting purposes only) try df - getGenomeData(cbs, verbose=verbose); print(df); This latter step will try to load the tab-delimited file containing the information about the number of bases per chromosomes. Since you specify Canine above, it will try to locate and read the file: annotationData/genomes/Canine/Canine,chromosomes.txt or any with additional tags, e.g. annotationData/genomes/Canine/Canine,chromosomes,UGP,HB20100822.txt It needs to contain (at least) the two columns chromosome and nbrOfBases. I don't have the exact numbers for the Canine genome, but see the attached file for an example. Feel free to forward the data to me, and I'll add this Canine annotation data so it's built in to the aroma framework. If you get the above working once, then process(ce) should work too. Hope this helps Henrik On Mon, Aug 23, 2010 at 4:28 AM, Denis amer.ak...@rub.de wrote: Hi there, I hope you can help me with my problem since I have followed kind help with a similar problem on the google aroma.affymetrix formus, yet without the last bit of information I would need to succeed: http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/96676bd38d64e884/e01329e5f44ba42b?lnk=gstq=building+ufl+file+for+DogSty06m520431#e01329e5f44ba42b We are processing the DogSty06m520431 chips and my Prof. wants me to generate CNV calculations for the corresponding samples. So far I could stick to the helpful guide on the aroma homepage for “Total Copy Number Analysis (GWS5 GWS6)”. Yet I am now faced with 2 problems of which I hope you could give me a hand. First one is that when executing the command: process(ce, chromosomes=c(38), verbose=verbose) for chromosomes 24 the analyses aborts with following prompting for e.g. chr25: process(ce, chromosomes=c(25), verbose=verbose) 20100823 13:05:15|Generating ChromosomeExplorer report... 20100823 13:05:15| Setting up ChromosomeExplorer report files... 20100823 13:05:15| Copying template files... 20100823 13:05:15| Source path: C:/Programme/R/R-2.11.1/library/ aroma.core/reports/includes 20100823 13:05:15| Destination path: reports/includes 20100823 13:05:23| Copying template files...done 20100823 13:05:23| Setting up ChromosomeExplorer report files...done 20100823 13:05:23| Explorer output version: 3 20100823 13:05:23| Compiling ChromosomeExplorer.onLoad.js.rsp... 20100823 13:05:23| Source: C:/Programme/R/R-2.11.1/library/aroma.core/ reports/templates/rsp/ChromosomeExplorer3/ ChromosomeExplorer.onLoad.js.rsp 20100823 13:05:23| Output path: reports/Weim/ACC,-XY,AVG,+300,A+B 20100823 13:05:23| Scanning directories for available chip types... 20100823 13:05:23| Detected chip types: DogSty06m520431 20100823 13:05:23| Scanning directories for available chip types...done 20100823 13:05:23| Scanning image files for available zooms... 20100823 13:05:24| Detected (or default) zooms: 1, 2, 4, 8, 16, 32, 64 20100823 13:05:24| Scanning image files for available zooms...done 20100823 13:05:24| Scanning directory for subdirectories... 20100823 13:05:24| Detected (or default) sets: cbs 20100823 13:05:24| Scanning directory for subdirectories...done 20100823 13:05:24| Compiling RSP... member data.class dimension objectSize 1 chipTypes character 1 72 2 chrLayers character 0 24 3 sampleLabels character 4 264 4 sampleLayers character 0 24 5 samples character 4 264 6 sets character 1 64 7 zooms numeric 7 56 20100823 13:05:28| Sample names: [1] W24a_(DogSty06m520431) W469_(DogSty06m520431) W511_(DogSty06m520431) [4] W513_(DogSty06m520431) 20100823 13:05:28| Full sample names: [1] W24a_(DogSty06m520431) W469_(DogSty06m520431) W511_(DogSty06m520431) [4] W513_(DogSty06m520431) 20100823 13:05:28| Compiling RSP...done 20100823 13:05:29| Compiling ChromosomeExplorer.onLoad.js.rsp...done Loading required package: RColorBrewer Loading required package: Cairo 20100823 13:05:40| Building tuple of reference sets... 20100823 13:05:40| No reference available. 20100823 13:05:40| Calculating average copy-number signals... 20100823 13:05:40| Retrieving average cell signals across 4 arrays... CnChipEffectFile: Name: .average-intensities-median-mad Tags: f1b4541a56b9bb2404325d6053edc91e Full name: .average-intensities-median- mad,f1b4541a56b9bb2404325d6053edc91e Pathname: plmData/Weim,ACC,-XY,AVG,+300,A+B/
[aroma.affymetrix] Re: Problem with GLAD on linux cluster
Hi Christian, On Wed, Aug 4, 2010 at 9:04 AM, cstratowa christian.strat...@vie.boehringer-ingelheim.com wrote: Dear Henrik, Thank you for your suggestion to use ceRef directly. Regarding your explanation of getAverageFile() the question is where the generated output will be saved. As I have mentioned, each node creates first a plmData subdirectory, e.g. Prostate/Prostate21/plmData and makes symbolic links to the normalized CEL-files located in Prostate/plmData. Thus the output of getAverageFile() should be stored for each node separately. Ah, now I see; I've been reading it as you were linking the directories, not the individual CEL files. This seems indeed to be the case, since e.g. the subdirectory Prostate/Prostate21/plmData/Prostate,ACC,-XY,QN,RMA,A+B,FLN,-XY/ Mapping250K_Nsp contains the file .average-intensities-median- mad,a1c33926939ee43fbed83ae69301d215.CEL created at a certain time while subdirectory Prostate/Prostate8/plmData/Prostate,ACC,- XY,QN,RMA,A+B,FLN,-XY/Mapping250K_Nsp contains a file with the same name, i.e. .average-intensities-median- mad,a1c33926939ee43fbed83ae69301d215.CEL created at a different time. Yes. As I understand it now, you preprocess all of the data, and wait for everything to be done (all *,chipEffects.CEL files to be generated) before continuing with the above, correct? If so, I'd suggest that you also wait for getAverageFile() to finish first. Then that average/ results file be available to all your cluster nodes as well. I even think you don't have to link each CEL file separately, because nothing else should be written back to the data set. It should be enough to link each data set directory, or even just plmData/ itself (not even sure the need to split it up anymore). As far as I understand these are the files created by getAverageFile() and thus each node creates its own file saved in its own subdirectory, so there will be no problem. Yes. Now I agree with you. It seems that the problem was indeed the result of saveObject() stored in .Rcache, which caused the race conditions. Since the removal of saveObject() I have until now experienced no problems. Yes. You are correct. Since caching is mainly done for memoization purposes, that is, to load already calculated results that are computational expensive to obtain from file, it is recommended to store the cache in a fast place. In other words, it is better if the .Rcache directory is on the local drive of the machine, rather than on a shared file system. If you had done that, then each machine would had to have do those calculations by themselves once, but when done the memoization would be faster and you would not have had any race conditions accessing the memoized results. The default ~/.Rcache/ can be changed, cf. http://www.aroma-project.org/archive/GoogleGroups/web/caching. This was a useful conversation to me; it made me see other ways for (unnecessary) race conditions to occur, and remind me how important it is to not overlook the smallest details in scientific communication since they can make big differences. Cheers, Henrik Thank you for your help. Best regards Christian On Aug 2, 2:54 pm, Henrik Bengtsson h...@stat.berkeley.edu wrote: Hi. On Mon, Jul 26, 2010 at 12:00 PM, cstratowa christian.strat...@vie.boehringer-ingelheim.com wrote: Dear Henrik, Maybe, my explanation was not clear enough: I have created my own package based on S4 classes, where one subclass is AromaSNP with slots celset, normset, plmset, effectset as lists, and methods readSNPData(), normalizeSNPData(), computeCN(), computeRawCN(), among others. Furthermore, the package includes scripts batch.aroma.norm.R, batch.aroma.model.R, batch.aroma.combine.R, and a perl script which distributes these scripts to the different cluster nodes. 1, Normalization: Script batch.aroma.norm.R creates first the subdirectory structure which I have already described, and then does the normalization. All normalization steps run on one server and the results are saved as AromaSNP object aroma in Prostate/ Prostate.Rdata. Furthermore, subdirectories Prostate/probeData and Prostate/plmData are created. 2, GLAD: Script batch.aroma.norm.R is called from each node separately. For each node it creates first a plmData subdirectory, e.g. Prostate/Prostate21/plmData and makes symbolic links to the normalized CEL-files located in Prostate/plmData. Then it loads object aroma from Prostate/Prostate.Rdata, whereby each node has a separate RAM of 2GB. Slot ar...@effectset contains the normalized data and is called from computeCN() as cesList - ar...@effectset. This cesList (which is in the RAM of each node) is passed to model - GladModel(cesList, refList), and is thus used to compute getAverageFile(), if refList=NULL (which is the default). Since each node calls the same cesList object, function saveObject() writes
Re: [aroma.affymetrix] Re: Problem with GLAD on linux cluster
available. First time one of your processes completes a getAverageFile() call, a new file will be created and stored on your file system. It's name will be a md5 checksum that is generated from the names of the arrays in the set that you call getAverageFile() on. If you do it twice for the same set of arrays, you will the second time get the results stored on file, because they have already been calculated. So far so good, the race condition occurs when you have two processes A and B that operates on the same data set 'cesList'. Process A runs the script, it request the reference which is missing and starts running getAverageFile(cesList[[1]]). While this is done, Process B starts doing the same thing, and since the *result file* of getAverageFile(cesList[[1]]) is not available, it starts doing the same thing. Now Process A finish and writes its result file. Later Process B writes its results to the same result file, because they process the same data set, more precisely getNames(cesList[[1]]) are the same. If Process B starts writing at the same time as Process A writes, there is a potential problem. From my troubleshooting, as far as I understands it, the only way you could have gotten that error message was when two or more processes did getAverageFile(cesList[[1]]) where getNames(cesList[[1]]) where identical. Are you 100% sure that is not the case? Are you saying that is not the case? If not, I am really puzzled how there could be a clash in the first place. Thus, the key point is to make sure that multiple processing are not trying to calculate getAverageFile() on the same array set at the same time. /Henrik I hope that this explanation could explain better what the different steps are. Best regards Christian On Jul 23, 4:35 pm, Henrik Bengtsson henrik.bengts...@gmail.com wrote: Hi. On Jul 22, 10:24 am, cstratowa christian.strat...@vie.boehringer- ingelheim.com wrote: Dear Henrik, Thank you very much for changing the code for getAverageFile(), I will try it and let you know. Thank you also for the explanation of writing to a temporary file, now I understand your intention. Regarding race conditions: No, I do not assume that aroma.* takes care of potential race conditions. Here is what I do: Assume that I have downloaded from GEO a prostate cancer dataset consisting of 40 CEL-files. Then I create a directory Prostate and subdirectories Prostate/annotationData and Prostate/rawData following your required file structure. However, starting with the 2nd CEL-file I create subdirectories Prostate/Prostate2,...,Prostate/Prostate40, each containing a symbolic link to ../annotationData and ../rawData from Prostate. Do I understand you correctly that you use a separate project directory for each CEL file, so that when you process the data you get separate subdirectories probeData/ and plmData/ in each of these project directories? Thus when running GLAD each cluster node has its own directory to write to, e.g. Prostate/Prostate21/reports for creating the images. This is where I get lost. In order to do CN segmentation (here GLAD), you need to calculate CN ratios relative to a reference. Looking at your error message, that reference is calculated from the pool of samples, i.e. getAverageFile() is done on the pool of references. Thus, for this to make sense you need a *pool of samples*, but if I understood you correctly above, you don't have that, but only one array per project directory. I guess I misunderstood you, because your error indicates something else. The only way the error you got occurred was because multiple R sessions tried to run getAverageFile(ces) on data sets that contain arrays with the same names and in the same order (more precisely getNames(ces)). If they would contain different array names, there would be no clash, because that saveObject() statement (that I just removed) would write to different filenames. This makes me suspect that you indeed use the same pool of reference samples. Only after all nodes have finished their computations, then I move the relevant files to the main directory, e.g. all images are moved to Prostate/reports. Afterwards I delete the subdirectories Prostate2,...,Prostate40 and their contents. As you can see, using this setup there should not be any race conditions. The only remaining problem are the temporary files which you store in .Rcache in my home directory. So, there is something I don't understand above. Can you post you full script, because that would certainly remove some of the ambiguities. Also, it helps if change your script to be explicit about the getAverageFile() calculation, i.e. print(cesN); ceR - getAverageFile(cesN); print(ceR); seg - GladModel(cesN, ceR); print(seg); instead of letting GladModel() do it implicitly: seg - GladModel(cesN); print(seg); As explained above, if your parallelized R sessions calculate ceR
Re: [aroma.affymetrix] medpolish in analyzing HuGene arrays
Hi Steven. On Mon, Jul 26, 2010 at 9:00 PM, Steven Bosinger steven.bosin...@gmail.com wrote: Hi, I'm new to aroma and bioC in general, so these are probably a very straightforward questions: I am using aroma to get QC on some Human Affymetrix Gene arrays. 1. To keep it consistent with previous analyses using RMA pre- processing (BKG subtraction, quantile normalization and median polish summarization), can I use the medpolish function instead of RmaPlm? Not sure what implementation you used before, but the RMA summarization step in the oligo package uses median polish. RmaPlm does not do the low-level calculation itself, by rely on existing code/package for this. By default it is using affyPLM/preprocessCore. You can tell RmaPlm to use that of oligo as: plm - RmaPlm(..., flavor=oligo); There are some more comments in help(RmaPlm). Note that even if two different implementations/software say they are using median polish, they may not be numerically reproducible. How median polish is started, how many iteration it runs etc may give you different results. If I remember correctly, it is also known for not always converging, i.e. it can oscillate between two results. Note that median polish and rlm (robust linear modelling) are both estimator for the same log-additive probe-level model, i.e. they try to estimate the same parameters but in different ways. Some people/software documentations are sloppy and say they use median polish, but in reality they might actually have used rlm. I would recommend to use rlm, if possible. You can always run both variants and see how much the results differ. 2. I read in the forum that NUSE plots aren't available when you summarize using medpolish, is this the case? Good catch. Could you provide a link where you found that? In order to calculate NUSE (Normalized Unscaled Standard Errors), you need standard deviations of the parameter estimates. The median polish estimator [see help(oligo::basicRMA)] does *not* give/return standard deviation/errors of the parameter estimates. Internally, we fix the stddev to 1 (one), so if you try to calculate NUSE, you'll get nothing useful or even an error. 3. Is there a vignette/pdf file similar to BioC that lists all the available functions for aroma? The website is the main source of documentation. 4. How can I export the RMA pre-processed data matrix to another 3rd party software? A good start is probably to use extractDataFrame() and write its content in a format you like. See the how-to page 'Extract probeset summaries (chip effects) as a data frame': http://aroma-project.org/howtos/extractDataFrame 5. Is there a function for MvA plots? It's not clear to me *what* you want to plot, but basically: plotMvsA(cf, reference=cfR); where 'cf' and 'cfR' are two AffymetrixCelFile:s, e.g. cf - getFile(cs, 1); cfR - getAverageFile(cs); You can do the same by replacing the AffymetrixCelSet 'cs' with an ChipEffectSet 'ces'. 6. How do I format plots? ie alter range, color etc The same way as you usual do - what have you tried and what didn't work? Sorry for these newbie questions... No worries. Though, next time, please try to post one question/topic per message, and try to be more precise in what you are asking/have tried. Then it is easier to help and quicker to reply to. Cheers, Henrik Steve. -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Batch Adjusted RMA data
Hi Fong(?). On Sat, Jul 24, 2010 at 2:10 AM, Fong fongchunc...@gmail.com wrote: Hi, I've used aroma.affymetrix to generate and extract the probeset summaries (chip effects) from a set of Human Exon array samples I have. And then performed batch adjustment on these probeset summaries using another R script (ComBat.R). Now I got the adjusted probeset values and I was running whether it was possible to feed these into firma again to use? I can't figure out how to load external RMA data into the aroma.affymetrix package, Unfortunately, there is no such option available in aroma.affymetrix. It can be done, but you really have to dig into the low-level parts which requires lots of knowledge, which only a few developers have. What adds to the complications, is that the FIRMA model relies on the residuals of the probe-level modelling (PLM), see the bottom equation in column 2 page 2 of Purdom et al. (2008): r_ijk = y_ijk + c_i + p_k where {y} are the probe signals, {c} are the estimated chip effects and {p} the estimate probe affinities. With the risk of making a fool of myself, I think ComBat is correcting only the chip effects {c}. This means that when calculating the above residuals you would use ComBat-normalized chip effects but the default RMA probe affinities. I don't think FIRMA algorithm was designed for this. Have you though about this? /Henrik Any help would be greatly appreciated. -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Re: Mo-Ex 1.0st array Analysis using aroma.affymetrix and FIRMA model
Hi Sundar, I've been leaving your messages to the FIRMA experts, because they can better answer you questions. However, I'll give a quick reply to the things I can answer. On Mon, Aug 2, 2010 at 6:59 PM, Sundar sundar...@gmail.com wrote: Hi, I am trying to analyze Mouse Exon 1.0 st array data using aroma.affymetrix and FIRMA model to find the splicing variants. CDF file : MoEx-1_0-st-v1,coreR1,A20080718,MR.cdf ( downloaded from the aroma.affymetrix website) CEL files: 12 Mouse Exon 1.0 st arrays. ( 6 arrays for each strain A and B) and Within A and B, i have 3 arrays each of two experimental conditions) 1) I am not sure how the RMA normalization in aroma.affymetrix performs, is the normalization performed within array or between the array ? The aroma.affymetrix package implements a standard RMA normalization, that is, it reproduces it very well. I recommend that you read up on RMA model: Bolstad, B.M., Irizarry R. A., Astrand, M., and Speed, T.P. (2003), A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Bias and Variance. Bioinformatics 19(2):185-193 Supplemental information Rafael. A. Irizarry, Benjamin M. Bolstad, Francois Collin, Leslie M. Cope, Bridget Hobbs and Terence P. Speed (2003), Summaries of Affymetrix GeneChip probe level data Nucleic Acids Research 31(4):e15 Irizarry, RA, Hobbs, B, Collin, F, Beazer-Barclay, YD, Antonellis, KJ, Scherf, U, Speed, TP (2002) Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data. Accepted for publication in Biostatistics. Then you'll see that the RMA pipeline is a multi-array method. For more question on this, I recommend you to use the larger Bioconductor mailing list, because this one is used for aroma.* specific questions. How do i pass arguments to the function to control the process of normalization ? What do you want to control. 2) What are .js and .css files generated at the end of the analysis described in vignette ( Human Exon array ). Are there any third party software that could be used to analyze this out put ? Those are Javascript and CSS files used by the ArrayExplorer HTML reports. They do not contain any kind of data. They cannot be used by other software. 3) How do i convert the probe set ID or transcript ID into gene name ? Can i calculate the fold change in aroma.affymetrix ? I leave this one to the FIRMA experts. Make sure to search/go through the aroma.affymetrix mailing archives - I think the question has been asked and answered before. More below Thank you, Sundar On Jul 27, 11:52 am, Sundar sundar...@gmail.com wrote: Hi, I am new to the Exon array concept. I am trying to analyze Mouse Exon 1.0 st array data using aroma.affymetrix and FIRMA model. Few questions i have are below. 1) To get a start i have just implemented the code described in vignette FIRMA: Human exon array analysis. There are certain .CEL files and other extension files generated. It is not clear which particular CEL files you are referring to. The ones under probeData/ contain normalized/calibrated probe signals in CEL files of the same format/layout as the once in rawData/. CEL files contains probe intensities, probe standard deviation, and number of pixels per probe. The ones in plmData/ are special aroma-specific CEL files that should be treated as internal file only, especially, they cannot be read by other software. These files contains chip effect estimates and standard deviation of those. There is also one CEL file containing probe affinity estimates. This is if you use the RMA-style probe-level modelling. How can i read them into R ? What do you want to do with the data that aroma doesn't do? How do i know what information they carry ? Is there any other software i can upload them to read these files ? Answered above. How do i get control of the analysis in applying extractMatrix() or extractDataFrame() , readUnits() functions ? Again, it is not clear what you want to do, but maybe the How-to page 'Extract probeset summaries (chip effects) as a data frame' [http://aroma-project.org/howtos/extractDataFrame] illustrates your options. I recommend that you use that instead of extractDataFrame(). Don't use readUnits(), unless you're a developer for aroma.affymetrix works. 2) I am unable to get a clear understanding of what the background calculation are taking place using the functions ( used to generate the files in Q.1) ? Except that these function performs QC, Normalization, Summarization etc. I am lacing clear understanding how those methods are implemented on Exon arrays , when compared to 3' prime arrays ? for instance,, In 3'-IVT arrays, I have the control of Normalizing within the group or across the group, where as in this exon array analysis I'm not sure what the function does ? I believe a detailed understand on the FIRMA paper will help here and help you be
Re: [aroma.affymetrix] non finite values in FIRMA results
Hi, I'll leave the details to FIRMA experts, but you are using really large values of argument 'ram'. It might be that you ran out of memory (the you got an error message). If you used cut'n'paste, instead of source(), to do the analysis, it might be that one of the fit() methods was preemptively finished. If so, not all units have been fitted. Rerun with ram=1 to see if you get a different result; the units already fitted will be skipped. I'm also not sure if 431210/(1190297*107) [1] 0.003385710 is an exceptionally large fraction. Note also that 431210/107 is exactly 4030; it could be that it is the exact same 4030 units that are NA in all samples. That could be explained by some units are not fitted or are Affymetrix control units. BTW, not all methods take argument 'ram'. Instead, use the global aroma settings for achieving the same, e.g. setOption(aromaSettings, memory/ram, 50). More info at http://aroma-project.org/settings. This way your scripts are clean and can be ran as-is on other machines with less memory. Maybe this helps(?) /Henrik On Mon, Aug 2, 2010 at 7:24 PM, Adi Tarca ata...@med.wayne.edu wrote: Hi all, I have a batch of 107 mice exon arrays for which I computed FIRMA scores and I got many NaN, Inf and 0 values which disable further analysis based on log FIRMA values for some probesets. I was wondering if this is a known issue or I am the only one to get these results. Here is the code I use to get the FIRMA scores: library(aroma.affymetrix) verbose - Arguments$getVerbose(-8, timestamp=TRUE) chipType - MoEx-1_0-st-v1 cdf - AffymetrixCdfFile$byChipType(chipType, tags=fullR1,A20080718,MR) cs - AffymetrixCelSet$byName(mice2010, cdf=cdf) bc - RmaBackgroundCorrection(cs, tag=fullR1,A20080718,MR) csBC - process(bc,verbose=verbose,ram=500) qn - QuantileNormalization(csBC, typesToUpdate=pm) csN - process(qn, verbose=verbose,ram=500) plmTr - ExonRmaPlm(csN, mergeGroups=TRUE) fit(plmTr, verbose=verbose,ram=500) firma - FirmaModel(plmTr) fit(firma, verbose=verbose,ram=500) fs - getFirmaScores(firma) myres2=extractDataFrame(fs,addNames=FALSE) myres=as.matrix(myres2[,-(1:3)]) Here is the counts of NaN Inf and 0 values: dim(myres) [1] 1190297 107 sum(is.nan(myres)) [1] 431210 sum(is.infinite(myres),na.rm=TRUE) [1] 855 sum(myres==0,na.rm=TRUE) [1] 214 R version 2.11.0 (2010-04-22) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] aroma.affymetrix_1.5.0 aroma.apd_0.1.7 affxparser_1.20.0 [4] R.huge_0.2.0 aroma.core_1.5.0 aroma.light_1.16.0 [7] matrixStats_0.2.1 R.rsp_0.3.6 R.cache_0.3.0 [10] R.filesets_0.8.1 digest_0.4.2 R.utils_1.4.0 [13] R.oo_1.7.2 R.methodsS3_1.2.0 -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
[aroma.affymetrix] aroma.affymetrix v1.7.0 released
Hi all, new versions of aroma.affymetrix and friends have been released. It is highly recommended to update: source(http://aroma-project.org/hbLite.R;); hbInstall(aroma.affymetrix); In addition to some added features, there were also a few bugs fixed in this release. Thanks for the reports! All details on what's new can be found below. Cheers, Henrik co-developers - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Updates to aroma.affymetrix - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Version: 1.7.0 [2010-07-26] o Committed to CRAN. No updates. Version: 1.6.8 [2010-07-21] o CLEAN UP: Now getAverageFile() for AffymetrixCelSet no longer writes debug information to ${Rcache}/aroma.affymetrix/idChecks/. Version: 1.6.7 [2010-07-19] o Now byPath(..., cdf) for ChipEffectSet will silently try to retrieve the the monocell CDF if argument 'cdf' is the main CDF. If it fails an error is thrown. This makes it possible to specify the main/ regular CDF (or chip type), instead of the monocell CDF, when retrieve a chip-effect data set. Version: 1.6.6 [2010-07-02] o Now AffymetrixCelSet$byName(..., chipType=GenomeWideSNP_6,Full) will work (before chiptypes with tags would give an error). This is now done by first locating the CDF for the chip type (with tags). o Added doASCRMAv1() and doASCRMAv2() for convenient allele-specific doCRMAv1() and doCRMAv2() wrappers. o CLEAN UP: Dropped argument 'transforms' from getImage() for AffymetrixCdfFile. Version: 1.6.5 [2010-06-16] o Added doRMA() for AffymetrixCelSet and data-set names. doRMA() runs in bounded memory and replicates the results of fitPLM() in the affyPLM package with great precision. Version: 1.6.4 [2010-06-07] o BUG FIX: Added argument shift=+300 to doCRMAv1(). Version: 1.6.3 [2010-05-30] o Now translateFullName() of AffymetrixProbeTabFile translates 'PROBE_STRAND' to 'targetStrandedness'. Version: 1.6.2 [2010-05-26] o Started to add scripts for downloading example data. Version: 1.6.1 [2010-05-19] o CORRECTION: doCRMAv1() did not shift +300 the signals before doing the probe-level summarization. o BUG FIX: Fixed a bug in PdInfo2Cdf(). Thanks Kasper Daniel Hansen for reporting this. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Updates to aroma.core - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Version: 1.7.0 [2010-07-26] o Committed to CRAN. No updates. Version: 1.6.8 [2010-07-24] o Added several methods for CopyNumberRegions, e.g. xRange(), prune(), simulateRawCopyNumbers(), +(), -() and *(). Version: 1.6.7 [2010-07-20] o Added writeDataFrame() for AromaUnitTotalCnBinarySet and AromaUnitFracBCnBinarySet to get the correct filename extension. Thanks Nicolas Vergne at the Curie Institute for reporting this. Version: 1.6.6 [2010-07-19] o Added subset() for CopyNumberRegions. o Now extractRegion() for RawGenomicSignals also accepts a CopyNumberRegions object for argument 'regions'. o Added extractRegions() for RawGenomicSignals. Version: 1.6.5 [2010-07-08] o BUG FIX: writeDateFrame() for AromaUnitSignalBinarySet would write the same data chunk over and over. Version: 1.6.4 [2010-07-06] o BUG FIX: indexOf() for ChromosomalModel would return NA if a search pattern contained parenthesis '(' and ')'. There was a similar issue in indexOf() for GenericDataFileSet/List in R.filesets, which was solved in R.filesets 0.8.3. Now indexOf() for ChromosomalModel utilizes ditto for GenericDataFileSet for its solution. Version: 1.6.3 [2010-06-22] o BUG FIX: as.GrayscaleImage(..., transforms=NULL) for 'matrix' would throw Exception: Argument 'transforms' contains a non-function: NULL. Version: 1.6.2 [2010-06-02] o BUG FIX: updateDataColumn() of AromaTabularBinaryFile would censor *signed integers* incorrectly; it should censor at/to [-(n+1),n], but did it at [-n,(n+1)] (two's complement). This caused it to write too large values as n+1, which then would be read as -(n+1), e.g. writing 130 would be censored to 128 (should be 127), which then would be read as -128. Added more detailed information on how many values were censored. Thanks Robert Ivanek for report on this. Version: 1.6.1 [2010-05-27] o Added trial version of fullname translator files. o doCBS() for character:s support data set tuples. o Added doCBS() for CopyNumberDataSetTuple:s. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Updates to R.filesets - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Version: 0.8.3 [2010-07-06] o BUG FIX: indexOf() for GenericDataFileSet/List would return NA if the search pattern/string contained parentheses. The reason is that such have a special meaning in regular expression. Now indexOf() first search by regular expression patterns, then by fixed strings. Thanks Johan Staaf at Lund University and Larry(?) for reporting on this issue. Version: 0.8.2 [2010-05-26]
Re: [aroma.affymetrix] peculiar array quality
Hi, it's hard to say what causing this, but if you see it in several samples at the same location, then my immediate thought is that you reference signal may carry it. Are you using the average of the pool of all samples as a reference or how do you calculate it? How many samples to you have in your reference pool? CN polymorphic regions that are frequent enough in your population could cause this, but then it should be a real biological signal, which you say it isn't. Are you using the full or the default GenomeWideSNP_6 CDF? Affymetrix removed several CN loci from the former to make the latter - CN loci that they found to be poor for CN analysis. This could also be a reason though those loci should be scattered fairly randomly along the genome. You could also check if there is a difference between the signals from SNPs and CN loci. If there is, that would indicate that there is some artifacts on the arrays. Also, are you really sure you are using the correct annotation data? For instance, if you use the full CDF to generate the data, but only the default for extracting genome locations (assuming the same ordering of row indices), such weird things may show up. If you plot your data using the ChromosomeExplorer, this should be taken care of automatically. Also, do some QC plots using ArrayExplorer; there might be spatial artifacts, although it sounds unlikely. Sorry, not much help, but at least some directions for troubleshooting. /Henrik On Wed, Jul 21, 2010 at 5:34 PM, Matt Wilkerson mdwilk...@gmail.com wrote: Hello, I have detected what I think is an array quality issue and wanted to get others' opinions about this phenomena. I observed this issue on chromosome views of CN from SNP6 arrays. It looks like a smearing effect where CN has irregular values and a range of large negative numbers to zero within specific regions. The regions at which this happens are identical in affected samples and occur on basically all chromosomes. This smearing is not cancer DNA segment loss, where probes belonging to a segment have similar CN values. In a group of about 70 arrays, 1/3 of the arrays have this issue and the others have expected segments of discrete amplifications/deletions. I have compared specimen, technical, and array characteristics to try to find a batch or quality issues, but the effect appears so far to be randomly occuring. I put an example at: http://www.unc.edu/~mwilkers/artifact.png In the plot, the points are probes. Axes are base position and log2 median centered copy number. The lines are segments overlaid. The colors are not important. I don't think this is an aroma issue - I detect the phenomena using apt-copynumber-workflow also. The only affymetrix summary option that associated with the artifact samples was allele summarization mean. The artifact arrays had lower values. Also, I have used aroma successfully with 250K_Sty arrays often and never seen this phenomena. My question: Has anyone seen this phenomena before? Does anyone have an explanation or suggestion? Thank you, Matt Wilkerson Lineberger Comprehensive Cancer Center University of North Carolina at Chapel Hill -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Re: Problem with GLAD on linux cluster
Hi. On Thu, Jul 22, 2010 at 10:24 AM, cstratowa christian.strat...@vie.boehringer-ingelheim.com wrote: Dear Henrik, Thank you very much for changing the code for getAverageFile(), I will try it and let you know. Thank you also for the explanation of writing to a temporary file, now I understand your intention. Regarding race conditions: No, I do not assume that aroma.* takes care of potential race conditions. Here is what I do: Assume that I have downloaded from GEO a prostate cancer dataset consisting of 40 CEL-files. Then I create a directory Prostate and subdirectories Prostate/annotationData and Prostate/rawData following your required file structure. However, starting with the 2nd CEL-file I create subdirectories Prostate/Prostate2,...,Prostate/Prostate40, each containing a symbolic link to ../annotationData and ../rawData from Prostate. Do I understand you correctly that you use a separate project directory for each CEL file, so that when you process the data you get separate subdirectories probeData/ and plmData/ in each of these project directories? Thus when running GLAD each cluster node has its own directory to write to, e.g. Prostate/Prostate21/reports for creating the images. This is where I get lost. In order to do CN segmentation (here GLAD), you need to calculate CN ratios relative to a reference. Looking at your error message, that reference is calculated from the pool of samples, i.e. getAverageFile() is done on the pool of references. Thus, for this to make sense you need a *pool of samples*, but if I understood you correctly above, you don't have that, but only one array per project directory. I guess I misunderstood you, because your error indicates something else. The only way the error you got occurred was because multiple R sessions tried to run getAverageFile(ces) on data sets that contain arrays with the same names and in the same order (more precisely getNames(ces)). If they would contain different array names, there would be no clash, because that saveObject() statement (that I just removed) would write to different filenames. This makes me suspect that you indeed use the same pool of reference samples. Only after all nodes have finished their computations, then I move the relevant files to the main directory, e.g. all images are moved to Prostate/reports. Afterwards I delete the subdirectories Prostate2,...,Prostate40 and their contents. As you can see, using this setup there should not be any race conditions. The only remaining problem are the temporary files which you store in .Rcache in my home directory. So, there is something I don't understand above. Can you post you full script, because that would certainly remove some of the ambiguities. Also, it helps if change your script to be explicit about the getAverageFile() calculation, i.e. print(cesN); ceR - getAverageFile(cesN); print(ceR); seg - GladModel(cesN, ceR); print(seg); instead of letting GladModel() do it implicitly: seg - GladModel(cesN); print(seg); As explained above, if your parallelized R sessions calculate ceR - getAverageFile(cesN) on the same 'cesN data set they will try to generated the same 'ceR' result file, and you have a race condition. I know that you store the monocell files in .Rcache/ aroma.affymetrix, so that the monocell files have to be created only once. Actually, the monocell *CDF* is stored in the corresponding annotationData/chipTypes/chipType/ directory. What is stored in .Rcache/ is main for performance purpose, i.e. we use it for memoization [http://en.wikipedia.org/wiki/Memoization]. Moreover, we mostly use it for memoization of annotation data, because that type of information is likely to be requested multiple times for the same chip types regardless of data set. In order for memoization to work well across R sessions and hosts, the .Rcache/ directory need to be accessed globally. We rarely use memoization for experimental data, because that is typically only requested once (in the data sets life time). However, for the temporary files please allow me to suggest that you create a temporary directory in your file structure, e.g. Prostate/tmp, where these files are stored. In my case this would definitely solve my problem since each subdirectory would contain its own temporary directory, e.g. Prostate/Prostate21/tmp. I do not know if this change would break any code or cause any problems, it is only a naive suggestion. What is your opinion? You suggestion makes sense for dataset specific temporary files etc, but again, I don't think that is the case here. Instead I think we are misunderstanding each other. You script will help. /Henrik Best regards Christian On Jul 21, 6:46 pm, Henrik Bengtsson henrik.bengts...@gmail.com wrote: Hi Christian. On Wed, Jul 21, 2010 at 2:59 PM, cstratowa christian.strat...@vie.boehringer-ingelheim.com wrote: Dear Henrik, Thank you for this extensive explanation
[aroma.affymetrix] Re: Problem with GLAD on linux cluster
Hi. On Jul 22, 10:24 am, cstratowa christian.strat...@vie.boehringer- ingelheim.com wrote: Dear Henrik, Thank you very much for changing the code for getAverageFile(), I will try it and let you know. Thank you also for the explanation of writing to a temporary file, now I understand your intention. Regarding race conditions: No, I do not assume that aroma.* takes care of potential race conditions. Here is what I do: Assume that I have downloaded from GEO a prostate cancer dataset consisting of 40 CEL-files. Then I create a directory Prostate and subdirectories Prostate/annotationData and Prostate/rawData following your required file structure. However, starting with the 2nd CEL-file I create subdirectories Prostate/Prostate2,...,Prostate/Prostate40, each containing a symbolic link to ../annotationData and ../rawData from Prostate. Do I understand you correctly that you use a separate project directory for each CEL file, so that when you process the data you get separate subdirectories probeData/ and plmData/ in each of these project directories? Thus when running GLAD each cluster node has its own directory to write to, e.g. Prostate/Prostate21/reports for creating the images. This is where I get lost. In order to do CN segmentation (here GLAD), you need to calculate CN ratios relative to a reference. Looking at your error message, that reference is calculated from the pool of samples, i.e. getAverageFile() is done on the pool of references. Thus, for this to make sense you need a *pool of samples*, but if I understood you correctly above, you don't have that, but only one array per project directory. I guess I misunderstood you, because your error indicates something else. The only way the error you got occurred was because multiple R sessions tried to run getAverageFile(ces) on data sets that contain arrays with the same names and in the same order (more precisely getNames(ces)). If they would contain different array names, there would be no clash, because that saveObject() statement (that I just removed) would write to different filenames. This makes me suspect that you indeed use the same pool of reference samples. Only after all nodes have finished their computations, then I move the relevant files to the main directory, e.g. all images are moved to Prostate/reports. Afterwards I delete the subdirectories Prostate2,...,Prostate40 and their contents. As you can see, using this setup there should not be any race conditions. The only remaining problem are the temporary files which you store in .Rcache in my home directory. So, there is something I don't understand above. Can you post you full script, because that would certainly remove some of the ambiguities. Also, it helps if change your script to be explicit about the getAverageFile() calculation, i.e. print(cesN); ceR - getAverageFile(cesN); print(ceR); seg - GladModel(cesN, ceR); print(seg); instead of letting GladModel() do it implicitly: seg - GladModel(cesN); print(seg); As explained above, if your parallelized R sessions calculate ceR - getAverageFile(cesN) on the same 'cesN data set they will try to generated the same 'ceR' result file, and you have a race condition. I know that you store the monocell files in .Rcache/ aroma.affymetrix, so that the monocell files have to be created only once. Actually, the monocell *CDF* is stored in the corresponding annotationData/chipTypes/chipType/ directory. What is stored in .Rcache/ is main for performance purpose, i.e. we use it for memoization [http://en.wikipedia.org/wiki/Memoization]. Moreover, we mostly use it for memoization of annotation data, because that type of information is likely to be requested multiple times for the same chip types regardless of data set. In order for memoization to work well across R sessions and hosts, the .Rcache/ directory need to be accessed globally. We rarely use memoization for experimental data, because that is typically only requested once (in the data sets life time). However, for the temporary files please allow me to suggest that you create a temporary directory in your file structure, e.g. Prostate/tmp, where these files are stored. In my case this would definitely solve my problem since each subdirectory would contain its own temporary directory, e.g. Prostate/Prostate21/tmp. I do not know if this change would break any code or cause any problems, it is only a naive suggestion. What is your opinion? Your suggestion makes sense for dataset specific temporary files etc, but again, I don't think that is the case here. Instead I think we are misunderstanding each other. You script will help. /Henrik Best regards Christian On Jul 21, 6:46 pm, Henrik Bengtsson henrik.bengts...@gmail.com wrote: Hi Christian. On Wed, Jul 21, 2010 at 2:59 PM, cstratowa christian.strat...@vie.boehringer-ingelheim.com wrote: Dear Henrik, Thank you for this extensive explanation and sorry
[aroma.affymetrix] Re: Reference dataset for ACNE
[sorry my repost did not contain my full reply due to a cut'n'paste error.] Hi Nicolas. On Tue, Jul 20, 2010 at 11:42 AM, Nicolas Vergne nicolas.vergne@gmail.com wrote: Hi everybody, I use ACNE for the normalization of SNP6.0 chip arrays. As ACNE is a multi-array methode, I would like to know if there is an option to precise the dataset of reference in the doACNE function? You may ask one of two things. Either you want to be able (a) to specify the subset of the arrays that you trust and you wish to estimate the ACNE model parameters based on, or you wish (b) to estimate them from a separate reference (training) set. The ACNE package does unfortunately not support neither of this yet. For (a), I can only say that you have to rely on the robust estimators of ACNE and the assumption that most arrays behave as normals at any given SNP (it can be different set of samples for each SNP). For (b), the best you can do for now, is to include your training data set when you fit ACNE. If it is large enough it will dominate the estimates. As long as you do ACNE manually (i.e. not doACNE()): http://aroma-project.org/vignettes/ACNE you can still do the CRMAv2 preprocessing part of ACNE separately for the training data set. It is only when you get to that NmfSnpPlm step where you have to merge your test and the training data set, e.g. csNRef - ... # Probe-normalized training data set csN - ... # Probe-normalized test data set # Append the training (reference) set to the test data set csN - append(csN, csNRef); # And fit the ACNE probe summarization for the lot plm - NmfSnpPlm(csN, mergeStrands=TRUE); ... and so on. DETAILS: In order to truly use external parameter estimates (priors), we to be able to specify that in the NmfSnpPlm setup. Part of this mechanism is already in place (generically in the aroma.affymetrix framework), but not fully. What is mainly missing is that the internal low-level fitSnpNmf() of ACNE still don't recognize/utilize such prior estimates. I cannot predict when this can be done by me. You may want to look at it yourself, I recommend to get it working with fitSnpNmf(). There is an example in help(fitSnpNmfArray) that could be adjusted for testing it. When that is in place, it shouldn't be that hard for me to update NmfSnpPlm and the wrapper doACNE() accordingly. That is for alternative (b), though alternative (a) also needs to be implemented in fitSnpNmf(). I would like to use the same sample for each new chip normalization. And I wouldn't like to use the dataset that I want to normalize. Is it a good way? My problem is to not reproduce the analysis for each new chip in the project. This sounds like the (b) alternative: It is rather well known that there are large lab and batch effects in Affymetrix data. Preprocessing removes some of this but certainly not everything. Others have observed this over and over. Because of this, estimating the ACNE model parameters on one data set from a different batch/lab and use them to normalize another data set will work less well than if the parameters where estimated from samples with the same batch. Hope this helps (a bit). /Henrik On Jul 20, 11:42 am, Nicolas Vergne nicolas.vergne@gmail.com wrote: Hi everybody, I use ACNE for the normalization of SNP6.0 chip arrays. As ACNE is a multi-array methode, I would like to know if there is an option to precise the dataset of reference in the doACNE function? I would like to use the same sample for each new chip normalization. And I wouldn't like to use the dataset that I want to normalize. Is it a good way? My problem is to not reproduce the analysis for each new chip in the project. Thanks in advance for your answers, Nicolas -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Re: Problem with GLAD on linux cluster
Hi Christian. On Wed, Jul 21, 2010 at 2:59 PM, cstratowa christian.strat...@vie.boehringer-ingelheim.com wrote: Dear Henrik, Thank you for this extensive explanation and sorry for the late reply but I was pretty busy. Yes, it did work before! As I mentioned with versions aroma.affymetrix_1.1.0 and earlier I have never had a problem doing the analyses on cluster nodes. Looking at the source code of different versions of saveObject() I realize that using saveObject(..,safe=FALSE) would be the same as using saveObject() from R.utils_0.9.1. Thus in principle this could solve my problem. Is this correct? Sadly, method AffymetrixCelSet::getAverageFile() in aroma.affymetrix_1.6.2 does not allow to pass parameter safe=FALSE to saveObject(). Is it possible for you to change it? I have decided to remove that debug code that calls saveObject(), because it is not really needed anymore. The main reason why I remove it is because it is obsolete code. The intention of that code snippet in getAverageFile() was never to protect against race conditions (it was just an unplanned side effect). Until next release, you can get a patched version as: library(aroma.affymetrix); downloadPackagePatch(aroma.affymetrix); Note, as I said in my previous reply, by processing (=here calling getAverageFile() on) the same data set on multiple hosts, you are potentially running into race conditions resulting in corrupt data. You should at least be aware of it and understand why this is the case. It is still not clear to me why you create first a temporary file which you then rename (although you mention power failures etc). However, would it be possible to add a random number to the temporary filename, e.g. *.tmp.1948234, so that the problem with the existing temporary file could be avoided? The main purpose of writing to a temporary file and then renaming is to make sure that the file is complete. If something happens while writing the temporary file, the final file will not exist/be created. If one would write to the final file from the beginning, there is no way for us to know if the file was correctly created or not. So, writing via a temporary file, we effectively have a way of creating files in one atomic action. Probably you only need to change line 59 to: pathnameT - sprintf(%s.tmp.%i, pathname, as.integer(runif(1,1,))) In order not to corrupt the temporary file, we check if it already exist as a protection for being overwritten/added to by another process. Yes, you could randomize the name of the temporary file, lowering the risk of two hosts writing to the same temporary file. However, when done, both hosts will try to rename their temporary files to the same pathname. If done at the same time, we still may have problems. Regarding your suggestion to wrap getAverageFile() in Mutex calls I have no idea if there exists an R-package for this purpose. Neither Rmpi nor snow seem to be suitable for this purpose (at least not without a complete re-write of my package). Yes, I neither know of a functional mutex implementation in R. You can achieve some by utilizing the lock mechanisms of data base servers (not SqlLite), but nothing ready is available to my knowledge. Again, you seem to assume that aroma.* takes care of potential race conditions for you - it does not. It only tries to detect them without warranty - and indeed, the reason why got the error in the first place indicates that you are pushing the system and that race conditions may very well happen. If you run things in parallel and you are updating/writing the *same data resource*, you should really have protection against race conditions. This is a generic problem unrelated to aroma.*. /Henrik One other question: Is it allowed to delete the contents of directory .Rcache/ aroma.affymetrix/idChecks? Yes, it should be safe to delete any .Rcache/ as long as no R session is in the process of writing to it. It's a cache containing redundant information. Best regards Christian On Jul 2, 12:47 am, Henrik Bengtsson h...@stat.berkeley.edu wrote: Hi Christian. On Tue, Jun 29, 2010 at 3:39 PM, cstratowa christian.strat...@vie.boehringer-ingelheim.com wrote: Dear Henrik, Until now I have used aroma.affymetrix_1.1.0 with R-2.8.1 and could run my analysis on our sge-cluster w/o any problems. Now I have upgraded to R-2.11.1 and to aroma.affymetrix_1.6.2 and are curently testing with 8 chips whether my package based on aroma.affymetrix still works on the cluster. The normalization step on a server did run fine, howeever, distributing the 8 samples on the cluster to run GladModel() resulted in the problem that 3 of 8 cluster nodes did stop with the following error message: Loading required package: GLAD ... Loading required package: RColorBrewer Loading required package: Cairo Error in list(`computeCN(aroma, model = model, arrays = arrays[i], chromosomes = 1:23, ref
[aroma.affymetrix] Re: Problem with GLAD on linux cluster
Hi Christian. On Wed, Jul 21, 2010 at 2:59 PM, cstratowa christian.strat...@vie.boehringer-ingelheim.com wrote: Dear Henrik, Thank you for this extensive explanation and sorry for the late reply but I was pretty busy. Yes, it did work before! As I mentioned with versions aroma.affymetrix_1.1.0 and earlier I have never had a problem doing the analyses on cluster nodes. Looking at the source code of different versions of saveObject() I realize that using saveObject(..,safe=FALSE) would be the same as using saveObject() from R.utils_0.9.1. Thus in principle this could solve my problem. Is this correct? Sadly, method AffymetrixCelSet::getAverageFile() in aroma.affymetrix_1.6.2 does not allow to pass parameter safe=FALSE to saveObject(). Is it possible for you to change it? I have decided to remove that debug code that calls saveObject(), because it is not really needed anymore. The main reason why I remove it is because it is obsolete code. The intention of that code snippet in getAverageFile() was never to protect against race conditions (it was just an unplanned side effect). Until next release, you can get a patched version as: library(aroma.affymetrix); downloadPackagePatch(aroma.affymetrix); Note, as I said in my previous reply, by processing (=here calling getAverageFile() on) the same data set on multiple hosts, you are potentially running into race conditions resulting in corrupt data. You should at least be aware of it and understand why this is the case. It is still not clear to me why you create first a temporary file which you then rename (although you mention power failures etc). However, would it be possible to add a random number to the temporary filename, e.g. *.tmp.1948234, so that the problem with the existing temporary file could be avoided? The main purpose of writing to a temporary file and then renaming is to make sure that the file is complete. If something happens while writing the temporary file, the final file will not exist/be created. If one would write to the final file from the beginning, there is no way for us to know if the file was correctly created or not. So, writing via a temporary file, we effectively have a way of creating files in one atomic action. Probably you only need to change line 59 to: pathnameT - sprintf(%s.tmp.%i, pathname, as.integer(runif(1,1,))) In order not to corrupt the temporary file, we check if it already exist as a protection for being overwritten/added to by another process. Yes, you could randomize the name of the temporary file, lowering the risk of two hosts writing to the same temporary file. However, when done, both hosts will try to rename their temporary files to the same pathname. If done at the same time, we still may have problems. Regarding your suggestion to wrap getAverageFile() in Mutex calls I have no idea if there exists an R-package for this purpose. Neither Rmpi nor snow seem to be suitable for this purpose (at least not without a complete re-write of my package). Yes, I neither know of a functional mutex implementation in R. You can achieve some by utilizing the lock mechanisms of data base servers (not SqlLite), but nothing ready is available to my knowledge. Again, you seem to assume that aroma.* takes care of potential race conditions for you - it does not. It only tries to detect them without warranty - and indeed, the reason why got the error in the first place indicates that you are pushing the system and that race conditions may very well happen. If you run things in parallel and you are updating/writing the *same data resource*, you should really have protection against race conditions. This is a generic problem unrelated to aroma.*. /Henrik One other question: Is it allowed to delete the contents of directory .Rcache/ aroma.affymetrix/idChecks? Yes, it should be safe to delete any .Rcache/ as long as no R session is in the process of writing to it. It's a cache containing redundant information. Best regards Christian On Jul 2, 12:47 am, Henrik Bengtsson h...@stat.berkeley.edu wrote: Hi Christian. On Tue, Jun 29, 2010 at 3:39 PM, cstratowa christian.strat...@vie.boehringer-ingelheim.com wrote: Dear Henrik, Until now I have used aroma.affymetrix_1.1.0 with R-2.8.1 and could run my analysis on our sge-cluster w/o any problems. Now I have upgraded to R-2.11.1 and to aroma.affymetrix_1.6.2 and are curently testing with 8 chips whether my package based on aroma.affymetrix still works on the cluster. The normalization step on a server did run fine, howeever, distributing the 8 samples on the cluster to run GladModel() resulted in the problem that 3 of 8 cluster nodes did stop with the following error message: Loading required package: GLAD ... Loading required package: RColorBrewer Loading required package: Cairo Error in list(`computeCN(aroma, model = model, arrays = arrays[i], chromosomes = 1
Re: [aroma.affymetrix] writeDataFrame in CRMAv2 and ACNE
[reposting; the forum has hiccups and does not put my replies in the archives or deliver to everyone.] Hi Nicolas, thanks for reporting this unwanted feature. I've fixed it so that the default filename is *,total.txt and *,fracB.txt, respectively. Until the next release is available, you can Install a patch as: library(aroma.affymetrix); downloadPackagePatch(aroma.core); FYI, the writeDataFrame() methods takes argument 'filename' (and 'path') allowing you to name them whatever you wish. /Henrik On Tue, Jul 20, 2010 at 11:45 AM, Nicolas Vergne nicolas.vergne@gmail.com wrote: Hi everybody, Just a little remark. I can not use writeDataFrame for CN and for BAF in the same script because txt file already exists. So I have to create two directories (or delete txt directory). Is there another solution? tags - ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY; chipType - GenomeWideSNP_6; ds1 - AromaUnitTotalCnBinarySet$byName(dataSet, tags=tags, chipType=chipType); dfTxt1 - writeDataFrame(ds1, columns=c(unitName, chromosome, position, *)); tags - ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY; chipType - GenomeWideSNP_6; ds2 - AromaUnitFracBCnBinarySet$byName(dataSet, tags=tags, chipType=chipType); dfTxt2 - writeDataFrame(ds2, columns=c(unitName, chromosome, position, *)); Thank you in advance for your answers, Nicolas -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
[aroma.affymetrix] IGNORE: Mail test #1
Hi, please ignore this message. I am trying to figure out why messages that I have sent (group owner) yesterday have not been delivered to the group and mailinglist archive. /Henrik -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: Exporting summarized signals to be used by Affymetrix GTC? (Was: Re: [aroma.affymetrix] Re: CRMA v2 errors)
Hi Markus, sorry but this one slipped through my net. On Thu, Jun 17, 2010 at 5:27 PM, Smaug72 leber.mar...@gmx.de wrote: Dear Henrik, unfortunately we are faced with another problem. We processed several CEL files with CRMAv2 as decribed by vignette: http://aroma-project.org/vignettes/CRMAv2 We received no errors during the run. The problem is that we cannot process the CEL files, which are generated. The CEL-Files generated in step 2 (Normalization for nucleotide- position probe sequence effects) have a size of about 65,9 MB. These files can be loaded by GTC Software version 4.0, but they cannot be processed by further examination (e.g. QC-matrix calculation) Yes, the output of all *probe-level* preprocessing methods generates CEL files of the same format/layout that you can treat as if they were raw CEL files. You should be able to use these CEL files in for instance the Affymetrix GTC software, dChip and so on. The CEL files generated in step 3 (Probe summarization) have a size of about 26,9 MB (decrease in size of about 39 MB). These CEL-Files cannot by loaded by GTC Software version 4.0. No/correct, because those CEL files are so called chip-effect CEL files (*,chipEffects.CEL), which are custom-made (=only recognized) by the aroma.affymetrix software. They cannot be read by other software (and you should not try to either). It sounds like you wish to export the summarized CN signals from aroma.affymetrix into the Affymetrix GTC software. I don't know what kind of data GTC can import, but you can export/write CN signals to tab-delimited text files by: # CRMA v2 vignette cesN - ... # from the PCR fragment-length normalization # Generate platform-independent data sets dsNList - exportTotalAndFracB(cesN, drop=FALSE); Then you can do: writeDataFrame(dsNList$total, ...); and (only if you used combineAlleles=FALSE): writeDataFrame(dsNList$fracB, ...); For more details on writeDataFrame() and what is written, see http://aroma-project.org/howtos/writeDataFrame The data is (total,fracB) = (total signal, allele B fractions). If you want (thetaA,thetaB) or similar you need to fix that yourself afterward. Again, I don't know what kind of data Affymetrix GTC can import, if any. BTW, if you use doCRMAv2() it will do/return 'dsNList' for you, cf. http://aroma-project.org/blocks/doCRMAv2. That may be more convenient for you. Hope this helps Henrik Do have experience with this error or explanations? Thank you and best regards, Markus -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] problem with CDF file
Hi, before continuing, do you have the latest version of aroma.affymetrix (v1.6.0) installed, e.g. what does library(aroma.affymetrix); print(sessionInfo()); report? You probably also want to update to R v2.11.1 (R v2.9.0 is rather old). Second, the annoationData/ directory should be located in your working directory, i.e. print(getwd()); I doubt that 'C:/Program Files/R/R-2.9.0/library/aroma.affymetrix/' is your working directory. See thread 'Could not locate a file for this chip type (Was: ...)' from August 27, 2009 http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/c18f714638a6eb24/9b34427b16128ef3 for more troubleshooting tricks. /Henrik On Tue, Jul 13, 2010 at 12:09 AM, Zsuzsa zsu...@gmail.com wrote: Hello Henrik, I am trying to use the the aroma.affymetrix package to check the quality of some mouse GeneST arrays. I got stuck with the cdf file. I downloaded the unsupported cdf file from Affymetrix, placed it in annotationData/chipTypes/MoGene-1_0-st-v1 folder and the run the following commands: library(affxparser) convertCdf(filename = C:/Program Files/R/R-2.9.0/library/ aroma.affymetrix/annotationData/chipTypes/MoGene-1_0-st-v1/MoGene-1_0- st-v1.r3.cdf, outFilename = C:/Program Files/R/R-2.9.0/library/aroma.affymetrix/ annotationData/chipTypes/MoGene-1_0-st-v1/MoGene-1_0-st-v1,r3.cdf) library(aroma.affymetrix) verbose - Arguments$getVerbose(-8, timestamp=TRUE) chipType - MoGene-1_0-st-v1 cdf - AffymetrixCdfFile$byChipType(chipType, tags=r3) The problem is I am getting the following error message: Error in list(`AffymetrixCdfFile$byChipType(chipType, tags = r3)` = environment, : [2010-07-12 12:42:56] Exception: Could not locate a file for this chip type: MoGene-1_0-st-v1,r3 at throw(Exception(...)) at throw.default(Could not locate a file for this chip type: , paste(c(chipType, tags), collapse = ,)) at throw(Could not locate a file for this chip type: , paste(c(chipType, tags), collapse = ,)) at method(static, ...) at AffymetrixCdfFile$byChipType(chipType, tags = r3) I think the function is looking for the file someplace else, but I don't know where. Would you be able to help me out on this. Thank you. Zsuzsa -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] ArrayExplorer: Error in readCelHeader(pathname)
Hi Richard, sorry for the delay - this one slipped through and I simply missed our message, but I caught it while troubleshooting the same problem reported in thread 'ArrayExplorer issue' started on 2010-07-02. The reason for your problems is a bug in aroma.core/R.filesets that causes hiccups when there are parentheses in the file/array names. This will be solved in the next release of aroma.core and R.filesets. Until that is available, please use the provided patches: library(aroma.affymetrix); downloadPackagePatch(R.filesets); downloadPackagePatch(aroma.core); downloadPackagePatch(aroma.affymetrix); Let me know if this helps. /Henrik On Fri, May 14, 2010 at 6:17 PM, Richard Beyer rpbe...@gmail.com wrote: Hi All, I am having a new problem with code I've run many times in the past. I guess something changed and I was hoping someone else has seen something similar and could point me in the right direction. I have affy rat ST chips (also same error with mouse ST). I do the usual preprocessing (using Mark Robinson's doEverything script). All is well as far as getting results, plotRle and plotNuse work fine. I have also executed the command in doEverything one at a time and seen no errors. The problem appears here: e.AndersonRat1 - doEverything(AndersonRatST_10.03.12.all_probes_bg_qn, RaGene-1_0-st-v1, getExpression=TRUE, doNorm=FALSE, doResiduals=TRUE) . . . . Calculating PLM residuals...done Warning message: In fitfcn(y) : Ignoring a unit group when fitting probe-level model, because it has a ridiculously large number of data points: 6515x50 5000x1 plotRle(e.AndersonRat1$qam,main=RLE e.AndersonRat1$qam probe level QN) rs - e.AndersonRat1$res ae - ArrayExplorer(rs) setColorMaps(ae, c(log2,log2pos,rainbow)) process(ae, interleaved=auto) Error in readCelHeader(pathname) : Cannot read CEL file header. File not found: NA/NA In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf R version 2.11.0 (2010-04-22) x86_64-redhat-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=C [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] preprocessCore_1.10.0 affyio_1.16.0 Biobase_2.8.0 aroma.affymetrix_1.5.0 aroma.apd_0.1.7 affxparser_1.20.0 [7] R.huge_0.2.0 aroma.core_1.5.0 aroma.light_1.15.1 matrixStats_0.2.1 R.rsp_0.3.6 R.cache_0.3.0 [13] R.filesets_0.8.1 digest_0.4.2 R.utils_1.4.0 R.oo_1.7.2 affy_1.24.2 R.methodsS3_1.2.0 loaded via a namespace (and not attached): [1] tools_2.11.0 All of this code was working with R 2.10.0. There seems to be lots of CEL files in the right places. For example: Pathname: plmData/AndersonRatST_10.03.12.all_probes_bg_noqn,RBC,RMA/RaGene-1_0-st-v1/Anderson_PG50_042210_(RaGene-1_0-st-v1),residuals.CEL I'm not sure how best to track this down. If anyone has a suggestion or pointer, I'd be very grateful.. Thanks much, Dick *** Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 Env. Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 Seattle, WA 98105-6099 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html http://staff.washington.edu/~dbeyer *** -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Error in using the extract() function
Hi Johan, On Wed, Jun 30, 2010 at 9:13 AM, Johan Staaf johan.st...@med.lu.se wrote: Dear Henrik, I get an error when trying to extract CRMAv2 processed data when using the extract() function like below. cesSamples - extract(cesNList[[chipType]], assay.vector) The error occurs when the assay vector contains sample names with parentheses in them, like: WHOOP_p_STY30_(CO-108057)_Mapping250K_Sty_H01_107610 However, there is no errors in the actual processing of the data, meaning that I get the file: WHOOP_p_STY30_(CO-108057)_Mapping250K_Sty_H01_107610,chipEffects.CEL Correct, this is because when you use extract() to pull out a subset of the arrays, extract() is calling indexOf() and it is in the latter there is a bug. When you process a data set, indexOf() is not used in your case, which is why there is no issue/error. As you might have noticed from the discussion on the mailing list, this problem is related to recent reports by others who also use parentheses in their filenames. I've solved the bug for the next release of aroma.core and R.filesets. Until that is available, please use the provided patches: library(aroma.affymetrix); downloadPackagePatch(R.filesets); downloadPackagePatch(aroma.core); downloadPackagePatch(aroma.affymetrix); Let me know if this helps. /Henrik Best regards Johan -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Microsoft Visual C++ Runtime Library
It sounds like on of your CEL files are corrupt. Start out with the 5 CEL files that work and add other CEL files one by one to the directory to figure out which work and which do not. Also, the CEL files should roughly be of the same file size; if one is much different that is a likely clue that it may be corrupt. Details: Ideally a corrupt CEL file should not crash R, but rather generate a nice error message. Unfortunately, it is the low-level Affymetrix Fusion SDK code that cause it to crash, which is beyond R and aroma.affymetrix. /Henrik On Thu, Jul 1, 2010 at 11:36 PM, Liang Cheng vikingch...@gmail.com wrote: Thank you, Pierre, in the beginning: cs - AffymetrixCelSet$byName(1, chipType=Mapping250K_Nsp) if I try to put 10 CEL files, a window will come out and it shows that: the application has requested the runtime to terminate it in an unusaul way. But if there are 5 CEL files, it works well. I appreciate your help, Liang 2010/7/1 Pierre Neuvial pie...@stat.berkeley.edu Thanks, and when do you get an error ? Can you paste the error message and the output of traceBack() ? Pierre On Thu, Jul 1, 2010 at 10:26 AM, Liang Cheng vikingch...@gmail.com wrote: Thank you, Pierre, the following is the sessionInfo and code: R version 2.11.0 (2010-04-22) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] preprocessCore_1.10.0 aroma.affymetrix_1.6.0 aroma.apd_0.1.7 [4] affxparser_1.20.0 R.huge_0.2.0 aroma.core_1.6.0 [7] aroma.light_1.16.0 matrixStats_0.2.1 R.rsp_0.3.6 [10] R.cache_0.3.0 R.filesets_0.8.2 digest_0.4.2 [13] R.utils_1.4.2 R.oo_1.7.3 R.methodsS3_1.2.0 Code: library(aroma.affymetrix) cs - AffymetrixCelSet$byName(1, chipType=Mapping250K_Nsp) qn - QuantileNormalization(cs) csQN - process(qn, verbose=TRUE) plm - RmaCnPlm(csQN, combineAlleles=TRUE, mergeStrands=TRUE) fit(plm, verbose=TRUE) ces - getChipEffectSet(plm) exData - extractDataFrame(ces, units=NULL, addNames=TRUE) write.table(exData,file=fileName.txt,row.names=FALSE) thank you very much, Liang 2010/7/1 Pierre Neuvial pie...@stat.berkeley.edu Hi, Could you please report the output of sessionInfo() and traceback(), and post a complete code example ? Pierre On Tue, Jun 29, 2010 at 10:09 AM, Liang Cheng vikingch...@gmail.com wrote: Hello everyone, I meet this error when I try to read 10 CEL files by using AffymetrixCelSet: the application has requested the runtime to terminate it in an unusaul way. Can someone help me? thanks a lot, Liang -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to
Re: Where can I download the CDF file? (Was: Re: [aroma.affymetrix] problem with Arguments$getInstanceOf(dataSet, SnpChipEffectSet))
Please see FAQ. 2007-05-24 on http://aroma-project.org/FAQ /Henrik On Fri, Jun 25, 2010 at 10:49 AM, Liang Cheng vikingch...@gmail.com wrote: Henrik, I found that if I want to read CEL files, I have to get the CDF files. Where can I get the ones mentioned in your slider, especially the 250k ones? thanks a lot, Viking [snip] -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Problem with GLAD on linux cluster
Hi Christian. On Tue, Jun 29, 2010 at 3:39 PM, cstratowa christian.strat...@vie.boehringer-ingelheim.com wrote: Dear Henrik, Until now I have used aroma.affymetrix_1.1.0 with R-2.8.1 and could run my analysis on our sge-cluster w/o any problems. Now I have upgraded to R-2.11.1 and to aroma.affymetrix_1.6.2 and are curently testing with 8 chips whether my package based on aroma.affymetrix still works on the cluster. The normalization step on a server did run fine, howeever, distributing the 8 samples on the cluster to run GladModel() resulted in the problem that 3 of 8 cluster nodes did stop with the following error message: Loading required package: GLAD ... Loading required package: RColorBrewer Loading required package: Cairo Error in list(`computeCN(aroma, model = model, arrays = arrays[i], chromosomes = 1:23, ref` = environment, : [2010-06-29 15:08:49] Exception: Cannot save to file. Temporary file already exists: ~/.Rcache/aroma.affymetrix/idChecks/ a1c33926939ee43fbed83ae69301d215.tmp at throw(Exception(...)) at throw.default(Cannot save to file. Temporary file already exists: , pathn at throw(Cannot save to file. Temporary file already exists: , pathnameT) at saveObject.default(list(key = key, keyIds = lapply(key, digest2), id = id), at saveObject(list(key = key, keyIds = lapply(key, digest2), id = id), idPathn at getAverageFile.AffymetrixCelSet(ces, force = force, verbose = less(verbose) at NextMethod(generic = getAverageFile, object = this, indices = indices, .. at getAverageFile.ChipEffectSet(ces, force = force, verbose = less(verbose)) at NextMethod(generic = getAverageFile, object = this, ...) at getAverageFile.SnpChipEffectSet(ces, force = force, verbose = less(verbose) at NextMethod(generic = getAverageFile, object = this, ...) at getAverageFile.CnChipEffectS Calls: computeCN ... saveObject.default - throw - throw.default - throw - throw.Exception Execution halted Interestingly, on the other 5 nodes GladModel() seems to run fine. Do you have any idea what the reason for this problem might be? This seems to be due to a race condition, because several processes calls getAverageFile() on the same data set (set of data files). It has nothing to do with the GladModel - that is only calling getAverageFile() in order to calculate the average signal across all samples in the data set. More precisely, in this particular case it is saveObject() of R.utils that detects that there already exist a temporary file (added file name extension *.tmp) that is currently being created and written to by another process. This temporary file is renamed to its final name when done. The reason why didn't observe it before is most likely because this additional feature was added to saveObject() in R.utils v1.2.4: Version: 1.2.4 [2009-10-30] o ROBUSTIFICATION: Lowered the risk for saveObject() to leave an imcomplete file due to say power failures etc. This is done by first writing to a temporary file, which is then renamed. If the temporary file already exists, an exception is thrown. Ok, that's the details explaining the error message and the traceback you report. So, did it work before? Did you get valid estimates? Probably, because the way getAverageFile() is written it is unlikely that a corrupt result file is created. For sure is that the calculations where done multiple times if there were race conditions. I'd like to put out a little disclaimer that although I try write methods so that they work even when there are race conditions. However, as you've noticed, I am also very conservative, that is, I rather detect the race condition and throw an exception, than silently ignore it. Then plan is to loosen this up in the future. I just like to say this here so that you understand my current design decisions/plans. I have to think about this particular case, because I could loosen up getAverageFile() a bit, I think. However, at the moment it is better if you take care of the race conditions yourself. Assume you current code looks something like this: fln - FragmentLengthNormalization(ces); cesN - process(fln); seg - GladModel(cesN); process(seg); Then first you should know that the latter two lines are computationally identical to [it is only slightly more complicated if you use chip type pairs]: ceR - getAverageFile(cesN); seg - GladModel(cesN, ceR); process(seg); So, if you can synchronize the averaging by (conceptually only): mutex - waitForMutex(foo); ceR - getAverageFile(cesN); releaseMutex(mutex); then it should all be fine. Replace waitForMutex()/releaseMutex() with your favorite synchronization mechanism. FYI, if there would be a cross-platform bullet proof and generic synchronization mechanism in R, I would internally add synchronization to lots of methods. Hope this helps(?) Henrik sessionInfo() R version 2.11.1 (2010-05-31) x86_64-unknown-linux-gnu locale: [1] C attached base packages: [1] stats
Re: [aroma.affymetrix] Problems with Affymetrix 250K Sty2 arrays after CRMAv2 analysis
Hi Johan, this certainly looks like a computational hiccup. I never seen it, though I can imagine various ways how it could happen. Instead of guessing, do you have a complete script that you did, you did you type the commands on the command line one by one? You are saying you did the analysis according to the tutorial for 10-500K analysis (CRMAv1), but the steps you describe are from the CRMAv2 method. An most importantly, because I think the answer is in the here, how did you extract the summarized data and how did you generate the plot? How did you calculate your reference signals? One of my guesses is that the upper CN band, which looks to be correct, are for the non-polymorphic CN loci, whereas the lower one, which is shifted, is from SNP signals. It could be that you are somehow only plotting allele-specific CA and/or CB signals and not C=CA+CB for SNPs. So, if you send all the commands verbatim, I can let you know what needs to be changed. /Henrik On Wed, Jun 23, 2010 at 9:02 AM, Johan Staaf johan.st...@med.lu.se wrote: Hi Henrik I have a question about strange looking genomic profiles for Affymetrix 250K Sty2 chips from GSE14994 after CRMAv2 analysis (please see attached png figures). Processing was done using calibration for allelic crosstalk, normalization for nucleotide-position probe sequence effects, probe summarization, and pcr fragment normalization according to the tutorial for 10-500K analysis. CN was obtained by comparison to normal samples also obtained from public repositories and processed simultaneously. Do you know of the cause of this, and how it could be corrected? Best regards Johan -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] problem with Arguments$getInstanceOf(dataSet, SnpChipEffectSet)
Hi. On Wed, Jun 23, 2010 at 11:04 AM, mortiz mortiz...@gmail.com wrote: hi everyone, im trying to develop a function based on FragmentLengthNormalization, but when i try to execute my new function it gives me the next error message: Error in process.NormalRegions(normalReg, verbose = verbose) : attempt to apply non-function sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 It's really time to update your R installation. locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States. 1252;LC_MONETARY=English_United States. 1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] sfit_0.1.8 setRNG_2009.11-1 MASS_7.2-48 aroma.affymetrix_1.3.0 aroma.apd_0.1.7 affxparser_1.16.0 R.huge_0.2.0 aroma.core_1.3.1 [9] aroma.light_1.15.1 matrixStats_0.1.8 R.rsp_0.3.6 R.filesets_0.6.5 digest_0.4.1 R.cache_0.2.0 R.utils_1.2.4 R.oo_1.6.7 [17] R.methodsS3_1.0.3 loaded via a namespace (and not attached): [1] tools_2.9.2 traceback() 2: process.NormalRegions(prueba, verbose = verbose) 1: process(prueba, verbose = verbose) the thing is that if i try FragmentLengthNormalization it does not give any problem, but if I do source(I:/aroma/FragmentLengthNormalization.R) then when I execute FragmentLengthNormalization over my ces variable it gives the same error str(ces) Classes 'CnChipEffectSet', 'SnpChipEffectSet', 'ChipEffectSet', 'ParameterCelSet', 'AffymetrixCelSet', 'AffymetrixFileSet', 'AromaPlatformInterface', 'AromaMicroarrayDataSet', 'GenericDataFileSet', 'FullNameInterface', 'Object' atomic [1:1] NA ..- attr(*, .env)=environment: 0x0419c000 ..- attr(*, ...instantiationTime)= POSIXct[1:1], format: 2010-06-23 10:28:05 can anyone help me with this??? I cannot see how you would get an error because you source() the FragmentLengthNormalization.R source file - that is really weird and often there is probably a very simple explanation. If you still mean that you can reproduce this error by sourcing FragmentLengthNormalization.R, then lets focus on that first and forget about your new code. Please provide a complete script showing what you are doing. Also, you can always do: debug(process.FragmentLengthNormalization); cesN - process(fln, verbose=-50); so that you can step through it to figure out exactly at what statement the error occurs. Make sure to start out from a fresh R session. If FLN doesn't give an error, you can move on to debug() your own method(s). Hope this helps /Henrik thanks!! :) maria o. -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Residual plot vertical separation
Hi Mamum, On Fri, Jun 18, 2010 at 12:25 PM, Mamun Rashid mamunbabu2...@gmail.com wrote: Hi everyone, I am analysing some affymetrix exon array data. I have been performing some Quality checking of the data. I have plotted the residulas from the plm fit of raw intensity data. I am using the core CDF file HuEx-1_0-st-v2,core,A20071112,EP chipType - HuEx-1_0-st-v2 cdf - AffymetrixCdfFile$byChipType(chipType,tags=core,A20071112,EP) I'm not sure where/when you've downloaded this CDF file, because it's name does not have the coreR3 tag but the core tag that is available at http://aroma-project.org/chipTypes/HuEx-1_0-st-v2/transcriptClustersCDFs You probably have the same file as HuEx-1_0-st-v2,coreR3,A20071112,EP.CDF. Compare the checksum you get with the one below. cdf - AffymetrixCdfFile$byChipType(HuEx-1_0-st-v2,coreR3,A20071112,EP); cdf AffymetrixCdfFile: Path: annotationData/chipTypes/HuEx-1_0-st-v2 Filename: HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf Filesize: 38.25MB Chip type: HuEx-1_0-st-v2,coreR3,A20071112,EP RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 2560x2560 Number of cells: 6553600 Number of units: 18708 Cells per unit: 350.31 Number of QC units: 1 getChecksum(cdf); [1] e7b0bacd27699534d125b16266d7cc09 If the checksums are identical, the file content is identical. exp_name - Affy-Exon cs - AffymetrixCelSet$byName(exp_name,cdf=cdf) ## *** Background Correction *** ## bc - RmaBackgroundCorrection(cs, tag=core) csBC - process(bc,verbose=verbose) ## Background corrected Raw Data ## *** summarization with PLM *** ## plmTr - ExonRmaPlm(csBC, mergeGroups=TRUE) fit(plmTr, verbose=verbose) ## *** Residual calculation of PLM fit *** ## rs - calculateResiduals(plmTr, verbose=verbose) # To browse spatial false-colored images of the residuals ae - ArrayExplorer(rs) setColorMaps(ae, c(log2,log2neg,rainbow, log2,log2pos,rainbow)) process(ae, interleaved=auto, verbose=verbose) display(ae) Now I see a clear vertical separation between most of the residual plots. I am not 100% sure what you mean by a clear vertical separation, but I guess you mean that there is a narrow white-ish band near the left-right center of the array. This is the likely reason: The PLM plmTr - ExonRmaPlm(csBC, mergeGroups=TRUE); is only done only using the PM probes, and any residuals are therefore only defined for those probes. For all other probes (e.g. MMs but also all the PMs not included in the CDF) there are no residuals defined (which show up as white in your plots). Next, if you look at the distribution of the PMs *as defined by* the CDF (HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf), there are fewer PMs in that center band than on the rest of the array, which means you will see fewer residuals in that band as well. There are also less fewer PMs in the left part compared with the right part of the array, which is probably the reason why the right part seems to be darker than the left part (when you look at the residual plots). The spatial distribution of PMs *according to the CDF* can be studied as follows: library(aroma.affymetrix); downloadPackagePatch(aroma.core); verbose - Arguments$getVerbose(-8, timestamp=TRUE); cdf - AffymetrixCdfFile$byChipType(chipType,tags=coreR3,A20071112,EP); # Get a spatial imageshowing the PMs (as defined by the CDF) # NOTE: The first time you do this for a new CDF, this will be very slow (~20 mins) # because internally extractDataFrame(cdf) is called. After that, it'll be fast. img - getImage(cdf, field=isPm, verbose=verbose); img - 1-img; # PM=1 (white) - PM=0 (black). # Write to file... pathname - sprintf(%s,isPm.png, getFilename(cdf)); EBImage::writeImage(img, file=pathname); # This image is available at: # http://www.aroma-project.org/images/public/chipTypes/HuEx-1_0-st-v2/HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf,isPm.png # By just looking at it, you can see it is darker (more PMs) in # the right part than the left. You also see the much lower # density of PMs in the narrow band in the middle. # Average density of PMs in left and right parts cols - 1:(nbrOfColumns(cdf)/2); imgL - img[cols,]; # sic! - the 'img' is rotated imgR - img[-cols,]; print(mean(imgL == 0)); ## [1] 0.1414371 print(mean(imgR == 0)); ## [1] 0.106 print(mean(imgR == 0) / mean(imgL == 0)); ## [1] 1.335439 Thus, there are 19% PMs in the right part and 14% in the left part (as defined by the HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf), which means there 33.5% more PMs in the right part compared with the left part. Some of the plots revealed some artefacts and scratches shich might occured due to hybridization and scanning problem. Almost all artifacts show up in the residual plots, because the artifact often affect only one of the probes in a probeset (since Affymetrix designed the arrays so that probes in a probeset are spread out on the array to avoid all being affected). Since it only affects one of the probes its residual will be an outlier
Re: [aroma.affymetrix] How to convert a xxx.CEL file to a yyy.txt file?
On Wed, Jun 23, 2010 at 7:29 AM, Liang Cheng vikingch...@gmail.com wrote: I want to read the data in it and then process it. so: the method to read it all kinds of methods to process it I am sorry, but that is still an extremely vague specification of what you are going to do. The best I can tell you is to have a look at the various vignettes online - http://www.aroma-project.org/ - to get a feel on the vast number of alternatives you have. You probably also want to look at the various Bioconductor packages supporting Affymetrix data - http://www.bioconductor.org/. /Henrik thank you 2010/6/23 Henrik Bengtsson h...@stat.berkeley.edu On Wed, Jun 23, 2010 at 7:00 AM, Liang Cheng vikingch...@gmail.com wrote: Thank you, Henrik So there is no function from aroma package, which can deal with the xx.cel file? Please explain what you mean deal with. There are hundreds of methods in the aroma.affymetrix package that process CEL files in various ways. /Henrik Viking 2010/6/22 Henrik Bengtsson h...@stat.berkeley.edu Hi Viking, On Tue, Jun 22, 2010 at 6:28 PM, Viking vikingch...@gmail.com wrote: I am new to aroma-project. I didn't find some materials to learn how to do it. Can somebody please help me? Thanks a lot. although you new here, please allow me to use a bit of sarcasm, because then I can award you the price of 'Posting The Least Precise Question' on the list during the last four years ;) More seriously, what are you trying to do? A CEL file contains (avg. intensity, std.dev. of intensity, nbr of pixels) for each probe. In addition to this there is some information about probes flagged as outliers by the Affymetrix scanner/image analysis software. You can find more information in the help pages of the affxparser software. There is not information about probesets, genes etc. Just so you know. The easiest to access all the information in a CEL file is probably use the low-level affxparser package and its readCel() function, e.g. pathname - 0001-7,10K,15-08-2006.CEL; data - readCel(pathname, readXY=TRUE, readIntensities=TRUE, readStdvs=TRUE, readPixels=TRUE); List of 8 $ header :List of 14 ..$ filename : chr C:/Users/hb/Documents/My Data/rawData/Jeremy_2007-10k /Mapping10K_Xba142/0001-7,10K,15-08-2006.CEL ..$ version : int 4 ..$ cols : int 658 ..$ rows : int 658 ..$ total : int 432964 ..$ algorithm : chr Percentile ..$ parameters : chr Percentile:75;CellMargin:2;OutlierHigh:1.500;OutlierL ow:1.004;AlgVersion:6.0;FixedCellSize:TRUE;FullFeatureWidth:5;FullFeatureH| __t runcated__ ..$ chiptype : chr Mapping10K_Xba142 ..$ header : chr Cols=658\nRows=658\nTotalX=658\nTotalY=658\nOffsetX=0 \nOffsetY=0\nGridCornerUL=235 130\nGridCornerUR=3603 136\nGridCornerLR=359| __t runcated__ ..$ datheader : chr [25..29720] 0001-7 10K 15-08-2006:CLS=3715 RWS=3715 XIN=1 YIN=1 VE=30 2.0 08/15/06 10:02:13 50206820 M10 \024 \02| __t runcated__ ..$ librarypackage: chr ..$ cellmargin : int 2 ..$ noutliers : int 1527 ..$ nmasked : int 0 $ x : int [1:432964] 0 1 2 3 4 5 6 7 8 9 ... $ y : int [1:432964] 0 0 0 0 0 0 0 0 0 0 ... $ intensities: num [1:432964] 271 16038 282 17471 138 ... $ stdvs : num [1:432964] 34.9 2321.7 36 3107.4 16.9 ... $ pixels : int [1:432964] 9 9 9 9 9 9 9 9 9 9 ... $ outliers : int [1:1527] 272 307 345 360 486 624 952 1019 1037 1155 ... $ masked : NULL See help(readCel, package=affxparser) for more information. Then you can write whichever fields you like to to file. Hope this helps Henrik Viking -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output
Re: [aroma.affymetrix] an error on locating cdf
Hi. On Mon, Jun 21, 2010 at 2:26 PM, Albyn albyn.dhun...@gmail.com wrote: Dear all, I am new to R and aroma.affymetrix both. I have 25 SNP6.0 Cel files and I have to find out LOH and UPD. I would like to use aroma.affymetrix could anybody suggest me if it is a good idea to use this package??? You can do lots of different types of preprocessing in aroma.affymetrix, particularly CRMAv2. You can do total CN segmentation, e.g. CBS. To do LOH and UPD analysis, you'll need parental-specific CN (PSCN) analysis, e.g PSCN segmentation. We are still working on getting PSCN analysis/segmentation into a standard pipeline form, and that will take time. However, you can still generate raw PSCN signals and use those to look/confirm LOH regions, e.g. via plotting allele B fraction along genome. See for instance, http://aroma-project.org/vignettes/tumorboost-highlevel I haven't used it in a while, but some people also use dChip for this. I am trying to install aroma.affymetrix and following the steps given on vignette. Could you please be more precise with which vignette you are looking at? I got stone on my second step. cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full); Error in list(`AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags = Full)` = environment, : [2010-06-21 14:12:57] Exception: Could not locate a file for this chip type: GenomeWideSNP_6,Full at throw(Exception(...)) at throw.default(Could not locate a file for this chip type: , paste(c(chipType, tags), collapse = ,)) at throw(Could not locate a file for this chip type: , paste(c(chipType, tags), collapse = ,)) at method(static, ...) at AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags = Full) Could anyone suggest me what might have been here?? Please read the Setup instructions at http://aroma-project.org/setup/annotationData The CRMAv2 vignette for processing GenomeWideSNP_6 arrays should also be rather clear about this; http://aroma-project.org/vignettes/CRMAv2 Does this help? /Henrik /yogesh Here is my session Info: sessionInfo() R version 2.11.0 (2010-04-22) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] genomewidesnp6Crlmm_1.0.2 aroma.affymetrix_1.6.0 aroma.apd_0.1.7 affxparser_1.20.0 R.huge_0.2.0 aroma.core_1.6.0 [7] aroma.light_1.16.0 matrixStats_0.2.1 R.rsp_0.3.6 R.filesets_0.8.2 digest_0.4.2 R.cache_0.3.0 [13] R.utils_1.4.2 R.oo_1.7.3 R.methodsS3_1.2.0 loaded via a namespace (and not attached): [1] tools_2.11.0 -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Problem with Total Copy Number Vignette
Hi Jack. On Mon, Jun 7, 2010 at 3:36 PM, Jack Yu j.yu...@gmail.com wrote: Hello, I sent an e-mail earlier regarding errors in running the total copy number vignette, but please disregard that as it turns out it was just an issue with the annotation files. Sorry for the inconveniences. Good. I've just send a message to that thread/discussion ('Error during Total copy number analysis using CRMA v1', June 4, 2010) closing it. That way anyone reading the forum archives can see it was solved. In the future, please always try to reply to original thread saying that it's been solved or not. However, I've encountered another problem that I'm hoping someone could help me with. After running the normalization of the chip effects using: cesNList[[chipType]] - process(fln, verbose=verbose) I encountered the error of: Error in list(`process(fln, verbose = verbose)` = environment, `process.FragmentLengthNormalization(fln, verbose = verbose)` = environment, : [2010-06-07 09:34:07] Exception: Cannot fit target function to enzyme, because there are no (finite) data points that are unique to this enzyme: 1 Are you following the same vignette ('Total copy number analysis using CRMA v1 (10K, 100K, 500K)') as you did your previous thread? Then my best guess is that you forgot to do: fit(plm, verbose=verbose); before moving on to the PCR fragment length normalization. If that does not help, let me know what the output of: ces - getChipEffectSet(plm); print(ces); More importantly, is there a reason why you want to use CRMAv1 and not CRMAv2? Note that the latter is recommended for GenomeWideSNP_6 data sets. To use CRMAv2, see vignette 'Estimation of total copy numbers using the CRMA v2 method (10K-GWS6)' [http://aroma-project.org/vignettes/CRMAv2]. ...or even easier, just use the new doCRMAv1() or doCRMAv2(), cf. http://aroma-project.org/blocks /Henrik sessionInfo() R version 2.11.1 (2010-05-31) powerpc-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] aroma.affymetrix_1.6.0 aroma.apd_0.1.7 affxparser_1.20.0 R.huge_0.2.0 aroma.core_1.6.0 [6] aroma.light_1.16.0 matrixStats_0.2.1 R.rsp_0.3.6 R.cache_0.3.0 R.filesets_0.8.1 [11] digest_0.4.2 R.utils_1.4.0 R.oo_1.7.2 R.methodsS3_1.2.0 traceback() 8: throw.Exception(Exception(...)) 7: throw(Exception(...)) 6: throw.default(Cannot fit target function to enzyme, because there are no (finite) data points that are unique to this enzyme: , ee) 5: throw(Cannot fit target function to enzyme, because there are no (finite) data points that are unique to this enzyme: , ee) 4: getTargetFunctions.FragmentLengthNormalization(this, verbose = less(verbose)) 3: getTargetFunctions(this, verbose = less(verbose)) 2: process.FragmentLengthNormalization(fln, verbose = verbose) 1: process(fln, verbose = verbose) Thanks in advance, Jack -- Jack Y. Yu Washington University in St.Louis (505) 920-0701 -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] re: base pair normalization in CRMAv2
On Mon, Jun 7, 2010 at 4:45 PM, seth redmond seth.redm...@imperial.ac.uk wrote: sadly it appears my CDF is square: print(cdf) AffymetrixCdfFile: Path: annotationData/chipTypes/Ag_SNP_1m520721 Filename: Ag_SNP_1m520721.CDF Filesize: 243.65MB Chip type: Ag_SNP_1m520721 RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 2560x2560 Number of cells: 6553600 Number of units: 404170 Cells per unit: 16.21 Number of QC units: 4 Ok, then that is ruled out. So the script to construct the ACS just fills in the sequences based on postition, it doesn't map to probe names? There is no such thing as probe names, only probeset/unit names. The only unique/safe way to refer to a cell (probe) on an array is by its (x,y) coordinate. What cells (probes) belong to what units (probesets) is defined in the CDF file. The CDF may get updated over time or custom CDFs may be used. Thus cells may be belong to different units, depending on CDF used. Contrary, the cells, the cell sequences, and their (x,y) locations never change. This is the reason why one only want to use (x,y) coordinates to infer the probe sequences. I guess it could be worth hand-checking a few to see if they match up. Yes, you want to do something like this: library(aroma.affymetrix); chipType - GenomeWideSNP_6; cdf - AffymetrixCdfFile$byChipType(chipType, tags=Full); acs - AromaCellSequenceFile$byChipType(chipType); ugc - getUnitGroupCellMap(cdf, units=1002:1003); str(ugc); Classes 'UnitGroupCellMap' and 'data.frame':12 obs. of 3 variables: $ unit : int 1002 1002 1002 1002 1002 1002 1003 1003 1003 1003 ... $ group: int 1 1 1 2 2 2 1 1 1 2 ... $ cell : int 640620 52 6150942 640619 51 6150941 ... seqs - readSequences(acs, cells=ugc$cell); str(seqs); chr [1:12] AAGCCTTTCTTACCTCCAAATGTTG ... ugcs - cbind(ugc, sequence=seqs); print(ugcs); unit groupcell sequence 1 1002 1 640620 AAGCCTTTCTTACCTCCAAATGTTG 2 1002 1 52 AAGCCTTTCTTACCTCCAAATGTTG 3 1002 1 6150942 AAGCCTTTCTTACCTCCAAATGTTG 4 1002 2 640619 AAGCCTTTCTTACCTCTAAATGTTG 5 1002 2 51 AAGCCTTTCTTACCTCTAAATGTTG 6 1002 2 6150941 AAGCCTTTCTTACCTCTAAATGTTG 7 1003 1 6212010 ATTCAGTAGGTCTGGTGAAATCTCA 8 1003 1 1346300 ATTCAGTAGGTCTGGTGAAATCTCA 9 1003 1 2406114 ATTCAGTAGGTCTGGTGAAATCTCA 10 1003 2 6212009 ATTCAGTAGGTCTAGTGAAATCTCA 11 1003 2 1346299 ATTCAGTAGGTCTAGTGAAATCTCA 12 1003 2 2406113 ATTCAGTAGGTCTAGTGAAATCTCA I still like to return to what I said in my first reply: It is hard to tell what is happening and even if something goes wrong - try to zoom out a bit so you see most of the data cloud when plotting the signals after ACC. It looks like the data is zoomed in to the lower quantiles. So, if you redo those plots after ACC with a great xlim, e.g. xlim=5*xlim it might not look that bad after all. /Henrik On 5 Jun 2010, at 16:42, Henrik Bengtsson wrote: Hi. On Thu, Jun 3, 2010 at 4:59 PM, seth redmond seth.redm...@imperial.ac.uk wrote: yes, this is a custom chip. The code used to create the ACS file [...] See comments below; if you chip type does not have the same number of probe rows as probe columns, there is an error causing you to get incorrect sequences. Is your CDF square of rectangular? FYI, it helps me/us help you if report as much as possible when your give issue reports, e.g. print(cdf). and run the acc is below, as far as I remember it's pretty standard. Yes running the ACC is standard, but requires the correct probe-sequence files, since that is what is used to infer the probe pairs for allele pairs. Head of the input file is also included. You seem to have forgotten to send/paste this one. So even if the seq files were completely wrong I wouldn't necessarily expect to see this degree of wrongness? Again, from the plots along I was convinced something was wrong, but only from t Is there any way to skip the ACC step altogether? Yes, the input and the output of ACC are standard CEL sets. That is, you can just pass the ACC's input set to the downstream step instead of the output set, e.g. csC - csR (replacing the ACC step). db - TabularTextFile(Ag_SNP_1m520721.ACS_input_file.txt,path=path); print(db); colClassPattern - c(^Probe (X|Y)=integer, ^(Probe Sequence|Target Strandedness)$=character); df - readDataFrame(db, colClassPattern=colClassPattern); cells - affy::xy2indices(x=df[[Probe X]], y=df[[Probe Y]], nr=nbrOfRows(cdf)); Woops, my bad. The example on how-to page 'How to: Create an Aroma Cell Sequence (ACS) file' [http://aroma-project.org/node/100] should read: cells - affy::xy2indices(x=df[[Probe X]], y=df[[Probe Y]], nr=nbrOfColumns(cdf)); The example was only correct for chip types with square dimension, i.e. nbrOfRows(cdf) == nbrOfColumns(cdf). When I wrote the example, I was mislead incorrectly to believe that 'nr' was number of rows. This is the only place where it was wrong
Re: [aroma.affymetrix] Error in sort.list(pairsToBuild) during AllelicCrosstalkCalibration
Hi. On Mon, Jun 7, 2010 at 4:41 PM, k o ott4...@gmail.com wrote: Dear Usergrop. while proccessing data from the 5000k Sty array (but not the Nsp set), I receive the folleowing error: acc - AllelicCrosstalkCalibration(csR, model=CRMAv2) csC - process(acc) Error in sort.list(pairsToBuild) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? Calls: process ... groupBySnpNucleotides.AromaCellSequenceFile - sort - sort.list In addition: Warning messages: 1: In rm(idxs, seqsPP, positions, cellsPP, snpPosition, cells, pos) : object 'idxs' not found 2: In rm(idxs, seqsPP, positions, cellsPP, snpPosition, cells, pos) : object 'seqsPP' not found 3: In rm(idxs, seqsPP, positions, cellsPP, snpPosition, cells, pos) : object 'positions' not found 4: In rm(idxs, seqsPP, positions, cellsPP, snpPosition, cells, pos) : object 'cellsPP' not found 5: In rm(idxs, seqsPP, positions, cellsPP, snpPosition, cells, pos) : object 'snpPosition' not found Execution halted Any ideas what could be the cause? Yes, it looks like you are not using a correct ACS file (Mapping250K_Sty,.acs / 170,393,859 bytes). Where did you get yours from? Then one (Mapping250K_Sty,HB20080710.acs) you can download from http://aroma-project.org/chipTypes/Mapping250K_Nsp-and-Mapping250K_Sty is of size 170,394,014 bytes (after gunzip). More, are you following one of the vignettes online, or another documentation? The reason why I ask is because you are dChip annotation data files, i.e. DChipGenomeInformation: Pathname: annotationData/chipTypes/Mapping250K_Sty/Mapping500K genome info hg17.txt DChipSnpInformation: Pathname: annotationData/chipTypes/Mapping250K_Sty/Mapping250K_Sty snp info.txt It's been several years since we updated the documentation and moved to use so called UGP and UFL annotation data files instead. We are no longer using dChip annotation files in the aroma project. You can download the UGP and UFL files for your chip type from: http://aroma-project.org/chipTypes/Mapping250K_Nsp-and-Mapping250K_Sty You certainly want to get the above correct before you start analyzing your 5,000 arrays. Also, I strongly recommend you to update to aroma.affymetrix v1.6.0. Just reinstall according to http://aroma-project.org/install (already installed/up-to-date packages will be skipped). You may also want to update to R v2.11.1; your R v2.10.1 is 6 mths old (and no longer supported by R/Bioconductor officials; though aroma.affymetrix works with it). Hope this helps /Henrik Thank you Karl-Heinz Session log: R version 2.10.1 (2009-12-14) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. Attempting to load the environment 'package:R.utils' Loading required package: R.oo Loading required package: R.methodsS3 R.methodsS3 v1.2.0 (2010-03-13) successfully loaded. See ?R.methodsS3 for help. Loading required package: utils R.oo v1.7.2 (2010-04-13) successfully loaded. See ?R.oo for help. R.utils v1.4.0 (2010-03-24) successfully loaded. See ?R.utils for help. [Previously saved workspace restored] library(aroma.affymetrix) Loading required package: R.filesets Loading required package: digest R.filesets v0.8.1 (2010-04-22) successfully loaded. See ?R.filesets for help. Loading required package: aroma.core Loading required package: R.cache R.cache v0.3.0 (2010-03-13) successfully loaded. See ?R.cache for help. Loading required package: R.rsp R.rsp v0.3.6 (2009-09-16) successfully loaded. See ?R.rsp for help. Type browseRsp() to open the RSP main menu in your browser. Loading required package: matrixStats matrixStats v0.2.1 (2010-04-05) successfully loaded. See ?matrixStats for help. Loading required package: aroma.light aroma.light v1.15.1 (2009-11-01) successfully loaded. See ?aroma.light for help. aroma.core v1.5.0 (2010-02-22) successfully loaded. See ?aroma.core for help. Loading required package: aroma.apd Loading required package: R.huge R.huge v0.2.0 (2009-10-16) successfully loaded. See ?R.huge for help. Loading required package: affxparser aroma.apd v0.1.7 (2009-10-16) successfully loaded. See ?aroma.apd for help. aroma.affymetrix v1.5.0 (2010-02-22) successfully loaded. See ?aroma.affymetrix for help. log - verbose - Arguments$getVerbose(-8, timestamp=TRUE) options(digits=4) # Don't display too many decimals. chiptype='Mapping250K_Sty' project-haprefTrio cdf -
Re: [aroma.affymetrix] re: base pair normalization in CRMAv2
Hi. On Thu, Jun 3, 2010 at 4:59 PM, seth redmond seth.redm...@imperial.ac.uk wrote: yes, this is a custom chip. The code used to create the ACS file [...] See comments below; if you chip type does not have the same number of probe rows as probe columns, there is an error causing you to get incorrect sequences. Is your CDF square of rectangular? FYI, it helps me/us help you if report as much as possible when your give issue reports, e.g. print(cdf). and run the acc is below, as far as I remember it's pretty standard. Yes running the ACC is standard, but requires the correct probe-sequence files, since that is what is used to infer the probe pairs for allele pairs. Head of the input file is also included. You seem to have forgotten to send/paste this one. So even if the seq files were completely wrong I wouldn't necessarily expect to see this degree of wrongness? Again, from the plots along I was convinced something was wrong, but only from t Is there any way to skip the ACC step altogether? Yes, the input and the output of ACC are standard CEL sets. That is, you can just pass the ACC's input set to the downstream step instead of the output set, e.g. csC - csR (replacing the ACC step). db - TabularTextFile(Ag_SNP_1m520721.ACS_input_file.txt,path=path); print(db); colClassPattern - c(^Probe (X|Y)=integer, ^(Probe Sequence|Target Strandedness)$=character); df - readDataFrame(db, colClassPattern=colClassPattern); cells - affy::xy2indices(x=df[[Probe X]], y=df[[Probe Y]], nr=nbrOfRows(cdf)); Woops, my bad. The example on how-to page 'How to: Create an Aroma Cell Sequence (ACS) file' [http://aroma-project.org/node/100] should read: cells - affy::xy2indices(x=df[[Probe X]], y=df[[Probe Y]], nr=nbrOfColumns(cdf)); The example was only correct for chip types with square dimension, i.e. nbrOfRows(cdf) == nbrOfColumns(cdf). When I wrote the example, I was mislead incorrectly to believe that 'nr' was number of rows. This is the only place where it was wrong; all internal code of aroma.affymetrix uses: x - df[[Probe X]]; y - df[[Probe Y]]; cells - nbrOfColumns(cdf) * y + x + 1L; I have updated the online example to use the above code instead. (If you insist of using affy::xy2indices() you should also add an explicit xy.offset=0, because the default is not safe; see code of affy::xy2indices). Conclusing, if you CDF is rectangulare, then you need to recreate your ACS file. seqs - df[[Probe Sequence]]; strands - df[[Target Strandedness]]; rm(df); acs - AromaCellSequenceFile$allocateFromCdf(cdf); updateSequences(acs, cells=cells, seqs=seqs, verbose=-10); updateTargetStrands(acs, cells=cells, strands=strands, verbose=-10); footer - readFooter(acs); footer$srcFile - list(filename=getFilename(db), checksum=getChecksum(db)); footer$createdBy - list(name=Seth Redmond, email=seth.redm...@imperial.ac.uk); writeFooter(acs, footer); Other that the above calculation of 'cells', this looks all correct. Hope this helps /Henrik ... acc - AllelicCrosstalkCalibration(csR, model=CRMAv2); csC - process(acc, verbose=-10); plotAllelePairs(acc, array=array, pairs=1:6, what=input, xlim=1.5*xlim); On 28 May 2010, at 17:59, Henrik Bengtsson wrote: Hi. On Thu, May 27, 2010 at 6:17 PM, seth redmond seth.redm...@imperial.ac.uk wrote: I've been working through the CRMAv2 vingette here: http://www.aroma-project.org/vignettes/CRMAv2 Have you been following it exactly, or have you done modifications? It always helps to show the code you are doing. And though I am getting CNV calls that make some kind of sense, the crosstalk calibration looks quite amazingly far from OK (before and after graphs attached). Clearly I have a problem here, but it's hard to start figuring out where. The probe sequence files were constructed from some config files, so there may be missing tag sequences or similar, but as far as I can see the sequences do seem to be matching up to the correct probes. So is there anywhere else I could look? What chip type is this? Is it a custom SNP chip? Are all arrays like this, or is this an exceptionally bad one? Is there anyone you are satisfied with? It is hard to tell what is happening and even if something goes wrong - try to zoom out a bit so you see most of the data cloud when plotting the signals after ACC. It looks like the data is zoomed in to the lower quantiles. Also, ACC only corrects for global offset and global crosstalk (for each of the six possible nucleotide pairs); it will not magically give cleaner genotype clouds/arms. Some of the offset is definitely corrected for. If it is a custom chip type and you get the probe sequences wrong, then you the six groups of nucleotide pairs will be wrong, which will give sub optimal correction, but probably not totally wrong. /Henrik -s -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report
Re: [aroma.affymetrix] Re: Mouse diversity array --building the required files for aroma.affymetrix UGP, UFL
On Thu, Jun 3, 2010 at 12:21 PM, Ivanek, Robert robert.iva...@fmi.ch wrote: Hi Henrik, I think you are right, the fragment sizes are theoretical ones. I would guess that the reason why also the long fragments are reported is because the same SNP is present in short fragment produced by the other enzyme. Thank you very much for the patch. Would you mind to update the MOUSEDIVm520650 chipType page and add there the UGP and UFL files? Ideally users contribute with UGP and UFL too, though this time I've done it since I've already done most of the work. Please compare to what you got when you did. /Henrik Best Regards Robert On Jun 2, 6:47 pm, Henrik Bengtsson h...@stat.berkeley.edu wrote: Hi. On Wed, Jun 2, 2010 at 11:16 AM, Ivanek, Robert robert.iva...@fmi.ch wrote: HI Henrik, I was a little bit investigating the error and I found out that some of the fragments reported in NetAffx files are really long. Why they got a negative value of -32768 and not a positive one? Thanks for reporting. It turns out to be a bug in aroma.core causing it to censor values into [-32767,32768], whereas it should have been [-32768,32767]. Thus, the fragment lengths that are too large where written as 32768, which when read back became -32768 (that's how signed integers loops around when output of range). That should have been written as 32767. I have fixed this in the next release of aroma.core. Until that is released, you can install a patch as explained in: http://aroma-project.org/howtos/updateOrPatch With the patch, you will get correct censoring and more informative warnings, e.g. Warning messages: 1: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr = hdr, : 33 values to be assigned were out of range [-32768,32767] and therefore censored to fit the range. Of these, 33 values in [35102,655381] were too large. 2: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr = hdr, : 21 values to be assigned were out of range [-32768,32767] and therefore censored to fit the range. Of these, 21 values in [50496,56758] were too large. About the very large fragment lengths: My guess is that they are theoretical fragments lengths. After running the PCR in the assay, very long fragments are not amplified and hence filtered out. For the specific enzyme, you should not get any hybrization signal for very long fragments. It is possible that you have signal from the cuts of the other enzyme. Maybe someone else has a better explanation of why they are so long and still on the array? You could also drop a message on the Affymetrix forums and ask. /Henrik Robert On Jun 1, 7:16 pm, Ivanek, Robert robert.iva...@fmi.ch wrote: Hi Henrik, Thanks for the answer and also the ACS file. I have one more question regarding the UFL file generation. I tried it by using the NettAffx and I got the following error: R ufl - AromaUflFile$allocateFromCdf(cdf, nbrOfEnzymes=2, tags=c(na30, RI20100601)) R csv - AffymetrixNetAffxCsvFile$byChipType(chipType, tags=.na30); R units - importFrom(ufl, csv); Warning messages: 1: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr = hdr, : Values to be assigned were out of range [-32767,32768] and therefore censored to fit the range. 2: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr = hdr, : Values to be assigned were out of range [-32767,32768] and therefore censored to fit the range. R csv - AffymetrixNetAffxCsvFile$byChipType(chipType, tags=.cn.na30); R units - importFrom(ufl, csv); Warning messages: 1: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr = hdr, : Values to be assigned were out of range [-32767,32768] and therefore censored to fit the range. 2: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr = hdr, : Values to be assigned were out of range [-32767,32768] and therefore censored to fit the range. And the summary produce the following R summary(ufl) length length.02 Min. :-32768 Min. :-32768 1st Qu.: 614 1st Qu.: 541 Median : 1146 Median : 997 Mean : 1601 Mean : 1466 3rd Qu.: 2195 3rd Qu.: 2000 Max. : 22095 Max. : 30002 NA's :230775 NA's :230775 Would you be so kind and build also the UFL and UGP files? Best Regards Robert On May 30, 7:27 pm, Henrik Bengtsson h...@stat.berkeley.edu wrote: Hi. On Wed, May 26, 2010 at 3:24 PM, Ivanek, Robert robert.iva...@fmi.ch wrote: Dear Sir or Madam, I would like to analyse the copy number variation data from Affymetrix Mouse Diversity Array. I have not found any information on your website about this particular array. I have created page for this: http://aroma-project.org/chipTypes/MOUSEDIVm520650 I have tried to build the annotation files which are required by aroma
Re: [aroma.affymetrix] Re: Mouse diversity array --building the required files for aroma.affymetrix UGP, UFL
Hi. On Wed, Jun 2, 2010 at 11:16 AM, Ivanek, Robert robert.iva...@fmi.ch wrote: HI Henrik, I was a little bit investigating the error and I found out that some of the fragments reported in NetAffx files are really long. Why they got a negative value of -32768 and not a positive one? Thanks for reporting. It turns out to be a bug in aroma.core causing it to censor values into [-32767,32768], whereas it should have been [-32768,32767]. Thus, the fragment lengths that are too large where written as 32768, which when read back became -32768 (that's how signed integers loops around when output of range). That should have been written as 32767. I have fixed this in the next release of aroma.core. Until that is released, you can install a patch as explained in: http://aroma-project.org/howtos/updateOrPatch With the patch, you will get correct censoring and more informative warnings, e.g. Warning messages: 1: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr = hdr, : 33 values to be assigned were out of range [-32768,32767] and therefore censored to fit the range. Of these, 33 values in [35102,655381] were too large. 2: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr = hdr, : 21 values to be assigned were out of range [-32768,32767] and therefore censored to fit the range. Of these, 21 values in [50496,56758] were too large. About the very large fragment lengths: My guess is that they are theoretical fragments lengths. After running the PCR in the assay, very long fragments are not amplified and hence filtered out. For the specific enzyme, you should not get any hybrization signal for very long fragments. It is possible that you have signal from the cuts of the other enzyme. Maybe someone else has a better explanation of why they are so long and still on the array? You could also drop a message on the Affymetrix forums and ask. /Henrik Robert On Jun 1, 7:16 pm, Ivanek, Robert robert.iva...@fmi.ch wrote: Hi Henrik, Thanks for the answer and also the ACS file. I have one more question regarding the UFL file generation. I tried it by using the NettAffx and I got the following error: R ufl - AromaUflFile$allocateFromCdf(cdf, nbrOfEnzymes=2, tags=c(na30, RI20100601)) R csv - AffymetrixNetAffxCsvFile$byChipType(chipType, tags=.na30); R units - importFrom(ufl, csv); Warning messages: 1: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr = hdr, : Values to be assigned were out of range [-32767,32768] and therefore censored to fit the range. 2: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr = hdr, : Values to be assigned were out of range [-32767,32768] and therefore censored to fit the range. R csv - AffymetrixNetAffxCsvFile$byChipType(chipType, tags=.cn.na30); R units - importFrom(ufl, csv); Warning messages: 1: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr = hdr, : Values to be assigned were out of range [-32767,32768] and therefore censored to fit the range. 2: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr = hdr, : Values to be assigned were out of range [-32767,32768] and therefore censored to fit the range. And the summary produce the following R summary(ufl) length length.02 Min. :-32768 Min. :-32768 1st Qu.: 614 1st Qu.: 541 Median : 1146 Median : 997 Mean : 1601 Mean : 1466 3rd Qu.: 2195 3rd Qu.: 2000 Max. : 22095 Max. : 30002 NA's :230775 NA's :230775 Would you be so kind and build also the UFL and UGP files? Best Regards Robert On May 30, 7:27 pm, Henrik Bengtsson h...@stat.berkeley.edu wrote: Hi. On Wed, May 26, 2010 at 3:24 PM, Ivanek, Robert robert.iva...@fmi.ch wrote: Dear Sir or Madam, I would like to analyse the copy number variation data from Affymetrix Mouse Diversity Array. I have not found any information on your website about this particular array. I have created page for this: http://aroma-project.org/chipTypes/MOUSEDIVm520650 I have tried to build the annotation files which are required by aroma but without success. I have few questions regarding that: 1: Is aroma.affymetrix able to analyse the Mouse Diversity Array ? Yes, because there should be no reason why it shouldn't - it uses a standard CDF etc. As you've noted, UGP (and UFL) files have not been created by anyone yet. For CN analysis, at least the UGP (genome positions) annotation data file needs to be there. 2: I tried to build the UGP file directly from NetAffx annotation files using the code on your website, however I am getting the following error. ## library(aroma.affymetrix) ## ## create UGP from NetAffx files cdf - AffymetrixCdfFile$byChipType(MOUSEDIVm520650) ## ## Creates an empty UGP file for the CDF, if missing. ugp - AromaUgpFile$allocateFromCdf(cdf, tags=c(na30, RI20100526
Re: [aroma.affymetrix] Mouse diversity array --building the required files for aroma.affymetrix UGP, UFL
Hi. On Wed, May 26, 2010 at 3:24 PM, Ivanek, Robert robert.iva...@fmi.ch wrote: Dear Sir or Madam, I would like to analyse the copy number variation data from Affymetrix Mouse Diversity Array. I have not found any information on your website about this particular array. I have created page for this: http://aroma-project.org/chipTypes/MOUSEDIVm520650 I have tried to build the annotation files which are required by aroma but without success. I have few questions regarding that: 1: Is aroma.affymetrix able to analyse the Mouse Diversity Array ? Yes, because there should be no reason why it shouldn't - it uses a standard CDF etc. As you've noted, UGP (and UFL) files have not been created by anyone yet. For CN analysis, at least the UGP (genome positions) annotation data file needs to be there. 2: I tried to build the UGP file directly from NetAffx annotation files using the code on your website, however I am getting the following error. ## library(aroma.affymetrix) ## ## create UGP from NetAffx files cdf - AffymetrixCdfFile$byChipType(MOUSEDIVm520650) ## ## Creates an empty UGP file for the CDF, if missing. ugp - AromaUgpFile$allocateFromCdf(cdf, tags=c(na30, RI20100526)) ## ## Import NetAffx unit position data csv - AffymetrixNetAffxCsvFile$byChipType(MOUSEDIVm520650, otags=.na30) Error in list(`AffymetrixNetAffxCsvFile$byChipType(MOUSEDIVm520650, tags = .na30)` = environment, : [2010-05-26 15:11:00] Exception: File format error of the tabular file ('annotationData/chipTypes/MOUSEDIVm520650/NetAffx/MOUSEDIVm520650.na30.annot.csv'): \ line 1 did not have 12 elements at throw(Exception(...)) at throw.default(File format error of the tabular file (', getPathname(this), '): , ex$message) at throw(File format error of the tabular file (', getPathname(this), '): , ex$message) at value[[3]](cond) at tryCatchOne(expr, names, parentenv, handlers[[1]]) at tryCatchList(expr, classes, parentenv, handlers) at tryCatch({ at verify.TabularTextFile(this, ...) at verify(this, ...) at this(...) at newInstance.Class(clazz, ...) at newInstance(clazz, ...) at newInstance.Object(static, pathname) at newInstance(static, pathname) at method(static, ...) at AffymetrixNetAffxCsvFile$byChipType(MOUSEDIVm520650, tags = .na30) In addition: Warning message: In read.table(3L, header = TRUE, colClasses = c(NA_character_, NA_character_, : not all columns named in 'colClasses' exist I had a look at the MOUSEDIVm520650.na30.annot.csv file. The line containing column names, that is: Probe Set ID,dbSNP RS ID,Chromosome,Physical Position,Strand,Cytoband,Allele A,Allele B,Associated Gene,Genetic Map,Fragment Enzyme Type Length Start Stop, contains a trailing comma (,) that shouldn't be there (file format error). This cause R to think there should be 12 and not 11 columns in the data set. Open the file in an editor and remove that trailing comma and any whitespace after Fragment Enzyme Type Length Start Stop. Then save the file. That should solve the problem. The other CSV file - MOUSEDIVm520650.cn.na30.annot.csv - does not have this problem. 3. I tried it also by using the manual approach using the tab=delimited file, however it seems to me that the mitochondria probes are skipped (NA values in ugp[,1] but valid values in ugp[,2]). The Affymetrix NetAffx CSV files use s M for the mitochondria chromosome. In aroma we encode this by integer 25. Another problem is that some positions for other chromosomes are not loaded in properly (valid values in ugp[,1] but NA values in ugp[,2]). You don't show how you read the data manually, so it is hard to say what you are doing wrong here. But note that there are quite a few arguments in read.table() that you need to set correctly in order to read Affymetrix NetAffx CSV files (it doesn't make easier that Affymetrix changes the file format once in a while and have stray erroneous symbols such as the above comma). Also, search our forum for 'MOUSEDIVm520650', because about a year ago David Rosenberg disscussed this chip type and I think he did create various annotation data files for the chip type. This was before the chip type was publicly announced by Affymetrix. /Henrik Here is the sessionInfo: R version 2.11.0 (2010-04-22) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=C [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] aroma.affymetrix_1.6.0 aroma.apd_0.1.7 affxparser_1.20.0 R.huge_0.2.0 aroma.core_1.6.0 aroma.light_1.16.0 [7] matrixStats_0.2.1 R.rsp_0.3.6 R.cache_0.3.0 R.filesets_0.8.1 digest_0.4.2
Re: [aroma.affymetrix] Suggestions for multiple processing jobs of the same platform (Affymetrix SNP chips)
Hi. On Tue, May 18, 2010 at 5:39 AM, Tae-Hoon Chung hoontaech...@gmail.com wrote: Hi, All; I have a simple question: What's the best way of performing multiple processing jobs of the same platform (Affymetrix SNP chips)? My concerns are as follows: (1) Many of the jobs involving Affymetrix SNP chips may access files in annotationData and it may result in conflict due to multiple jobs trying to access the same files in annotationData at the same time. Is this the case and is there any safeguard for this? If this is the real possibility, then what is the best way of avoiding this kind of trouble? This are definitely valid concerns. The quick answer is that the aroma framework tries very hard to protect you against potential conflicts and minimize the risk for getting invalid results from running parallel analyses on the same data set. The data under annotationData/ is basically only read, which means any number of R sessions can access those files without conflicts. The only exception is when a so called monocell CDF is created for a new CDF. This is only done once per CDF lifetime, so the risk for having to processes trying to create the same monocell CDF is very small. Still, there is a risk (some monocell CDF takes several minutes to generate), and in order to protect ourselves against corrupt monocell CDFs, they are created/written atomically (this is done by first writing to a temporary file which is then renamed). For tiling array analysis so called unique CDFs are created in a similar fashion. Likewise, any data sets under rawData/ should/can be considered read only, meaning any number of R sessions can access those without conflicts. Again, there are exceptions and that is when the average signals across arrays are calculated (via getAverageFile()) or when the target distribution is calculated for quantile normalization; those kind of result files are stored where the data set is located (which can be rawData/). As above, all data files created in the aroma framework are generated/written in an atomic fashion, decreasing the risk for conflicts (and if they occur they are very likely to be detected). In order to be completely protected against multiple (write) access of the same data files, there is a need for a formal synchronization mechanism. This turns out to a very hard problem, especially if we want to support it on all operating and file system out there. But for you information, we are working toward it and we take nothing for granted. See also the page on 'Future directions' [http://aroma-project.org/features/future/]. Finally, as long as you only analyze different data set (or apply different methods on the same data set) you will be fine. (2) Many (or most) of the jobs produce lots of intermediate files in probeData/plmData folders, requiring many disk accessing and it seems like this takes up a lot of computational resources of the machine, slowing down other jobs. Is this just my impression or is it what's really going on? Yes, all intermediate results are stored in persistent memory, i.e. on the file system. The overhead from the actual I/O is not that big, but sure it is significant. Note in all analysis you have to read the data once and often write the results at least once. To this, the aroma framework add I/O for doing the same for the intermediate results. One major bottleneck is when you fit the probe-level models (probe summarization) and it is mostly because data from multiple arrays are read and restructured into a list reflecting the structure of the CDF, then fitted, and finally unstructured to be written to separate files. The wrapping and unwrapping into nested CDF list structures is what takes time. If you look at the verbose output from fit() of a PLM, you can see that most of the writing time is spend on unwrapping/encoding the estimates. For the more recent SNP CN chip types (GWS5, GWS6, ...) that also have non-polymorphic CN units, we can speed up the fitting of those CN units lots by fitting the PLM as: if (length(findUnitsTodo(plm)) 0) { # Fit CN probes quickly (~5-10s/array + some overhead) units - fitCnProbes(plm, verbose=verbose); str(units); # int [1:945826] 935590 935591 935592 935593 935594 935595 ... # Fit remaining units, i.e. SNPs (~5-10min/array) units - fit(plm, verbose=verbose); str(units); } (I noticed you from your other message that you were looking at ACNE; I've updated the ACNE vignette to reflect the above, which should speed things up lots). FYI, the fitCnProbes() utilizes the knowledge that (many) CN units are single probes, which allows us to quickly fit those units without having to go through the wrapping/unwrapping into a CDF list structure. Analogously, one can optimize the processing of other common dimensions of SNP/CN units; I am slowly preparing for such a move but it takes time because any algorithm/code has to be able work with any existing and future CDF. If this is
Re: [aroma.affymetrix] removal of bad quality chips from a big dataset
Hi. On Sat, May 15, 2010 at 6:10 PM, Gabriele Zoppoli zopp...@gmail.com wrote: Hi, I'm new here, and I'm sorry if I'll post obvious questions. I looked throughout the newsgroup and on the aroma.affymetrix web page, but I couldn't find the answer from my question, so here it is: I'm trying to analyze a 950 chip dataset from Wooster et al (318 cancer cell lines in triplicate - on average). So I followed the steps as in the web page, and arrived to the plotNuse and plotRle part. Nothing wrong so far, and I can clearly see that some arrays are outliers and possibly have poor quality, so I would like to remove them for further analyses. The issue is, I don't have a clue how to know which ones they are and how to remove them, because the plots are too crowdy and I don't know how to see what is what and how to take it out. The plotRle() and plotNuse() methods both take argument 'arrays', which allows you to specify *which* arrays to display. This allows you to plot a smaller number of arrays per plot, which should make it possible for you to narrow done the arrays of interest. For example, plotRle(qam, arrays=1:50); plotNuse(qam, arrays=c(54,80:90,130:144)); Note also that if you are plotting to a image file, you can make it really wide to fit almost any number of arrays, and then use an image browser to scroll it: filename - sprintf(%s,plotRle.png, getName(qam)); arrays - 1:200; png(filename, width=100+20*length(arrays), height=400); plotRle(qam, arrays=arrays); dev.off(); This way you should be able to graphically identify which arrays are bad. This way you can get an index vector of all arrays you which to exclude, e.g. exclArrays - c(4,54,57,98); Then you can drop this from the data set as: ces - extract(ces, -exclArrays); The new 'ces' object will contain all but the excluded arrays. FYI, there are also ways to grab the RLE statistics and identify bad arrays using that. The easiest way to do this, is the get what plotRle()/plotNuse() returns, e.g. stats - plotRle(qam, arrays=1:50); where 'stats' will be a list of length length(arrays) containing boxplot statistics. See str(stats) for the output. Hope this helps Henrik Some information: sessionInfo() R version 2.10.1 (2009-12-14) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] aroma.affymetrix_1.5.0 aroma.apd_0.1.7 affxparser_1.18.0 [4] R.huge_0.2.0 aroma.core_1.5.0 aroma.light_1.16.0 [7] matrixStats_0.2.1 R.rsp_0.3.6 R.cache_0.3.0 [10] R.filesets_0.8.1 digest_0.4.2 R.utils_1.4.0 [13] R.oo_1.7.2 R.methodsS3_1.2.0 loaded via a namespace (and not attached): [1] tools_2.10.1 About my data: print(qam) QualityAssessmentModel: Name: Dataset Wooster Tags: RBC,QN,RMA,QC Path: qcData/Dataset Wooster,RBC,QN,RMA,QC/HG-U133_Plus_2 Chip-effect set: ChipEffectSet: Name: Dataset Wooster Tags: RBC,QN,RMA Path: plmData/Dataset Wooster,RBC,QN,RMA/HG-U133_Plus_2 Platform: Affymetrix Chip type: HG-U133_Plus_2,monocell Number of arrays: 950 Names: 1A2 _SS392785_HG-U133_Plus_2_HCHP-186915_, 1A2 _SS392786_HG- U133_Plus_2_HCHP-186916_, ..., YAPC_SS331347_HG- U133_Plus_2_HCHP-182915_ Time period: [not reported if more than 500 arrays] Total file size: 546.67MB RAM: 0.84MB Parameters: (probeModel: chr pm) RAM: 0.00MB And a final question (a very stupid one, I'm sure): once I finished my quality controls, how do I average technical replicates? Thank you and I beg your pardon for any silly question Gabriele -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
[aroma.affymetrix] aroma.affymetrix v1.6.0 released
Hi all, aroma.affymetrix and friends have been updated and is now being rolled out to the CRAN servers. It is highly recommended to update: source(http://aroma-project.org/hbLite.R;); hbInstall(aroma.affymetrix); This update follows the April releases of R v2.11.0 and Bioconductor v2.6, which we also recommended to use with the aroma framework. In this release we have added further protection against ending up with partially written data files due to an abruptly terminated R session. There were also some bug fixes, which mainly were due to changes in the new release of Bioconductor that broke some existing methods, e.g. SNPRMA and CRLMM. Thanks to all users for reporting bugs and other potential issues. In addition, we have better (although not perfect) support for gcRMA on more chip types. Affymetrix's recent SNP CN chip type Cytogenetics_Array is also better supported. For other updates and more details, see the end of this message. Documentation keeps getting added to the http://www.aroma-project.org/ website. As before, any kind of contribution to it is greatly appreciated. Cheers, Henrik co-developers - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Updates to aroma.affymetrix - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Version: 1.6.0 [2010-05-14] o Package submitted to CRAN. o Package pass system and redundancy tests. o Package pass R CMD check on R v2.11.0 and v2.12.0 devel. Version: 1.5.9 [2010-05-13] o SPEED UP: Now the constructor AllelicCrosstalkCalibration() is set to recognize the Cytogenetics_Array chip type. This avoids having to scan the CDF for unit types and check for SNPs, which is slow and not really wanted for a constructor function. o ROBUSTNESS: Added a redundancy test of CRMA v1.5 for the Cytogenetics_Array chip type. o ROBUSTNESS: Now fromDataFile() of ChipEffectFile and FirmaFile, as well as convertToUnique() of AffymetrixCelSet allocates/creates data files atomically. As elsewhere, this is done by first creating and writing to a temporary file, which when complete is then renamed. This lowers the risk of generating incomplete files. o CLEAN UP/DEPRECATED: AffymetrixCelSet$createBlankSet() was removed, because it has not been used anywhere since 2007. o BUG FIX: convertToUnique() for AffymetrixCelSet would not recognize Windows Shortcut links. Version: 1.5.8 [2010-05-09] o Made justSNPRMA(..., normalizeSNPsOnly=auto) for AffymetrixCelSet the default. o Now all findUnitsTodo() for data sets checks the data file that comes last in a lexicographic ordering. This is now consistent with how the summarization methods update the files. Before it used to be the one that is last in the data set. o Now all updateUnits() for data sets updates the data files in lexicographic order. o Now CrlmmModel(..., recalibrate=TRUE) is the default. o Now justSNPRMA(..., returnESet=TRUE) for AffymetrixCelSet returns an AlleleSet due to updates in oligo v1.12.0. o Added extractAlleleSet() to SnpChipEffectSet. Replaces extractSnpQSet(), because the SnpQSet class was dropped in oligo v1.12.0 and replaced by the AlleleSet class. o BUG FIX: fit() of CrlmmModel would not work with oligo v1.12.0 and newer. o BUG FIX: getCallSet() and getCrlmmParametersSet() of CrlmmModel used non-existing verbose object 'log' instead of 'verbose'. Version: 1.5.7 [2010-04-22] o Added groupUnitsByDimension() to AffymetrixCdfFile. o ROBUSTNESS: Added redundancy tests for doCRMAv2() and writeDataFrame(). o BUG FIX: doCRMAv1() for AffymetrixCelSet used undefined 'csN' internally instead of 'csC'. Version: 1.5.6 [2010-04-15] o BUG FIX: computeAffinities(..., verbose=FALSE) of AffymetrixCdfFile would give throw Error in reset(pb) : object 'pb' not found. Thanks Stephen ? at Mnemosyne BioSciences, Finland, for this report. Version: 1.5.5 [2010-04-07] o ROBUSTNESS: Added a test script for gcRMA background correction on the MoEx-1_0-st-v1 chip type. Version: 1.5.4 [2010-04-06] o Added an internal version of doCRMAv1(). o Added argument 'plm' to existing doCRMAv2(). Version: 1.5.3 [2010-03-31] o Updated getProbeSequenceData() for AffymetrixCdfFile to recognize more NetAffx probe-tab files, e.g. MoEx-1_0-st-v1.probe.tab. o KNOWN ISSUES: getProbeSequenceData() for AffymetrixCdfFile requires that the unit names in the probe-tab file match the ones in the CDF. This may cause issues if custom CDFs with custom unit names are used. This is another reason why we should move away from probe-tab files and instead use aroma binary cell sequence files. Version: 1.5.2 [2010-03-26] o Added argument 'defValue' to createFrom() for AffymetrixCelFile so that one can specify the default value for cleared elements. Version: 1.5.1 [2010-03-14] o BUG FIX: allocateFromCdf() of AromaCellCpgFile, AromaCellPositionFile, and AromaCellMatchScoreFile would drop all but the first tag. - - - - - - - - - - - - - - - - - - - - - - -
Re: [aroma.affymetrix] Re: CRMA v2 errors
Hi. On Fri, May 14, 2010 at 10:36 AM, Markus Leber leber.mar...@gmx.de wrote: Dear Henrik, thank you very much for your support. You are right. I am sorry that I didn't notice this problem. Step Calibration for crosstalk between allele probe pairs works without problems now. Unfortunately at the beginning of step Normalization for nucleotide-position probe sequence effects I noticed another error. First I initialize with: bpn - BasePositionNormalization(csC, target=zero) print(bpn) Afterwards I call: csN - process(bpn, verbose=verbose) Within this procedure I noticed this error: ... 20100513 23:03:38| Storing normalized data... 20100513 23:03:38| Temporary pathname: probeData/HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY/GenomeWideSNP_6/GIGAS_g_GAINmixHapMapAffy2_GenomeWideEx_6_A03_31250.CEL.tmp 20100513 23:03:38| Creating CEL file for results, if missing... 20100513 23:03:38| Creating CEL file... 20100513 23:03:38| Chip type: GenomeWideSNP_6,Full 20100513 23:03:38| Pathname: probeData/HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY/GenomeWideSNP_6/GIGAS_g_GAINmixHapMapAffy2_GenomeWideEx_6_A03_31250.CEL.tmp 20100513 23:03:38| Method 'copy'... 20100513 23:03:38| Copying file... Error in list(`process(bpn, verbose = verbose)` = environment, `process.AbstractProbeSequenceNormalization(bpn, verbose = verbose)` = environment, : I printed the workflow output of this session in file CRMAv2_Error.txt, which is attached. Do you have experience with this error or you have an idea about the reason of this problem? It's good that you attach logs, especially when they are really long. However, there is an advantage of pasting the error message and traceback into this message, because then it will be found when others search the archives for similar messages. In your case, if you had added a little bit more of the error message that would have been enough for most of us to immediately spot was it going on. From the end of your CRMAv2_Error.txt log: 20100513 23:03:38|Storing normalized data... 20100513 23:03:38| Temporary pathname: probeData/HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY/GenomeWideSNP_6/GIGAS_g_GAINmixHapMapAffy2_GenomeWideEx_6_A03_31250.CEL.tmp 20100513 23:03:38| Creating CEL file for results, if missing... 20100513 23:03:38| Creating CEL file... 20100513 23:03:38| Chip type: GenomeWideSNP_6,Full 20100513 23:03:38| Pathname: probeData/HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY/GenomeWideSNP_6/GIGAS_g_GAINmixHapMapAffy2_GenomeWideEx_6_A03_31250.CEL.tmp 20100513 23:03:38| Method 'copy'... 20100513 23:03:38|Copying file... Fehler in list(`process(bpn, verbose = verbose)` = environment, `process.AbstractProbeSequenceNormalization(bpn, verbose = verbose)` = environment, : [2010-05-13 23:03:38] Exception: Failed to copy file. Temporary copy file exists: probeData/HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY/GenomeWideSNP_6/GIGAS_g_GAINmixHapMapAffy2_GenomeWideEx_6_A03_31250.CEL.tmp.tmp.tmp at throw(Exception(...)) at throw.default(Failed to copy file. Temporary copy file exists: , tmpPathn at throw(Failed to copy file. Temporary copy file exists: , tmpPathname) at copyFile.default(getPathname(this), pathname, overwrite = overwrite, verbos at copyFile(getPathname(this), pathname, overwrite = overwrite, verbose = less at copyTo.GenericDataFile(this, filename = tmpPathname, path = NULL, verbose = at copyTo(this, filename = tmpPathname, path = NULL, verbose = less(verbose)) at createFrom.AffymetrixCelFile(df, filename = pathnameT, path = NULL, verbose at createFrom(df, filename = pathnameT, path = NULL, verbose = less(verbose)) at process.AbstractProbeSequenceNormalization(bpn, verbose = verbose) at process(bpn, verbose Zusätzlich: Warnmeldung: In log2(y) : NaNs wurden erzeugt 20100513 23:03:39|Copying file...done [...] See that traceback? That is really useful because it tells us what commands have been called internally and in what function the error occurs. EXPLANATION: The error message tries to be as clear as possible on what the problem is, even though it does not provide a suggest how to solve (that is long-term wish I have for the aroma framework). This error thrown in order to protect you by telling you that there seem to be an existing temporary file that has been generated but not been completed. It can either from running the same script simultaneously/in parallel on a different machine with access to the same file directory, or from a having interrupted a previous run leaving a half written file. I suspect the latter is the case for you; did you run it before and the interrupt it and restarted it? SOLUTION: Go to the probeData/HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY/GenomeWideSNP_6/ directory and delete any files with file extension *.tmp (or *.tmp.tmp, *.tmp.tmp.tmp and so on). Then restart the script. You can keep
Re: [aroma.affymetrix] Re: CRMA v2 errors
Hi. On Wed, May 12, 2010 at 12:23 PM, Smaug72 leber.mar...@gmx.de wrote: Dear Henrik, thank you very much for your reply. Again I installed R and CRMA v2 on a new virtual machine (Suse 11.2), so that I can step back if necessary. To get the terms correct; you installed the aroma.affymetrix package. CRMAv2 is a statistical methods not a software. This time I received 21 warnings after the installation process. But no error was detected. Nevertheless the same error (unexpected symbol in array - 1xlim) occurred. Did you read my reply? That statement that gives the error should be two statements on two different lines of code. Again, the *only* thing you have to fix is: array - 1 xlim - c(-500,15000) You propose to test the 6 CEL files mentioned in the vignette. But it seems the six CEL files (NA06985.CEL, ..., NA07019.CEL) can't be downloaded from the web, or? No, because I/we do not have the right to redistribute those data files. It seems that they belongs to the Genome-Wide Human SNP Array 6.0 Sample Data Set, which consists of 3 DVDs and must be ordered by affymetrix? That is one of many possible sources. The example data set is using HapMap samples. There are a few data sets for this out there. As long as you get the CEL files, you should be fine. Under http://aroma-project.org/node/51 you find links to the HapMap Consortium that also provide (individual) CEL files for download. So far I use a dataset from the Broad Institute: http://www.broadinstitute.org/mpg/birdsuite/download.html - birdsuite_test_inputs_1.5.3.tgz Do you think the reason for the error can be a false integration of the CEL files? No; please read my previous reply - again, the error has nothing to do with your CEL files; it is only those two lines of code that you have use. I thought the annotation data and raw data are checked within the previous procedure? Not sure what previous procedure is, but if you mean the part of the script above: array - 1 xlim - c(-500,15000) then yes. You seem to got it right. /Henrik Thank you, Markus -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] segfault while fitting plm in copy number processing using CRMAv2 algorithm
(units) + + ## Fit remaining units, i.e. SNPs (~5-10min/array) + units - fit(plm, verbose=verbose) + str(units) + } *** caught segfault *** address 0x10ae8d020, cause 'memory not mapped' Traceback: 1: .Call(R_affx_get_cel_file, filename, readHeader, readIntensities, rea\ dXY, readXY, readPixels, readStdvs, readOutliers, readMasked, indices, as.i\ nteger(verbose), PACKAGE = affxparser) 2: readCel(getPathname(this), indices = idxs, readIntensities = FALSE, rea\ dStdvs = TRUE, readPixels = FALSE) 3: findUnitsTodo.ChipEffectFile(ce, ...) 4: findUnitsTodo(ce, ...) 5: findUnitsTodo.ChipEffectSet(ces, verbose = verbose, ...) 6: findUnitsTodo(ces, verbose = verbose, ...) 7: findUnitsTodo.ProbeLevelModel(plm) 8: findUnitsTodo(plm) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace Selection: The session info is as follows: print(sessionInfo()) R version 2.11.0 (2010-04-22) x86_64-apple-darwin9.8.0 locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] aroma.affymetrix_1.5.0 aroma.apd_0.1.7 affxparser_1.20.0 [4] R.huge_0.2.0 aroma.core_1.5.0 aroma.light_1.16.0 [7] matrixStats_0.2.1 R.rsp_0.3.6 R.cache_0.3.0 [10] R.filesets_0.8.1 digest_0.4.2 R.utils_1.4.0 [13] R.oo_1.7.2 R.methodsS3_1.2.0 loaded via a namespace (and not attached): [1] tools_2.11.0 Warning message: 'DESCRIPTION' file has 'Encoding' field and re-encoding is not possible TH 2010/5/10 Henrik Bengtsson h...@stat.berkeley.edu Ok. This could be an issue with affxparser and 64-bit OSX; recent problem reports with affxparser has been with 64-bit OSX. You are running R v2.10.x, which is outdated. The new stable release of R is R v2.11.x. I recommend that you update, because the rest of the community have already moved on and any bug fixes to R and packages will be for R v2.11.x. Updating will give you access to newer version of package, including affxparser v1.20.0. There has been some fixes to affxparser, and with some luck they solve your problem. If you update R, then just rerun the aroma installation: source(http://aroma-project.org/hbLite.R;); hbInstall(aroma.affymetrix); and several other packages will also be updated. [ If you're really stuck with R v2.10.x, you could try installing affxparser v1.20.0 as: source(http://aroma-project.org/hbLite.R;); biocLite(affxparser, rver=2.11.0); but I really recommend to update R. ] Let's see if that solves your problem. If not, we have to do some more troubleshooting... /Henrik On Mon, May 10, 2010 at 9:37 AM, Chung Tae-Hoon hoontaech...@gmail.com wrote: I'm sorry to forget providing necessary information. .libPaths(/Library/Frameworks/R.framework/Versions/2.10/Resources/library64 \ ) library(aroma.affymetrix) Loading required package: R.utils Loading required package: R.oo Loading required package: R.methodsS3 R.methodsS3 v1.2.0 (2010-03-13) successfully loaded. See ?R.methodsS3 for help. R.oo v1.7.1 (2010-03-17) successfully loaded. See ?R.oo for help. R.utils v1.4.0 (2010-03-24) successfully loaded. See ?R.utils for help. Loading required package: R.filesets Loading required package: digest R.filesets v0.8.0 (2010-02-22) successfully loaded. See ?R.filesets for help. Loading required package: aroma.core Loading required package: R.cache R.cache v0.3.0 (2010-03-13) successfully loaded. See ?R.cache for help. Loading required package: R.rsp R.rsp v0.3.6 (2009-09-16) successfully loaded. See ?R.rsp for help. Type browseRsp() to open the RSP main menu in your browser. Loading required package: matrixStats matrixStats v0.2.1 (2010-04-05) successfully loaded. See ?matrixStats for help. Loading required package: aroma.light aroma.light v1.15.1 (2009-11-01) successfully loaded. See ?aroma.light for help\ . aroma.core v1.5.0 (2010-02-22) successfully loaded. See ?aroma.core for help. Loading required package: aroma.apd Loading required package: R.huge R.huge v0.2.0 (2009-10-16) successfully loaded. See ?R.huge for help. Loading required package: affxparser aroma.apd v0.1.7 (2009-10-16) successfully loaded. See ?aroma.apd for help. aroma.affymetrix v1.5.0 (2010-02-22) successfully loaded. See ?aroma.affymetrix\ for help. Patching /Users/thchung/.Rpatches/aroma.affymetrix/20100331/AffymetrixCdfFile.g\ etProbeSequenceData.R print(sessionInfo()) R version 2.10.1 Patched (2010-02-01 r51089) x86_64-apple-darwin9.8.0 locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] aroma.affymetrix_1.5.0 aroma.apd_0.1.7 affxparser_1.18.0 [4] R.huge_0.2.0 aroma.core_1.5.0
Re: [aroma.affymetrix] CRMA v2 errors
Hi. On Tue, May 11, 2010 at 2:42 PM, Smaug72 leber.mar...@gmx.de wrote: Dear Henrik Bengtsson, we would like to use the CRMA v2 for our work. Unfortunately we receive errors. First we use a Linux system (Suse 11.2). I installed R (version 2.11.0) without any problems. Afterwards I followed your instructions on your webpage (http://aroma- project.org/install) to install CRMA v2. After the installation process I received this warning: In packageDescription(pkg) : no package 'DNAcopy' was found The DNAcopy package is needed first when you do segmentation. Nevertheless the program works fine at the beginning. I followed your instructions on this page: http://aroma-project.org/vignettes/CRMAv2 The analysis startup and the declaration of the raw data set work without errors. The section Step 1 - Calibration for crosstalk between allele probe pairs also works as long as I come to the command: array - 1xlim - c(-500,15000) Fehler: Unerwartetes Symbol in array - 1xlim That is a cut'n'paste error when a newline gone missing; that web page now reads: array - 1 xlim - c(-500,15000) just as it does a few lines down. This is the only cause for you problems. plotAllelePairs(acc, array=array, pairs=1:6, what=input, xlim=xlim/3) Fehler in list(`plotAllelePairs(acc, array = array, pairs = 1:6, what = input, xlim = xli` = environment, : [2010-05-11 10:40:58] Exception: Argument 'array' is not a vector: function at throw(Exception(...)) at throw.default(sprintf(Argument '%s' is not a vector: %s, .name, storage.m at throw(sprintf(Argument '%s' is not a vector: %s, .name, storage.mode(x))) at getVector.Arguments(static, x, ..., .name = .name) at getVector(static, x, ..., .name = .name) at getNumerics.Arguments(static, ..., asMode = integer, disallow = disallow) at getNumerics(static, ..., asMode = integer, disallow = disallow) at getIntegers.Arguments(static, x, ..., range = range, .name = .name) at getIntegers(static, x, ..., range = range, .name = .name) at getIndices.Arguments(static, ..., length = length) at getIndices(static, ..., length = length) at method(static, ...) at Arguments$getIndex(array, range = c(1, Inf)) at plotAllelePairs.AllelicCrosstalkCalibration(acc, array = array, pairs = 1:6 at plotAllelePairs(acc, array = array, pairs = 1:6, what = input, xlim .. First I receive the error: unexpected symbol in array - 1xlim Do you have experience with this error? I don't know whether the following error Argument 'array' is not a vector occurs as a consequence of the first error. I removed the ^M symbols in the CEL files. But this error occurs with ^M and without it. Not necessary - a good rule of thumb is that if you have to mess with your raw data files, you are probably doing something wrong. Do you know an accurate dataset, which we can use as basis to test the software with a unix system. Any Affymetrix data set will work the same regardless of operating system. The example HapMap data set using in the vignette should work. Hope this helps Henrik I hope you find some time to answer my request. Thanks in advance. Cheers Markus -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] segfault while fitting plm in copy number processing using CRMAv2 algorithm
What does print(sessionInfo()); report after doing library(aroma.affymetrix)? /Henrik On Mon, May 10, 2010 at 4:47 AM, Chung Tae-Hoon hoontaech...@gmail.com wrote: Hi, All; I was trying to process Affymetrix 250K Sty SNP Chip of HapMap project using CRMAv2 algorithm. I was following the vignette on the web. It worked out smoothly without trouble until I got segfault error while fitting plm as follows: ## All annotation data file verification worked out fine! cdf - AffymetrixCdfFile$byChipType(“Mapping250K_Sty”) print (cdf) ## worked fine. gi - getGenomeInformation(cdf) print (gi) ## worked fine. si - getSnpInformation(cdf) print (si) ## worked fine. acs - AromaCellSequenceFile$byChipType(getChipType(cdf)) print (acs) ## worked fine. ## step 1. declaring raw data set csR - AffymetrixCelSet$byName(“HapMap500K,Sty”, cdf=cdf) ## print (csR) ## worked fine ## AffymetrixCelSet: ## Name: HapMap500K ## Tags: Sty ## Path: rawData/HapMap500K,Sty/Mapping250K_Sty ## Platform: Affymetrix ## Chip type: Mapping250K_Sty ## Number of arrays: 270 ## Names: NA06985, NA06991, ..., NA19240 ## Time period: 2005-08-31 11:28:01 -- 2005-12-09 14:53:56 ## Total file size: 16918.81MB ## RAM: 0.35MB ## step 2. processing data ##--- Processing step 1. calibration for crosstalk between allele probe pairs acc - AllelicCrosstalkCalibration(csR, model=CRMAv2) ## print (acc) ## AllelicCrosstalkCalibration: ## Data set: HapMap500K ## Input tags: Sty ## User tags: * ## Asterisk ('*') tags: ACC,-XY ## Output tags: Sty,ACC,-XY ## Number of files: 270 (16918.81MB) ## Platform: Affymetrix ## Chip type: Mapping250K_Sty ## Algorithm parameters: (rescaleBy: chr groups, targetAvg: num [1:2] 2200 22\ 00, subsetToAvg: chr -XY, mergeShifts: logi TRUE, B: int 1, flavor: chr sfit\ , algorithmParameters:List of 3, ..$ alpha: num [1:8] 0.1 0.075 0.05 0.03 0.01\ 0.0025 0.001 0.0001, ..$ q: num 2, ..$ Q: num 98) ## Output path: probeData/HapMap500K,Sty,ACC,-XY/Mapping250K_Sty ## Is done: FALSE ## RAM: 0.01MB csC - process(acc, verbose=verbose) ## print(csC) ## AffymetrixCelSet: ## Name: HapMap500K ## Tags: Sty,ACC,-XY ## Path: probeData/HapMap500K,Sty,ACC,-XY/Mapping250K_Sty ## Platform: Affymetrix ## Chip type: Mapping250K_Sty ## Number of arrays: 270 ## Names: NA06985, NA06991, ..., NA19240 ## Time period: 2005-08-31 11:28:01 -- 2005-12-09 14:53:56 ## Total file size: 16918.81MB ## RAM: 0.35MB ##--- Processing step 2. Normalization for nucleotide-position probe sequence effects bpn - BasePositionNormalization(csC, target=zero) ## print (bpn) ## BasePositionNormalization: ## Data set: HapMap500K ## Input tags: Sty,ACC,-XY ## User tags: * ## Asterisk ('*') tags: BPN,-XY ## Output tags: Sty,ACC,-XY,BPN,-XY ## Number of files: 270 (16918.81MB) ## Platform: Affymetrix ## Chip type: Mapping250K_Sty ## Algorithm parameters: (unitsToFit: chr -XY, typesToFit: chr pm, unitsToU\ pdate: NULL, typesToUpdate: chr pm, shift: num 0, target: chr zero, model: \ chr smooth.spline, df: int 5) ## Output path: probeData/HapMap500K,Sty,ACC,-XY,BPN,-XY/Mapping250K_Sty ## Is done: FALSE ## RAM: 0.01MB csN - process(bpn, verbose=verbose) ## print (csN) ## AffymetrixCelSet: ## Name: HapMap500K ## Tags: Sty,ACC,-XY,BPN,-XY ## Path: probeData/HapMap500K,Sty,ACC,-XY,BPN,-XY/Mapping250K_Sty ## Platform: Affymetrix ## Chip type: Mapping250K_Sty ## Number of arrays: 270 ## Names: NA06985, NA06991, ..., NA19240 ## Time period: 2005-08-31 11:28:01 -- 2005-12-09 14:53:56 ## Total file size: 16918.81MB ## RAM: 0.35MB ##--- Processing step 3. Probe summarization plm - RmaCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE) ## print (plm) ## RmaCnPlm: ## Data set: HapMap500K ## Chip type: Mapping250K_Sty ## Input tags: Sty,ACC,-XY,BPN,-XY ## Output tags: Sty,ACC,-XY,BPN,-XY,RMA,A+B ## Parameters: (probeModel: chr pm; shift: num 0; flavor: chr affyPLM; trea\ tNAsAs: chr weights; mergeStrands: logi TRUE; combineAlleles: logi TRUE). ## Path: plmData/HapMap500K,Sty,ACC,-XY,BPN,-XY,RMA,A+B/Mapping250K_Sty ## RAM: 0.00MB if (length(findUnitsTodo(plm)) 0) { ## Fit CN probes quickly (~5-10s/array + some overhead) units - fitCnProbes(plm, verbose=verbose) str(units) ## Fit remaining units, i.e. SNPs (~5-10min/array) units - fit(plm, verbose=verbose) str(units) } *** caught segfault *** address 0x104fd2020, cause 'memory not mapped' Traceback: 1: .Call(R_affx_get_cel_file, filename, readHeader, readIntensities, rea\ dXY, readXY, readPixels, readStdvs, readOutliers, readMasked, indices, as.i\ nteger(verbose), PACKAGE = affxparser) 2: readCel(getPathname(this), indices = idxs, readIntensities = FALSE, rea\ dStdvs = TRUE,
Re: [aroma.affymetrix] segfault while fitting plm in copy number processing using CRMAv2 algorithm
Ok. This could be an issue with affxparser and 64-bit OSX; recent problem reports with affxparser has been with 64-bit OSX. You are running R v2.10.x, which is outdated. The new stable release of R is R v2.11.x. I recommend that you update, because the rest of the community have already moved on and any bug fixes to R and packages will be for R v2.11.x. Updating will give you access to newer version of package, including affxparser v1.20.0. There has been some fixes to affxparser, and with some luck they solve your problem. If you update R, then just rerun the aroma installation: source(http://aroma-project.org/hbLite.R;); hbInstall(aroma.affymetrix); and several other packages will also be updated. [ If you're really stuck with R v2.10.x, you could try installing affxparser v1.20.0 as: source(http://aroma-project.org/hbLite.R;); biocLite(affxparser, rver=2.11.0); but I really recommend to update R. ] Let's see if that solves your problem. If not, we have to do some more troubleshooting... /Henrik On Mon, May 10, 2010 at 9:37 AM, Chung Tae-Hoon hoontaech...@gmail.com wrote: I'm sorry to forget providing necessary information. .libPaths(/Library/Frameworks/R.framework/Versions/2.10/Resources/library64 \ ) library(aroma.affymetrix) Loading required package: R.utils Loading required package: R.oo Loading required package: R.methodsS3 R.methodsS3 v1.2.0 (2010-03-13) successfully loaded. See ?R.methodsS3 for help. R.oo v1.7.1 (2010-03-17) successfully loaded. See ?R.oo for help. R.utils v1.4.0 (2010-03-24) successfully loaded. See ?R.utils for help. Loading required package: R.filesets Loading required package: digest R.filesets v0.8.0 (2010-02-22) successfully loaded. See ?R.filesets for help. Loading required package: aroma.core Loading required package: R.cache R.cache v0.3.0 (2010-03-13) successfully loaded. See ?R.cache for help. Loading required package: R.rsp R.rsp v0.3.6 (2009-09-16) successfully loaded. See ?R.rsp for help. Type browseRsp() to open the RSP main menu in your browser. Loading required package: matrixStats matrixStats v0.2.1 (2010-04-05) successfully loaded. See ?matrixStats for help. Loading required package: aroma.light aroma.light v1.15.1 (2009-11-01) successfully loaded. See ?aroma.light for help\ . aroma.core v1.5.0 (2010-02-22) successfully loaded. See ?aroma.core for help. Loading required package: aroma.apd Loading required package: R.huge R.huge v0.2.0 (2009-10-16) successfully loaded. See ?R.huge for help. Loading required package: affxparser aroma.apd v0.1.7 (2009-10-16) successfully loaded. See ?aroma.apd for help. aroma.affymetrix v1.5.0 (2010-02-22) successfully loaded. See ?aroma.affymetrix\ for help. Patching /Users/thchung/.Rpatches/aroma.affymetrix/20100331/AffymetrixCdfFile.g\ etProbeSequenceData.R print(sessionInfo()) R version 2.10.1 Patched (2010-02-01 r51089) x86_64-apple-darwin9.8.0 locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] aroma.affymetrix_1.5.0 aroma.apd_0.1.7 affxparser_1.18.0 [4] R.huge_0.2.0 aroma.core_1.5.0 aroma.light_1.15.1 [7] matrixStats_0.2.1 R.rsp_0.3.6 R.cache_0.3.0 [10] R.filesets_0.8.0 digest_0.4.2 R.utils_1.4.0 [13] R.oo_1.7.1 R.methodsS3_1.2.0 I am using 64-bit R-2.10.1 on Mac OS x. TH -Original Message- From: aroma-affymetrix@googlegroups.com [mailto:aroma-affymet...@googlegroups.com] On Behalf Of Henrik Bengtsson Sent: Monday, May 10, 2010 3:08 PM To: aroma-affymetrix Subject: Re: [aroma.affymetrix] segfault while fitting plm in copy number processing using CRMAv2 algorithm What does print(sessionInfo()); report after doing library(aroma.affymetrix)? /Henrik On Mon, May 10, 2010 at 4:47 AM, Chung Tae-Hoon hoontaech...@gmail.com wrote: Hi, All; I was trying to process Affymetrix 250K Sty SNP Chip of HapMap project using CRMAv2 algorithm. I was following the vignette on the web. It worked out smoothly without trouble until I got segfault error while fitting plm as follows: ## All annotation data file verification worked out fine! cdf - AffymetrixCdfFile$byChipType(“Mapping250K_Sty”) print (cdf) ## worked fine. gi - getGenomeInformation(cdf) print (gi) ## worked fine. si - getSnpInformation(cdf) print (si) ## worked fine. acs - AromaCellSequenceFile$byChipType(getChipType(cdf)) print (acs) ## worked fine. ## step 1. declaring raw data set csR - AffymetrixCelSet$byName(“HapMap500K,Sty”, cdf=cdf) ## print (csR) ## worked fine ## AffymetrixCelSet: ## Name: HapMap500K ## Tags: Sty ## Path: rawData/HapMap500K,Sty/Mapping250K_Sty ## Platform: Affymetrix ## Chip type: Mapping250K_Sty ## Number of arrays: 270 ## Names: NA06985, NA06991, ..., NA19240 ## Time period: 2005-08-31 11:28:01 -- 2005-12-09 14:53:56
Re: [aroma.affymetrix] error with extractSnpQSet
Hi, before doing anything else, please provide what print(sessionInfo()) reports. /Henrik On Wed, Apr 28, 2010 at 12:43 PM, Nolwenn Le Meur nlem...@gmail.com wrote: Hi everyone, I am trying to analyze pooling-based GWAS (I am used to expression data but new to the GWAS field) . I have 2 datasets from Illumina 610SNP and Affymetrix 250K_Nsp. I have started with the Affy one but I am not sure my preprocessing is valid. I followed Marco and Henrik exchange for a start and now I would like to compute genotype calls using the Crlmm model I can't make it run. Here is my script and the errors: library(aroma.affymetrix) log - verbose - Arguments$getVerbose(-8, timestamp=TRUE) name - moins60-1 chipType - c(Mapping250K_Nsp) ## verify cdf cdf - AffymetrixCdfFile$byChipType(chipType) ## sequence acs - AromaCellSequenceFile$byChipType(chipType) ## read in cel cs - AffymetrixCelSet$byName(name, chipType=chipType) ## normalization (note: should do something specific because pooled data?) cn - justSNPRMA.AffymetrixCelSet(cs, normalizeToHapmap=TRUE, returnESet=FALSE, verbose=log) ## Genotype call (does not seem to work) crlmm - CrlmmModel(cn, tags=*,oligo) ## . I did copy all log because of the length ..$ : chr [1:18] m601-1 m601-2 m602-1 m602-2 ... 20100428 12:28:09| Extracting data...done 20100428 12:28:09| Ordering unit groups to be (sense, antisense)... 20100428 12:28:09| Swapping elements: int [1:1918] 20 58 59 100 109 178 183 207 227 237 ... 20100428 12:28:09| Ordering unit groups to be (sense, antisense)...done 20100428 12:28:09| Allocate and populate SnpQSet... Error in getClass(Class, where = topenv(parent.frame())) : SnpQSet is not a defined class 20100428 12:28:09| Allocate and populate SnpQSet...done 20100428 12:28:09| Extracting data...done 20100428 12:28:09| Chunk #1 of 7...done 20100428 12:28:09|Calling genotypes by CRLMM...done units3 - fit(crlmm, ram=oligo, verbose=log) ##.. same 20100428 12:35:35| Swapping elements: int [1:1918] 20 58 59 100 109 178 183 207 227 237 ... 20100428 12:35:35| Ordering unit groups to be (sense, antisense)...done 20100428 12:35:35| Allocate and populate SnpQSet... Error in getClass(Class, where = topenv(parent.frame())) : SnpQSet is not a defined class 20100428 12:35:35| Allocate and populate SnpQSet...done 20100428 12:35:35| Extracting data...done 20100428 12:35:35| Chunk #1 of 7...done 20100428 12:35:35|Calling genotypes by CRLMM...done str(units3) Error in str(units3) : object 'units3' not found and all calls are NA I directly tried extractSNPQSet but same thing: ## Extract SNPQSet require oligo snpqset - extractSnpQSet(cn) Error in getClass(Class, where = topenv(parent.frame())) : SnpQSet is not a defined class Any help or suggestion is appreciated Nolwenn -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] quantile normalisation - what to expect?
Hi. On Thu, Apr 22, 2010 at 9:26 PM, mike dewar mikede...@gmail.com wrote: Hi, I'm trying to normalize data generated by the immunological genome project (immgen.org). They have released raw data for 128 arrays and I would like to preprocess their data. I'm very new to this, so apologies for any obvious gaffes in what I'm about to show you. No need to apologies; we're all learning new things all the time. I have been using aroma.affymetrix to preprocess the data, and the whole process occurs without error. However, when it comes time to look at differential expression, I'm finding that nearly /everything/ is diff expressed leading me to suspect that I'm doing some preprocessing wrong. My question is this: after having preprocessed the data should each of my arrays be similarly distributed? For example, if I plot my data on a QQ-plot, should it lie along the line y=x? Yes, that's a correct expectation. Depending on exactly how the quantiles are normalized, you expect them to either be exactly on y = x, or scattered around it (with or without tails behaving slightly off the line). The code I'm using for the preprocessing is (pretty much copied verbatim from the aroma website): cdf - AffymetrixCdfFile$byChipType('MoGene-1_0-st-v1',tags='r3') cs - AffymetrixCelSet$byName(GEOid,cdf=cdf) # background correction bc - RmaBackgroundCorrection(cs) csBC - process(bc,verbose=verbose) # normalise qn - QuantileNormalization(csBC) csN - process(qn, verbose=verbose) First, here you are using the default settings, which means that *all* probes on the array are used in the estimation and normalization. You can also tell it to normalize PMs only etc. In your case you probably want to use: qn - QuantileNormalization(csBC, typesToUpdate=pm); as also suggested in Vignette 'Gene 1.0 ST array analysis' [http://aroma-project.org/node/38]. It makes a difference, which is illustrated in Vignette 'Empirical probe-signal densities and rank-based quantile normalization' [http://aroma-project.org/node/141]. Also, when you want to validate the QN output, you should first do it on the probe signals, because that is what is normalized here. So, plot the probe-signal densities before and after QN as done in the latter vignette. # proble level model plm - RmaPlm(csN) fit(plm,verbose=verbose) FYI, the default behavior is that the probe summary is done on PM probes only, that is, the explicit equivalent to the above is: plm - RmaPlm(csN, probeModel=pm) (this is why you want to also do QN on PM only). # extract data from the probe level model ces - getChipEffectSet(plm) Note that you here are working with probe summaries, so you would not expect perfect agreement on the empirical densities (because the QN was done on the signals before summarization). However, they will probably agree well. Try: plotDensity(ces); gene_summary - extractMatrix(ces,returnUgcMap=TRUE) # transform to a log scale gene_summary - log2(gene_summary) which all runs without error. However when I look at a few columns of the data, for the first 1000 genes using qqnorm(gene_summary[1:1000,1:3]) Note the difference between qqnorm() and qqplot()! You want to use qqplot() to compare your densities to each other, not to the normal distribution. /Henrik I get a rather curvy line that's nowhere near the line y=x. This doesn't agree with my (admittedly rather limited) understanding of what quantile normalisation is supposed to do. Can anyone advise? Should I be worried that I don't have a qqnorm plot that lies along y=x? Is it the normalisation that I should be worried about? Is my naivety leading me down the wrong path when it comes to preprocessing? Thanks in advance, Mike Dewar -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] failure of AllelicCrosstalkCalibration on R-2.11
On Fri, Apr 23, 2010 at 2:20 AM, Karl Kornacker kornac...@midohio.twcbc.com wrote: I'm running R-2.11 on 64-bit Windows 7. R-Forge does not show a Windows x86_64 version of sfit. I've emailed the R-forge team asking about the Win64 plans: https://r-forge.r-project.org/forum/forum.php?thread_id=2420forum_id=77 More importantly, could you please let me know what happens if you do: source(http://aroma-project.org/hbLite.R;) hbInstall(aroma.affymetrix) Do you get an error message? What happens if you do: install.packages(sfit, repos=http://R-Forge.R-project.org;) I don't have access to Windows 64-bit so I cannot test this myself. The reason why I ask is that I believe in the special case of 'sfit' it will still work using the 32-bit on a 64-bit system. This is because sfit is actually containing an executable (bin/cfit.exe), and isn't it that Win64 can run Win32 executables?The R binaries is just a dummy (libs/dummy.dll) that is never loaded by R. The reason for this rather special setup is historical and only for the sfit package. Could you please provide me with the above details? Then I can make decisions on what actions I should take next, e.g. do we need to build a specific Win64 version now, or can we wait for r-forge to do it for us and so on. /Henrik PS. The decision to (not) put 'sfit' on CRAN is not mine; the original author (Pratyaksha Wirapati) wish to migrate to the 'expectile' package and instead put that on CRAN. In order to make that move, we have to make sure to get fully sfit-reproducible results using expectile, and we still haven't tested it well enough. We have also identified convergence issues with the expectile code, causing the cross-talk calibration to fail in very rare cases when using expectile. I need to find enough problematic real-world cases in order for Pratyaksha to be able to troubleshoot it. The latter delay is due to me. Since sfit has way more CPU mileage, and there are no reported problems with it, we will use that as the default in aroma.affymetrix. -kk -Original Message- From: aroma-affymetrix@googlegroups.com [mailto:aroma-affymet...@googlegroups.com] On Behalf Of Henrik Bengtsson Sent: Thursday, April 22, 2010 7:46 PM To: aroma-affymetrix Subject: Re: [aroma.affymetrix] failure of AllelicCrosstalkCalibration on R-2.11 Hi. On Thu, Apr 22, 2010 at 11:13 PM, Karl Kornacker kornac...@midohio.twcbc.com wrote: The key function AllelicCrosstalkCalibration has stealth dependencies on additional packages (sfit and/or expectile) which are currently unavailable for R-2.11. When might updated versions of these packages for R-2.11 become available? What have you tried? What platform are you using? If you install aroma.affymetrix as explained on http://www.aroma-project.org/install you should get 'sfit'. If that for some reason should not work, 'sfit' can be installed manually from r-forge, cf. http://r-forge.r-project.org/R/?group_id=349 Hope this helps /Henrik Karl Kornacker -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2
Re: [aroma.affymetrix] failure of AllelicCrosstalkCalibration on R-2.11
Hi. On Fri, Apr 23, 2010 at 1:24 PM, Karl Kornacker kornac...@midohio.twcbc.com wrote: Henrik, Here are the error messages when attempting to load 32-bit versions of sfit and expectile under R-2.11: library(sfit) Error: package 'sfit' was built before R 2.10.0: please re-install it Q1. That doesn't look correct to me; it looks like you have an old version installed from somewhere else, and not from following one of the two installation options. Is that correct? Q2. What do you get if you do: packageDescription(sfit) That is probably pointing to a previously installed version?! Now, if I try (on my Win32 system, which should be the default on your Win64 system): install.packages(sfit, repos=http://R-Forge.R-project.org;, type=win64.binary); I get: Warning in install.packages(sfit, repos = http://R-Forge.R-project.org;, : argument 'lib' is missing: using 'C:\Users\hb/R/win-library/2.11' Warning: unable to access index for repository http://R-Forge.R-project.org/bin/windows64/contrib/2.11 Warning message: In getDependencies(pkgs, dependencies, available, lib) : package 'sfit' is not available Q3. In other words, nothing gets installed. Is that also what you get? Q4. If so, please try install.packages(sfit, repos=http://R-Forge.R-project.org;, type=win.binary); Does it install now? library(expectile) Error: package 'expectile' was built before R 2.10.0: please re-install it Don't worry about 'expectile'; you will not need it. This stealth dependency of Aroma.Affymetrix on unavailable packages remains hidden until a call to AllelicCrosstalkCalibration attempts to load the package specified by the flavor parameter. That is a design decision. Some packages are only loaded/required at the point when it is know the user really need the feature. I could setup aroma.affymetrix to require that all packages should be available/required upon install. However, many packages are optional (formally listed under 'Suggests' in DESCRIPTION), because they are rarely used/only used by some people in some studies. It would be annoying for those to have to install packages they don't need. The fewer package required, the fewer potential issues you will have. The only real alternative is to provide a validate function that a user can use to assert that all packages, even optional ones, are installed. That seems to be a feature R should provide and not aroma.* per se. I go half way, and have hbInstall(aroma.affymetrix) install some of the optional packages (including sfit), but if they are not installed, 99.9% of aroma.affymetrix will still work. Cheers, /Henrik Karl -Original Message- From: aroma-affymetrix@googlegroups.com [mailto:aroma-affymet...@googlegroups.com] On Behalf Of Henrik Bengtsson Sent: Friday, April 23, 2010 6:02 AM To: aroma-affymetrix Subject: Re: [aroma.affymetrix] failure of AllelicCrosstalkCalibration on R-2.11 On Fri, Apr 23, 2010 at 2:20 AM, Karl Kornacker kornac...@midohio.twcbc.com wrote: I'm running R-2.11 on 64-bit Windows 7. R-Forge does not show a Windows x86_64 version of sfit. I've emailed the R-forge team asking about the Win64 plans: https://r-forge.r-project.org/forum/forum.php?thread_id=2420forum_id=77 More importantly, could you please let me know what happens if you do: source(http://aroma-project.org/hbLite.R;) hbInstall(aroma.affymetrix) Do you get an error message? What happens if you do: install.packages(sfit, repos=http://R-Forge.R-project.org;) I don't have access to Windows 64-bit so I cannot test this myself. The reason why I ask is that I believe in the special case of 'sfit' it will still work using the 32-bit on a 64-bit system. This is because sfit is actually containing an executable (bin/cfit.exe), and isn't it that Win64 can run Win32 executables? The R binaries is just a dummy (libs/dummy.dll) that is never loaded by R. The reason for this rather special setup is historical and only for the sfit package. Could you please provide me with the above details? Then I can make decisions on what actions I should take next, e.g. do we need to build a specific Win64 version now, or can we wait for r-forge to do it for us and so on. /Henrik PS. The decision to (not) put 'sfit' on CRAN is not mine; the original author (Pratyaksha Wirapati) wish to migrate to the 'expectile' package and instead put that on CRAN. In order to make that move, we have to make sure to get fully sfit-reproducible results using expectile, and we still haven't tested it well enough. We have also identified convergence issues with the expectile code, causing the cross-talk calibration to fail in very rare cases when using expectile. I need to find enough problematic real-world cases in order for Pratyaksha to be able to troubleshoot it. The latter delay is due to me. Since sfit has way more CPU mileage, and there are no reported problems with it, we will use that as the default
Re: [aroma.affymetrix] failure of AllelicCrosstalkCalibration on R-2.11
Hi. On Thu, Apr 22, 2010 at 11:13 PM, Karl Kornacker kornac...@midohio.twcbc.com wrote: The key function AllelicCrosstalkCalibration has stealth dependencies on additional packages (sfit and/or expectile) which are currently unavailable for R-2.11. When might updated versions of these packages for R-2.11 become available? What have you tried? What platform are you using? If you install aroma.affymetrix as explained on http://www.aroma-project.org/install you should get 'sfit'. If that for some reason should not work, 'sfit' can be installed manually from r-forge, cf. http://r-forge.r-project.org/R/?group_id=349 Hope this helps /Henrik Karl Kornacker -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] Error ExtractDataFrame
Please report your sessionInfo(). /Henrik On Wed, Apr 21, 2010 at 11:51 AM, elodie elodie.chapeaubl...@gmail.com wrote: Hi, I try to use aroma.affymetrix for Human Exon chip with custom CDF. I tested several BrainArray custom CDF (refseq, ense, vegae). Before, I used convertCdf() to convert cdf in good format. With ense or vegae custom cdf, I have a error with extractDataFrame() method. I tested this code with only two HuEx-1_0-st-v2 chip before I run an analyse on all samples (230 samples) My R code : library(aroma.affymetrix) library(affxparser) cdf - AffymetrixCdfFile$byChipType(HuEx-1_0-st-v2) cs - AffymetrixCelSet$byName(vessie, cdf=cdf) setCdf(cs,cdf) bc - RmaBackgroundCorrection(cs, tag=coreR2) verbose - Arguments$getVerbose(-8, timestamp=TRUE) csBC - process(bc,verbose=verbose) qn - QuantileNormalization(csBC, typesToUpdate=pm) csN - process(qn, verbose=verbose) getCdf(csN) plmEx - ExonRmaPlm(csN, mergeGroups=FALSE) fit(plmEx, verbose=verbose, force=TRUE) #changement cesEx - getChipEffectSet(plmEx) ExFitdf - extractDataFrame(cesEx, units=NULL, addNames=TRUE) The last line returns this error : Erreur dans list(`extractDataFrame(cesEx, units = NULL, addNames = TRUE)` = environment, : [2010-04-21 11:43:33] Exception: Range of argument 'indices' is out of range [1,262144]: [1,304497] at throw(Exception(...)) at throw.default(sprintf(Range of argument '%s' is out of range [%s, %s]: [%s, at throw(sprintf(Range of argument '%s' is out of range [%s,%s]: [%s,%s], .n at getNumerics.Arguments(static, ..., asMode = integer, disallow = disallow) at getNumerics(static, ..., asMode = integer, disallow = disallow) at getIntegers.Arguments(static, x, ..., range = range, .name = .name) at getIntegers(static, x, ..., range = range, .name = .name) at method(static, ...) at Arguments$getIndices(indices, max = nbrOfCells, disallow = NaN) at readRawData.AffymetrixCelFile(this, ...) at readRawData(this, ...) at getData.AffymetrixCelFile(this, indices = map[, cell], fields = celFields at getData(this, indices = map[, cell], fields = celFields[fields]) at withCallingHandlers(expr, warning = function(w) invokeRestart(muffleWarnin at Can you help me to identify the problem and find a solution ? Thanks, -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] time to read the sequences of one chromosome in GWS6.0
Hi. On Tue, Apr 20, 2010 at 10:16 AM, mortiz mortiz...@gmail.com wrote: hi everyone, I need to read the sequences of the probes from the GWS6.0 chip and it has taken me more than 12 hours to do it only for 2 chromosomes. Im guessing im doing something wrong, because the basepairnormalization has to do the same and it doesnt take this long. this is what im doing: sessionInfo() R version 2.10.1 (2009-12-14) i386-pc-mingw32 locale: [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C LC_TIME=Spanish_Spain.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] MASS_7.3-4 aroma.affymetrix_1.5.0 aroma.apd_0.1.7 affxparser_1.18.0 R.huge_0.2.0 aroma.core_1.5.0 aroma.light_1.15.1 [8] matrixStats_0.1.9 R.rsp_0.3.6 R.cache_0.2.0 R.filesets_0.8.0 digest_0.4.2 R.utils_1.3.3 R.oo_1.6.7 [15] R.methodsS3_1.1.0 loaded via a namespace (and not attached): [1] tools_2.10.1 for (ii in 1:22){ units - getUnitsOnChromosome(gi, ii); cells - getCellIndices(cdf, units=units) auxSeqs - applyCdfGroups(cells, function(groups) lapply(groups, function(group) { readSequenceMatrix(acs, cells=group$indices)})) } A good rule of thumb in R is that whenever you use an apply function over a large number of elements you are most likely doing something very slow. More importantly (I think), in the above code, you are accessing the probe sequence file for every single unit group, and on SNP6 there are approximately 900,000*2+900,000 = 2,700,000 unit groups. Reading from file has some overhead, so you want to do as few requests as possible. Instead, read all of the sequence matrix first and then do your subsetting in memory. (In aroma.* we do this in chunks, but the idea is the same. We never read it one unit at the time.). Thus, the first speed up would be to do: acsData - readSequenceMatrix(acs); # One request instead of 2,7 millions. for (ii in 1:22) { units - getUnitsOnChromosome(gi, chromosome=ii); cells - getCellIndices(cdf, units=units); auxSeqs - applyCdfGroups(cells, function(groups) { lapply(groups, function(group) { cells - group$indices; acsData[cells,,drop=FALSE]; }); }); } # for (ii ...) That should speed things up. You still have too levels of apply:s that slows things down. What are you trying to do? Do you need the data in such a nested list structure? If this is specific to SNP CN chip types, you can write your algorithm/code to deal with SNPs and (single-cell) CN loci separately. /Henrik if anyone knows a faster way to do it, please let me know :) thanks ;) maria -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] problem with gcrma using HG-U133_Plus_2 CDF from affymetrix
Hi. On Thu, Apr 15, 2010 at 10:56 AM, step...@mnemosyne.co.uk step...@mnemosyne.co.uk wrote: Dear aroma users, I am trying to run gcrma across a collection of human breast cell line CEL files without success - code has worked in previous aroma versions, and is still contemporary with documented instructions at http://www.aroma-project.org . From a freshly started R instance - CDF file binary formatted and straight from Affymetrix. Normalisation process is fine with e.g. RMA, but for consistency with a related project I would prefer the results using gcrma. The run returns a simple missing object 'pb' error. library(aroma.affymetrix) cs - AffymetrixCelSet$byName(Breast, chipType=HG-U133_Plus_2); bc - GcRmaBackgroundCorrection(cs); csB - process(bc); Error in reset(pb) : object 'pb' not found Setting the 'verbose' argument to anything by FALSE, should workaround this bug, e.g. csB - process(bc, verbose=0); will give minimal output information (only a progressbar). If you don't mind the verbose output, use verbose=TRUE or similar. Details: The bug is in computeAffinities() for AffymetrixCdfFile. This bug has probably been around for a very long time, so I suspect that when you say code has worked in previous aroma versions, it could be that you did set the 'verbose' argument before [though I've been wrong before]. Almost all our redundancy tests are turning on the verbose output, which is why this passed unnoticed. BTW, for next time, make sure to also report traceback() after getting an error. That helps narrowing down the issue. This bug will be fixed for the next release; then you can skip 'verbose' and not even the progress bar will be outputted. Hope this helps /Henrik sessionInfo() R version 2.10.1 (2009-12-14) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] aroma.affymetrix_1.5.0 aroma.apd_0.1.7 affxparser_1.18.0 [4] R.huge_0.2.0 aroma.core_1.5.0 aroma.light_1.15.2 [7] matrixStats_0.1.9 R.rsp_0.3.6 R.cache_0.3.0 [10] R.filesets_0.8.0 digest_0.4.2 R.utils_1.3.3 [13] R.oo_1.7.1 R.methodsS3_1.2.0 loaded via a namespace (and not attached): [1] splines_2.10.1 tools_2.10.1 bc GcRmaBackgroundCorrection: Data set: Breast Input tags: User tags: * Asterisk ('*') tags: GRBC Output tags: GRBC Number of files: 84 (1085.65MB) Platform: Affymetrix Chip type: HG-U133_Plus_2 Algorithm parameters: (subsetToUpdate: NULL, typesToUpdate: chr pm, indicesNeg ativeControl: NULL, affinities: NULL, type: chr fullmodel, opticalAdjust: logi TRUE, gsbAdjust: logi TRUE, gsbParameters: NULL) Output path: probeData/Breast,GRBC/HG-U133_Plus_2 Is done: FALSE RAM: 0.00MB Is there any obvious reason for this FUBAR - do I wait for the next release of aroma? Cheers - greetings from Sunny Finland Stephen -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en To unsubscribe, reply using remove me as the subject.
Re: [aroma.affymetrix] Re: GCRMA normalization with MoEx-1_0-st-v1
Hi, sorry for not being clear; I never made the fix available, because I though it would help anyway, because don't want to use the standard CDF for this chip type either way. However, I realized that you can of course use it for the gcRMA background correction step, and from there use one of the custom CDFs. For example: library(aroma.affymetrix); verbose - Arguments$getVerbose(-10, timestamp=TRUE); dataSet - Affymetrix-Tissues; chipType - MoEx-1_0-st-v1; # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # Setup data set # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - cdf - AffymetrixCdfFile$byChipType(chipType, tags=coreR1,A20080718,MR); print(cdf); csR - AffymetrixCelSet$byName(dataSet, chipType=chipType); print(csR); # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # gcRMA-style background correction # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # Currently, you must use the standard CDF file. cdf - getCdf(csR); cdfS - AffymetrixCdfFile$byChipType(getChipType(cdf, fullname=FALSE)); setCdf(csR, cdfS); bc - GcRmaBackgroundCorrection(csR, type=affinities); print(bc); csB - process(bc, verbose=verbose); print(csB); # Now, use the custom CDF in what follows setCdf(csB, cdf); print(csB); (The above is now be part of the redundancy test suite of aroma.affymetrix). In order to install the patch, follow the instructions on http://aroma-project.org/howtos/updateOrPatch /Henrik On Wed, Mar 31, 2010 at 4:37 PM, Gil Tomás gil@gmail.com wrote: Thanks for your reply. I've downloaded the MoEx-1_0-st-v1.cdf (binary version of the unsupported CDF file from Affymetrix) from http://www.aroma-project.org/node/31. I used it to run the analysis and here's what I got: ** R ## * gcrma normalization R cdf - AffymetrixCdfFile$byChipType (MoEx-1_0-st-v1) # taken from http://www.aroma-project.org/node/31 R cs - AffymetrixCelSet$byName (affy-brain-dev, cdf = cdf) # produces cell set class object R R bc - GcRmaBackgroundCorrection (cs) R csB - process (bc, verbose = -10) # as suggested by Henrik Bengtsson Background correcting data set... Computing probe affinities... Computing GCRMA probe affinities for 1257006 units... Identify PMs and MMs among the CDF cell indices... logi [1:5266159] TRUE TRUE TRUE TRUE TRUE TRUE ... Mode FALSE TRUE NA's logical 334476 4931683 0 MMs are defined as non-PMs Number of PMs: 4931683 Number of MMs: 334476 Identify PMs and MMs among the CDF cell indices...done Reading probe-sequence data... Retrieving probe-sequence data... Chip type (full): MoEx-1_0-st-v1 Locating probe-tab file... Chip type: MoEx-1_0-st-v1 AffymetrixProbeTabFile: Name: MoEx-1_0-st-v1 Tags: Full name: MoEx-1_0-st-v1 Pathname: annotationData/chipTypes/MoEx-1_0-st-v1/NetAffx/ MoEx-1_0-st-v1.probe.tab File size: 460.47 MB (482839635 bytes) RAM: 0.01 MB Number of data rows: NA Columns [12]: 'probeID', 'probeSetID', 'probeXPos', 'probeYPos', 'assembly', 'seqname', 'start', 'stop', 'strand', 'probeSequence', 'targetStrandedness', 'category' Number of text lines: NA AffymetrixCdfFile: Path: annotationData/chipTypes/MoEx-1_0-st-v1/MoEx-1_0-st-v1 Filename: MoEx-1_0-st-v1.cdf Filesize: 274.30MB Chip type: MoEx-1_0-st-v1 RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 2560x2560 Number of cells: 6553600 Number of units: 1257006 Cells per unit: 5.21 Number of QC units: 0 Locating probe-tab file...done Validating probe-tab file against CDF... chr Unit name: Error in list(`process(bc, verbose = -10)` = environment, `process.GcRmaBackgroundCorrection(bc, verbose = -10)` = environment, : [2010-03-31 16:27:37] Exception: Either argument 'names' or 'pattern' must be specified. at throw(Exception(...)) at throw.default(Either argument 'names' or 'pattern' must be specified.) at throw(Either argument 'names' or 'pattern' must be specified.) at indexOf.UnitNamesFile(this, names = unitName) at indexOf(this, names = unitName) at getProbeSequenceData.AffymetrixCdfFile(this, safe = safe, verbose = verbose at getProbeSequenceData(this, safe = safe, verbose = verbose) at computeAffinities.AffymetrixCdfFile(cdf, paths = probePath, ..., verbose = at computeAffinities(cdf, paths = probePath, ..., verbose = less(verbose)) at bgAdjustGcrma.AffymetrixCelSet(NA, path = probeData/affy-brain- dev,GRBC/Mo at bgAdjustGcrma(NA, path = probeData/affy-brain-dev,GRBC/MoEx-1_0- st-v1, ve at do.call(bgAdjustGcrma, args = args) at process.GcRmaBackgroundCorrection(bc, verbose = -10) at process(bc, verbose = -10) In addition: Warning message: In readDataFrame.TabularTextFile(ptf, colClassPatterns = c(`^unitName $` = character), : Argument 'rows' was out of range [1,0]. Ignored rows beyond this range
Re: [aroma.affymetrix] Re: GCRMA normalization with MoEx-1_0-st-v1
Thanks. I've located and identified the problem. I fixed it for the case when you use the default CDF from Affymetrix. Unfortunately, it won't work in the case you use custom CDF, as you do. To solve that, we need to make more updates and I've already taken some actions for that, but this will take weeks before it's ready. Maybe it will be ready for the next big release of aroma.affymetrix. /Henrik On Tue, Mar 30, 2010 at 3:10 PM, Gil Tomás gil@gmail.com wrote: Sorry for the delayed reply: ** R ## * gcrma normalization R cdf - AffymetrixCdfFile$byChipType (MoEx-1_0-st- v1,fullR1,A20080718,MR) # taken from http://www.aroma-project.org/node/31 R cs - AffymetrixCelSet$byName (affy-brain-dev, cdf = cdf) # produces cell set class object R R bc - GcRmaBackgroundCorrection (cs) R csB - process (bc, verbose = -10) # as suggested by Henrik Bengtsson Background correcting data set... Computing probe affinities... Computing GCRMA probe affinities for 265508 units... Identify PMs and MMs among the CDF cell indices... logi [1:4565541] TRUE TRUE TRUE TRUE TRUE TRUE ... Mode TRUE NA's logical 4565541 0 MMs are defined as non-PMs Number of PMs: 4565541 Number of MMs: 0 Identify PMs and MMs among the CDF cell indices...done Reading probe-sequence data... Retrieving probe-sequence data... Chip type (full): MoEx-1_0-st-v1,fullR1,A20080718,MR Locating probe-tab file... Chip type: MoEx-1_0-st-v1 AffymetrixProbeTabFile: Name: MoEx-1_0-st-v1 Tags: Full name: MoEx-1_0-st-v1 Pathname: annotationData/chipTypes/MoEx-1_0-st-v1/NetAffx/ MoEx-1_0-st-v1.probe.tab File size: 460.47 MB (482839635 bytes) RAM: 0.01 MB Number of data rows: NA Columns [12]: 'probeID', 'probeSetID', 'probeXPos', 'probeYPos', 'assembly', 'seqname', 'start', 'stop', 'strand', 'probeSequence', 'targetStrandedness', 'category' Number of text lines: NA AffymetrixCdfFile: Path: annotationData/chipTypes/MoEx-1_0-st-v1/MoEx-1_0-st- v1,fullR1,A20080718,MR.cdf Filename: MoEx-1_0-st-v1,fullR1,A20080718,MR.cdf Filesize: 176.32MB Chip type: MoEx-1_0-st-v1,fullR1,A20080718,MR RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 2560x2560 Number of cells: 6553600 Number of units: 265508 Cells per unit: 24.68 Number of QC units: 1 Locating probe-tab file...done Validating probe-tab file against CDF... chr Unit name: Error in list(`process(bc, verbose = -10)` = environment, `process.GcRmaBackgroundCorrection(bc, verbose = -10)` = environment, : [2010-03-30 15:07:34] Exception: Either argument 'names' or 'pattern' must be specified. at throw(Exception(...)) at throw.default(Either argument 'names' or 'pattern' must be specified.) at throw(Either argument 'names' or 'pattern' must be specified.) at indexOf.UnitNamesFile(this, names = unitName) at indexOf(this, names = unitName) at getProbeSequenceData.AffymetrixCdfFile(this, safe = safe, verbose = verbose at getProbeSequenceData(this, safe = safe, verbose = verbose) at computeAffinities.AffymetrixCdfFile(cdf, paths = probePath, ..., verbose = at computeAffinities(cdf, paths = probePath, ..., verbose = less(verbose)) at bgAdjustGcrma.AffymetrixCelSet(NA, path = probeData/affy-brain- dev,GRBC/Mo at bgAdjustGcrma(NA, path = probeData/affy-brain-dev,GRBC/MoEx-1_0- st-v1, ve at do.call(bgAdjustGcrma, args = args) at process.GcRmaBackgroundCorrection(bc, verbose = -10) at process(bc, verbose = -10) In addition: Warning message: In readDataFrame.TabularTextFile(ptf, colClassPatterns = c(`^unitName $` = character), : Argument 'rows' was out of range [1,0]. Ignored rows beyond this range. Validating probe-tab file against CDF...done Retrieving probe-sequence data...done Reading probe-sequence data...done Computing GCRMA probe affinities for 265508 units...done Computing probe affinities...done Background correcting data set...done ** On Mar 25, 7:17 pm, Henrik Bengtsson henrik.bengts...@gmail.com wrote: What's the verbose output, if you do: csB - process(bc, verbose=-10) /H On Thu, Mar 25, 2010 at 4:34 PM, Gil Tomás gil@gmail.com wrote: Thank you much for your reply Henrik. Each of your comments were very enlightening. First of all, the documentation source I'm following is that from the official aroma.affymetrix site, particularly the section documenting reproducible research for the gcRMA code (http://aroma- project.org/replication/gcRMA). I then redefined the filesystem of the project according to your instructions (keeping the cdf file with the tag comma separated nomenclature) and reran the code: ** R cdf - AffymetrixCdfFile$byChipType (MoEx-1_0-st- v1,fullR1,A20080718,MR) # taken fromhttp://www.aroma-project.org/node/31 R cs - AffymetrixCelSet$byName (affy-brain-dev
[aroma.affymetrix] FYI: If the aroma.affymetrix mailing list goes down...
Hi, in a few moments, I will update a few pages on the aroma.affymetrix Google Group so that they point to the new website http://www.aroma-project.org/. This may cause the aroma.affymetrix mailing list/forum to go down, meaning if you try to post a message a message will bounce back to you with an error message. If this happens, we will try to fix it asap, but if so, we will be in hands of Google to solve it. Make sure to follow updates on: http://www.aroma-project.org/ For a background and reasons for previous forum hiccups, see http://aroma-project.org/forum/GoogleGroup/KnownIssues Hopefully nothing goes wrong... /Henrik -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en To unsubscribe from this group, send email to aroma-affymetrix+unsubscribegooglegroups.com or reply to this email with the words REMOVE ME as the subject.
Re: [aroma.affymetrix] saturated Affy 500K SNP array signals?
Hi. On Fri, Mar 26, 2010 at 9:17 PM, Louie van de Lagemaat louie...@gmail.com wrote: Hi Henrik et al, I have been reanalyzing an older dataset of Mapping500K (Nsp+Sty) arrays for CNVs using aroma, and this works in general really well. However, I have noticed that overall many individuals in this particular dataset appear to have CNVs highly skewed toward deletions. Only a few individuals seem to have the expected balance of insertions and deletions. It is not clear from your description what the problem is. What to do you mean by skewed toward deletions? Do you mean there are a higher number of deleted regions, or do you mean that the CN mean levels are shifted down away from CN=2, or something else? Is it possible that the samples that show almost exclusively deletions are saturated? If the arrays are indeed saturated, is there a way around this that is implemented in aroma? Or is there a better explanation? Some signal intensity distributions are in the png attached. To me those distributions looks alright; could you explain which one of those plots looks funny to you and why you think so, and I might be able to clarify further. /Henrik Thanks in advance for any help or ideas you can offer, Louie van de Lagemaat Sanger Fellow Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA --- # PS, here's the script I use: library(aroma.affymetrix) library(aroma.cn) verbose - Arguments$getVerbose(-8, timestamp=TRUE) setOption(aromaSettings, memory/ram, 50) setOption(aromaSettings, memory/gcArrayFrequency, 20) cs - AffymetrixCelSet$byName(ProjName, chipType=Mapping250K_Nsp) IntermediateResults[[csNsp]] - extract(cs, !isDuplicated(cs)) cs - AffymetrixCelSet$byName(ProjName, chipType=Mapping250K_Sty) IntermediateResults[[csSty]] - extract(cs, !isDuplicated(cs)) # ACC acc - AllelicCrosstalkCalibration(IntermediateResults[[csNsp]], model=CRMAv2) IntermediateResults[[csCNsp]] - process(acc, verbose=verbose) acc - AllelicCrosstalkCalibration(IntermediateResults[[csSty]], model=CRMAv2) IntermediateResults[[csCSty]] - process(acc, verbose=verbose) # BPN bpn - BasePositionNormalization(IntermediateResults[[csCNsp]], target=zero) IntermediateResults[[csNNsp]] - process(bpn, verbose=verbose) bpn - BasePositionNormalization(IntermediateResults[[csCSty]], target=zero) IntermediateResults[[csNSty]] - process(bpn, verbose=verbose) # QN qn - QuantileNormalization(IntermediateResults[[csNNsp]]) IntermediateResults[[csQNsp]] - process(qn, verbose=verbose) qn - QuantileNormalization(IntermediateResults[[csNSty]]) IntermediateResults[[csQSty]] - process(qn, verbose=verbose) # probe level model plm - RmaCnPlm(IntermediateResults[[csQNsp]], combineAlleles=TRUE, mergeStrands=TRUE) fit(plm, verbose=verbose) IntermediateResults[[plmNsp]] = plm plm - RmaCnPlm(IntermediateResults[[csQSty]], combineAlleles=TRUE, mergeStrands=TRUE) fit(plm, verbose=verbose) IntermediateResults[[plmSty]] = plm # fragment length normalization cesNList - list() ces - getChipEffectSet(IntermediateResults[[plmNsp]]) fln - FragmentLengthNormalization(ces, target=zero) cesNList[[Mapping250K_Nsp]] - process(fln, verbose=verbose) ces - getChipEffectSet(IntermediateResults[[plmSty]]) fln - FragmentLengthNormalization(ces, target=zero) cesNList[[Mapping250K_Sty]] - process(fln, verbose=verbose) # get male reference - an all-male sample IntermediateResults[[ceRefNspM]] - calculateBaseline(cesNList[[Mapping250K_Nsp]], chromosomes=1:22, ploidy=2, defaultPloidy=2, verbose=verbose) IntermediateResults[[ceRefStyM]] - calculateBaseline(cesNList[[Mapping250K_Sty]], chromosomes=1:22, ploidy=2, defaultPloidy=2, verbose=verbose) IntermediateResults[[ceRefNspM]] - calculateBaseline(cesNList[[Mapping250K_Nsp]], chromosomes=23, ploidy=1, defaultPloidy=1, verbose=verbose) IntermediateResults[[ceRefStyM]] - calculateBaseline(cesNList[[Mapping250K_Sty]], chromosomes=23, ploidy=1, defaultPloidy=1, verbose=verbose) # call CNVs using both CBS and GLAD, for comparison CbsSegM - CbsModel(cesNList, list(Mapping250K_Nsp = IntermediateResults[[ceRefNspM]], Mapping250K_Sty = IntermediateResults[[ceRefStyM]])) writeRegions(CbsSegM, chromosomes = 1:23, verbose = verbose) GladSegM - GladModel(cesNList, list(Mapping250K_Nsp = IntermediateResults[[ceRefNspM]], Mapping250K_Sty = IntermediateResults[[ceRefStyM]])) writeRegions(GladSegM, chromosomes = 1:23, verbose = verbose) -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send
Re: [aroma.affymetrix] Alternatives to quantile normalization? (Was: Re: [aroma.affymetrix] ProbeLevelTransform subclasses: how to use)
Hi, can you post your complete script where you go from CEL files to how you generate those MA plots? /Henrik On Sat, Mar 27, 2010 at 12:57 AM, Richard Beyer rpbe...@gmail.com wrote: Hi Henrik, I have attached 3 png files. Two density plots of raw intensities: for all probes and for pm probes: name = AndersonRatST_10.03.12 chip=RaGene-1_0-st-v1 checkChipType=FALSE cs - AffymetrixCelSet$byName(name, chipType=chip, checkChipType=checkChipType) graphics.off() png(file=Anderson rat ST raw intensity all probes 26mar10.png,width = 1240, height = 1240, units = px, pointsize = 16, bg = white, res = NA) plotDensity(cs,col=cols1,ylim=c(0,0.6),xlim=c(4,15),main=raw intensity all probes) legend(10,0.6,legend= as.character(paste(grps,1:35)),fill=cols1,ncol=3) graphics.off() png(file=Anderson rat ST raw intensity pm probes 26mar10.png,width = 1240, height = 1240, units = px, pointsize = 16, bg = white, res = NA) plotDensity(cs,col=cols1,ylim=c(0,0.4),types=pm,xlim=c(4,12),main=raw intensity pm probes) legend(10,0.4,legend= as.character(paste(grps,1:35)),fill=cols1,ncol=3) graphics.off() I also attached a png file that is a MA plot for the output of the limma analysis. In this experiment there are 7 groups: 2A 2B 2C 4A 4B 4C Sham. ( I didn't use your more elegant way of generating the plots, I just plotted the topTable results from limma.) The group names for the various contrasts are also shown in the density plots. There are 5 arrays for each group. The weird MA plots are the middle two in the bottom row of the MA plots png file. They are for the contrasts: 4A-4B and 4A-4C. Actually all 12 MA plots look a bit weird, meaning the lens shaped cloud of points is not centered about the zero line. We have exhaustively checked the wet lab QC and everything looks good. I appreciate you having a look at these figures. Please let me know if you can suggest some further analysis. I was even wondering about doing quantile normalization at the probeset level, rather than the probe level. I am puzzled. Thanks very much, Dick On Fri, Mar 26, 2010 at 2:18 AM, Henrik Bengtsson henrik.bengts...@gmail.com wrote: Hi, On Thu, Mar 25, 2010 at 7:20 PM, Richard Beyer rpbe...@gmail.com wrote: Hi Henrik, I'm quite enjoying aroma-project.org. Thanks for your detailed help. I am making some progress now. I think my dataset has some issues that are becoming clearer. I'm trying QuantileNormalization(csBC, typesToUpdate=pm, tags=c(*, type)), using just the pm probes. Maybe this will help. so when doing that, what does the density plots looks afterward, and more interestingly, what does the M vs A plots for the PM signals look like? You can get hold of the index vector for the PM probes by: cdf - getCdf(cs); cells - getCellIndices(cdf, stratifyBy=pm, unlist=TRUE, useNames=FALSE); Then you can plot the M vs A for any pair of CEL files as: cfT - getFile(cs, 1); cfR - getFile(cs, 2); smoothScatterMvsA(cfT, cfR, indices=cells); (or plotMvsA(...) for a scatter plot). If you want to compare to the pool of all arrays, do: cfR - getAverageFile(cs); /Henrik PS. Please post PNGs instead of PDFs, because they are smaller. I've attached a pdf file that shows the raw intensities with and without the control spots: plotDensity(cs,col=rainbow(35),ylim=c(0,0.7)) legend(0,0.7,legend= as.character(1:35),fill=rainbow(35)) plotDensity(cs,col=rainbow(35),ylim=c(0,0.7),types=pm) legend(0,0.7,legend= as.character(1:35),fill=rainbow(35)) Cheers, Dick On Thu, Mar 25, 2010 at 10:06 AM, Henrik Bengtsson henrik.bengts...@gmail.com wrote: Hi. On Thu, Mar 25, 2010 at 5:02 PM, Richard Beyer rpbe...@gmail.com wrote: Hi Henrik, Thanks very much for your help. I saw your post about the new documentation link after I wrote my question. So I will look through that. What you are saying is very helpful even though I wasn't thinking quite so large scale or ambitious. Ok. I have an immediate problem with seeing very weird results on a dataset of 35 rat ST arrays. When I run the RmaBackgroundCorrection(), QuantileNormalization() normalized data through limma and do MA plots of one group against another, the main cloud of data points is not the expected lens shape centered about the origin. The shape is more like a s-wave with part of the cloud of points above and part below the zero line. All the arrays pass Affy QC as done by Expression Console and they seem fine when I plot NUSE and RLE. Judging from the shapes seen in the MA plots, my first reaction is that the assumption of most-probesets-are-unchanged is not being enforced by the quantile normalization step. So, I wanted to just try a few reasonable alternatives to the quantile normalization step. In addition, I think I've seen less pronounced versions of these s-wave shapes in MA plots from ST arrays in other data sets, but not nearly so pronounced
Re: [aroma.affymetrix] Alternatives to quantile normalization? (Was: Re: [aroma.affymetrix] ProbeLevelTransform subclasses: how to use)
Hi, On Thu, Mar 25, 2010 at 7:20 PM, Richard Beyer rpbe...@gmail.com wrote: Hi Henrik, I'm quite enjoying aroma-project.org. Thanks for your detailed help. I am making some progress now. I think my dataset has some issues that are becoming clearer. I'm trying QuantileNormalization(csBC, typesToUpdate=pm, tags=c(*, type)), using just the pm probes. Maybe this will help. so when doing that, what does the density plots looks afterward, and more interestingly, what does the M vs A plots for the PM signals look like? You can get hold of the index vector for the PM probes by: cdf - getCdf(cs); cells - getCellIndices(cdf, stratifyBy=pm, unlist=TRUE, useNames=FALSE); Then you can plot the M vs A for any pair of CEL files as: cfT - getFile(cs, 1); cfR - getFile(cs, 2); smoothScatterMvsA(cfT, cfR, indices=cells); (or plotMvsA(...) for a scatter plot). If you want to compare to the pool of all arrays, do: cfR - getAverageFile(cs); /Henrik PS. Please post PNGs instead of PDFs, because they are smaller. I've attached a pdf file that shows the raw intensities with and without the control spots: plotDensity(cs,col=rainbow(35),ylim=c(0,0.7)) legend(0,0.7,legend= as.character(1:35),fill=rainbow(35)) plotDensity(cs,col=rainbow(35),ylim=c(0,0.7),types=pm) legend(0,0.7,legend= as.character(1:35),fill=rainbow(35)) Cheers, Dick On Thu, Mar 25, 2010 at 10:06 AM, Henrik Bengtsson henrik.bengts...@gmail.com wrote: Hi. On Thu, Mar 25, 2010 at 5:02 PM, Richard Beyer rpbe...@gmail.com wrote: Hi Henrik, Thanks very much for your help. I saw your post about the new documentation link after I wrote my question. So I will look through that. What you are saying is very helpful even though I wasn't thinking quite so large scale or ambitious. Ok. I have an immediate problem with seeing very weird results on a dataset of 35 rat ST arrays. When I run the RmaBackgroundCorrection(), QuantileNormalization() normalized data through limma and do MA plots of one group against another, the main cloud of data points is not the expected lens shape centered about the origin. The shape is more like a s-wave with part of the cloud of points above and part below the zero line. All the arrays pass Affy QC as done by Expression Console and they seem fine when I plot NUSE and RLE. Judging from the shapes seen in the MA plots, my first reaction is that the assumption of most-probesets-are-unchanged is not being enforced by the quantile normalization step. So, I wanted to just try a few reasonable alternatives to the quantile normalization step. In addition, I think I've seen less pronounced versions of these s-wave shapes in MA plots from ST arrays in other data sets, but not nearly so pronounced as this one. The end result is I'm stuck and puzzled. In theory quantile normalization should do a decent job of making the log-ratios independent of the log-intensities, that is, the cloud in an M vs. A scatter plot should be fairly straight (with the possible exception at very weak signals or very large signals). If you really want to dive into the arguments, see: H. Bengtsson O. Hössjer, Methodological study of affine transformations of gene expression data with proposed robust non-parametric multi-dimensional normalization method. BMC Bioinformatics, 2006. [http://www.aroma-project.org/publications] Since you don't get this, I would first make sure that you are plotting the same data points that you are normalizing. Note that quantile normalization can be done on PMs only, on all probes etc. See page 'Empirical probe-signal densities and rank-based quantile normalization' for how different settings give different normalization outputs. Hope this helps /Henrik PS. It is possible to attached PNGs to emails to this list; you may want to share your figures. Thanks again, Dick On Thu, Mar 25, 2010 at 8:26 AM, Henrik Bengtsson henrik.bengts...@gmail.com wrote: Hi. On Thu, Mar 25, 2010 at 5:30 AM, dbe...@u.washington.edu rpbe...@gmail.com wrote: Hello, I would like to get more info, perhaps example calls, on the various subclasses of ProbeLevelTransform. I see from a previous post by Mark Robinson, I have the examples: if(doNorm){ bc - RmaBackgroundCorrection(cs) csBC - process(bc,verbose=verbose,force=force) setCdf(csBC, cdf) qn - QuantileNormalization(csBC, typesToUpdate=pm) csN - process(qn, verbose=verbose,force=force) #time required setCdf(csN, cdf) } What I'd like to be able to do is something akin to what I used to be able to do with the affy expresso call. That is, specify different background methods, different normalization methods, such as invariant set, rma, constant, etc. It sounds like you wish to setup a high-level API providing wrappers for common preprocessing sequences/pipelines. There has been some independent attempts by us doing this, but we haven't done a serious attempt
Re: [aroma.affymetrix] GCRMA normalization with MoEx-1_0-st-v1
Hi, GCRMA is not fully supported for all chip types, and I haven't checked if MoEx-1_0-st-v1 is one. But, first, lets fix some other mistakes you're doing. On Wed, Mar 24, 2010 at 4:48 PM, Gil Tomás gil@gmail.com wrote: Dear all, I am trying to normalize a dataset hybridized with MoEx-1_0-st-v1 with GCRMA on aroma.affymetrix. Here's the code I'm using to do so: I assume you have put this together from what you have found online; if there is a particular source/webpage/manual where you've found this, please let me know so we can make sure it is corrected. ** R prj.dir - /Users/giltomas/projects/brain-dev/raw-data/aroma- affymetrix # sets up the project directory R setwd (prj.dir) R library (aroma.affymetrix) R ## * gcrma normalization R cdf - AffymetrixCdfFile$byChipType (MoEx-1_0-st- v1.fullR1.A20080718.MR.bin) # taken from http://www.aroma-project.org/node/31 and convert to binary with convertCdf Here you are trying to load a CDF for a chip type named 'MoEx-1_0-st-v1.fullR1.A20080718.MR.bin'. Affymetrix does not produce such a *chip type*, though they do produce a chip type named 'MoEx-1_0-st-v1'. There are three things you've probably misunderstood above: (1) The terms *chip type* and *CDF* are not the same, cf. page 'Differences between chip type and chip definition file (CDF)': http://aroma-project.org/definitions/chipTypesAndCDFs (2) You have a CDF file named 'MoEx-1_0-st-v1.fullR1.A20080718.MR.bin.cdf'. On page 'Chip type: MoEx-1_0-st-v1': http://aroma-project.org/chipTypes/MoEx-1_0-st-v1 (same as your URL) there is a MoEx-1_0-st-v1,fullR1,A20080718,MR.cdf file. It looks like you have: (2a) replaced the commas with periods - *do not do that*. The aroma framework use well structured comma-seperated filenames, cf. page 'Definition: Fullnames, names and tags of directories and files': http://aroma-project.org/node/77 (2b) Converted a CDF that is already in a binary format using convertCdf(). FYI, all CDFs created/provided by us are already in a binary format; it is only a few unofficial CDF provided by Affymetrix that come in the ASCII/text format. Thus, you want to have a directory structure as: annotationData/ chipTypes/ MoEx-1_0-st-v1/ MoEx-1_0-st-v1,fullR1,A20080718,MR.cdf See how the chip type directory has the same name as the *name* part (before first comma) as the CDF file. See (2a) above. Same applies to your raw data, you want to have the data set directory as: rawData/ affy-brain-dev/ MoEx-1_0-st-v1/ *.CEL files Have a look at the pages at http://aroma-project.org/setup which should clarify this further. ...then proceed with the rest. /Henrik PS. FYI, the approach you have done *may* have worked in other setups, but it is incorrect and should not be done. You are likely to run into problems sooner or later. R cs - AffymetrixCelSet$byName (affy-brain-dev, cdf = cdf) # produces cell set class object R bc - GcRmaBackgroundCorrection (cs) R csB - process (bc) Error in list(`process(bc)` = environment, `process.GcRmaBackgroundCorrection(bc)` = environment, : [2010-03-24 16:36:00] Exception: Either argument 'names' or 'pattern' must be specified. at throw(Exception(...)) at throw.default(Either argument 'names' or 'pattern' must be specified.) at throw(Either argument 'names' or 'pattern' must be specified.) at indexOf.UnitNamesFile(this, names = unitName) at indexOf(this, names = unitName) at getProbeSequenceData.AffymetrixCdfFile(this, safe = safe, verbose = verbose at getProbeSequenceData(this, safe = safe, verbose = verbose) at computeAffinities.AffymetrixCdfFile(cdf, paths = probePath, ..., verbose = at computeAffinities(cdf, paths = probePath, ..., verbose = less(verbose)) at bgAdjustGcrma.AffymetrixCelSet(NA, path = probeData/affy-brain- dev,GRBC/Mo at bgAdjustGcrma(NA, path = probeData/affy-brain-dev,GRBC/MoEx-1_0- st-v1.full at do.call(bgAdjustGcrma, args = args) at process.GcRmaBackgroundCorrection(bc) at process(bc) In addition: Warning message: In readDataFrame.TabularTextFile(ptf, colClassPatterns = c(`^unitName $` = character), : Argument 'rows' was out of range [1,0]. Ignored rows beyond this range. R traceback () 15: throw.Exception(Exception(...)) 14: throw(Exception(...)) 13: throw.default(Either argument 'names' or 'pattern' must be specified.) 12: throw(Either argument 'names' or 'pattern' must be specified.) 11: indexOf.UnitNamesFile(this, names = unitName) 10: indexOf(this, names = unitName) 9: getProbeSequenceData.AffymetrixCdfFile(this, safe = safe, verbose = verbose) 8: getProbeSequenceData(this, safe = safe, verbose = verbose) 7: computeAffinities.AffymetrixCdfFile(cdf, paths = probePath, ..., verbose = less(verbose)) 6: computeAffinities(cdf, paths = probePath, ..., verbose = less(verbose)) 5: bgAdjustGcrma.AffymetrixCelSet(NA, path = probeData/affy-brain- dev,GRBC/MoEx-1_0-st-v1.fullR1.A20080718.MR.bin,
Re: [aroma.affymetrix] Is it correct to do analyze different expression data based on the same platform on the same time?
Hi. On Sun, Mar 14, 2010 at 3:04 AM, Yong pkuonl...@gmail.com wrote: Hi Everyone, I kind of remember that it is difficult or not correct to analyze multiple datasets based on different array design like hgu133plus2 and HuEx-1_0-st-v2. So, I am wondering whether it is OK to pool different experiments together if they are based on the same array design, such as HuEx-1_0-st-v2. Specifically, if we follow the standard routine, i.e., RmaBackgroundCorrection + QuantileNormalization + ExonRmaPlm + getChipEffectSet , could we still filter those experiment specific factors and make these experiment comparable? Are you aware of: M.D. Robinson T.P. Speed. A comparison of Affymetrix gene expression arrays. BMC Bioinformatics, 2007, 8, 449. Available via: http://aroma-project.org/publications Maybe that help you answer your question/problem. /Henrik Many thanks ahead. Yong Zhang University of Chicago -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en To unsubscribe from this group, send email to aroma-affymetrix+unsubscribegooglegroups.com or reply to this email with the words REMOVE ME as the subject.
Re: [aroma.affymetrix] Alternatives to quantile normalization? (Was: Re: [aroma.affymetrix] ProbeLevelTransform subclasses: how to use)
Hi. On Thu, Mar 25, 2010 at 5:02 PM, Richard Beyer rpbe...@gmail.com wrote: Hi Henrik, Thanks very much for your help. I saw your post about the new documentation link after I wrote my question. So I will look through that. What you are saying is very helpful even though I wasn't thinking quite so large scale or ambitious. Ok. I have an immediate problem with seeing very weird results on a dataset of 35 rat ST arrays. When I run the RmaBackgroundCorrection(), QuantileNormalization() normalized data through limma and do MA plots of one group against another, the main cloud of data points is not the expected lens shape centered about the origin. The shape is more like a s-wave with part of the cloud of points above and part below the zero line. All the arrays pass Affy QC as done by Expression Console and they seem fine when I plot NUSE and RLE. Judging from the shapes seen in the MA plots, my first reaction is that the assumption of most-probesets-are-unchanged is not being enforced by the quantile normalization step. So, I wanted to just try a few reasonable alternatives to the quantile normalization step. In addition, I think I've seen less pronounced versions of these s-wave shapes in MA plots from ST arrays in other data sets, but not nearly so pronounced as this one. The end result is I'm stuck and puzzled. In theory quantile normalization should do a decent job of making the log-ratios independent of the log-intensities, that is, the cloud in an M vs. A scatter plot should be fairly straight (with the possible exception at very weak signals or very large signals). If you really want to dive into the arguments, see: H. Bengtsson O. Hössjer, Methodological study of affine transformations of gene expression data with proposed robust non-parametric multi-dimensional normalization method. BMC Bioinformatics, 2006. [http://www.aroma-project.org/publications] Since you don't get this, I would first make sure that you are plotting the same data points that you are normalizing. Note that quantile normalization can be done on PMs only, on all probes etc. See page 'Empirical probe-signal densities and rank-based quantile normalization' for how different settings give different normalization outputs. Hope this helps /Henrik PS. It is possible to attached PNGs to emails to this list; you may want to share your figures. Thanks again, Dick On Thu, Mar 25, 2010 at 8:26 AM, Henrik Bengtsson henrik.bengts...@gmail.com wrote: Hi. On Thu, Mar 25, 2010 at 5:30 AM, dbe...@u.washington.edu rpbe...@gmail.com wrote: Hello, I would like to get more info, perhaps example calls, on the various subclasses of ProbeLevelTransform. I see from a previous post by Mark Robinson, I have the examples: if(doNorm){ bc - RmaBackgroundCorrection(cs) csBC - process(bc,verbose=verbose,force=force) setCdf(csBC, cdf) qn - QuantileNormalization(csBC, typesToUpdate=pm) csN - process(qn, verbose=verbose,force=force) #time required setCdf(csN, cdf) } What I'd like to be able to do is something akin to what I used to be able to do with the affy expresso call. That is, specify different background methods, different normalization methods, such as invariant set, rma, constant, etc. It sounds like you wish to setup a high-level API providing wrappers for common preprocessing sequences/pipelines. There has been some independent attempts by us doing this, but we haven't done a serious attempt in standardizing this. An *important* objective is that whenever providing such wrappers, we should make sure that they replicate existing implementations as well as possible. For instance, if you setup an expresso() method operating on aroma.affymetrix classes, you want to make sure it can replicate the results of expresso() in the affy package, otherwise you will just add lots of confusion out there. For some methods, we have do assert near-perfect reproducibility, cf. http://aroma-project.org/replication If you want to provide an expresso() method for AffymetrixCelSet objects, I suggest that simply implement case by case using the scripts provided in the online vignettes and for the case we can guarantee to replicate affy exactly. All other cases should throw an error. For each case provided there should be at least one redundancy test so that we can assert that the reproducibility is guaranteed whenever we release a new version of aroma.affymetrix/other packages are updated. After this we can discuss the missing cases and add support for them one by one. One way you can start is to override the expresso() method like this using S3 dispatching: # This will make sure expresso() of affy is called whenever an AffyBatch is used. setMethodS3(expresso, AffyBatch, function(...) { affy::expresso(...); }); # This is your expresso() metod for AffymetrixCelSet objects. setMethodS3(expresso, AffymetrixCelSet, function(cs
Re: [aroma.affymetrix] Re: GCRMA normalization with MoEx-1_0-st-v1
What's the verbose output, if you do: csB - process(bc, verbose=-10) /H On Thu, Mar 25, 2010 at 4:34 PM, Gil Tomás gil@gmail.com wrote: Thank you much for your reply Henrik. Each of your comments were very enlightening. First of all, the documentation source I'm following is that from the official aroma.affymetrix site, particularly the section documenting reproducible research for the gcRMA code (http://aroma- project.org/replication/gcRMA). I then redefined the filesystem of the project according to your instructions (keeping the cdf file with the tag comma separated nomenclature) and reran the code: ** R cdf - AffymetrixCdfFile$byChipType (MoEx-1_0-st- v1,fullR1,A20080718,MR) # taken from http://www.aroma-project.org/node/31 R cs - AffymetrixCelSet$byName (affy-brain-dev, cdf = cdf) # produces cell set class object R print (cs) AffymetrixCelSet: Name: affy-brain-dev Tags: Path: rawData/affy-brain-dev/MoEx-1_0-st-v1 Platform: Affymetrix Chip type: MoEx-1_0-st-v1,fullR1,A20080718,MR Number of arrays: 12 Names: hyb7808_(MoEx-1_0-st-v1), hyb7809_(MoEx-1_0-st-v1), ..., hyb7819_(MoEx-1_0-st-v1) Time period: 2009-11-19 11:00:18 -- 2009-11-20 13:15:37 Total file size: 752.61MB RAM: 0.01MB R bc - GcRmaBackgroundCorrection (cs) R csB - process (bc) Error in list(`process(bc)` = environment, `process.GcRmaBackgroundCorrection(bc)` = environment, : [2010-03-25 16:17:22] Exception: Either argument 'names' or 'pattern' must be specified. at throw(Exception(...)) at throw.default(Either argument 'names' or 'pattern' must be specified.) at throw(Either argument 'names' or 'pattern' must be specified.) at indexOf.UnitNamesFile(this, names = unitName) at indexOf(this, names = unitName) at getProbeSequenceData.AffymetrixCdfFile(this, safe = safe, verbose = verbose at getProbeSequenceData(this, safe = safe, verbose = verbose) at computeAffinities.AffymetrixCdfFile(cdf, paths = probePath, ..., verbose = at computeAffinities(cdf, paths = probePath, ..., verbose = less(verbose)) at bgAdjustGcrma.AffymetrixCelSet(NA, path = probeData/affy-brain- dev,GRBC/Mo at bgAdjustGcrma(NA, path = probeData/affy-brain-dev,GRBC/MoEx-1_0- st-v1, ve at do.call(bgAdjustGcrma, args = args) at process.GcRmaBackgroundCorrection(bc) at process(bc) In addition: Warning message: In readDataFrame.TabularTextFile(ptf, colClassPatterns = c(`^unitName $` = character), : Argument 'rows' was out of range [1,0]. Ignored rows beyond this range. R traceback () 15: throw.Exception(Exception(...)) 14: throw(Exception(...)) 13: throw.default(Either argument 'names' or 'pattern' must be specified.) 12: throw(Either argument 'names' or 'pattern' must be specified.) 11: indexOf.UnitNamesFile(this, names = unitName) 10: indexOf(this, names = unitName) 9: getProbeSequenceData.AffymetrixCdfFile(this, safe = safe, verbose = verbose) 8: getProbeSequenceData(this, safe = safe, verbose = verbose) 7: computeAffinities.AffymetrixCdfFile(cdf, paths = probePath, ..., verbose = less(verbose)) 6: computeAffinities(cdf, paths = probePath, ..., verbose = less(verbose)) 5: bgAdjustGcrma.AffymetrixCelSet(NA, path = probeData/affy-brain- dev,GRBC/MoEx-1_0-st-v1, verbose = FALSE, overwrite = FALSE, subsetToUpdate = NULL, typesToUpdate = pm, indicesNegativeControl = NULL, affinities = NULL, type = fullmodel, opticalAdjust = TRUE, gsbAdjust = TRUE, gsbParameters = NULL, .deprecated = FALSE) 4: bgAdjustGcrma(NA, path = probeData/affy-brain-dev,GRBC/MoEx-1_0-st- v1, verbose = FALSE, overwrite = FALSE, subsetToUpdate = NULL, typesToUpdate = pm, indicesNegativeControl = NULL, affinities = NULL, type = fullmodel, opticalAdjust = TRUE, gsbAdjust = TRUE, gsbParameters = NULL, .deprecated = FALSE) 3: do.call(bgAdjustGcrma, args = args) 2: process.GcRmaBackgroundCorrection(bc) 1: process(bc) *** Now that my modus operandi conforms to your prescribed norm, I still observe an error message that is very much the same as the one before. Could you give me a hint as to why it occurs? Is it that because the gcRMA implementation of aroma.affymetrix doesn't support the MoEx-1_0- st-v1 chip? How could I infer that? On Mar 25, 3:26 pm, Henrik Bengtsson henrik.bengts...@gmail.com wrote: Hi, GCRMA is not fully supported for all chip types, and I haven't checked if MoEx-1_0-st-v1 is one. But, first, lets fix some other mistakes you're doing. On Wed, Mar 24, 2010 at 4:48 PM, Gil Tomás gil@gmail.com wrote: Dear all, I am trying to normalize a dataset hybridized with MoEx-1_0-st-v1 with GCRMA on aroma.affymetrix. Here's the code I'm using to do so: I assume you have put this together from what you have found online; if there is a particular source/webpage/manual where you've found this, please let me know so we can make sure it is corrected. ** R prj.dir - /Users/giltomas/projects/brain-dev
Re: [aroma.affymetrix] getUniqueCdf inflates dimensions of original cdf
Hi, I leave this one to Mark Robinson who is designed createUniqueCdf() for AffymetrixCdfFile and is on top of this. Though, in the meanwhile could you please: 1. Clarify the origin of Mm_PromPR_v02.CDF, because Affymetrix does not provide an CDF. 2. Make the Mm_PromPR_v02.CDF available to us? If you're happy to share it (and got the rights), I'm happy to have aroma-project.org to either link to it or host it. /Henrik On Fri, Mar 12, 2010 at 8:04 PM, stvjc carey...@gmail.com wrote: cdfU AffymetrixCdfFile: Path: annotationData/chipTypes/Mm_PromPR_v02 Filename: Mm_PromPR_v02,unique.CDF Filesize: 126.33MB Chip type: Mm_PromPR_v02,unique RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 3026x3026 Number of cells: 9156676 Number of units: 25373 Cells per unit: 360.88 Number of QC units: 0 cdf AffymetrixCdfFile: Path: annotationData/chipTypes/Mm_PromPR_v02 Filename: Mm_PromPR_v02.cdf Filesize: 126.33MB Chip type: Mm_PromPR_v02 RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 2166x2166 Number of cells: 4691556 Number of units: 25373 Cells per unit: 184.90 Number of QC units: 0 this leads to (i think) csU = convertToUnique(csN, verbose=verbose) 20100312 14:02:59|Converting to unique CDF... 20100312 14:02:59| Getting unique CDF... 20100312 14:02:59| Getting unique CDF...done 20100312 14:02:59| Input tags:MN,lm 20100312 14:02:59| Input Path: probeData/Dawn,MN,lm/Mm_PromPR_v02 20100312 14:02:59| Output Path:probeData/Dawn,MN,lm,UNQ/Mm_PromPR_v02 20100312 14:02:59| allTags:MN,lm,UNQ 20100312 14:02:59| Test whether dataset exists 20100312 14:02:59| Reading cell indices from standard CDF... 20100312 14:03:08| Reading cell indices from standard CDF...done 20100312 14:03:08| Reading cell indices list from unique CDF... 20100312 14:03:17| Reading cell indices list from unique CDF...done 20100312 14:03:17| Converting CEL data from standard to unique CDF for sample 1 ( 10_BL6_IP_Mmp ) of 8... 20100312 14:03:17| Reading intensity values according to standard CDF... Error in readCel(filename, indices = indices, readHeader = FALSE, readOutliers = FALSE, : Argument 'indices' contains an element out of range. 20100312 14:03:23| Reading intensity values according to standard CDF...done 20100312 14:03:23| Converting CEL data from standard to unique CDF for sample 1 ( 10_BL6_IP_Mmp ) of 8...done 20100312 14:03:23|Converting to unique CDF...done sessionInfo() R version 2.11.0 Under development (unstable) (2010-03-02 r51194) x86_64-apple-darwin9.8.0 locale: [1] C attached base packages: [1] stats graphics grDevices datasets tools utils methods [8] base other attached packages: [1] gsmoothr_0.1.4 limma_3.3.4 aroma.affymetrix_1.5.0 [4] aroma.apd_0.1.7 affxparser_1.19.6 R.huge_0.2.0 [7] aroma.core_1.5.0 aroma.light_1.15.1 matrixStats_0.1.9 [10] R.rsp_0.3.6 R.cache_0.2.0 R.filesets_0.8.0 [13] R.utils_1.3.3 R.oo_1.6.7 R.methodsS3_1.1.0 [16] weaver_1.13.0 codetools_0.2-2 digest_0.4.2 -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] Re: can't load CDF file
Hi. On Wed, Mar 10, 2010 at 7:23 PM, dkny169 daniela...@yahoo.com wrote: Hi Henrik, The upload of the cdf file worked now perfectly. Thanks for pointing out the right version of the supplementary file. Unfortunately, the upload of the .CEL files still doesn't work? Any ideas? cs-AffymetrixCelSet$byName(tissues,cdf=cdf) Error in list(`AffymetrixCelSet$byName(tissues, cdf = cdf)` = environment, : [2010-03-10 13:20:35] Exception: Could not locate a file for this chip type: MoGene-1_0-st-v1 This means that it could not locate any CEL files in the data set directory. In other words, make sure your CEL files are located in: rawData/tissues/MoGene-1_0-st-v1/ This is explained on page 'Setup: Location of raw data files': http://www.aroma-project.org/node/68 Hope this helps /Henrik PS. It is called the loading of... or better the setup of..., not upload of at throw(Exception(...)) at throw.default(The specified CDF structure (', getChipType(cdf), ') is no at throw(The specified CDF structure (', getChipType(cdf), ') is not compat at setCdf.AffymetrixCelSet(set, cdf) at setCdf(set, cdf) at byPath.AffymetrixCelSet(static, path = path, cdf = cdf, ...) at byPath(static, path = path, cdf = cdf, ...) at withCallingHandlers(expr, warning = function(w) invokeRestart(muffleWarnin at suppressWarnings({ at method(static, ...) at AffymetrixCelSet$byName(tissues, cdf = cdf) Many thanks, Daniela On Mar 9, 5:12 pm, Henrik Bengtsson henrik.bengts...@gmail.com wrote: Hi. On Tue, Mar 9, 2010 at 11:01 PM, dkny169 daniela...@yahoo.com wrote: Hi Henrik, I got my documentation from here:http://bioinf.wehi.edu.au/folders/firmagene/sup3.R Thanks. That is from the FIRMAGene supplementary materials: http://bioinf.wehi.edu.au/folders/firmagene/ Mark provides a more up-to-date version: [04-Feb-2010] Made a modification to the sup3.R script, now available as sup3_04feb2010.R, to make sure we use the Ensembl annotation that corresponds to the hg18 (Mar 2006) build. Mark, would you mind making it more clear on the above URL that 'sup3.R' is out dated? Reversing the NEWS list so that the most recent events are at the top may help too. Maybe also by renaming the outdated one sup3.R to sup3_06jun2009.R. If you could give me a link to a vignette or manual on how to use FIRMAGene, which is up to date and understandable, I would be more than thankful! More comments below... Many thanks, Daniela On Mar 9, 4:56 pm, Henrik Bengtsson henrik.bengts...@gmail.com wrote: Hi, please let us know what your source of documentation is, e.g. webpages, because you are using method names that are either outdated or non-public. Then I'll answer your questions... /Henrik On Tue, Mar 9, 2010 at 10:44 PM, dkny169 daniela...@yahoo.com wrote: Hi Mark, Thanks for your answer. I think it works now; I had the working directory set at chipTypes and not at the parent directory of annotationData. I am getting following back: cdf-AffymetrixCdfFile$findByChipType(MoEx-1_0-st-v1) cdf [1] annotationData/chipTypes/MoEx-1_0-st-v1/MoEx-1_0-st-v1.cdf Don't use findByChipType(), which only returns a pathname, but byChipType(), which returns an AffymetrixCdfFile object, i.e. cdf - AffymetrixCdfFile$byChipType(MoEx-1_0-st-v1); [You did this in your first email, but then all of a sudden findByChipType(), which is why I wondered where you got that from.] I am trying to upload the CEL files now that are stored in rawData/ tissues/MoEx-1_0-st-v1. The working directory is set at the parent directory of rawData. But again I am getting a failure message. What am I doing wrong now? cs-AffymetrixCelSet$fromName(tissues,cdf=cdf) Error in list(`AffymetrixCelSet$fromName(tissues, cdf = cdf)` = environment, : [2010-03-09 16:28:12] Exception: AffymetrixCelSet$fromName() is defunct. Use AffymetrixCelSet$byName() instead. This error message is clear, ehe? at throw(Exception(...)) at throw.default(msg) at throw(msg) at method(static, ...) at AffymetrixCelSet$fromName(tissues, cdf = cdf) cs-AffymetrixCelSet$byName(tissues,cdf=cdf) Error in list(`AffymetrixCelSet$byName(tissues, cdf = cdf)` = environment, : [2010-03-09 16:38:21] Exception: Argument 'cdf' is neither of nor inherits class AffymetrixCdfFile: character This one is because you used findByChipType() above; use byChipType() and it will work. Hope this helps Henrik at throw(Exception(...)) at throw.default(sprintf(Argument '%s' is neither of nor inherits class %s: % at throw(sprintf(Argument '%s' is neither of nor inherits class %s: %s, .nam at method(static, ...) at Arguments$getInstanceOf(cdf, AffymetrixCdfFile) at method(static, ...) at AffymetrixCelSet$byName(tissues, cdf = cdf) Thanks, Daniela On Mar 9, 3:23 pm, Mark Robinson mrobin...@wehi.edu.au wrote: Hi
Re: [aroma.affymetrix] Re: can't load CDF file
Hi. On Tue, Mar 9, 2010 at 11:01 PM, dkny169 daniela...@yahoo.com wrote: Hi Henrik, I got my documentation from here: http://bioinf.wehi.edu.au/folders/firmagene/sup3.R Thanks. That is from the FIRMAGene supplementary materials: http://bioinf.wehi.edu.au/folders/firmagene/ Mark provides a more up-to-date version: [04-Feb-2010] Made a modification to the sup3.R script, now available as sup3_04feb2010.R, to make sure we use the Ensembl annotation that corresponds to the hg18 (Mar 2006) build. Mark, would you mind making it more clear on the above URL that 'sup3.R' is out dated? Reversing the NEWS list so that the most recent events are at the top may help too. Maybe also by renaming the outdated one sup3.R to sup3_06jun2009.R. If you could give me a link to a vignette or manual on how to use FIRMAGene, which is up to date and understandable, I would be more than thankful! More comments below... Many thanks, Daniela On Mar 9, 4:56 pm, Henrik Bengtsson henrik.bengts...@gmail.com wrote: Hi, please let us know what your source of documentation is, e.g. webpages, because you are using method names that are either outdated or non-public. Then I'll answer your questions... /Henrik On Tue, Mar 9, 2010 at 10:44 PM, dkny169 daniela...@yahoo.com wrote: Hi Mark, Thanks for your answer. I think it works now; I had the working directory set at chipTypes and not at the parent directory of annotationData. I am getting following back: cdf-AffymetrixCdfFile$findByChipType(MoEx-1_0-st-v1) cdf [1] annotationData/chipTypes/MoEx-1_0-st-v1/MoEx-1_0-st-v1.cdf Don't use findByChipType(), which only returns a pathname, but byChipType(), which returns an AffymetrixCdfFile object, i.e. cdf - AffymetrixCdfFile$byChipType(MoEx-1_0-st-v1); [You did this in your first email, but then all of a sudden findByChipType(), which is why I wondered where you got that from.] I am trying to upload the CEL files now that are stored in rawData/ tissues/MoEx-1_0-st-v1. The working directory is set at the parent directory of rawData. But again I am getting a failure message. What am I doing wrong now? cs-AffymetrixCelSet$fromName(tissues,cdf=cdf) Error in list(`AffymetrixCelSet$fromName(tissues, cdf = cdf)` = environment, : [2010-03-09 16:28:12] Exception: AffymetrixCelSet$fromName() is defunct. Use AffymetrixCelSet$byName() instead. This error message is clear, ehe? at throw(Exception(...)) at throw.default(msg) at throw(msg) at method(static, ...) at AffymetrixCelSet$fromName(tissues, cdf = cdf) cs-AffymetrixCelSet$byName(tissues,cdf=cdf) Error in list(`AffymetrixCelSet$byName(tissues, cdf = cdf)` = environment, : [2010-03-09 16:38:21] Exception: Argument 'cdf' is neither of nor inherits class AffymetrixCdfFile: character This one is because you used findByChipType() above; use byChipType() and it will work. Hope this helps Henrik at throw(Exception(...)) at throw.default(sprintf(Argument '%s' is neither of nor inherits class %s: % at throw(sprintf(Argument '%s' is neither of nor inherits class %s: %s, .nam at method(static, ...) at Arguments$getInstanceOf(cdf, AffymetrixCdfFile) at method(static, ...) at AffymetrixCelSet$byName(tissues, cdf = cdf) Thanks, Daniela On Mar 9, 3:23 pm, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Daniela. Is your CDF in the: annotationData/chipTypes/MoEx-1_0-st-v1/ directory? (http://aroma-project.org/node/66) Cheers, Mark Hi, I stored my CDF file in annotationData/chipTypes; nevertheless I cannot upload the file. Can anyone please tel me what I am doing wrong: cdf-AffymetrixCdfFile$byChipType(MoEx-1_0-st-v1.cdf) ror in list(`AffymetrixCdfFile$byChipType(MoEx-1_0-st-v1.cdf)` = environment, : [2010-03-09 14:50:58] Exception: Could not locate a file for this chip type: MoEx-1_0-st-v1.cdf at throw(Exception(...)) at throw.default(Could not locate a file for this chip type: , paste(c(chipT at throw(Could not locate a file for this chip type: , paste(c(chipType, tag at method(static, ...) at AffymetrixCdfFile$byChipType(MoEx-1_0-st-v1.cdf) Many thanks for your help! -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en __ The information in this email is confidential and intended solely for the addressee
Re: [aroma.affymetrix] aroma.affymetrix and R 64bit
On Mon, Mar 1, 2010 at 6:38 PM, zaid z...@genomedx.com wrote: Is there support for aroma.affymetrix under R 64 bit? Definitely on Linux. Are you asking about Windows 64-bit? Then, I think so, cf: http://cran.r-project.org/web/checks/check_results_aroma.affymetrix.html but I don't have machines to test it myself - someone out there with a Win64 system that can test? You should also be aware of the 'Update 2 on MinGW-w64 builds for 64-bit Windows' (March 1, 2010) post on R-devel: http://tolstoy.newcastle.edu.au/R/e9/devel/10/03/0598.html /Henrik -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] Re: aroma.affymetrix and R 64bit
So affxparser is built by the Bioconductor framework. Please post to the BioC mailing list and ask there - they should be able to tell what the Win64 plans are. /Henrik On Mon, Mar 1, 2010 at 9:41 PM, zaid z...@genomedx.com wrote: I'm using March 1st 64bit version on Windows but for some reason affxparser seems to be unavailable. On Mar 1, 9:48 am, Henrik Bengtsson henrik.bengts...@gmail.com wrote: On Mon, Mar 1, 2010 at 6:38 PM, zaid z...@genomedx.com wrote: Is there support for aroma.affymetrix under R 64 bit? Definitely on Linux. Are you asking about Windows 64-bit? Then, I think so, cf: http://cran.r-project.org/web/checks/check_results_aroma.affymetrix.html but I don't have machines to test it myself - someone out there with a Win64 system that can test? You should also be aware of the 'Update 2 on MinGW-w64 builds for 64-bit Windows' (March 1, 2010) post on R-devel: http://tolstoy.newcastle.edu.au/R/e9/devel/10/03/0598.html /Henrik -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group athttp://groups.google.com/group/aroma-affymetrix?hl=en- Hide quoted text - - Show quoted text - -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] Segmentation
Hi. On Tue, Feb 23, 2010 at 3:36 AM, Alfredo Hidalgo ahida...@inmegen.gob.mx wrote: Hi! We are interested in running a GISTIC analysis on the data we obtained after segmentation with GLAD with Aroma, but there seems to be a problem regarding the start and end postions of the segments, which apparently do not match the physical positions of the markers file we are using for GISTIC. I don't know what marker file you are using, but the locations for the markers (units) used in the segmentation methods in aroma.affymetrix is given by the UGP (unit genome position) file you are using. The unit names are given by the CDF. You can find the UGP from the CDF as follows: cesN - ... glad - GladModel(cesN); cdf - getCdf(cesN); ugp - getAromaUgpFile(cdf); df - data.frame(unitName=getUnitNames(cdf)); gp - readDataFrame(ugp); df - cbind(df, gp); Does the segmentation reports the actual physical position of the first and last markers or does it reports other nearby position? I returns whatever the glad() of the GLAD package returns. See help(glad, package=GLAD) for more details. If you wish to troubleshoot more at a low level, you can extract the low-level data like this: cn - extractRawCopyNumbers(glad, array=1, chromosome=1); data - as.data.frame(cn); pos - data$x; M - data$cn; and then use that to call glad(). That might help you. Another question, I have a copy number file obtained from paired analyisis in Partek Genomics Suite, and want to do the segmentation using GLAD or CBS. How can I incorporate my CN file into the Aroma pipeline to do the segmentation? This requires that you first allocate so called binary CN files and import the CN data to them. There is no pipeline to do this automatically for Partek data, so you have to do it manually. This requires a bit of understanding of the annotation data files involved etc. It is explained in Vignette 'Creating binary data files containing copy number estimates': http://www.aroma-project.org/node/88 If you get that far, you can the, using the example in that vignette, load the complete data set as: ds - AromaUnitTotalCnBinarySet$byName(MyDataSet,tagA,tagB, chipType=HG-CGH-244A); glad - GladModel(ds); and continue from there. It is on the todo list to document all this better, but don't expect anything soon. /Henrik Thanks al lot!! -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] Re: QC analysis and HuEx
Ok, before we try to troubleshoot this one, please update to the latest aroma.affymetrix version. The one you are using is nearly three months old, and I prefer to troubleshoot the current code base. When you've done that, it should be enough to run plotRle(); you don't have to rerun everything. BTW, did you remember to call fit() on the probe-level model? /Henrik On Wed, Feb 24, 2010 at 8:15 PM, zaid z...@genomedx.com wrote: traceback() 8: plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs) 7: bxp(bxpStats, ylim = ylim, outline = outline, las = las, ...) 6: plotBoxplotStats.list(stats, main = main, ylab = ylab, ...) 5: plotBoxplotStats(stats, main = main, ylab = ylab, ...) 4: plotBoxplot.ChipEffectSet(ces, type = RLE, ...) 3: plotBoxplot(ces, type = RLE, ...) 2: plotRle.QualityAssessmentModel(qamTr) 1: plotRle(qamTr) qamTr QualityAssessmentModel: Name: tissues Tags: RBC,QN,RMA,merged,QC Path: qcData/tissues,RBC,QN,RMA,merged,QC/HuEx-1_0-st-v2 Chip-effect set: ExonChipEffectSet: Name: tissues Tags: RBC,QN,RMA,merged Path: plmData/tissues,RBC,QN,RMA,merged/HuEx-1_0-st-v2 Platform: Affymetrix Chip type: HuEx-1_0-st-v2,monocell Number of arrays: 2 Names: S370-A-HuEx-1_0-st-v2-01-1 (S09-13138), S371-A-HuEx-1_0-st- v2-01-1 (S09-07848) Time period: 2010-02-24 10:28:31 -- 2010-02-24 10:28:31 Total file size: 5.43MB RAM: 0.01MB Parameters: (probeModel: chr pm, mergeGroups: logi TRUE) RAM: 0.00MB sessionInfo() R version 2.10.0 (2009-10-26) i386-pc-mingw32 locale: [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 [4] LC_NUMERIC=C LC_TIME=English_Canada.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Biobase_2.6.0 aroma.affymetrix_1.3.0 aroma.apd_0.1.7 affxparser_1.18.0 [5] R.huge_0.2.0 aroma.core_1.3.1 aroma.light_1.15.1 matrixStats_0.1.8 [9] R.rsp_0.3.6 R.filesets_0.6.5 digest_0.4.1 R.cache_0.2.0 [13] R.utils_1.2.4 R.oo_1.6.5 affy_1.24.2 R.methodsS3_1.0.3 loaded via a namespace (and not attached): [1] affyio_1.14.0 preprocessCore_1.8.0 How can i get more details on the error. I tried to use less CEL files and as few as 3, still no luck. Thanks in advance On Feb 24, 10:46 am, Henrik Bengtsson henrik.bengts...@gmail.com wrote: Hi, there are probably more output from the error, or ? If so, could you please provide us with that one? Also, whenever you get an error, is it is always helpful to report output of traceback() [see email footer]. What's your sessionInfo()? /Henrik On Wed, Feb 24, 2010 at 7:29 PM, zaid z...@genomedx.com wrote: Error: Error in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars $yaxs) : NAs not allowed in 'ylim' On Feb 24, 10:19 am, zaid z...@genomedx.com wrote: I was doing QC analysis in aroma in R on HuEx chip but got an error while trying to plot NUSE. ylim contains NA. I'm running R 2.10(32bit) on a windows 7(64bit). my command: library(aroma.affymetrix) verbose - Arguments$getVerbose(-8, timestamp=TRUE) chipType - HuEx-1_0-st-v2 cdf - AffymetrixCdfFile$byChipType(chipType) print(cdf) cs - AffymetrixCelSet$byName(tissues, cdf=cdf) bc - RmaBackgroundCorrection(cs) csBC - process(bc,verbose=verbose) qn - QuantileNormalization(csBC, typesToUpdate=pm) csN - process(qn, verbose=verbose) plmTr - ExonRmaPlm(csN, mergeGroups=TRUE) fit(plmTr, verbose=verbose) qamTr - QualityAssessmentModel(plmTr) plotNuse(qamTr) plotRle(qamTr) End of command I was able to run the previous on U95A data and Plus 2 data. Also, in the past I was able to run that on HuEx data. The cdf file I'm using is binary and used multiples ones (HuEx-1_0- st- v2,core,A20071112,EP.cdf, HuEx-1_0-st- v2,control,A20071112,EP.cdf, HuEx-1_0-st- v2,extended,A20071112,EP.cdf etc offered on Elizabeth's Columnhttp://groups.google.com/group/aroma-affymetrix/web/affymetrix-define... ) Could you please point me how to fix this problem? Thanks in advance -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group athttp://groups.google.com/group/aroma-affymetrix?hl=en- Hide quoted text - - Show quoted text - -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output
Re: [aroma.affymetrix] error in doing GenomeGraphs
On Tue, Feb 16, 2010 at 6:50 PM, camelbbs camel...@gmail.com wrote: Hi, Can anyone help me for this error? u-indexOf(cdf,6811818) u integer(0) This tells you that there is no unit with name 6811818 in the MoEx-1_0-st-v1,coreR1,A20080718,MR CDF file. You are simply asking for information on a non-existing unit. You can get the unit names available in a CDF by: unitNames - getUnitNames(cdf); ugcM - getUnitGroupCellMap(getCdf(ds), units=u, retNames=TRUE) Error in if (any(units 1)) stop(Argument 'units' contains non- positive indices.) : missing value where TRUE/FALSE needed This is an error, because you request to get the (unit,group,cell) map of zero (an empty set of) units. The error message is not clear on this, because it is really an unexpected use case. Hope this helps Henrik cdf AffymetrixCdfFile: Path: annotationData/chipTypes/MoEx-1_0-st-v1 Filename: MoEx-1_0-st-v1,coreR1,A20080718,MR.cdf Filesize: 30.53MB Chip type: MoEx-1_0-st-v1,coreR1,A20080718,MR RAM: 0.62MB File format: v4 (binary; XDA) Dimension: 2560x2560 Number of cells: 6553600 Number of units: 17831 Cells per unit: 367.54 Number of QC units: 1 -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
[aroma.affymetrix] GenomeWideSNP_6: Updated UFL and UGP annotation data files
For your information: The unit fragment length (UFL) and the unit genome position (UGP) annotation files for the Affymetrix GenomeWideSNP_6 chip type has been updated, and available at: http://aroma-project.org/chipTypes/GenomeWideSNP_6 The source was the two Affymetrix NetAffx CSV files GenomeWideSNP_6.cn.na30.annot.csv (777MB) and GenomeWideSNP_6.na30.annot.csv (1.32GB). The updates are only minor from previous versions. More details below. /Henrik HISTORY: # UGP: # o na27.1 - na30 # No differences # o na27 - na27.1 # No differences # o na26 - na27 # Two units (932039, 1872834) where moved from ChrX to ChrY. # Same location. # o na24 - na26 # Only minor modifications for non-missing values: # - three loci changed chromosomes # - an additional 23 loci changed positions, of which only 17 moved # more than 2 base pairs. # UFL: # o na27.1 - na30 # - No changes. # o na27 - na27.1 # - No changes. # o na26 - na27 # - No changes. # o na24 - na26 # - All changes are for SNP units. # - There are 6 NspI and 14 StyI changes in SNP fragment lengths, # which some are only minor. # - There are 1108 more SNPs that now have missing values. -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] Access to source files
Hi. On Tue, Feb 9, 2010 at 7:40 PM, Randy Gobbel randy.gob...@gmail.com wrote: I've just noticed in the past few days that with the new Web site, it's not at all obvious how to download source files for the various packages that go into aroma.affymetrix. I've managed it by pawing through old messages looking for links, but it would be nice to have a direct link in an obvious place. Sometimes there's no substitute for reading the sources, or occasionally building a custom version, if you're running a bleeding-edge version of R. True. I've added a how to page 'Access the source code': http://www.aroma-project.org/howtos/AccessSourceCode You were probably thinking of the old approach that was based on hbGet() - it is no longer supported/maintained, because basically everything is now on CRAN. Hope this helps Henrik PS. It might be confusing that there is still old documentation over at the Google Group. The plan is first to make sure everything is moved, then we will try to replace the Google Group front page with a link to http://www.aroma-project.org/, and eventually delete all old Google Group pages. What prevents us from doing this is that we are still not sure if the Google Group will be blocked yet again if we touch it. When it is blocked, the mailing list is also blocked and we will be in the mercy of Google to unlock it. -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] SNP Call rates
Hi Rama. On Tue, Feb 9, 2010 at 10:15 AM, Rama Gullapalli dr.ramachan...@gmail.com wrote: Hi All, First time poster, long-time lurker. Really appreciate all the wonderful stuff you guys are doing (Henrik et al). Would love to be able to help in anyway deemed necessary (Including documentation work...). I am going through the process of understanding the capabilities of the software right now. Thanks, we appreciate your feedback. We are grateful to any help on documentation, proof reading, script validation etc. I had a question. CNAG 3.0 and GTC3.0.1 software have something called a SNP call rate estimator. I was wondering if there was a similar function in Aroma? What would be the function which would be able to perform a similar analysis? Sorry, there is no SNP call rate feature available in aroma.affymetrix. Depends how you define it, it may be more or less easy to calculate by hand given probe signals etc. I think some tools provide rough estimates from the raw probe signals, some from after a long-running genotyping, and so on. Cheers, Henrik Thanks for your time. Regards Rama -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: Runtime error with (Was: Re: [aroma.affymetrix] Re: Question for custom CDF of ST-Array)
On Thu, Jan 28, 2010 at 4:58 PM, branko b.miso...@lumc.nl wrote: Hi all, [snip] 4) Last one , regarding QC issue with plotting … SO when doing Array (pseudo) image plots my RGui in windows complains e.g.: If I do: cf - getFile (cs, 1) plotImage(cf, transform=list(log2), palette=rainbow (256)) #Loading required package: EBImage #Loading required package: abind …. I get “Runtime error!” message from Visual C++ and I have to click 2 times “ok” and then I get the picture… Here is the link to the error msg:: http://www.4shared.com/file/209798878/f5a3f82e/Aromaplotimageerror.html SO you can imagine I’m not enthusiastic of clicking twice for 300 arrays and then again for several type of plots. Any idea where is the issue ? (I guess something with EBImage dependency make issue ) If you google EBImage together with the error message [http://goo.gl/J9KQ] you'll get to the EBImage installation PDF which in Section 3. Windows explains what the reason is and how to solve it. /Henrik Below is my session info . Hope you can help . Best regards, Branko -- Branislav Misovic, Department of Toxicogenetics Leiden University Medical Center PO.box 9600, Building2,Room:T3-11 2300 RC Leiden The Netherlands Phone: +31 71 526 9636 Mob: 0653135855 E-mail: b.miso...@lumc.nl sessionInfo() R version 2.10.0 (2009-10-26) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] abind_1.1-0 aroma.affymetrix_1.3.4 aroma.apd_0.1.7 [4] affxparser_1.18.0 R.huge_0.2.0 aroma.core_1.3.4 [7] aroma.light_1.15.1 matrixStats_0.1.8 R.rsp_0.3.6 [10] R.filesets_0.6.5 digest_0.4.1 R.cache_0.2.0 [13] R.utils_1.2.4 R.oo_1.6.6 EBImage_3.2.0 [16] R.methodsS3_1.0.3 loaded via a namespace (and not attached): [1] tools_2.10.0 -- Branislav Misovic, Department of Toxicogenetics Leiden University Medical Center PO.box 9600, Building2,Room:T3-11 2300 RC Leiden The Netherlands Phone: +31 71 526 9636 Mob: 0653135855 E-mail: b.miso...@lumc.nl -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] extracting data from plmEx
Hi. On Tue, Jan 26, 2010 at 7:00 PM, parantu shah parantu.s...@gmail.com wrote: Hi I want to extract as ExpressionSet - the set of normalized array (200+) aftter fitting the plmEx - ExonRmaPlm(csN, mergeGroups=FALSE) fit(plmEx, verbose=verbose) [ not the chip effect or anything else] I don't understand this comment. You say you want to extract the normalized probe signals, not the probe-level summarized signals? If so, I believe you have missunderstood what fit() on a ExonRmaPlm does, because the ExonRmaPlm class is for summarizing probe-level data into chip effects. Thus, if you want to work with the probe signals you do not need to do this step, but instead work with the 'csN' data set. I don't know what 'csN' is here, but one guess is that it is the CEL data set output from a QuantileNormalization step. If so, this is a set of CEL files just as your original CEL files. Then you can use standard Bioconductor methods to read the CEL data, e.g. pathnames - getPathnames(csN); ab - ReadAffy(filenames=pathnames); print(ab); Note that this gives you an AffyBatch object, but not an ExpressionSet, though both extend the eSet class. More importantly, ReadAffy() really reads all probe signals into memory, so you have to deal with all the regular memory issues that you have with Bioconductor - you are no longer riding on the aroma.affymetrix framework. to use normalized data in another Bioconductor module. ExtractDataFrame and ExtractMatrix doesn't work. Exactly what did you try, and in what way did they not work? /Henrik Any ideas will be well come. Thanks Parantu -- Parantu Shah, PhD Dept. of Biostatistics Computational Biology Dana-Farber Cancer Institute Harvard School of Public Health CLS-11075, 3 Blackfan Circle Boston MA 02115 Phone : 617 582 8852 http://www.hsph.harvard.edu/~pshah -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] ArrayExplorer problem
Hi, I managed to reproduce this when using the same names as you. It turns out to be a bug causing indexOf(ds, foo+bar) of a data set 'ds' to return NA when the requested name contains a '+' symbol. The reason was that the '+' was parsed as a regular expression symbol. I've fixed R.filesets, where the bug is located. Until I release a new version of R.filesets and submit it to CRAN, you can use the following patch to R.filesets: # INSTALL library(aroma.affymetrix); downloadPackagePatch(R.filesets); When you've done this once, the patch will be available in all future R sessions. It will uninstall itself when you later install the new R.filesets version. Let me know if this works. /Henrik On Tue, Jan 26, 2010 at 1:59 AM, Randy Gobbel randy.gob...@gmail.com wrote: I'm running into the following error when trying to use ArrayExplorer. I'm running on an 8 CPU (Xeon) Mac Pro, OS 10.5.8, EBImage and supporting software just installed from scratch: rs - calculateResidualSet(plm) ae - ArrayExplorer(rs) setColorMaps(ae, c('log2,log2neg,rainbow','log2,log2pos,rainbow')) process(ae, interleaved='auto') Error in readCelHeader(pathname) : Cannot read CEL file header. File not found: NA/NA In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf traceback() 34: stop(Cannot read CEL file header. File not found: , filename) 33: readCelHeader(pathname) 32: getHeader.AffymetrixCelFile(this) 31: getHeader(this) 30: getCdf.AffymetrixCelFile(this$files[[1]], ...) 29: getCdf(this$files[[1]], ...) 28: getCdf.AffymetrixCelSet(this) 27: getCdf(this) 26: clearCache.AffymetrixCelSet(res) 25: NextMethod(generic = clearCache, object = this, ...) 24: clearCache.ResidualSet(res) 23: clearCache(res) 22: extract.GenericDataFileSet(X[[1L]], ...) 21: FUN(X[[1L]], ...) 20: lapply.default(dsList, FUN = extract, files, ...) 19: lapply(dsList, FUN = extract, files, ...) 18: extract.GenericDataFileSetList(this, ..., onDuplicated = error) 17: extract(this, ..., onDuplicated = error) 16: getFileList.GenericDataFileSetList(this, ii, ...) 15: getFileList(this, ii, ...) 14: getFullNames.AromaMicroarrayDataSetTuple(setTuple) 13: getFullNames(setTuple) 12: eval(expr, envir, enclos) 11: eval(rExpr, envir = envir) 10: sourceRsp.default(file = pathname, response = response, ...) 9: sourceRsp(file = pathname, response = response, ...) 8: rspToHtml.default(pathname, path = NULL, outFile = outFile, outPath = outPath, overwrite = TRUE, envir = env) 7: rspToHtml(pathname, path = NULL, outFile = outFile, outPath = outPath, overwrite = TRUE, envir = env) 6: updateOnChipTypeJS.ArrayExplorer(this, ...) 5: updateOnChipTypeJS(this, ...) 4: setup.ArrayExplorer(this, ..., verbose = less(verbose)) 3: setup(this, ..., verbose = less(verbose)) 2: process.ArrayExplorer(ae, interleaved = auto) 1: process(ae, interleaved = auto) rs ResidualSet: Name: all Tags: RBC,QN,RMA Path: plmData/all,RBC,QN,RMA/Hs133P_Hs_REFSEQ Platform: Affymetrix Chip type: Hs133P_Hs_REFSEQ Number of arrays: 9 Names: EA08034_98020_H133+_MCCW199, EA08034_98021_H133+_SKINW199, ..., EA08034_98031_H133+_PN-1NN2 Time period: 2010-01-20 15:59:10 -- 2010-01-20 15:59:28 Total file size: 116.30MB RAM: 0.01MB Parameters: (probeModel: chr pm) cdf AffymetrixCdfFile: Path: annotationData/chipTypes/Hs133P_Hs_REFSEQ Filename: Hs133P_Hs_REFSEQ.cdf Filesize: 15.21MB Chip type: Hs133P_Hs_REFSEQ RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 1164x1164 Number of cells: 1354896 Number of units: 25102 Cells per unit: 53.98 Number of QC units: 9 Any suggestions? Everything else seems to be working fine at this point. -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: How to find help on aroma.* packages? (Was: Re: [aroma.affymetrix] Re: Question for custom CDF of ST-Array)
Hi, On Thu, Jan 28, 2010 at 4:58 PM, branko b.miso...@lumc.nl wrote: Hi all, [snip] 3) Few questions regarding Quality checks and basic data manipulations in Aroma: [snip] I ask this silly questions because Using R commands like str() doesn’t show me the object fields etc. so I can’t use standard R matrix commands, str() does not work on the aroma.* objects, but if you do ll() [two L:s] you will see some of the contents of these objects. However, the idea is that you use the methods API to access the objects, not the fields (slots). also help (“some Aroma command” ) doesn’t show enough information. Sometimes it gives empty help page. The R help pages for the aroma.* packages are sparse. The reason for this is simply that it is a daunting task to setup them up and keep them up to date. Not enough of time/resources/people for this. The ones you do find, they are up to date. The others I point to a generic help page saying it is not documented. Instead, I ask everyone to use the online documentation at: http://www.aroma-project.org/ That is *the* place to find documentation about aroma.* packages. You can trust what you find there. There are of course more features in the aroma.* packages. Some advanced users dive into the code, to find out these, and even add their own extension. However, our strategy is that new features will be documented online first when we consider them to be stable. This is the only way we can keep up with the maintenance. FYI, the aroma.* and R.* packages now consists of 6+ lines of code. I could not find pdf manual in Aroma installed libraries nor in the Google group. I see only html file showing me all the functions classes. Is there easier way to look for functions than main html pages ? If you mean a PDF vignette when you say pdf, that does not exists. We don't use (Sweave) vignettes. Instead we prefer to document things online, for the above reasons. Code of functions are not visible by just typing func.name() , I guess I can always get source code and search but there is likely easy way to do it. This is not specific to aroma.* packages. When you type print(methodName) you will see the S3 generic function. In order to see the function for your particular object, you need to use: methods(methodName) Please see R help on the S3 class system, for why/how this works. It is visible that Aroma uses different classes than BioConductor. I assume there is a good reason for that, but maybe you can give some link with explanation? That is too daunting task to document, but the short answer is that the Bioconductor classes do not support large data sets. This is why we developed aroma.affymetrix in the first place. Advanced develoeprs can also look at the R.filesets to see the core of how we deal with large data sets. [snip] /Henrik Hope you can help . Best regards, Branko -- Branislav Misovic, Department of Toxicogenetics Leiden University Medical Center PO.box 9600, Building2,Room:T3-11 2300 RC Leiden The Netherlands Phone: +31 71 526 9636 Mob: 0653135855 E-mail: b.miso...@lumc.nl sessionInfo() R version 2.10.0 (2009-10-26) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] abind_1.1-0 aroma.affymetrix_1.3.4 aroma.apd_0.1.7 [4] affxparser_1.18.0 R.huge_0.2.0 aroma.core_1.3.4 [7] aroma.light_1.15.1 matrixStats_0.1.8 R.rsp_0.3.6 [10] R.filesets_0.6.5 digest_0.4.1 R.cache_0.2.0 [13] R.utils_1.2.4 R.oo_1.6.6 EBImage_3.2.0 [16] R.methodsS3_1.0.3 loaded via a namespace (and not attached): [1] tools_2.10.0 -- Branislav Misovic, Department of Toxicogenetics Leiden University Medical Center PO.box 9600, Building2,Room:T3-11 2300 RC Leiden The Netherlands Phone: +31 71 526 9636 Mob: 0653135855 E-mail: b.miso...@lumc.nl -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example.
Re: [aroma.affymetrix] Re: Directory structure: FIRMAGene
2010/1/27 Mikhail mikhail.dozmo...@gmail.com: Henrik, thank you for such a thorough answer. Now I understand how to create two datasets, and it did work. I'm trying to use these datasets for FIRMAGene analysis, as described in http://bioinf.wehi.edu.au/folders/firmagene/sup3.R file. Nowhere in this file I can see how and where the two datasets are defined and compared. It starts with ONE dataset reading in cs, # this assumes the CEL files are at ./rawData/tissues/HuGene-1_0-st- v1/ cs-AffymetrixCelSet$fromName(tissues,cdf=cdf,verbose=verbose) and continues through FIRMAGene. Yes, I can create two datasets, as you described. Shall they be joined together to process for normalization and then to FIRMAGene? I tried csM - AffymetrixCelSet$byName(MS, tags=M, cdf=cdf) csS - AffymetrixCelSet$byName(MS, tags=S, cdf=cdf) cs-cbind(csM,csS) Now I'm starting to understanding your question better. You want to keep the data sets in different directories (as solved), but join them together into one for the analysis. In order to do this, you can append one set to another. The safest/best way to do this would be to do: cs - append(csM, csS); setFullName(cs, MS,M+S); This will setup a set of your 3+3 CEL files with fullname MS,M+S. This name will be used in all downstream analysis/output data sets. If you don't use setFullName(), the fullname will be that of the first data set ('csM'). /Henrik Is it correct? I doubt, because the following code in the example file mentioned above doesn't work. I wonder if FIRMAGene can recognize the tags from two datasets, for proper comparison. There are several example files at http://bioinf.wehi.edu.au/folders/firmagene/, none of them, however, explains where datasets for comparison are defined. Therefore I wonder which directory structure shall I create and how to properly read the data for FIRMAGene processing. Originally, I have 3 .cel files for one condition (M), and 3 .cel files for another (S). Thank you! Mikhail. On Jan 26, 6:58 pm, Henrik Bengtsson henrik.bengts...@gmail.com wrote: Hi On Tue, Jan 26, 2010 at 3:32 PM, Mikhail mikhail.dozmo...@gmail.com wrote: Hi, Onhttp://www.aroma-project.org/node/79there's nice summary of what the directory structure should look like. To my understanding the directory structure also reflects the project structure. The annotationData/chipTypes/ directory structure shouldn't. Image that this as a global structure shared with everyone. For example, I want to identify differentially expressed exons from Affymetrix Human Gene 1.0 ST, using FIRMAgenehttp://bioinf.wehi.edu.au/folders/firmagene/. I have two groups, and need to compare them. So I set the structure like this: For annotation: annotationData\chipTypes\HuGene-1_0-st-v1 annotationData\chipTypes\HuGene-1_0-st-v1\HuGene-1_0-st-v1_M annotationData\chipTypes\HuGene-1_0-st-v1\HuGene-1_0-st-v1_S Not sure what you mean by two groups in this context and what 'M' and 'S' refers to. Are those two latter subdirectories? Note that the definition of a 'chip type' differ from the definition of an annotation data file (e.g. CDF file). The chip type never changes after the array is designed and produced by Affymetrix. The annotation data files will change as the human genome annotation and other things gets updated. Thus, if you buy HuGene-1_0-st-v1 arrays from Affymetrix, you want any annotation data files to be stored under annotationData/chipTypes/HuGene-1_0-st-v1/. Similarly, for you raw data files. Let's see if my below comments clarifies it to you. For rawData: rawData\MS rawData\MS\HuGene-1_0-st-v1_M rawData\MS\HuGene-1_0-st-v1_S I believe you want to do: rawData/MS,M/HuGene-1_0-st-v1/ rawData/MS,S/HuGene-1_0-st-v1/ This way you will have two data sets for the same chip type with fullnames MS,M and MS,S. By definition of names, tags fullnames, both data sets have the name MS differing by the tags M and S. You can also use fullnames MS_M and MS_S, which then gives data sets with different names (same) and no tags. The following code runs OK chipType - HuGene-1_0-st-v1 cdf - AffymetrixCdfFile$byChipType(chipType, tags=r3) So, I'm not sure where you placed the CDF, but yes, the CDF will be found if it is located in (or in a subdirectory of) annotationData/chipTypes/HuGene-1_0-st-v1/. But this cs - AffymetrixCelSet$byName(MS, cdf=cdf) gives an error:No such directory: MS/HuGene-1_0-st-v1 I don't want to put all my files in the same HuGene-1_0-st-v1 folder, they are from different groups and suppose to be compared. Using the rawData/ structure I suggest above, you can do: csM - AffymetrixCelSet$byName(MS, tags=M, cdf=cdf) csS - AffymetrixCelSet$byName(MS, tags=S, cdf=cdf) or equivalently csM - AffymetrixCelSet$byName(MS,M, cdf=cdf) csS - AffymetrixCelSet$byName(MS,S, cdf=cdf) If you then process: cs - csM all output data sets will have