Hi. On Tue, Feb 3, 2009 at 10:39 PM, David Rosenberg <david.m.rosenb...@gmail.com> wrote: > > Thank you greatly for your help. A couple of additional questions:
[snip] > 2.) I don't fully follow the acc model described in the CRMA paper > (2008). The section in question references a 2001 Wirapati P, Speed > T. paper that is listed as 'draft.' Can you recommend another > reference / example of how the acc transformation is performed and the > crosstalk matrix/offset vector are calculated? There is also: Wirapati P, Speed TP (2002) An algorithm to fit a simplex to a set of multidimensional points. WEHI Bioinformatics technical notes. http://www.isrec.isb-sib.ch/~pwirapat/sfit/wirapati2002wehi-bioinf.pdf It has been on my and Asa's (P. Wirapati) todo list to write up and submit a manuscript on this and more for a while. Last time we had a very serious chat and start on this was in May 2008, but more high-priority requests keep postponing this. I think it is very unfortunate, because Asa's work on this over the years (he should get most of the credit) is quite amazing; he's careful about theory, estimators, algorithm, implementation, speed and everything you can imagine. This project has achieved a bit higher priority from related work on resequencing arrays. We are also testing out more generic weighted estimators taking prior estimates. ...one day. /Henrik > > Thanks again. > > On Feb 3, 2009, at 3:23 PM, Henrik Bengtsson wrote: > >> >> Ok, sorry I didn't answer you question explicitly. No, they do not >> have to contain ChrX and ChrY data. However, but independent of your >> question, I would recommend that you import all annotation data >> available, since that might become of interest in future usage. >> >> Currently part of the code is hardwired to assume the human genome. >> Thus, if you pass "-XY", it interprets "X" to correspond to chromosome >> 23 and "Y" to be chromosome 24. So, '-XY' tries to exclude units on >> chromosome 23 and 24. The plan is to support genome specific >> annotation data where you can specify what chromosome index "X" and >> "Y" maps to. The directory annotationData/genomes/ is reserved for >> this, and the ChromosomeExplorer is somewhat sensitive to this. But >> that is all future plans. >> >> /Henrik >> >> On Tue, Feb 3, 2009 at 1:03 PM, David Rosenberg >> <david.m.rosenb...@gmail.com> wrote: >>> >>> Do they ufl/ugp/acc files need to be build with X and Y chromosomes >>> included? If so, this won't be too hard to fix. >>> >>> >>> On Jan 30, 9:50 pm, David Rosenberg <david.m.rosenb...@gmail.com> >>> wrote: >>>> As I think about this further, it occurs to me that there are other >>>> potential problems with this chip/cdf. I was looking at the ugp/ufl >>>> files and it appears that fragment length normalization etc. is >>>> performed on a unit-by-unit basis. The way the array/cdf is >>>> currently >>>> structured, not all probes in a particular unit are precisely co- >>>> located. While the location differences within a unit are quite >>>> small >>>> (100 bp or so), this does result in units where probes hybridize to >>>> multiple digestion fragments. This definitely 'breaks' fragment >>>> length normalization as I see it currently implemented. Now, the >>>> cdf >>>> can be restructured such that all units map to a single genomic >>>> location, but that seems to preclude merging/summarizing further >>>> down >>>> the analysis workflow. If there were a way to perform these >>>> normalization procedures on a per-probe basis rather than a per-unit >>>> basis, this would be preferable. Let me know your thoughts. >>>> >>>> On Jan 30, 2009, at 6:41 PM, Henrik Bengtsson wrote: >>>> >>>> >>>> >>>>> Hi, >>>> >>>>> could you please forward your UGP file to me; I think I know what >>>>> the >>>>> problem is, but I guess it easier for me to check it myself first. >>>> >>>>> BTW, although this looks like a custom CDF - if you want to, I >>>>> can put >>>>> up a group page specific to this chip type, documenting the chip >>>>> type >>>>> (and either link or host those annotation files). Might be useful >>>>> for >>>>> a future fellow researcher. It's your call. >>>> >>>>> /Henrik >>>> >>>>> On Fri, Jan 30, 2009 at 12:08 PM, David Rosenberg >>>>> <david.m.rosenb...@gmail.com> wrote: >>>> >>>>>> I am receiving the following errors when attempting to perform >>>>>> allelic >>>>>> crosstalk calibration on a dataset using a custom CDF. I don't >>>>>> know >>>>>> if this is indicative of errors in the internal structure of the >>>>>> CDF, >>>>>> or if there are parameters that I must pass to >>>>>> AllelicCrosstalkCalibration due to the properties of the CDF >>>>>> (i.e. # >>>>>> of chromosomes, etc.) >>>> >>>>>>> library("aroma.affymetrix") >>>>>> Loading required package: R.utils >>>>>> Loading required package: R.oo >>>>>> Loading required package: R.methodsS3 >>>>>> R.methodsS3 v1.0.3 (2008-07-02) successfully loaded. See ? >>>>>> R.methodsS3 >>>>>> for help. >>>>>> R.oo v1.4.6 (2008-08-11) successfully loaded. See ?R.oo for help. >>>>>> R.utils v1.1.3 (2009-01-12) successfully loaded. See ?R.utils for >>>>>> help. >>>>>> Loading required package: aroma.core >>>>>> Loading required package: R.cache >>>>>> R.cache v0.1.7 (2008-02-27) successfully loaded. See ?R.cache for >>>>>> help. >>>>>> Loading required package: R.rsp >>>>>> R.rsp v0.3.4 (2008-03-06) successfully loaded. See ?R.rsp for >>>>>> help. >>>>>> Type browseRsp() to open the RSP main menu in your browser. >>>>>> Loading required package: matrixStats >>>>>> Loading required package: digest >>>>>> Loading required package: aroma.light >>>>>> aroma.light v1.11.1 (2009-01-12) successfully loaded. See ? >>>>>> aroma.light >>>>>> for help. >>>>>> aroma.core v1.0.0 (2009-01-12) successfully loaded. See ? >>>>>> aroma.core >>>>>> for help. >>>>>> Loading required package: affxparser >>>>>> Loading required package: R.huge >>>>>> R.huge v0.1.6 (2008-07-03) successfully loaded. See ?R.huge for >>>>>> help. >>>>>> Loading required package: aroma.apd >>>>>> aroma.apd v0.1.3 (2006-06-14) successfully loaded. See ? >>>>>> aroma.apd for >>>>>> help. >>>>>> aroma.affymetrix v1.0.0 (2009-01-12) successfully loaded. See ? >>>>>> aroma.affymetrix for help. >>>> >>>>>>> log <- verbose <- Arguments$getVerbose(-8, timestamp=TRUE) >>>> >>>>>>> # Don't display too many decimals. >>>>>>> options(digits=4) >>>>>>> chipType="MOUSEDIVm520650" >>>>>>> cdf<-AffymetrixCdfFile$byChipType("MOUSEDIVm520650") >>>>>>> print(cdf) >>>>>> AffymetrixCdfFile: >>>>>> Path: annotationData/chipTypes/MOUSEDIVm520650 >>>>>> Filename: MOUSEDIVm520650.CDF >>>>>> Filesize: 463.91MB >>>>>> Chip type: MOUSEDIVm520650 >>>>>> RAM: 0.00MB >>>>>> File format: v4 (binary; XDA) >>>>>> Dimension: 2572x2680 >>>>>> Number of cells: 6892960 >>>>>> Number of units: 973990 >>>>>> Cells per unit: 7.08 >>>>>> Number of QC units: 4 >>>>>>> gi<-getGenomeInformation(cdf) >>>>>>> print(gi) >>>>>> UgpGenomeInformation: >>>>>> Name: MOUSEDIVm520650 >>>>>> Tags: DMR20090129 >>>>>> Pathname: annotationData/chipTypes/MOUSEDIVm520650/ >>>>>> MOUSEDIVm520650,DMR20090129.ugp >>>>>> File size: 4.64MB >>>>>> RAM: 0.00MB >>>>>> Chip type: MOUSEDIVm520650 >>>>>>> si<-getSnpInformation(cdf) >>>>>>> print(si) >>>>>> UflSnpInformation: >>>>>> Name: MOUSEDIVm520650 >>>>>> Tags: DMR20090129 >>>>>> Pathname: annotationData/chipTypes/MOUSEDIVm520650/ >>>>>> MOUSEDIVm520650,DMR20090129.ufl >>>>>> File size: 3.72MB >>>>>> RAM: 0.00MB >>>>>> Chip type: MOUSEDIVm520650 >>>>>> Number of enzymes: 2 >>>>>>> acs<-AromaCellSequenceFile$byChipType(getChipType(cdf, >>>>>>> fullname=FALSE)) >>>>>>> print(acs) >>>>>> AromaCellSequenceFile: >>>>>> Name: MOUSEDIVm520650 >>>>>> Tags: DMR20090129 >>>>>> Pathname: annotationData/chipTypes/MOUSEDIVm520650/ >>>>>> MOUSEDIVm520650,DMR20090129.acs >>>>>> File size: 170.91MB >>>>>> RAM: 0.00MB >>>>>> Number of data rows: 6892960 >>>>>> File format: v1 >>>>>> Dimensions: 6892960x26 >>>>>> Column classes: raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, >>>>>> raw, >>>>>> raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, >>>>>> raw, >>>>>> raw >>>>>> Number of bytes per column: 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, >>>>>> 1, 1, >>>>>> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 >>>>>> Footer: <createdOn>20090129 12:45:11 CST</ >>>>>> createdOn><platform>Affymetrix</ >>>>>> platform><chipType>MOUSEDIVm520650</ >>>>>> chipType> >>>>>> Chip type: MOUSEDIVm520650 >>>>>> Platform: Affymetrix >>>>>>> csR<-AffymetrixCelSet$byName("mDIV/testset", cdf=cdf) >>>>>>> print(csR) >>>>>> AffymetrixCelSet: >>>>>> Name: testset >>>>>> Tags: >>>>>> Path: rawData/mDIV/testset/MOUSEDIVm520650 >>>>>> Platform: Affymetrix >>>>>> Chip type: MOUSEDIVm520650 >>>>>> Number of arrays: 44 >>>>>> Names: SNP_mDIV_A10-10_081308, SNP_mDIV_A10-201_091708, ..., >>>>>> SNP_mDIV_A9-9_081308 >>>>>> Time period: 2008-08-13 15:39:47 -- 2008-09-18 00:20:33 >>>>>> Total file size: 2899.65MB >>>>>> RAM: 0.06MB >>>>>> There were 50 or more warnings (use warnings() to see the first >>>>>> 50) >>>>>>> acc<-AllelicCrosstalkCalibration(csR, model="CRMAv2") >>>>>>> print(acc) >>>>>> AllelicCrosstalkCalibration: >>>>>> Data set: testset >>>>>> Input tags: >>>>>> User tags: * >>>>>> Asterisk ('*') tags: ACC,ra,-XY >>>>>> Output tags: ACC,ra,-XY >>>>>> Number of files: 44 (2899.65MB) >>>>>> Platform: Affymetrix >>>>>> Chip type: MOUSEDIVm520650 >>>>>> Algorithm parameters: (rescaleBy: chr "all", targetAvg: num 2200, >>>>>> subsetToAvg: chr "-XY", mergeShifts: logi TRUE, B: int 1, >>>>>> flavor: chr >>>>>> "sfit", algorithmParameters:List of 3, ..$ alpha: num [1:8] 0.1 >>>>>> 0.075 >>>>>> 0.05 0.03 0.01 0.0025 0.001 0.0001, ..$ q: num 2, ..$ Q: num 98) >>>>>> Output path: probeData/testset,ACC,ra,-XY/MOUSEDIVm520650 >>>>>> Is done: FALSE >>>>>> RAM: 0.01MB >>>>>>> csC<-process(acc, verbose=verbose) >>>>>> 20090130 14:03:43|Calibrating data set for allelic cross talk... >>>>>> Error in if (any(units < 1)) stop("Argument 'units' contains non- >>>>>> positive indices.") : >>>>>> missing value where TRUE/FALSE needed >>>>>> 20090130 14:03:43|Calibrating data set for allelic cross >>>>>> talk...done >>>>>>> traceback() >>>>>> 13: readCdfCellIndices(pathname, ...) >>>>>> 12: getCellIndicesChunk(getPathname(this), units = >>>>>> unitsChunk, ..., >>>>>> verbose = verbose) >>>>>> 11: fcn(idxs[ii], ...) >>>>>> 10: lapplyInChunks.numeric(units, function(unitsChunk) { >>>>>> cdfChunk <- getCellIndicesChunk(getPathname(this), units = >>>>>> unitsChunk, >>>>>> ..., verbose = verbose) >>>>>> res <- vector("list", length(unitsChunk)) >>>>>> res[[1]] <- unlist(cdfChunk, use.names = useNames) >>>>>> res >>>>>> }, chunkSize = 1e+05, useNames = useNames, verbose = verbose) >>>>>> 9: lapplyInChunks(units, function(unitsChunk) { >>>>>> cdfChunk <- getCellIndicesChunk(getPathname(this), units = >>>>>> unitsChunk, >>>>>> ..., verbose = verbose) >>>>>> res <- vector("list", length(unitsChunk)) >>>>>> res[[1]] <- unlist(cdfChunk, use.names = useNames) >>>>>> res >>>>>> }, chunkSize = 1e+05, useNames = useNames, verbose = verbose) >>>>>> 8: getCellIndices.AffymetrixCdfFile(cdf, units = subset, >>>>>> useNames = >>>>>> FALSE, >>>>>> unlist = TRUE) >>>>>> 7: getCellIndices(cdf, units = subset, useNames = FALSE, unlist = >>>>>> TRUE) >>>>>> 6: getSubsetToAvg.AllelicCrosstalkCalibration(this) >>>>>> 5: getSubsetToAvg(this) >>>>>> 4: getParameters.AllelicCrosstalkCalibration(this) >>>>>> 3: getParameters(this) >>>>>> 2: process.AllelicCrosstalkCalibration(acc, verbose = verbose) >>>>>> 1: process(acc, verbose = verbose) >>>> >>> >> >> > > > > > > --~--~---------~--~----~------------~-------~--~----~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~----------~----~----~----~------~----~------~--~---