Re: [aroma.affymetrix] problem using CRMA v2
For the record, here's an updated on this thread, which we took the discussion and troubleshooting off line: The problems was identified and solved. The short story is that it was an unfortunate and somewhat unlikely set of actions (using a different CDF with the exact same name and then changing to use the official one) that caused it to happen, but it could still happen. The outcome of this report (thanks!), is that I've made updates so that in the next release this will not happen again. The long story is that the Aroma Framework utilizes so called memoization (aka caching) to speed up the processing of repetitive tasks. For instance, when you query a CDF to identify all cell indices for PM probes in all SNPs, it takes quite some time to scan through the CDF (which you notice the first time you work with a new chip type), but the second time you ask Aroma to do the same thing, Aroma is clever enough to recognize that you've done this before, and instead loads the results from a cache (on your file system). In order lookup results in the cache, Aroma looks at the input arguments, where the CDF file is one of them, and generate a unique hash key. In the current version, this key is generated only from the filename of the CDF (which caused the about key clash), but in the upcoming version of the package it will be generated from the filename, the filesize and the file MD5 checksum, which makes the key unique. So, case closed. /Henrik On Tue, Nov 27, 2012 at 12:15 PM, Ioana Cutcutache ioana.cutcuta...@gmail.com wrote: Hi Henrik, I just did what you suggested and seems it's all ok, the checksum is fine, output below. Any idea what else might be causing the problem? cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full); # ASSERT CORRECTNESS stopifnot(identical(getChecksum(cdf), 3fbe0f6e7c8a346105238a3f3d10d4ec)) cdfM - getMonocellCdf(cdf); print(cdfM); AffymetrixCdfFile: Path: annotationData/chipTypes/GenomeWideSNP_6 Filename: GenomeWideSNP_6,Full,monocell.CDF Filesize: 416.27MB Chip type: GenomeWideSNP_6,Full,monocell RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 1681x1681 Number of cells: 2825761 Number of units: 1881415 Cells per unit: 1.50 Number of QC units: 4 stopifnot(identical(getChecksum(cdfM), 3a797e24a62f4f2e5214a6bf819151da)); getChecksum(cdfM) [1] 3a797e24a62f4f2e5214a6bf819151da Ioana On Tuesday, 27 November 2012 21:40:54 UTC+2, Henrik Bengtsson wrote: Hi, this certainly looks weird and it's very puzzling. My last guess it that your so called monocell CDF is corrupt (but I cannot see how that happened). Verify that you get the same: library(aroma.affymetrix); cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full); # ASSERT CORRECTNESS stopifnot(identical(getChecksum(cdf), 3fbe0f6e7c8a346105238a3f3d10d4ec)) cdfM - getMonocellCdf(cdf); print(cdfM); ## AffymetrixCdfFile: ## Path: annotationData/chipTypes/GenomeWideSNP_6 ## Filename: GenomeWideSNP_6,Full,monocell.CDF ## File size: 416.27 MB (436492121 bytes) ## Chip type: GenomeWideSNP_6,Full,monocell ## RAM: 0.00MB ## File format: v4 (binary; XDA) ## Dimension: 1681x1681 ## Number of cells: 2825761 ## Number of units: 1881415 ## Cells per unit: 1.50 ## Number of QC units: 4 # ASSERT CORRECTNESS stopifnot(identical(getChecksum(cdfM), 3a797e24a62f4f2e5214a6bf819151da)); If you get an error at the latter step, then I recommend that you remove that file by doing: pathnameM - getPathname(cdfM); file.remove(pathnameM); stopifnot(!isFile(pathnameM)); and then also clear the file cache for aroma.affymetrix by doing: path - getCachePath(dirs=aroma.affymetrix); removeDirectory(path, recursive=TRUE); Then regenerate the monocell CDF (it will take a long time): cdfM - getMonocellCdf(cdf, verbose=-10); and again compare with the output I get, particularly: stopifnot(identical(getChecksum(cdfM), 3a797e24a62f4f2e5214a6bf819151da)); If get to this point, it means that was the problem and you should be able to start your analysis. I'm really sorry that you get hit by this - it's really odd and it's extremely rare that someone has to got through all this mess to get things working - it should just work out of the box and hopefully it will do after this. /Henrik On Tue, Nov 27, 2012 at 2:39 AM, Ioana Cutcutache ioana.cu...@gmail.com wrote: I am still not able to get same as you for units (so if I continue doing units2 - fit(plm, verbose=-10) I am not getting the right thing either). I ran everything again and here is the output of the last part: plm - AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE) units1 - fitCnProbes(plm, verbose=verbose); Estimating single-probe CN units... Getting chip-effect set... Getting chip-effect set from data set... Getting chip-effect set from data set...done Getting chip-effect set...done Identifying CN units... int [1:945826] 935590 935591 935592 935593 935594
Re: [aroma.affymetrix] problem using CRMA v2
I am still not able to get same as you for units (so if I continue doing units2 - fit(plm, verbose=-10) I am not getting the right thing either). I ran everything again and here is the output of the last part: plm - AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE) units1 - fitCnProbes(plm, verbose=verbose); Estimating single-probe CN units... Getting chip-effect set... Getting chip-effect set from data set... Getting chip-effect set from data set...done Getting chip-effect set...done Identifying CN units... int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597 935598 935599 ... Identifying CN units...done Identifying subset to fit... Identifying non-fitted units in chip-effect file... Pathname: plmData/test,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6/91228050T,chipEffects.CEL Found indices cached on file Reading data for these 1881415 cells... Reading data for these 1881415 cells...done Looking for stdvs = 0 indicating non-estimated units: int [1:1881415] 1 2 3 4 5 6 7 8 9 10 ... Identifying non-fitted units in chip-effect file...done int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597 935598 935599 ... Identifying subset to fit...done Identifying cell indices for CN units... Identifying cell indices for CN units...done Keeping only single-cell units... Single-cell units: int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597 935598 935599 ... Cell indices: int [1:945826] 3435872 3403913 422133 2456228 4388533 5792832 3788106 3660159 3524469 3264141 ... Keeping only single-cell units...done Identifying cell indices for estimates... int [1:945826] 1876054 1876055 1876056 1876057 1876058 1876059 1876060 1876061 1876062 1876063 ... Identifying cell indices for estimates...done Fitting 1 arrays... Array #1 ('91228050T') of 1... Reading signals... num [1:945826] 6448 7260 7878 6151 12230 ... Reading signals...done Transforming signals to estimates... 'data.frame': 945826 obs. of 4 variables: $ cell: int 1876054 1876055 1876056 1876057 1876058 1876059 1876060 1876061 1876062 1876063 ... $ theta : num 6448 7260 7878 6151 12230 ... $ sdTheta : num 1.49e-08 1.49e-08 1.49e-08 1.49e-08 1.49e-08 ... $ outliers: logi FALSE FALSE FALSE FALSE FALSE FALSE ... Transforming signals to estimates...done Writing estimates... Storing flat data to file... Updating file... Updating file...done Storing flat data to file...done Writing estimates...done Array #1 ('91228050T') of 1...done used (Mb) gc trigger (Mb) max used (Mb) Ncells 483738 25.91733216 92.6 12913475 689.7 Vcells 28397870 216.7 83594476 637.8 130588565 996.4 Fitting 1 arrays...done Estimating single-probe CN units...done str(units1); int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597 935598 935599 ... units - findUnitsTodo(plm); str(units); int [1:935589] 1 2 3 4 5 6 7 8 9 10 ... On Monday, 26 November 2012 23:08:47 UTC+2, Henrik Bengtsson wrote: On Mon, Nov 26, 2012 at 12:46 PM, Ioana Cutcutache ioana.cu...@gmail.com javascript: wrote: Ok, so the first part ran without any errors, output below: print(csN); AffymetrixCelSet: Name: test Tags: ACC,ra,-XY,BPN,-XY Path: probeData/test,ACC,ra,-XY,BPN,-XY/GenomeWideSNP_6 Platform: Affymetrix Chip type: GenomeWideSNP_6,Full Number of arrays: 1 Names: 91228050T [1] Time period: 2011-07-19 05:08:51 -- 2011-07-19 05:08:51 Total file size: 65.83MB RAM: 0.01MB print(getFullNames(csN)); [1] 91228050T cdf2 - getCdf(csN); stopifnot(identical(getChecksum(cdf2), getChecksum(cdf))); # IMPORTANT However I am not getting same as you for the units: plm - AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE); units - findUnitsTodo(plm); str(units); int [1:1881415] 1 2 3 4 5 6 7 8 9 10 ... Alright (but has nothing to do with the error you first reported). If so, do: units1 - fitCnProbes(plm, verbose=verbose); str(units1); ## int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597 935598 935599 ... and you should get the same as: units - findUnitsTodo(plm); str(units); ## int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... If so, the try with units2 - fit(plm, verbose=-10); and compare to the output I already pasted in my previous message (below). /Henrik On Monday, 26 November 2012 21:15:32 UTC+2, Henrik Bengtsson wrote: Hmm... first run this in a fresh R session: library(aroma.affymetrix); cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full); csR - AffymetrixCelSet$byName(test, cdf=cdf); acc - AllelicCrosstalkCalibration(csR, model=CRMAv2); csC - process(acc, verbose=verbose); bpn - BasePositionNormalization(csC, target=zero); csN - process(bpn, verbose=verbose); then
Re: [aroma.affymetrix] problem using CRMA v2
Hi, this certainly looks weird and it's very puzzling. My last guess it that your so called monocell CDF is corrupt (but I cannot see how that happened). Verify that you get the same: library(aroma.affymetrix); cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full); # ASSERT CORRECTNESS stopifnot(identical(getChecksum(cdf), 3fbe0f6e7c8a346105238a3f3d10d4ec)) cdfM - getMonocellCdf(cdf); print(cdfM); ## AffymetrixCdfFile: ## Path: annotationData/chipTypes/GenomeWideSNP_6 ## Filename: GenomeWideSNP_6,Full,monocell.CDF ## File size: 416.27 MB (436492121 bytes) ## Chip type: GenomeWideSNP_6,Full,monocell ## RAM: 0.00MB ## File format: v4 (binary; XDA) ## Dimension: 1681x1681 ## Number of cells: 2825761 ## Number of units: 1881415 ## Cells per unit: 1.50 ## Number of QC units: 4 # ASSERT CORRECTNESS stopifnot(identical(getChecksum(cdfM), 3a797e24a62f4f2e5214a6bf819151da)); If you get an error at the latter step, then I recommend that you remove that file by doing: pathnameM - getPathname(cdfM); file.remove(pathnameM); stopifnot(!isFile(pathnameM)); and then also clear the file cache for aroma.affymetrix by doing: path - getCachePath(dirs=aroma.affymetrix); removeDirectory(path, recursive=TRUE); Then regenerate the monocell CDF (it will take a long time): cdfM - getMonocellCdf(cdf, verbose=-10); and again compare with the output I get, particularly: stopifnot(identical(getChecksum(cdfM), 3a797e24a62f4f2e5214a6bf819151da)); If get to this point, it means that was the problem and you should be able to start your analysis. I'm really sorry that you get hit by this - it's really odd and it's extremely rare that someone has to got through all this mess to get things working - it should just work out of the box and hopefully it will do after this. /Henrik On Tue, Nov 27, 2012 at 2:39 AM, Ioana Cutcutache ioana.cutcuta...@gmail.com wrote: I am still not able to get same as you for units (so if I continue doing units2 - fit(plm, verbose=-10) I am not getting the right thing either). I ran everything again and here is the output of the last part: plm - AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE) units1 - fitCnProbes(plm, verbose=verbose); Estimating single-probe CN units... Getting chip-effect set... Getting chip-effect set from data set... Getting chip-effect set from data set...done Getting chip-effect set...done Identifying CN units... int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597 935598 935599 ... Identifying CN units...done Identifying subset to fit... Identifying non-fitted units in chip-effect file... Pathname: plmData/test,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6/91228050T,chipEffects.CEL Found indices cached on file Reading data for these 1881415 cells... Reading data for these 1881415 cells...done Looking for stdvs = 0 indicating non-estimated units: int [1:1881415] 1 2 3 4 5 6 7 8 9 10 ... Identifying non-fitted units in chip-effect file...done int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597 935598 935599 ... Identifying subset to fit...done Identifying cell indices for CN units... Identifying cell indices for CN units...done Keeping only single-cell units... Single-cell units: int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597 935598 935599 ... Cell indices: int [1:945826] 3435872 3403913 422133 2456228 4388533 5792832 3788106 3660159 3524469 3264141 ... Keeping only single-cell units...done Identifying cell indices for estimates... int [1:945826] 1876054 1876055 1876056 1876057 1876058 1876059 1876060 1876061 1876062 1876063 ... Identifying cell indices for estimates...done Fitting 1 arrays... Array #1 ('91228050T') of 1... Reading signals... num [1:945826] 6448 7260 7878 6151 12230 ... Reading signals...done Transforming signals to estimates... 'data.frame': 945826 obs. of 4 variables: $ cell: int 1876054 1876055 1876056 1876057 1876058 1876059 1876060 1876061 1876062 1876063 ... $ theta : num 6448 7260 7878 6151 12230 ... $ sdTheta : num 1.49e-08 1.49e-08 1.49e-08 1.49e-08 1.49e-08 ... $ outliers: logi FALSE FALSE FALSE FALSE FALSE FALSE ... Transforming signals to estimates...done Writing estimates... Storing flat data to file... Updating file... Updating file...done Storing flat data to file...done Writing estimates...done Array #1 ('91228050T') of 1...done used (Mb) gc trigger (Mb) max used (Mb) Ncells 483738 25.91733216 92.6 12913475 689.7 Vcells 28397870 216.7 83594476 637.8 130588565 996.4 Fitting 1 arrays...done Estimating single-probe CN units...done str(units1); int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597 935598 935599 ... units - findUnitsTodo(plm); str(units); int [1:935589] 1 2 3 4 5 6 7 8 9 10 ... On Monday,
Re: [aroma.affymetrix] problem using CRMA v2
Hello, Please see below the requested information: 1. sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] aroma.affymetrix_2.6.0 affxparser_1.30.0 aroma.apd_0.2.3 [4] R.huge_0.4.1 aroma.light_1.28.0 aroma.core_2.6.0 [7] matrixStats_0.6.2 R.rsp_0.8.2R.cache_0.6.5 [10] R.devices_2.1.1R.filesets_1.6.0 digest_0.5.2 [13] R.utils_1.18.0 R.oo_1.10.1R.methodsS3_1.4.2 2. The script (up to the point where I get the error) (sorry it does not work to attach files so I am pasting it in here): library(aroma.affymetrix) log - verbose - Arguments$getVerbose(-8, timestamp=TRUE) options(digits=4) cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) print(cdf) gi - getGenomeInformation(cdf) print(gi) si - getSnpInformation(cdf) print(si) acs - AromaCellSequenceFile$byChipType(getChipType(cdf, fullname=FALSE)) print(acs) cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) csR - AffymetrixCelSet$byName(test, cdf=cdf) print(csR) acc - AllelicCrosstalkCalibration(csR, model=CRMAv2) print(acc) csC - process(acc, verbose=verbose) print(csC) bpn - BasePositionNormalization(csC, target=zero) print(bpn) csN - process(bpn, verbose=verbose) print(csN) plm - AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE) print(plm) if (length(findUnitsTodo(plm)) 0) { # Fit CN probes quickly (~5-10s/array + some overhead) units - fitCnProbes(plm, verbose=verbose) str(units) # int [1:945826] 935590 935591 935592 935593 935594 935595 ... # Fit remaining units, i.e. SNPs (~5-10min/array) units - fit(plm, verbose=verbose) # --- error here str(units) } 3. cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) print(cdf) AffymetrixCdfFile: Path: annotationData/chipTypes/GenomeWideSNP_6 Filename: GenomeWideSNP_6,Full.cdf Filesize: 470.44MB Chip type: GenomeWideSNP_6,Full RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 2572x2680 Number of cells: 6892960 Number of units: 1881415 Cells per unit: 3.66 Number of QC units: 4 Thank you! On Monday, 26 November 2012 17:52:39 UTC+2, Henrik Bengtsson wrote: Hi, please provide the following information so I can help troubleshoot: 1. What's your sessionInfo() after doing library(aroma.affymetrix)? 2. Your complete script (right now I not sure which chip type your are working on) up to the point where you get the error. 3. The output of print(cdf) where 'cdf' is your AffymetrixCdfFile object. Henrik On Mon, Nov 26, 2012 at 7:40 AM, Ioana Cutcutache ioana.cu...@gmail.com javascript: wrote: Hello, I am trying to use CRMA v2 and basically just following the instructions at http://www.aroma-project.org/vignettes/CRMAv2. I have all the annotation files and I think everything is right (the checking of this files gives me the same output as the webpage mentioned above). However when I get to units - fit(plm, verbose=verbose) I get the following error which I am no able to solve: [... removed output...] 20121126 19:51:39| Setting up parameter sets...done 20121126 19:51:39| Fitting AvgCnPlm for each unit type separately... 20121126 19:51:39| Unit types: unknown expression genotyping 0 1 2 20121126 19:51:39| Unit type #1 ('unknown') of 3... 20121126 19:51:39| Unit type: unknown (code=0) 20121126 19:51:39| Number of units of this type: 1590374 int [1:1590374] 291042 291043 291044 291045 291046 291047 291048 291049 291050 291051 ... 20121126 19:51:39| Fitting the model by unit dimensions (at least for the large classes)... 20121126 19:51:39|Grouping units into equivalent (unit,group,cell) dimensions... [2012-11-26 19:51:39] Exception: Argument 'units' contains 25346 NA value(s). at #11. getNumerics.Arguments(static, ..., asMode = integer, disallow = disallow) - getNumerics.Arguments() is in environment 'R.utils' at #10. getNumerics(static, ..., asMode = integer, disallow = disallow) - getNumerics() is in environment 'R.utils' at #09. getIntegers.Arguments(static, x, ..., range = range, .name = .name) - getIntegers.Arguments() is in environment 'R.utils' at #08. getIntegers(static, x, ..., range = range, .name = .name) - getIntegers() is in environment 'R.utils' at #07.
Re: [aroma.affymetrix] problem using CRMA v2
Ok, everything looks alright to me. However, the error you get looks weird to me and I've never seen it, but I'm sure there is an easy explanation. Lets do some troubleshooting; what output do you when doing the following (I've pasted my output): cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full); print(cdf); print(getChecksum(cdf)); ## [1] 3fbe0f6e7c8a346105238a3f3d10d4ec ut - getUnitTypes(cdf); print(table(ut)); ## ut ## 1 2 5 ##621 934968 945826 If you don't get the same unit-types output as I do, retry with: ut - getUnitTypes(cdf, force=TRUE); print(table(ut)); /Henrik On Mon, Nov 26, 2012 at 8:55 AM, Ioana Cutcutache ioana.cutcuta...@gmail.com wrote: Hello, Please see below the requested information: 1. sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] aroma.affymetrix_2.6.0 affxparser_1.30.0 aroma.apd_0.2.3 [4] R.huge_0.4.1 aroma.light_1.28.0 aroma.core_2.6.0 [7] matrixStats_0.6.2 R.rsp_0.8.2R.cache_0.6.5 [10] R.devices_2.1.1R.filesets_1.6.0 digest_0.5.2 [13] R.utils_1.18.0 R.oo_1.10.1R.methodsS3_1.4.2 2. The script (up to the point where I get the error) (sorry it does not work to attach files so I am pasting it in here): library(aroma.affymetrix) log - verbose - Arguments$getVerbose(-8, timestamp=TRUE) options(digits=4) cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) print(cdf) gi - getGenomeInformation(cdf) print(gi) si - getSnpInformation(cdf) print(si) acs - AromaCellSequenceFile$byChipType(getChipType(cdf, fullname=FALSE)) print(acs) cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) csR - AffymetrixCelSet$byName(test, cdf=cdf) print(csR) acc - AllelicCrosstalkCalibration(csR, model=CRMAv2) print(acc) csC - process(acc, verbose=verbose) print(csC) bpn - BasePositionNormalization(csC, target=zero) print(bpn) csN - process(bpn, verbose=verbose) print(csN) plm - AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE) print(plm) if (length(findUnitsTodo(plm)) 0) { # Fit CN probes quickly (~5-10s/array + some overhead) units - fitCnProbes(plm, verbose=verbose) str(units) # int [1:945826] 935590 935591 935592 935593 935594 935595 ... # Fit remaining units, i.e. SNPs (~5-10min/array) units - fit(plm, verbose=verbose) # --- error here str(units) } 3. cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) print(cdf) AffymetrixCdfFile: Path: annotationData/chipTypes/GenomeWideSNP_6 Filename: GenomeWideSNP_6,Full.cdf Filesize: 470.44MB Chip type: GenomeWideSNP_6,Full RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 2572x2680 Number of cells: 6892960 Number of units: 1881415 Cells per unit: 3.66 Number of QC units: 4 Thank you! On Monday, 26 November 2012 17:52:39 UTC+2, Henrik Bengtsson wrote: Hi, please provide the following information so I can help troubleshoot: 1. What's your sessionInfo() after doing library(aroma.affymetrix)? 2. Your complete script (right now I not sure which chip type your are working on) up to the point where you get the error. 3. The output of print(cdf) where 'cdf' is your AffymetrixCdfFile object. Henrik On Mon, Nov 26, 2012 at 7:40 AM, Ioana Cutcutache ioana.cu...@gmail.com wrote: Hello, I am trying to use CRMA v2 and basically just following the instructions at http://www.aroma-project.org/vignettes/CRMAv2. I have all the annotation files and I think everything is right (the checking of this files gives me the same output as the webpage mentioned above). However when I get to units - fit(plm, verbose=verbose) I get the following error which I am no able to solve: [... removed output...] 20121126 19:51:39| Setting up parameter sets...done 20121126 19:51:39| Fitting AvgCnPlm for each unit type separately... 20121126 19:51:39| Unit types: unknown expression genotyping 0 1 2 20121126 19:51:39| Unit type #1 ('unknown') of 3... 20121126 19:51:39| Unit type: unknown (code=0) 20121126 19:51:39| Number of units of this type: 1590374 int [1:1590374] 291042 291043 291044 291045 291046 291047 291048 291049 291050 291051 ... 20121126 19:51:39| Fitting the model by unit dimensions (at least for the large classes)... 20121126 19:51:39|Grouping units into equivalent (unit,group,cell) dimensions... [2012-11-26 19:51:39] Exception: Argument 'units' contains 25346 NA
Re: [aroma.affymetrix] problem using CRMA v2
Hi Henrik, Looks like the output is ok: cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) print(getChecksum(cdf)); [1] 3fbe0f6e7c8a346105238a3f3d10d4ec ut - getUnitTypes(cdf); print(table(ut)); ut 0 1 2 1565028 621 290420 ut - getUnitTypes(cdf, force=TRUE); print(table(ut)); ut 1 2 5 621 934968 945826 Ioana On Monday, 26 November 2012 19:59:44 UTC+2, Henrik Bengtsson wrote: Ok, everything looks alright to me. However, the error you get looks weird to me and I've never seen it, but I'm sure there is an easy explanation. Lets do some troubleshooting; what output do you when doing the following (I've pasted my output): cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full); print(cdf); print(getChecksum(cdf)); ## [1] 3fbe0f6e7c8a346105238a3f3d10d4ec ut - getUnitTypes(cdf); print(table(ut)); ## ut ## 1 2 5 ##621 934968 945826 If you don't get the same unit-types output as I do, retry with: ut - getUnitTypes(cdf, force=TRUE); print(table(ut)); /Henrik On Mon, Nov 26, 2012 at 8:55 AM, Ioana Cutcutache ioana.cu...@gmail.com javascript: wrote: Hello, Please see below the requested information: 1. sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] aroma.affymetrix_2.6.0 affxparser_1.30.0 aroma.apd_0.2.3 [4] R.huge_0.4.1 aroma.light_1.28.0 aroma.core_2.6.0 [7] matrixStats_0.6.2 R.rsp_0.8.2R.cache_0.6.5 [10] R.devices_2.1.1R.filesets_1.6.0 digest_0.5.2 [13] R.utils_1.18.0 R.oo_1.10.1R.methodsS3_1.4.2 2. The script (up to the point where I get the error) (sorry it does not work to attach files so I am pasting it in here): library(aroma.affymetrix) log - verbose - Arguments$getVerbose(-8, timestamp=TRUE) options(digits=4) cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) print(cdf) gi - getGenomeInformation(cdf) print(gi) si - getSnpInformation(cdf) print(si) acs - AromaCellSequenceFile$byChipType(getChipType(cdf, fullname=FALSE)) print(acs) cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) csR - AffymetrixCelSet$byName(test, cdf=cdf) print(csR) acc - AllelicCrosstalkCalibration(csR, model=CRMAv2) print(acc) csC - process(acc, verbose=verbose) print(csC) bpn - BasePositionNormalization(csC, target=zero) print(bpn) csN - process(bpn, verbose=verbose) print(csN) plm - AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE) print(plm) if (length(findUnitsTodo(plm)) 0) { # Fit CN probes quickly (~5-10s/array + some overhead) units - fitCnProbes(plm, verbose=verbose) str(units) # int [1:945826] 935590 935591 935592 935593 935594 935595 ... # Fit remaining units, i.e. SNPs (~5-10min/array) units - fit(plm, verbose=verbose) # --- error here str(units) } 3. cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) print(cdf) AffymetrixCdfFile: Path: annotationData/chipTypes/GenomeWideSNP_6 Filename: GenomeWideSNP_6,Full.cdf Filesize: 470.44MB Chip type: GenomeWideSNP_6,Full RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 2572x2680 Number of cells: 6892960 Number of units: 1881415 Cells per unit: 3.66 Number of QC units: 4 Thank you! On Monday, 26 November 2012 17:52:39 UTC+2, Henrik Bengtsson wrote: Hi, please provide the following information so I can help troubleshoot: 1. What's your sessionInfo() after doing library(aroma.affymetrix)? 2. Your complete script (right now I not sure which chip type your are working on) up to the point where you get the error. 3. The output of print(cdf) where 'cdf' is your AffymetrixCdfFile object. Henrik On Mon, Nov 26, 2012 at 7:40 AM, Ioana Cutcutache ioana.cu...@gmail.com wrote: Hello, I am trying to use CRMA v2 and basically just following the instructions at http://www.aroma-project.org/vignettes/CRMAv2. I have all the annotation files and I think everything is right (the checking of this files gives me the same output as the webpage mentioned above). However when I get to units - fit(plm, verbose=verbose) I get the following error which I am no able to solve: [... removed output...] 20121126 19:51:39| Setting up
Re: [aroma.affymetrix] problem using CRMA v2
Hmm... first run this in a fresh R session: library(aroma.affymetrix); cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full); csR - AffymetrixCelSet$byName(test, cdf=cdf); acc - AllelicCrosstalkCalibration(csR, model=CRMAv2); csC - process(acc, verbose=verbose); bpn - BasePositionNormalization(csC, target=zero); csN - process(bpn, verbose=verbose); then what's the output of: print(csN); print(getFullNames(csN)); cdf2 - getCdf(csN); stopifnot(identical(getChecksum(cdf2), getChecksum(cdf))); # IMPORTANT If you get an error above, what is the output of print(cdf2) and getChecksum(cdf2)? If not, continue: plm - AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE); units - findUnitsTodo(plm); str(units); ## int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... Do you get the same? If so, continue: units2 - fit(plm, verbose=-10); You should get something like the following, except from the dataset/sample names (if not, let me know): ## Fitting model of class AvgCnPlm... ## AvgCnPlm: ## Data set: GSE13372 ## Chip type: GenomeWideSNP_6,Full ## Input tags: testset,ACC,ra,-XY,BPN,-XY ## Output tags: testset,ACC,ra,-XY,BPN,-XY,AVG,A+B ## Parameters: {probeModel: chr pm, shift: num 0, flavor: chr median, mergeStrands: logi TRUE, combineAlleles: logi TRUE} ## Path: plmData/GSE13372,testset,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6 ## RAM: 0.01MB ## Identifying non-estimated units... ## Identifying non-fitted units in chip-effect file... ##Pathname: plmData/GSE13372,testset,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6/GSM337708,chipEffects.CEL ##Found indices cached on file ##Reading data for these 1881415 cells... ##Reading data for these 1881415 cells...done ##Looking for stdvs = 0 indicating non-estimated units: ## int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... ## Identifying non-fitted units in chip-effect file...done ## Identifying non-estimated units...done ## Getting model fit for 934968 units. ## Identifying unit types:... ## Units: ##int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... ## Unit types: ##atomic [1:934968] 2 2 2 2 2 2 2 2 2 2 ... ##- attr(*, types)= Named int [1:8] 0 1 2 3 4 5 6 7 ## ..- attr(*, names)= chr [1:8] unknown expression genotyping resequencing ... ## unitTypes ##2 ## 934968 ## Unique unit types: ## genotyping ##2 ## Identifying unit types:...done ## Setting up parameter sets... ## Creating CEL file... ##Chip type: GenomeWideSNP_6,Full ##Pathname: plmData/GSE13372,testset,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6/probeAffinities.CEL ##Returning already existing file. ## Creating CEL file...done ## used (Mb) gc trigger (Mb) max used (Mb) ## Ncells 549305 29.41606180 85.8 13015638 695.2 ## Vcells 7692698 58.7 23004541 175.6 40154683 306.4 ## Setting up parameter sets...done ## Fitting AvgCnPlm for each unit type separately... ## Unit types: ## genotyping ##2 ## Unit type #1 ('genotyping') of 1... ##Unit type: genotyping (code=2) ##Number of units of this type: 934968 ## int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... ##Fitting the model by unit dimensions (at least for the large classes)... ## Grouping units into equivalent (unit,group,cell) dimensions... [...] What is particular interesting are the outputs of your unit types and unit indices. /Henrik On Mon, Nov 26, 2012 at 10:45 AM, Ioana Cutcutache ioana.cutcuta...@gmail.com wrote: library(aroma.affymetrix) log - verbose - Arguments$getVerbose(-8, timestamp=TRUE) options(digits=4) cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) print(cdf) gi - getGenomeInformation(cdf) print(gi) si - getSnpInformation(cdf) print(si) acs - AromaCellSequenceFile$byChipType(getChipType(cdf, fullname=FALSE)) print(acs) cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) csR - AffymetrixCelSet$byName(test, cdf=cdf) print(csR) acc - AllelicCrosstalkCalibration(csR, model=CRMAv2) print(acc) csC - process(acc, verbose=verbose) print(csC) bpn - BasePositionNormalization(csC, target=zero) print(bpn) csN - process(bpn, verbose=verbose) print(csN) -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/