Re: [aroma.affymetrix] problem using CRMA v2

2012-11-30 Thread Henrik Bengtsson
For the record, here's an updated on this thread, which we took the
discussion and troubleshooting off line:

The problems was identified and solved.  The short story is that it
was an unfortunate and somewhat unlikely set of actions (using a
different CDF with the exact same name and then changing to use the
official one) that caused it to happen, but it could still happen.
The outcome of this report (thanks!), is that I've made updates so
that in the next release this will not happen again.

The long story is that the Aroma Framework utilizes so called
memoization (aka caching) to speed up the processing of repetitive
tasks.  For instance, when you query a CDF to identify all cell
indices for PM probes in all SNPs, it takes quite some time to scan
through the CDF (which you notice the first time you work with a new
chip type), but the second time you ask Aroma to do the same thing,
Aroma is clever enough to recognize that you've done this before, and
instead loads the results from a cache (on your file system).  In
order lookup results in the cache, Aroma looks at the input arguments,
where the CDF file is one of them, and generate a unique hash key.  In
the current version, this key is generated only from the filename of
the CDF (which caused the about key clash), but in the upcoming
version of the package it will be generated from the filename, the
filesize and the file MD5 checksum, which makes the key unique.

So, case closed.

/Henrik

On Tue, Nov 27, 2012 at 12:15 PM, Ioana Cutcutache
ioana.cutcuta...@gmail.com wrote:
 Hi Henrik,

 I just did what you suggested and seems it's all ok, the checksum is fine,
 output below.
 Any idea what else might be causing the problem?

 cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full);
 # ASSERT CORRECTNESS
 stopifnot(identical(getChecksum(cdf), 3fbe0f6e7c8a346105238a3f3d10d4ec))
 cdfM - getMonocellCdf(cdf);
 print(cdfM);
 AffymetrixCdfFile:
 Path: annotationData/chipTypes/GenomeWideSNP_6
 Filename: GenomeWideSNP_6,Full,monocell.CDF
 Filesize: 416.27MB
 Chip type: GenomeWideSNP_6,Full,monocell
 RAM: 0.00MB
 File format: v4 (binary; XDA)
 Dimension: 1681x1681
 Number of cells: 2825761
 Number of units: 1881415
 Cells per unit: 1.50
 Number of QC units: 4
 stopifnot(identical(getChecksum(cdfM),
 3a797e24a62f4f2e5214a6bf819151da));
 getChecksum(cdfM)
 [1] 3a797e24a62f4f2e5214a6bf819151da

 Ioana

 On Tuesday, 27 November 2012 21:40:54 UTC+2, Henrik Bengtsson wrote:

 Hi,

 this certainly looks weird and it's very puzzling.  My last guess it
 that your so called monocell CDF is corrupt (but I cannot see how that
 happened).  Verify that you get the same:

 library(aroma.affymetrix);

 cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full);
 # ASSERT CORRECTNESS
 stopifnot(identical(getChecksum(cdf), 3fbe0f6e7c8a346105238a3f3d10d4ec))

 cdfM - getMonocellCdf(cdf);
 print(cdfM);
 ## AffymetrixCdfFile:
 ## Path: annotationData/chipTypes/GenomeWideSNP_6
 ## Filename: GenomeWideSNP_6,Full,monocell.CDF
 ## File size: 416.27 MB (436492121 bytes)
 ## Chip type: GenomeWideSNP_6,Full,monocell
 ## RAM: 0.00MB
 ## File format: v4 (binary; XDA)
 ## Dimension: 1681x1681
 ## Number of cells: 2825761
 ## Number of units: 1881415
 ## Cells per unit: 1.50
 ## Number of QC units: 4

 # ASSERT CORRECTNESS
 stopifnot(identical(getChecksum(cdfM),
 3a797e24a62f4f2e5214a6bf819151da));


 If you get an error at the latter step, then I recommend that you
 remove that file by doing:

 pathnameM - getPathname(cdfM);
 file.remove(pathnameM);
 stopifnot(!isFile(pathnameM));

 and then also clear the file cache for aroma.affymetrix by doing:

 path - getCachePath(dirs=aroma.affymetrix);
 removeDirectory(path, recursive=TRUE);

 Then regenerate the monocell CDF (it will take a long time):

 cdfM - getMonocellCdf(cdf, verbose=-10);

 and again compare with the output I get, particularly:

 stopifnot(identical(getChecksum(cdfM),
 3a797e24a62f4f2e5214a6bf819151da));

 If get to this point, it means that was the problem and you should be
 able to start your analysis.


 I'm really sorry that you get hit by this - it's really odd and it's
 extremely rare that someone has to got through all this mess to get
 things working - it should just work out of the box and hopefully it
 will do after this.

 /Henrik

 On Tue, Nov 27, 2012 at 2:39 AM, Ioana Cutcutache
 ioana.cu...@gmail.com wrote:
  I am still not able to get same as you for units (so if I continue doing
  units2 - fit(plm, verbose=-10) I am not getting the right thing
  either).
  I ran everything again and here is the output of the last part:
 
  plm - AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE)
  units1 - fitCnProbes(plm, verbose=verbose);
  Estimating single-probe CN units...
   Getting chip-effect set...
Getting chip-effect set from data set...
Getting chip-effect set from data set...done
   Getting chip-effect set...done
   Identifying CN units...
 int [1:945826] 935590 935591 935592 935593 935594 

Re: [aroma.affymetrix] problem using CRMA v2

2012-11-27 Thread Ioana Cutcutache
I am still not able to get same as you for units (so if I continue 
doing units2 - fit(plm, verbose=-10) I am not getting the right thing 
either).
I ran everything again and here is the output of the last part:

 plm - AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE)
 units1 - fitCnProbes(plm, verbose=verbose); 
Estimating single-probe CN units...
 Getting chip-effect set...
  Getting chip-effect set from data set...
  Getting chip-effect set from data set...done
 Getting chip-effect set...done
 Identifying CN units...
   int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597 
935598 935599 ...
 Identifying CN units...done
 Identifying subset to fit...
  Identifying non-fitted units in chip-effect file...
   Pathname: 
plmData/test,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6/91228050T,chipEffects.CEL
   Found indices cached on file
   Reading data for these 1881415 cells...
   Reading data for these 1881415 cells...done
   Looking for stdvs = 0 indicating non-estimated units:
int [1:1881415] 1 2 3 4 5 6 7 8 9 10 ...
  Identifying non-fitted units in chip-effect file...done
   int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597 
935598 935599 ...
 Identifying subset to fit...done
 Identifying cell indices for CN units...
 Identifying cell indices for CN units...done
 Keeping only single-cell units...
  Single-cell units:
   int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597 
935598 935599 ...
  Cell indices:
   int [1:945826] 3435872 3403913 422133 2456228 4388533 5792832 3788106 
3660159 3524469 3264141 ...
 Keeping only single-cell units...done
 Identifying cell indices for estimates...
   int [1:945826] 1876054 1876055 1876056 1876057 1876058 1876059 1876060 
1876061 1876062 1876063 ...
 Identifying cell indices for estimates...done
 Fitting 1 arrays...
  Array #1 ('91228050T') of 1...
   Reading signals...
 num [1:945826] 6448 7260 7878 6151 12230 ...
   Reading signals...done
   Transforming signals to estimates...
'data.frame':   945826 obs. of  4 variables:
 $ cell: int  1876054 1876055 1876056 1876057 1876058 1876059 
1876060 1876061 1876062 1876063 ...
 $ theta   : num  6448 7260 7878 6151 12230 ...
 $ sdTheta : num  1.49e-08 1.49e-08 1.49e-08 1.49e-08 1.49e-08 ...
 $ outliers: logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
   Transforming signals to estimates...done
   Writing estimates...
Storing flat data to file...
 Updating file...
 Updating file...done
Storing flat data to file...done
   Writing estimates...done
  Array #1 ('91228050T') of 1...done
 used  (Mb) gc trigger  (Mb)  max used  (Mb)
  Ncells   483738  25.91733216  92.6  12913475 689.7
  Vcells 28397870 216.7   83594476 637.8 130588565 996.4
 Fitting 1 arrays...done
Estimating single-probe CN units...done
 str(units1);
 int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597 
935598 935599 ...
 units - findUnitsTodo(plm); 
 str(units);
 int [1:935589] 1 2 3 4 5 6 7 8 9 10 ...

On Monday, 26 November 2012 23:08:47 UTC+2, Henrik Bengtsson wrote:

 On Mon, Nov 26, 2012 at 12:46 PM, Ioana Cutcutache 
 ioana.cu...@gmail.com javascript: wrote: 
  Ok, so the first part ran without any errors, output below: 
  
  print(csN); 
  AffymetrixCelSet: 
  Name: test 
  Tags: ACC,ra,-XY,BPN,-XY 
  Path: probeData/test,ACC,ra,-XY,BPN,-XY/GenomeWideSNP_6 
  Platform: Affymetrix 
  Chip type: GenomeWideSNP_6,Full 
  Number of arrays: 1 
  Names: 91228050T [1] 
  Time period: 2011-07-19 05:08:51 -- 2011-07-19 05:08:51 
  Total file size: 65.83MB 
  RAM: 0.01MB 
  print(getFullNames(csN)); 
  [1] 91228050T 
  cdf2 - getCdf(csN); 
  stopifnot(identical(getChecksum(cdf2), getChecksum(cdf))); # IMPORTANT 
  
  
  However I am not getting same as you for the units: 
  
  plm - AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE); 
  units - findUnitsTodo(plm); 
  str(units); 
   int [1:1881415] 1 2 3 4 5 6 7 8 9 10 ... 

 Alright (but has nothing to do with the error you first reported).  If so, 
 do: 

 units1 - fitCnProbes(plm, verbose=verbose); 
 str(units1); 
 ##  int [1:945826] 935590 935591 935592 935593 935594 935595 935596 
 935597 935598 935599 ... 

 and you should get the same as: 

 units - findUnitsTodo(plm); 
 str(units); 
 ##  int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... 

 If so, the try with 

 units2 - fit(plm, verbose=-10); 

 and compare to the output I already pasted in my previous message (below). 

 /Henrik 

  
  
  On Monday, 26 November 2012 21:15:32 UTC+2, Henrik Bengtsson wrote: 
  
  Hmm... first run this in a fresh R session: 
  
  library(aroma.affymetrix); 
  cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full); 
  csR - AffymetrixCelSet$byName(test, cdf=cdf); 
  acc - AllelicCrosstalkCalibration(csR, model=CRMAv2); 
  csC - process(acc, verbose=verbose); 
  bpn - BasePositionNormalization(csC, target=zero); 
  csN - process(bpn, verbose=verbose); 
  
  then 

Re: [aroma.affymetrix] problem using CRMA v2

2012-11-27 Thread Henrik Bengtsson
Hi,

this certainly looks weird and it's very puzzling.  My last guess it
that your so called monocell CDF is corrupt (but I cannot see how that
happened).  Verify that you get the same:

library(aroma.affymetrix);

cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full);
# ASSERT CORRECTNESS
stopifnot(identical(getChecksum(cdf), 3fbe0f6e7c8a346105238a3f3d10d4ec))

cdfM - getMonocellCdf(cdf);
print(cdfM);
## AffymetrixCdfFile:
## Path: annotationData/chipTypes/GenomeWideSNP_6
## Filename: GenomeWideSNP_6,Full,monocell.CDF
## File size: 416.27 MB (436492121 bytes)
## Chip type: GenomeWideSNP_6,Full,monocell
## RAM: 0.00MB
## File format: v4 (binary; XDA)
## Dimension: 1681x1681
## Number of cells: 2825761
## Number of units: 1881415
## Cells per unit: 1.50
## Number of QC units: 4

# ASSERT CORRECTNESS
stopifnot(identical(getChecksum(cdfM), 3a797e24a62f4f2e5214a6bf819151da));


If you get an error at the latter step, then I recommend that you
remove that file by doing:

pathnameM - getPathname(cdfM);
file.remove(pathnameM);
stopifnot(!isFile(pathnameM));

and then also clear the file cache for aroma.affymetrix by doing:

path - getCachePath(dirs=aroma.affymetrix);
removeDirectory(path, recursive=TRUE);

Then regenerate the monocell CDF (it will take a long time):

cdfM - getMonocellCdf(cdf, verbose=-10);

and again compare with the output I get, particularly:

stopifnot(identical(getChecksum(cdfM), 3a797e24a62f4f2e5214a6bf819151da));

If get to this point, it means that was the problem and you should be
able to start your analysis.


I'm really sorry that you get hit by this - it's really odd and it's
extremely rare that someone has to got through all this mess to get
things working - it should just work out of the box and hopefully it
will do after this.

/Henrik

On Tue, Nov 27, 2012 at 2:39 AM, Ioana Cutcutache
ioana.cutcuta...@gmail.com wrote:
 I am still not able to get same as you for units (so if I continue doing
 units2 - fit(plm, verbose=-10) I am not getting the right thing either).
 I ran everything again and here is the output of the last part:

 plm - AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE)
 units1 - fitCnProbes(plm, verbose=verbose);
 Estimating single-probe CN units...
  Getting chip-effect set...
   Getting chip-effect set from data set...
   Getting chip-effect set from data set...done
  Getting chip-effect set...done
  Identifying CN units...
int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597
 935598 935599 ...
  Identifying CN units...done
  Identifying subset to fit...
   Identifying non-fitted units in chip-effect file...
Pathname:
 plmData/test,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6/91228050T,chipEffects.CEL
Found indices cached on file
Reading data for these 1881415 cells...
Reading data for these 1881415 cells...done
Looking for stdvs = 0 indicating non-estimated units:
 int [1:1881415] 1 2 3 4 5 6 7 8 9 10 ...
   Identifying non-fitted units in chip-effect file...done
int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597
 935598 935599 ...
  Identifying subset to fit...done
  Identifying cell indices for CN units...
  Identifying cell indices for CN units...done
  Keeping only single-cell units...
   Single-cell units:
int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597
 935598 935599 ...
   Cell indices:
int [1:945826] 3435872 3403913 422133 2456228 4388533 5792832 3788106
 3660159 3524469 3264141 ...
  Keeping only single-cell units...done
  Identifying cell indices for estimates...
int [1:945826] 1876054 1876055 1876056 1876057 1876058 1876059 1876060
 1876061 1876062 1876063 ...
  Identifying cell indices for estimates...done
  Fitting 1 arrays...
   Array #1 ('91228050T') of 1...
Reading signals...
  num [1:945826] 6448 7260 7878 6151 12230 ...
Reading signals...done
Transforming signals to estimates...
 'data.frame':   945826 obs. of  4 variables:
  $ cell: int  1876054 1876055 1876056 1876057 1876058 1876059
 1876060 1876061 1876062 1876063 ...
  $ theta   : num  6448 7260 7878 6151 12230 ...
  $ sdTheta : num  1.49e-08 1.49e-08 1.49e-08 1.49e-08 1.49e-08 ...
  $ outliers: logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
Transforming signals to estimates...done
Writing estimates...
 Storing flat data to file...
  Updating file...
  Updating file...done
 Storing flat data to file...done
Writing estimates...done
   Array #1 ('91228050T') of 1...done
  used  (Mb) gc trigger  (Mb)  max used  (Mb)
   Ncells   483738  25.91733216  92.6  12913475 689.7
   Vcells 28397870 216.7   83594476 637.8 130588565 996.4
  Fitting 1 arrays...done
 Estimating single-probe CN units...done
 str(units1);
  int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597
 935598 935599 ...
 units - findUnitsTodo(plm);
 str(units);
  int [1:935589] 1 2 3 4 5 6 7 8 9 10 ...

 On Monday, 

Re: [aroma.affymetrix] problem using CRMA v2

2012-11-26 Thread Ioana Cutcutache
Hello,

Please see below the requested information:

1.  sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
 [1] aroma.affymetrix_2.6.0 affxparser_1.30.0  aroma.apd_0.2.3   
 [4] R.huge_0.4.1   aroma.light_1.28.0 aroma.core_2.6.0  
 [7] matrixStats_0.6.2  R.rsp_0.8.2R.cache_0.6.5 
[10] R.devices_2.1.1R.filesets_1.6.0   digest_0.5.2  
[13] R.utils_1.18.0 R.oo_1.10.1R.methodsS3_1.4.2

2. The script (up to the point where I get the error) (sorry it does not 
work to attach files so I am pasting it in here):
library(aroma.affymetrix)
log - verbose - Arguments$getVerbose(-8, timestamp=TRUE)
options(digits=4)
cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full)
print(cdf)
gi - getGenomeInformation(cdf)
print(gi)
si - getSnpInformation(cdf)
print(si)
acs - AromaCellSequenceFile$byChipType(getChipType(cdf, fullname=FALSE))
print(acs)
cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full)
csR - AffymetrixCelSet$byName(test, cdf=cdf)
print(csR)
acc - AllelicCrosstalkCalibration(csR, model=CRMAv2)
print(acc)
csC - process(acc, verbose=verbose)
print(csC)
bpn - BasePositionNormalization(csC, target=zero)
print(bpn)
csN - process(bpn, verbose=verbose)
print(csN)
plm - AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE)
print(plm)
if (length(findUnitsTodo(plm))  0) {
  # Fit CN probes quickly (~5-10s/array + some overhead)
  units - fitCnProbes(plm, verbose=verbose)
  str(units)
  # int [1:945826] 935590 935591 935592 935593 935594 935595 ...
  
  # Fit remaining units, i.e. SNPs (~5-10min/array)
  units - fit(plm, verbose=verbose)  # --- error here
  str(units)
}

3.  cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full)
 print(cdf)
AffymetrixCdfFile:
Path: annotationData/chipTypes/GenomeWideSNP_6
Filename: GenomeWideSNP_6,Full.cdf
Filesize: 470.44MB
Chip type: GenomeWideSNP_6,Full
RAM: 0.00MB
File format: v4 (binary; XDA)
Dimension: 2572x2680
Number of cells: 6892960
Number of units: 1881415
Cells per unit: 3.66
Number of QC units: 4

Thank you!


On Monday, 26 November 2012 17:52:39 UTC+2, Henrik Bengtsson wrote:

 Hi, 

 please provide the following information so I can help troubleshoot: 

 1. What's your sessionInfo() after doing library(aroma.affymetrix)? 

 2. Your complete script (right now I not sure which chip type your are 
 working on) up to the point where you get the error. 

 3. The output of print(cdf) where 'cdf' is your AffymetrixCdfFile object. 

 Henrik 

 On Mon, Nov 26, 2012 at 7:40 AM, Ioana Cutcutache 
 ioana.cu...@gmail.com javascript: wrote: 
  Hello, 
  
  I am trying to use CRMA v2 and basically just following the instructions 
 at 
  http://www.aroma-project.org/vignettes/CRMAv2. 
  I have all the annotation files and I think everything is right (the 
  checking of this files gives me the same output as the webpage mentioned 
  above). 
  However when I get to 
  units - fit(plm, verbose=verbose) 
  I get the following error which I am no able to solve: 
  
  [... removed output...] 
  20121126 19:51:39| Setting up parameter sets...done 
  20121126 19:51:39| Fitting AvgCnPlm for each unit type separately... 
  20121126 19:51:39|  Unit types: 
   unknown expression genotyping 
 0  1  2 
  20121126 19:51:39|  Unit type #1 ('unknown') of 3... 
  20121126 19:51:39|   Unit type: unknown (code=0) 
  20121126 19:51:39|   Number of units of this type: 1590374 
  int [1:1590374] 291042 291043 291044 291045 291046 291047 291048 
  291049 291050 291051 ... 
  20121126 19:51:39|   Fitting the model by unit dimensions (at least 
  for the large classes)... 
  20121126 19:51:39|Grouping units into equivalent (unit,group,cell) 
  dimensions... 
  [2012-11-26 19:51:39] Exception: Argument 'units' contains 25346 NA 
  value(s). 
  
at #11. getNumerics.Arguments(static, ..., asMode = integer, 
  disallow = disallow) 
- getNumerics.Arguments() is in environment 'R.utils' 
  
at #10. getNumerics(static, ..., asMode = integer, disallow = 
 disallow) 
- getNumerics() is in environment 'R.utils' 
  
at #09. getIntegers.Arguments(static, x, ..., range = range, .name = 
  .name) 
- getIntegers.Arguments() is in environment 'R.utils' 
  
at #08. getIntegers(static, x, ..., range = range, .name = .name) 
- getIntegers() is in environment 'R.utils' 
  
at #07. 

Re: [aroma.affymetrix] problem using CRMA v2

2012-11-26 Thread Henrik Bengtsson
Ok, everything looks alright to me.  However, the error you get looks
weird to me and I've never seen it, but I'm sure there is an easy
explanation.  Lets do some troubleshooting; what output do you when
doing the following (I've pasted my output):

cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full);
print(cdf);
print(getChecksum(cdf));
## [1] 3fbe0f6e7c8a346105238a3f3d10d4ec

ut - getUnitTypes(cdf);
print(table(ut));
## ut
##  1  2  5
##621 934968 945826

If you don't get the same unit-types output as I do, retry with:
ut - getUnitTypes(cdf, force=TRUE);
print(table(ut));

/Henrik

On Mon, Nov 26, 2012 at 8:55 AM, Ioana Cutcutache
ioana.cutcuta...@gmail.com wrote:
 Hello,

 Please see below the requested information:

 1.  sessionInfo()
 R version 2.15.2 (2012-10-26)
 Platform: x86_64-unknown-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
  [1] aroma.affymetrix_2.6.0 affxparser_1.30.0  aroma.apd_0.2.3
  [4] R.huge_0.4.1   aroma.light_1.28.0 aroma.core_2.6.0
  [7] matrixStats_0.6.2  R.rsp_0.8.2R.cache_0.6.5
 [10] R.devices_2.1.1R.filesets_1.6.0   digest_0.5.2
 [13] R.utils_1.18.0 R.oo_1.10.1R.methodsS3_1.4.2

 2. The script (up to the point where I get the error) (sorry it does not
 work to attach files so I am pasting it in here):
 library(aroma.affymetrix)
 log - verbose - Arguments$getVerbose(-8, timestamp=TRUE)
 options(digits=4)
 cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full)
 print(cdf)
 gi - getGenomeInformation(cdf)
 print(gi)
 si - getSnpInformation(cdf)
 print(si)
 acs - AromaCellSequenceFile$byChipType(getChipType(cdf, fullname=FALSE))
 print(acs)
 cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full)
 csR - AffymetrixCelSet$byName(test, cdf=cdf)
 print(csR)
 acc - AllelicCrosstalkCalibration(csR, model=CRMAv2)
 print(acc)
 csC - process(acc, verbose=verbose)
 print(csC)
 bpn - BasePositionNormalization(csC, target=zero)
 print(bpn)
 csN - process(bpn, verbose=verbose)
 print(csN)
 plm - AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE)
 print(plm)
 if (length(findUnitsTodo(plm))  0) {
   # Fit CN probes quickly (~5-10s/array + some overhead)
   units - fitCnProbes(plm, verbose=verbose)
   str(units)
   # int [1:945826] 935590 935591 935592 935593 935594 935595 ...

   # Fit remaining units, i.e. SNPs (~5-10min/array)
   units - fit(plm, verbose=verbose)  # --- error here
   str(units)
 }

 3.  cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full)
 print(cdf)
 AffymetrixCdfFile:
 Path: annotationData/chipTypes/GenomeWideSNP_6
 Filename: GenomeWideSNP_6,Full.cdf
 Filesize: 470.44MB
 Chip type: GenomeWideSNP_6,Full
 RAM: 0.00MB
 File format: v4 (binary; XDA)
 Dimension: 2572x2680
 Number of cells: 6892960
 Number of units: 1881415
 Cells per unit: 3.66
 Number of QC units: 4

 Thank you!


 On Monday, 26 November 2012 17:52:39 UTC+2, Henrik Bengtsson wrote:

 Hi,

 please provide the following information so I can help troubleshoot:

 1. What's your sessionInfo() after doing library(aroma.affymetrix)?

 2. Your complete script (right now I not sure which chip type your are
 working on) up to the point where you get the error.

 3. The output of print(cdf) where 'cdf' is your AffymetrixCdfFile object.

 Henrik

 On Mon, Nov 26, 2012 at 7:40 AM, Ioana Cutcutache
 ioana.cu...@gmail.com wrote:
  Hello,
 
  I am trying to use CRMA v2 and basically just following the instructions
  at
  http://www.aroma-project.org/vignettes/CRMAv2.
  I have all the annotation files and I think everything is right (the
  checking of this files gives me the same output as the webpage mentioned
  above).
  However when I get to
  units - fit(plm, verbose=verbose)
  I get the following error which I am no able to solve:
 
  [... removed output...]
  20121126 19:51:39| Setting up parameter sets...done
  20121126 19:51:39| Fitting AvgCnPlm for each unit type separately...
  20121126 19:51:39|  Unit types:
   unknown expression genotyping
 0  1  2
  20121126 19:51:39|  Unit type #1 ('unknown') of 3...
  20121126 19:51:39|   Unit type: unknown (code=0)
  20121126 19:51:39|   Number of units of this type: 1590374
  int [1:1590374] 291042 291043 291044 291045 291046 291047 291048
  291049 291050 291051 ...
  20121126 19:51:39|   Fitting the model by unit dimensions (at least
  for the large classes)...
  20121126 19:51:39|Grouping units into equivalent (unit,group,cell)
  dimensions...
  [2012-11-26 19:51:39] Exception: Argument 'units' contains 25346 NA
  

Re: [aroma.affymetrix] problem using CRMA v2

2012-11-26 Thread Ioana Cutcutache
Hi Henrik,

Looks like the output is ok:

 cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full)
 print(getChecksum(cdf));
[1] 3fbe0f6e7c8a346105238a3f3d10d4ec
 ut - getUnitTypes(cdf); 
 print(table(ut));
ut
  0   1   2 
1565028 621  290420 
 ut - getUnitTypes(cdf, force=TRUE); 
 print(table(ut));
ut
 1  2  5 
   621 934968 945826 

Ioana

On Monday, 26 November 2012 19:59:44 UTC+2, Henrik Bengtsson wrote:

 Ok, everything looks alright to me.  However, the error you get looks 
 weird to me and I've never seen it, but I'm sure there is an easy 
 explanation.  Lets do some troubleshooting; what output do you when 
 doing the following (I've pasted my output): 

 cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full); 
 print(cdf); 
 print(getChecksum(cdf)); 
 ## [1] 3fbe0f6e7c8a346105238a3f3d10d4ec 

 ut - getUnitTypes(cdf); 
 print(table(ut)); 
 ## ut 
 ##  1  2  5 
 ##621 934968 945826 

 If you don't get the same unit-types output as I do, retry with: 
 ut - getUnitTypes(cdf, force=TRUE); 
 print(table(ut)); 

 /Henrik 

 On Mon, Nov 26, 2012 at 8:55 AM, Ioana Cutcutache 
 ioana.cu...@gmail.com javascript: wrote: 
  Hello, 
  
  Please see below the requested information: 
  
  1.  sessionInfo() 
  R version 2.15.2 (2012-10-26) 
  Platform: x86_64-unknown-linux-gnu (64-bit) 
  
  locale: 
   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C 
   [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 
   [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 
   [7] LC_PAPER=C LC_NAME=C 
   [9] LC_ADDRESS=C   LC_TELEPHONE=C 
  [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C 
  
  attached base packages: 
  [1] stats graphics  grDevices utils datasets  methods   base 
  
  other attached packages: 
   [1] aroma.affymetrix_2.6.0 affxparser_1.30.0  aroma.apd_0.2.3 
   [4] R.huge_0.4.1   aroma.light_1.28.0 aroma.core_2.6.0 
   [7] matrixStats_0.6.2  R.rsp_0.8.2R.cache_0.6.5 
  [10] R.devices_2.1.1R.filesets_1.6.0   digest_0.5.2 
  [13] R.utils_1.18.0 R.oo_1.10.1R.methodsS3_1.4.2 
  
  2. The script (up to the point where I get the error) (sorry it does not 
  work to attach files so I am pasting it in here): 
  library(aroma.affymetrix) 
  log - verbose - Arguments$getVerbose(-8, timestamp=TRUE) 
  options(digits=4) 
  cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) 
  print(cdf) 
  gi - getGenomeInformation(cdf) 
  print(gi) 
  si - getSnpInformation(cdf) 
  print(si) 
  acs - AromaCellSequenceFile$byChipType(getChipType(cdf, 
 fullname=FALSE)) 
  print(acs) 
  cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) 
  csR - AffymetrixCelSet$byName(test, cdf=cdf) 
  print(csR) 
  acc - AllelicCrosstalkCalibration(csR, model=CRMAv2) 
  print(acc) 
  csC - process(acc, verbose=verbose) 
  print(csC) 
  bpn - BasePositionNormalization(csC, target=zero) 
  print(bpn) 
  csN - process(bpn, verbose=verbose) 
  print(csN) 
  plm - AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE) 
  print(plm) 
  if (length(findUnitsTodo(plm))  0) { 
# Fit CN probes quickly (~5-10s/array + some overhead) 
units - fitCnProbes(plm, verbose=verbose) 
str(units) 
# int [1:945826] 935590 935591 935592 935593 935594 935595 ... 
  
# Fit remaining units, i.e. SNPs (~5-10min/array) 
units - fit(plm, verbose=verbose)  # --- error here 
str(units) 
  } 
  
  3.  cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full) 
  print(cdf) 
  AffymetrixCdfFile: 
  Path: annotationData/chipTypes/GenomeWideSNP_6 
  Filename: GenomeWideSNP_6,Full.cdf 
  Filesize: 470.44MB 
  Chip type: GenomeWideSNP_6,Full 
  RAM: 0.00MB 
  File format: v4 (binary; XDA) 
  Dimension: 2572x2680 
  Number of cells: 6892960 
  Number of units: 1881415 
  Cells per unit: 3.66 
  Number of QC units: 4 
  
  Thank you! 
  
  
  On Monday, 26 November 2012 17:52:39 UTC+2, Henrik Bengtsson wrote: 
  
  Hi, 
  
  please provide the following information so I can help troubleshoot: 
  
  1. What's your sessionInfo() after doing library(aroma.affymetrix)? 
  
  2. Your complete script (right now I not sure which chip type your are 
  working on) up to the point where you get the error. 
  
  3. The output of print(cdf) where 'cdf' is your AffymetrixCdfFile 
 object. 
  
  Henrik 
  
  On Mon, Nov 26, 2012 at 7:40 AM, Ioana Cutcutache 
  ioana.cu...@gmail.com wrote: 
   Hello, 
   
   I am trying to use CRMA v2 and basically just following the 
 instructions 
   at 
   http://www.aroma-project.org/vignettes/CRMAv2. 
   I have all the annotation files and I think everything is right (the 
   checking of this files gives me the same output as the webpage 
 mentioned 
   above). 
   However when I get to 
   units - fit(plm, verbose=verbose) 
   I get the following error which I am no able to solve: 
   
   [... removed output...] 
   20121126 19:51:39| Setting up 

Re: [aroma.affymetrix] problem using CRMA v2

2012-11-26 Thread Henrik Bengtsson
Hmm... first run this in a fresh R session:

library(aroma.affymetrix);
cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full);
csR - AffymetrixCelSet$byName(test, cdf=cdf);
acc - AllelicCrosstalkCalibration(csR, model=CRMAv2);
csC - process(acc, verbose=verbose);
bpn - BasePositionNormalization(csC, target=zero);
csN - process(bpn, verbose=verbose);

then what's the output of:
print(csN);
print(getFullNames(csN));
cdf2 - getCdf(csN);
stopifnot(identical(getChecksum(cdf2), getChecksum(cdf))); # IMPORTANT

If you get an error above, what is the output of print(cdf2) and
getChecksum(cdf2)?  If not, continue:

plm - AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE);
units - findUnitsTodo(plm);
str(units);
##  int [1:934968] 622 623 624 625 626 627 628 629 630 631 ...

Do you get the same?  If so, continue:

units2 - fit(plm, verbose=-10);

You should get something like the following, except from the
dataset/sample names (if not, let me know):

## Fitting model of class AvgCnPlm...
##  AvgCnPlm:
##  Data set: GSE13372
##  Chip type: GenomeWideSNP_6,Full
##  Input tags: testset,ACC,ra,-XY,BPN,-XY
##  Output tags: testset,ACC,ra,-XY,BPN,-XY,AVG,A+B
##  Parameters: {probeModel: chr pm, shift: num 0, flavor: chr
median, mergeStrands: logi TRUE, combineAlleles: logi TRUE}
##  Path: plmData/GSE13372,testset,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6
##  RAM: 0.01MB
##  Identifying non-estimated units...
##   Identifying non-fitted units in chip-effect file...
##Pathname:
plmData/GSE13372,testset,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6/GSM337708,chipEffects.CEL
##Found indices cached on file
##Reading data for these 1881415 cells...
##Reading data for these 1881415 cells...done
##Looking for stdvs = 0 indicating non-estimated units:
## int [1:934968] 622 623 624 625 626 627 628 629 630 631 ...
##   Identifying non-fitted units in chip-effect file...done
##  Identifying non-estimated units...done
##  Getting model fit for 934968 units.
##  Identifying unit types:...
##   Units:
##int [1:934968] 622 623 624 625 626 627 628 629 630 631 ...
##   Unit types:
##atomic [1:934968] 2 2 2 2 2 2 2 2 2 2 ...
##- attr(*, types)= Named int [1:8] 0 1 2 3 4 5 6 7
## ..- attr(*, names)= chr [1:8] unknown expression
genotyping resequencing ...
##   unitTypes
##2
##   934968
##   Unique unit types:
##   genotyping
##2
##  Identifying unit types:...done
##  Setting up parameter sets...
##   Creating CEL file...
##Chip type: GenomeWideSNP_6,Full
##Pathname:
plmData/GSE13372,testset,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6/probeAffinities.CEL
##Returning already existing file.
##   Creating CEL file...done
## used (Mb) gc trigger  (Mb) max used  (Mb)
##   Ncells  549305 29.41606180  85.8 13015638 695.2
##   Vcells 7692698 58.7   23004541 175.6 40154683 306.4
##  Setting up parameter sets...done
##  Fitting AvgCnPlm for each unit type separately...
##   Unit types:
##   genotyping
##2
##   Unit type #1 ('genotyping') of 1...
##Unit type: genotyping (code=2)
##Number of units of this type: 934968
## int [1:934968] 622 623 624 625 626 627 628 629 630 631 ...
##Fitting the model by unit dimensions (at least for the large classes)...
## Grouping units into equivalent (unit,group,cell) dimensions...
[...]

What is particular interesting are the outputs of your unit types and
unit indices.

/Henrik

On Mon, Nov 26, 2012 at 10:45 AM, Ioana Cutcutache
ioana.cutcuta...@gmail.com wrote:
 library(aroma.affymetrix)
 log - verbose - Arguments$getVerbose(-8, timestamp=TRUE)
 options(digits=4)
 cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full)
 print(cdf)
 gi - getGenomeInformation(cdf)
 print(gi)
 si - getSnpInformation(cdf)
 print(si)
 acs - AromaCellSequenceFile$byChipType(getChipType(cdf, fullname=FALSE))
 print(acs)
 cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full)
 csR - AffymetrixCelSet$byName(test, cdf=cdf)
 print(csR)
 acc - AllelicCrosstalkCalibration(csR, model=CRMAv2)
 print(acc)
 csC - process(acc, verbose=verbose)
 print(csC)
 bpn - BasePositionNormalization(csC, target=zero)
 print(bpn)
 csN - process(bpn, verbose=verbose)
 print(csN)

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/