[aroma.affymetrix] How to cite CRMA v2

2010-11-03 Thread Henrik Bengtsson
Hi there,

thanks all for citing our publication directly or indirectly related
to the aroma project framework.

Since I noticed that the original CRMA paper often get cited even when
the CRMA v2 method is used/meant, I would like to clarify to the list
that CRMA v2 is preferably referenced as:

H. Bengtsson, P. Wirapati  T.P. Speed, A single-array preprocessing
method for estimating full-resolution raw copy numbers from all
Affymetrix genotyping arrays including GenomeWideSNP 5  6,
Bioinformatics, 2009. [PMID: 19535535]

You can find all our references at
http://www.aroma-project.org/publications/ each with a small Cite
this for: note indicating for which method it should be cited.  If
you are uncertain, just drop us an email and we'll clarify.

Cheers,

Henrik

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Re: CbsModel parameters

2010-10-27 Thread Henrik Bengtsson
Are you sure you are not picking up old results, that is, did you use
fit(cbs, ..., force=TRUE) or simply did you remove the previous
segmentation results in cbsData/?

You can troubleshoot with one array and one chromosome, e.g.

fit(cbs, arrays=6, chromosomes=16, min.width=5, undo.splits=sdundo,
undo.SD=1, force=TRUE, verbose=-10);

/Henrik

On Wed, Oct 27, 2010 at 11:20 AM, Kai wangz...@gmail.com wrote:
 Hi Henrik,

 Thank you for your reply. However, I followed your instructions but
 still got segments with only 2 markers:

 These are the codes I ran:

 cbs = CbsModel(ds);
 cbs$.calculateRatios = FALSE;
 fit(cbs, chromosomes=c(1:23), min.width=5, undo.splits=sdundo,
 undo.SD=1, verbose=-10);
 ce = ChromosomeExplorer(cbs);
 process(ce,chromosomes=c(1:23));

 These are what I found out in the results (there are a total of 4
 samples):

 min(getRegions(cbs)[[1]][,5])
 [1] 5
 min(getRegions(cbs)[[2]][,5])
 [1] 2
 min(getRegions(cbs)[[3]][,5])
 [1] 2
 min(getRegions(cbs)[[4]][,5])
 [1] 2
 which(getRegions(cbs)[[4]][,5]==2)
 [1]  52 139
 getRegions(cbs)[[4]][139,1:5]
    chromosome    start     stop   mean count
 139         16 45057510 45057696 -1.427     2

 It seems to me that min.width=5 worked only in the first sample. Do
 you have any idea on this? Thanks!

 Best,
 Kai


 On Oct 26, 9:09 pm, Henrik Bengtsson henrik.bengts...@aroma-
 project.org wrote:
 I forgot to say that in the next release of aroma.core package, you
 will be able to specify additional arguments when you setup the CBS
 model:

 cbs - CbsModel(ds, min.width=5);

 ...but until then you have to stick with the below workaround.

 /Henrik

 On Tue, Oct 26, 2010 at 9:07 PM, hb h...@biostat.ucsf.edu wrote:
  Hi,

  sorry my mistake. I meant to write that you should pass the additional 
  arguments to fit() for the CbsModel (not process()), e.g.

  cbs - CbsModel(ds);
  cbs$.calculateRatios - FALSE;
  fit(cbs, chromosomes=1:23, min.width=5, verbose=-10);

  This will (explicitly) fit the segmentation model. Have a look at the 
  verbose output; you'll see that min.width should show up in the output 
  just before the DNAcopy segment() is called.

  After you've done the segmentation for all of you arrays and chromosomes, 
  you can have the ChromosomeExplorer generate the report for you as usual, 
  i.e.

  ce - ChromosomeExplorer(cbs);
  process(ce, chromosomes=1:23);

  Note that in your case you have to either delete already generated CBS 
  results, or use fit(..., force=TRUE), in order for aroma.* not to pick up 
  the old segmentation. You also need to delete the already generated PNG 
  files for the ChromosomeExplorer under reports/...

  On Tue, Oct 26, 2010 at 4:43 PM, Kai wangz...@gmail.com wrote:
  Hi Henrik,

  Thank you very much for your response. However, I tried the following
  codes to set the minimal number of marker to 5, but the results I got
  still contain segments with only 2 markers ...

  cbs = CbsModel(ds);
  cbs$.calculateRatios = FALSE;
  ce = ChromosomeExplorer(cbs);
  process(ce,chromosomes=c(1:23),min.width=5);

  I am not clear where I should put min.width=5? If I do
  process(cbs,min.width=5) first, how can I send the results to be
  displayed by chromosome explorer?

  Thanks again for your help. I look forward to hearing from you soon.

  Best,
  Kai

  On Sep 27, 9:47 pm, Henrik Bengtsson henrik.bengts...@gmail.com
  wrote:
  Hi.

  On Mon, Sep 27, 2010 at 4:51 PM, Kai wangz...@gmail.com wrote:
   Hi Henrik,

   I was wondering whether there is a way I can fine tune the behavior of
   CbsModel. Sometimes the default algorithm produces too many small
   fragments right next to each other without much separation in mean
   copy numbers. Is there a way to control how smooth the segmentation
   results are?

  Any additional arguments (in ...) that you pass to process(cbs, ...)
  will be passed down to the DNAcopy::segment(), which is the function
  doing the actual segmentation.  For more details on how fine tuning
  the CBS algorithm, see help(segment, package=DNAcopy).  You may
  also want to contact the authors of that method/package.

  /Henrik

   Thanks a lot!

   Best,
   Kai

   --
   When reporting problems on aroma.affymetrix, make sure 1) to run the 
   latest
   version of the package, 2) to report the output of sessionInfo() and
   traceback(), and 3) to post a complete code example.

   You received this message because you are subscribed to the Google 
   Groups
   aroma.affymetrix group with websitehttp://www.aroma-project.org/.
   To post to this group, send email to aroma-affymetrix@googlegroups.com
   To unsubscribe and other options, go 
   tohttp://www.aroma-project.org/forum/

  --
  When reporting problems on aroma.affymetrix, make sure 1) to run the 
  latest version of the package, 2) to report the output of sessionInfo() 
  and traceback(), and 3) to post a complete code example.

  You received this message because you are subscribed to the Google Groups 
  aroma.affymetrix group

Re: [aroma.affymetrix] Re: CbsModel parameters

2010-10-26 Thread Henrik Bengtsson
Hi,

sorry my mistake.  I meant to write that you should pass the
additional arguments to fit() for the CbsModel (not process()), e.g.

cbs - CbsModel(ds);
cbs$.calculateRatios - FALSE;
fit(cbs, chromosomes=1:23, min.width=5, verbose=-10);

This will (explicitly) fit the segmentation model.  Have a look at the
verbose output; you'll see that min.width should show up in the
output just before the DNAcopy segment() is called.

After you've done the segmentation for all of you arrays and
chromosomes, you can have the ChromosomeExplorer generate the report
for you as usual, i.e.

ce - ChromosomeExplorer(cbs);
process(ce, chromosomes=1:23);

Note that in your case you have to either delete already generated CBS
results, or use fit(..., force=TRUE), in order for aroma.* not to pick
up the old segmentation.   You also need to delete the already
generated PNG files for the ChromosomeExplorer under reports/...




On Tue, Oct 26, 2010 at 4:43 PM, Kai wangz...@gmail.com wrote:
 Hi Henrik,

 Thank you very much for your response. However, I tried the following
 codes to set the minimal number of marker to 5, but the results I got
 still contain segments with only 2 markers ...

 cbs = CbsModel(ds);
 cbs$.calculateRatios = FALSE;
 ce = ChromosomeExplorer(cbs);
 process(ce,chromosomes=c(1:23),min.width=5);

 I am not clear where I should put min.width=5? If I do
 process(cbs,min.width=5) first, how can I send the results to be
 displayed by chromosome explorer?

 Thanks again for your help. I look forward to hearing from you soon.

 Best,
 Kai



 On Sep 27, 9:47 pm, Henrik Bengtsson henrik.bengts...@gmail.com
 wrote:
 Hi.

 On Mon, Sep 27, 2010 at 4:51 PM, Kai wangz...@gmail.com wrote:
  Hi Henrik,

  I was wondering whether there is a way I can fine tune the behavior of
  CbsModel. Sometimes the default algorithm produces too many small
  fragments right next to each other without much separation in mean
  copy numbers. Is there a way to control how smooth the segmentation
  results are?

 Any additional arguments (in ...) that you pass to process(cbs, ...)
 will be passed down to the DNAcopy::segment(), which is the function
 doing the actual segmentation.  For more details on how fine tuning
 the CBS algorithm, see help(segment, package=DNAcopy).  You may
 also want to contact the authors of that method/package.

 /Henrik



  Thanks a lot!

  Best,
  Kai

  --
  When reporting problems on aroma.affymetrix, make sure 1) to run the latest
  version of the package, 2) to report the output of sessionInfo() and
  traceback(), and 3) to post a complete code example.

  You received this message because you are subscribed to the Google Groups
  aroma.affymetrix group with websitehttp://www.aroma-project.org/.
  To post to this group, send email to aroma-affymetrix@googlegroups.com
  To unsubscribe and other options, go tohttp://www.aroma-project.org/forum/



 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Can aroma.affymertix handle the data of agilent chip?

2010-10-24 Thread Henrik Bengtsson
Hi.

On Wed, Oct 13, 2010 at 5:24 PM, Yue Hu yuehu.m...@gmail.com wrote:
 Hi,

  Just shift from affymetrix to agilent recently and since I prefer
 the plot generated by aroma.affymetrix I am just wondering if
 aroma.affymetrix is able to handle agilent chip data in some way.

When you say plot generated by aroma.affymetrix, are you thinking of
the copy-number image files generated by the ChromosomeExplorer?  If
so, yes, there is *some* support for using data other platforms, but
it is less documented.  The main hurdle is that there are no automated
ways to import data (other than Affymetrix), but on the other hand in
most cases it is not really harder than using read.table().

See section 'Generalization to other technologies than Affymetrix' on
the 'Future directions' page
[http://aroma-project.org/features/future/] for more information.

/Henrik


  best,

  Yue

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Exception: None of the data directories exist

2010-10-24 Thread Henrik Bengtsson
Hi.

On Thu, Oct 14, 2010 at 4:38 AM, allab asphod...@googlemail.com wrote:
 Dear aroma users/authors,
 i am doing now Affy 6.0 SNP data analysis and my goal is to become BAF
 values so that i can further use them with the method SOMATICS
 (Assie'08).
 I have not used from the very beginning the wrapper
 ds - doASCRMAv2(TumorProjekt, chipType=GenomeWideSNP_6);

 but did all analysis steps explicitly,  actually 2 times: one with
 argument  combineAlleles=FALSE and one with argument
 combineAlleles=TRUE.

If you wish to get allele B fractions (BAFs) you need
combineAlleles=FALSE, which is the default of doASCRMAv2().


 As i understand it correctly, if i had object ds i could use ds$fracB
 to become BAF.

Correct.  The 'ds' object is actually an R list containing two data
set elements: 'total' and 'fracB'.  You are interested in the latter
here.

 I did not want to recalculate everything and started following:

FYI, the key thing with the aroma framework is that it will *not*
recalculate already processed data; your results are persistent across
R sessions since they are stored on the file system.  Sure, if you
redo doASCRMAv2() there will be some overhead, but most steps are
skipped.

However, it is true that you can also load the data sets as you try next:


 dataSet - TumorProjekt;
 tags - ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY;
 chipType - GenomeWideSNP_6;
 ds - AromaUnitFracBCnBinarySet$byName(dataSet, tags=tags,
 chipType=chipType);

 dfTxt - writeDataFrame(ds, columns=c(unitName,
 chromosome,position, *));

 but become
 Exception: None of the data directories exist: totalAndFracBData,
 rawCnData, cnData, smoothCnData

 What could be the reason for this?

So the error occur in the ds - AromaUnitFracBCnBinarySet$byName(...)
step. The writeDataFrame() step is not part of this. (Please try to
paste the error where it belongs/as it occurs).  The error says that
it cannot find the data set you are asking it to load (in any of the
so called root directories totalAndFracBData/, rawCnData/, cnData/,
smoothCnData/ in the current directory).  The key to get this right is
that you are in the same working directory as you were when you did
doASCRMAv2(); from the error message it looks like you are in a
different working directory because it cannot find any of the reported
directories and it should find totalAndFracBData/.

You also have to make sure you are using exactly the same data set
name and tags.  It should match what:

dsList - doASCRMAv2(TumorProjekt, chipType=GenomeWideSNP_6);
print(dsList);

outputs, especially what print(dsList$fracB) reports.

Hope this helps

/Henrik



 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Re: Trying to create a CDF file from an R package/environment problems

2010-10-24 Thread Henrik Bengtsson
Thanks for follow up/reporting back to the list.

/Henrik

On Thu, Oct 14, 2010 at 12:03 PM, Fong fongchunc...@gmail.com wrote:
 For those interested I got a reply from the makers of the CDF files
 and apparently it is an issue on their end.  Here is their reply:

 We have been studying the problem and we have discovered a bug in
 perfect match and mismatch probes annotation that makes Env2Cdf
 function unable to use packages from GATExplorer.

 We send you GeneMapper and TranscriptMapper for HG_U133_Plus2
 with the bug corrected. In a couple of days we will upload all
 the packages to GATExplorer website.

 Looks like it was an issue on their end.

 On Oct 12, 12:48 pm, Fong fongchunc...@gmail.com wrote:
 Hi,

 I've found a set of R packages (CDF) files from a service called
 GATExplorer (http://bioinfow.dep.usal.es/xgate/mapping/mapping.php?
 content=rprogram) and I am trying to create a CDF file from the R
 packages.  I've followed the instructions found 
 athttp://www.aroma-project.org/node/41
 but I am running to errors.  This is what happens:

 Env2Cdf(genemapperhgu133plus2cdf, u1332plus_ivt_breast_A.CEL,
 overwrite=TRUE)
 Loading required package: affxparser
 Reading environment: genemapperhgu133plus2cdf.
 Reading CEL file header.
 Creating CDF list for 20172 units.
 Error in FUN(X[[1L]], ...) : no 'dimnames' attribute for array

 I am not too familiar with how CDF R packages work.  Does anyone have
 any advice on what I could do?

 Thanks,

 Fong

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Help needed regarding GCRMA normalization of exon arrays using aroma.affymetrix.....

2010-09-29 Thread Henrik Bengtsson
 to replace the NaN and missing values to some other
 value (like the mean or median or some small number close to 0)?

It is good that we made progress.  The new error looks like a bug in
the sense that that piece of code does not expect NAs to appear.  I
have to looking into this and figure out if NAs can indeed be expected
of if they are incorrectly introduced earlier in the pipeline or not.
 I will also go back to our redundancy tests and check, because we do
not detect this problem there.  If NAs should be allowed, the fix
should be simple, but has to be done by me updating the code.

I'll get back to your when I've done this.

 Also I am getting some output in the probeData folder as:
 probeData/Exon Data,OBC/MoEx-1_0-st-v1/(all the *.cel files) and
 probeData/Exon Data,GRBC/MoEx-1_0-st-v1/MoEx-1_0-st-v1-affinities.apa.
 What are this outputs corresponds to?

That file contains the GCRMA probe affinities computed from the CDF
and the probe-sequence file.  Consider it as an internal file that is
saved to disk so that the next time you run the pipeline, if redone,
it be found and the processing will be much faster.

Sorry about all these issues.  As I almost write in every
correspondence related to gcRMA processing - the inner code was
written in the early days and specifically for a few chip types.
After that new chip types came around and things became a bit shaky.
However and although not really visible to the end user, we are slowly
updating the code and moving to a more robust and generic solution.
What the end user probably sees is more and more informative error
messages.  So, base with us.

Cheers,

Henrik

 Again thank you for the help.
 Prithish Banerjee,
 Graduate Research Assistant,
 Department of Statistics,
 West Virginia University.

 On Tue, Sep 28, 2010 at 1:47 AM, Henrik Bengtsson h...@aroma-project.org
 wrote:

 Hi,

 sorry, my mistake.  I missed that you already did this.

 You are missing the 'MoEx-1_0-st-v1.probe.tab' annotation data file.
 You can download it from Affymetrix and you'll find a link to their
 support page via http://www.aroma-project.org/chipTypes/MoEx-1_0-st-v1
 .  Download it (something like MoEx-1_0-st-v1 Probe Sequences,
 tabular format (130 MB, 3/19/08)) and place it in
 annotationData/chipTypes/MoEx-1_0-st-v1/. You can verify that it is
 correct by trying the following:

  library(aroma.affymetrix);
  ptf - AffymetrixProbeTabFile$byChipType(MoEx-1_0-st-v1);
  ptf
 AffymetrixProbeTabFile:
 Name: MoEx-1_0-st-v1
 Tags:
 Full name: MoEx-1_0-st-v1
 Pathname: annotationData/chipTypes/MoEx-1_0-st-v1/MoEx-1_0-st-v1.probe.tab
 File size: 460.47 MB (482839635 bytes)
 RAM: 0.01 MB
 Number of data rows: NA
 Columns [12]: 'probeID', 'probeSetID', 'probeXPos', 'probeYPos',
 'assembly', 'seqname', 'start', 'stop', 'strand', 'probeSequence',
 'targetStrandedness', 'category'
 Number of text lines: NA
 AffymetrixCdfFile:
 Path: annotationData/chipTypes/MoEx-1_0-st-v1
 Filename: MoEx-1_0-st-v1.cdf
 Filesize: 274.30MB
 Chip type: MoEx-1_0-st-v1
 RAM: 0.00MB
 File format: v4 (binary; XDA)
 Dimension: 2560x2560
 Number of cells: 6553600
 Number of units: 1257006
 Cells per unit: 5.21
 Number of QC units: 0

 If you get that to work, then your script should work.

 Let me now if this solved your problem.

 /Henrik

 On Mon, Sep 27, 2010 at 1:58 PM, Prithish Banerjee
 prithish.baner...@gmail.com wrote:
  Respected Dr Bengtsson,
  My codes and outputs are as follows:
  source(http://aroma-project.org/hbLite.R;);
  hbInstall(aroma.affymetrix)
  source(http://aroma-project.org/hbLite.R;);
  hbInstall(aroma.cn)
  verbose - Arguments$getVerbose(-10, timestamp=TRUE);
  dataSet - Exon Data
  chipType - MoEx-1_0-st-v1
  cdf -
  AffymetrixCdfFile$byChipType(chipType,tags=coreR1,A20080718,MR)
  print(cdf)
  AffymetrixCdfFile:
  Path: annotationData/chipTypes/MoEx-1_0-st-v1
  Filename: MoEx-1_0-st-v1,coreR1,A20080718,MR.cdf
  Filesize: 30.53MB
  Chip type: MoEx-1_0-st-v1,coreR1,A20080718,MR
  RAM: 0.00MB
  File format: v4 (binary; XDA)
  Dimension: 2560x2560
  Number of cells: 6553600
  Number of units: 17831
  Cells per unit: 367.54
  Number of QC units: 1
  csR - AffymetrixCelSet$byName(dataSet, chipType=chipType)
  print(csR)
  AffymetrixCelSet:
  Name: Exon Data
  Tags:
  Path: rawData/Exon Data/MoEx-1_0-st-v1
  Platform: Affymetrix
  Chip type: MoEx-1_0-st-v1
  Number of arrays: 7
  Names: DK Litter D15 P1_(MoEx-1_0-st-v1), DK Litter D15
  P14_(MoEx-1_0-st-v1), ..., DK Litter D15 P6_(MoEx-1_0-st-v1)
  Time period: 2009-06-18 13:22:04 -- 2009-06-30 15:13:54
  Total file size: 440.55MB
  RAM: 0.01MB
  cdf - getCdf(csR)
  cdfS - AffymetrixCdfFile$byChipType(getChipType(cdf, fullname=FALSE))
  setCdf(csR, cdfS)
  bc - GcRmaBackgroundCorrection(csR, type=affinities)
  print(bc)
  GcRmaBackgroundCorrection:
  Data set: Exon Data
  Input tags:
  User tags: *
  Asterisk ('*') tags: GRBC
  Output tags: GRBC
  Number of files: 7 (440.55MB)
  Platform: Affymetrix
  Chip type: MoEx-1_0-st-v1

Re: [aroma.affymetrix] Parameters Sent To DNAcopy Functions

2010-09-29 Thread Henrik Bengtsson
Hi.

On Wed, Sep 29, 2010 at 8:00 PM, Dario Strbenac
d.strbe...@garvan.org.au wrote:
 Hello,

 I remember reading a while ago that you can pass in additional parameters to 
 CbsModel, and they will get passed onto DNAcopy functions. However, it 
 doesn't seem to be working for me. I don't want any segments less than 5 
 probes wide. However, the 8th segment is only 2 wide.

 model - CbsModel(extract(normalisedCels, 1), extract(normalisedCels, 2), 
 min.width = 5)
 fit(model, force = TRUE)

Your expectation that you should specify the extra parameters in the
setup of the CbsModel object follows the overall style of the aroma
framework.  However, in this particular case we haven't implemented
passing parameters that way.  However, a workaround is to do it via
the fit() call instead.  In your case, you would do:

model - CbsModel(extract(normalisedCels, 1), extract(normalisedCels, 2));
fit(model, min.width=5, force=TRUE);

Hope this helps

Henrik

 There were 50 or more warnings (use warnings() to see the first 50)
 foldChangeTable - getRegions(model)[[1]]
 foldChangeTable[1:10,]
   chromosome     start      stop    mean count                                
                                                                             
 url
 1           1     51599  14941584  0.1975  7929          
 http://genome.ucsc.edu/cgi-bin/hgTracks?clade=vertebrateorg=Humandb=hg18position=chr1%3A0-16430583
 2           1  14944039  16878322 -0.3535  1163   
 http://genome.ucsc.edu/cgi-bin/hgTracks?clade=vertebrateorg=Humandb=hg18position=chr1%3A14750611-17071750
 3           1  16878364  17215511 -0.0717   163   
 http://genome.ucsc.edu/cgi-bin/hgTracks?clade=vertebrateorg=Humandb=hg18position=chr1%3A16844649-17249226
 4           1  17217671  26830062 -0.3424  6070   
 http://genome.ucsc.edu/cgi-bin/hgTracks?clade=vertebrateorg=Humandb=hg18position=chr1%3A16256432-27791301
 5           1  26830481  72541505  0.1856 28889   
 http://genome.ucsc.edu/cgi-bin/hgTracks?clade=vertebrateorg=Humandb=hg18position=chr1%3A22259379-77112607
 6           1  72541525  72583737  2.0660    45   
 http://genome.ucsc.edu/cgi-bin/hgTracks?clade=vertebrateorg=Humandb=hg18position=chr1%3A72537304-72587958
 7           1  72584492 101046159  0.1900 18772  
 http://genome.ucsc.edu/cgi-bin/hgTracks?clade=vertebrateorg=Humandb=hg18position=chr1%3A69738325-103892326
 8           1 101046857 101047369 -2.8866     2 
 http://genome.ucsc.edu/cgi-bin/hgTracks?clade=vertebrateorg=Humandb=hg18position=chr1%3A101046806-101047420
 9           1 101047606 150822152  0.1841 15874  
 http://genome.ucsc.edu/cgi-bin/hgTracks?clade=vertebrateorg=Humandb=hg18position=chr1%3A96070151-155799607
 10          1 150822331 150852863  2.6550    31 
 http://genome.ucsc.edu/cgi-bin/hgTracks?clade=vertebrateorg=Humandb=hg18position=chr1%3A150819278-150855916

 I don't think the warnings are related to my question, but here they are, 
 anyway :

 warnings()
 Warning messages:
 1: In log(M, base = 2) : NaNs produced
 2: In log(A, base = 2) : NaNs produced
 3: In DNAcopy::CNA(genomdat = data$y, chrom = data$chromosome,  ... :
  array has repeated maploc positions

 4: In log(M, base = 2) : NaNs produced
 5: In log(A, base = 2) : NaNs produced
 6: In DNAcopy::CNA(genomdat = data$y, chrom = data$chromosome,  ... :
  array has repeated maploc positions
 ...                ...                ...

 I'm using aroma.affymetrix 1.7.0 on R 2.12.0 alpha.

 --
 Dario Strbenac
 Research Assistant
 Cancer Epigenetics
 Garvan Institute of Medical Research
 Darlinghurst NSW 2010
 Australia

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] unusual copy number analysis result - split copy numbers

2010-09-28 Thread Henrik Bengtsson
Hi.

On Tue, Sep 28, 2010 at 11:54 AM, Patrick Danaher
patrickjdana...@gmail.com wrote:
 Hi Henrik,
 Thanks for your response.  The thread you suggested ( http://goo.gl/FGVe )
 describes my problem well - I'm getting a very similar intensity profile for
 some chromosomes in some samples.  The attached png shows the problem (red
 dots are intensities; the black dots are from a copy number calling problem
 and can be ignored).  The second figure plots the called intensities against
 normal reference intensities for the same loci.
 As for your specific questions: I've never used CRMAv2 before, my dataset
 isn't public, and it's an affy 6.0 chip.

What's your chip type?  The other thread reported problems on
Mapping250K_Sty though labelled as Sty 2 and I don't know what the
2 means there.

 Do you think annotation issues would cause this problem only in a small
 subset of my samples?

When I say annotation issues, I really mean that if the CDF for the
chip type is not the correct one, you might pick up the wrong probe
signals, especially for SNPs, e.g. PM_A may get the value of a total
CN probe once in a while, say.  It could be a software/annotation bug
in the Affymetrix DAT to CEL file conversion and so on.  That's why it
is crucial to know more about the chip used.

I also recommend that you try dChip and/or Affymetrix GTC, if possible.

/Henrik

 Thanks,
 Patrick
 On Sun, Sep 26, 2010 at 1:36 PM, Henrik Bengtsson h...@aroma-project.org
 wrote:

 Hi.

 On Mon, Sep 13, 2010 at 4:19 PM, Patrick patrickjdana...@gmail.com
 wrote:
  Hi everyone,
  I'm using AROMA's implementation of the CRMA v2 method to get copy
  number estimates for cancer samples, and I'm getting a very unusual
  result.  Many of the samples have a chromosome where AROMA has called
  primarily copy number gains or losses, and the losses are mixed in
  with each other.  That is, if you plot the probes' intensities by
  their positions on the chromosome, you see large stretches (~10,000
  probes) where there are no intensities in the normal range
  (corresponding to no gain or loss), and there are intensities both
  above and below the normal range, mixed in with each other along the
  chromosome.  It is as if the plots for a chromosome with a long
  deletion and a chromosome with a long addition were laid atop each
  other.

 It is not 100% clear from your description what you are observing.
 Note that it is possible to attach PNGs to messages sent to this
 mailing list as long as you send it as an email (not via the web
 interface).  What chip type are you working on and do you look at a
 public data set?  Have you used CRMAv2 on other data sets without a
 problem?

 FYI, Johan Staaf reported odd looking copy number results that are
 reproducible and very odd. See thread 'Problems with Affymetrix 250K
 Sty2 arrays after CRMAv2 analysis' on June 23-August 5, 2010, cf.
 http://goo.gl/FGVe.  From the discussion in that thread, it seems to
 have something to do with annotation issues, but it is still to be
 solved.  Is that what you are experiencing?

  It seems implausible that a cancer sample would have copy number gains
  and losses mixed in with each other in such small intervals, over such
  large stretches of chromosome, without any loci having the usual 2
  copies, so I suspect the normalization or the affy array is the source
  of this phenomenon.  I looked at the data without using AROMA, and the
  phenomenon was not evident.  I re-normalized the data 3 times, each
  time using only one step of the AROMA normalization in isolation.  The
  base position normalization step produced the phenomenon, and the
  allele crosstalk calibration and the fragment length normalization
  steps did not.

 What would help troubleshooting is if you could see other software
 such as dChip of Affymetrix GTC produces the same oddities.  If they
 do, we know for sure it's something odd with the annotation.

 /Henrik

  Any thoughts on what I'm seeing and on how the base pair normalization
  could cause it would be very appreciated.
  Thanks,
  Patrick
 
  --
  When reporting problems on aroma.affymetrix, make sure 1) to run the
  latest version of the package, 2) to report the output of sessionInfo() and
  traceback(), and 3) to post a complete code example.
 
 
  You received this message because you are subscribed to the Google
  Groups aroma.affymetrix group with website http://www.aroma-project.org/.
  To post to this group, send email to aroma-affymetrix@googlegroups.com
  To unsubscribe and other options, go to
  http://www.aroma-project.org/forum/
 

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the
 latest version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix

Re: [aroma.affymetrix] unusual copy number analysis result - split copy numbers

2010-09-28 Thread Henrik Bengtsson
On Tue, Sep 28, 2010 at 12:03 PM, hb h...@biostat.ucsf.edu wrote:
 Hi.

 On Tue, Sep 28, 2010 at 11:54 AM, Patrick Danaher patrickjdana...@gmail.com 
 wrote:
 Hi Henrik,
 Thanks for your response.  The thread you suggested ( http://goo.gl/FGVe )
 describes my problem well - I'm getting a very similar intensity profile for
 some chromosomes in some samples.  The attached png shows the problem (red
 dots are intensities; the black dots are from a copy number calling problem
 and can be ignored).  The second figure plots the called intensities against
 normal reference intensities for the same loci.
 As for your specific questions: I've never used CRMAv2 before, my dataset
 isn't public, and it's an affy 6.0 chip.

 What's your chip type? The other thread reported problems on Mapping250K_Sty 
 though labelled as Sty 2 and I don't know what the 2 means there.

Woops, I read it as it was *not* a GenomeWideSNP_6 chip in your
...and it's an affy 6.0 chip note.  So, it is GenomeWideSNP_6.


 Do you think annotation issues would cause this problem only in a small
 subset of my samples?

 When I say annotation issues, I really mean that if the CDF for the chip type 
 is not the correct one, you might pick up the wrong probe signals, especially 
 for SNPs, e.g. PM_A may get the value of a total CN probe once in a while, 
 say. It could be a software/annotation bug in the Affymetrix DAT to CEL file 
 conversion and so on. That's why it is crucial to know more about the chip 
 used.

 I also recommend that you try dChip and/or Affymetrix GTC, if possible.

Since it is GenomeWideSNP_6, you should be able to try it on Affymetrix GTC.

/Henrik


 /Henrik

 Thanks,
 Patrick
 On Sun, Sep 26, 2010 at 1:36 PM, Henrik Bengtsson h...@aroma-project.org
 wrote:

 Hi.

 On Mon, Sep 13, 2010 at 4:19 PM, Patrick patrickjdana...@gmail.com
 wrote:
  Hi everyone,
  I'm using AROMA's implementation of the CRMA v2 method to get copy
  number estimates for cancer samples, and I'm getting a very unusual
  result.  Many of the samples have a chromosome where AROMA has called
  primarily copy number gains or losses, and the losses are mixed in
  with each other.  That is, if you plot the probes' intensities by
  their positions on the chromosome, you see large stretches (~10,000
  probes) where there are no intensities in the normal range
  (corresponding to no gain or loss), and there are intensities both
  above and below the normal range, mixed in with each other along the
  chromosome.  It is as if the plots for a chromosome with a long
  deletion and a chromosome with a long addition were laid atop each
  other.

 It is not 100% clear from your description what you are observing.
 Note that it is possible to attach PNGs to messages sent to this
 mailing list as long as you send it as an email (not via the web
 interface).  What chip type are you working on and do you look at a
 public data set?  Have you used CRMAv2 on other data sets without a
 problem?

 FYI, Johan Staaf reported odd looking copy number results that are
 reproducible and very odd. See thread 'Problems with Affymetrix 250K
 Sty2 arrays after CRMAv2 analysis' on June 23-August 5, 2010, cf.
 http://goo.gl/FGVe.  From the discussion in that thread, it seems to
 have something to do with annotation issues, but it is still to be
 solved.  Is that what you are experiencing?

  It seems implausible that a cancer sample would have copy number gains
  and losses mixed in with each other in such small intervals, over such
  large stretches of chromosome, without any loci having the usual 2
  copies, so I suspect the normalization or the affy array is the source
  of this phenomenon.  I looked at the data without using AROMA, and the
  phenomenon was not evident.  I re-normalized the data 3 times, each
  time using only one step of the AROMA normalization in isolation.  The
  base position normalization step produced the phenomenon, and the
  allele crosstalk calibration and the fragment length normalization
  steps did not.

 What would help troubleshooting is if you could see other software
 such as dChip of Affymetrix GTC produces the same oddities.  If they
 do, we know for sure it's something odd with the annotation.

 /Henrik

  Any thoughts on what I'm seeing and on how the base pair normalization
  could cause it would be very appreciated.
  Thanks,
  Patrick
 
  --
  When reporting problems on aroma.affymetrix, make sure 1) to run the
  latest version of the package, 2) to report the output of sessionInfo() 
  and
  traceback(), and 3) to post a complete code example.
 
 
  You received this message because you are subscribed to the Google
  Groups aroma.affymetrix group with website 
  http://www.aroma-project.org/.
  To post to this group, send email to aroma-affymetrix@googlegroups.com
  To unsubscribe and other options, go to
  http://www.aroma-project.org/forum/
 

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the
 latest version

Re: [aroma.affymetrix] CbsModel parameters

2010-09-27 Thread Henrik Bengtsson
Hi.

On Mon, Sep 27, 2010 at 4:51 PM, Kai wangz...@gmail.com wrote:
 Hi Henrik,

 I was wondering whether there is a way I can fine tune the behavior of
 CbsModel. Sometimes the default algorithm produces too many small
 fragments right next to each other without much separation in mean
 copy numbers. Is there a way to control how smooth the segmentation
 results are?

Any additional arguments (in ...) that you pass to process(cbs, ...)
will be passed down to the DNAcopy::segment(), which is the function
doing the actual segmentation.  For more details on how fine tuning
the CBS algorithm, see help(segment, package=DNAcopy).  You may
also want to contact the authors of that method/package.

/Henrik


 Thanks a lot!

 Best,
 Kai

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest
 version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Help needed regarding GCRMA normalization of exon arrays using aroma.affymetrix.....

2010-09-27 Thread Henrik Bengtsson
)
   at computeAffinities.AffymetrixCdfFile(cdf, paths = probePath, ..., ver
   at computeAffinities(cdf, paths = probePath, ..., verbose = less(verbos
   at bgAdjustGcrma.AffymetrixCelSet(NA, path = probeData/Exon Data,GRBC/
   at bgAdjustGcrma(NA, path = probeData/Exon Data,GRBC/MoEx-1_0-st-v1,
   at do.call(bgAdjustGcrma, args = args)
   at process.GcRmaBackgroundCorrection(
 20100927 16:33:30|     Locating probe-tab file...done
 20100927 16:33:30|    Retrieving probe-sequence data...done
 20100927 16:33:30|   Reading probe-sequence data...done
 20100927 16:33:30|  Computing GCRMA probe affinities for 1257006
 units...done
 20100927 16:33:30| Computing probe affinities...done
 20100927 16:33:30|Background correcting data set...done
 traceback()
 17: throw.Exception(Exception(...))
 16: throw(Exception(...))
 15: throw.default(Found probe-tab file only by means of deprectated (v1)
 search rules: ,
         pathname)
 14: throw(Found probe-tab file only by means of deprectated (v1) search
 rules: ,
         pathname)
 13: method(static, ...)
 12: AffymetrixProbeTabFile$findByChipType(chipType, what = what,
         ...)
 11: method(static, ...)
 10: AffymetrixProbeTabFile$byChipType(chipType = chipType, verbose =
 less(verbose,
         100))
 9: getProbeSequenceData.AffymetrixCdfFile(this, safe = safe, verbose =
 verbose)
 8: getProbeSequenceData(this, safe = safe, verbose = verbose)
 7: computeAffinities.AffymetrixCdfFile(cdf, paths = probePath, ...,
        verbose = less(verbose))
 6: computeAffinities(cdf, paths = probePath, ..., verbose = less(verbose))
 5: bgAdjustGcrma.AffymetrixCelSet(NA, path = probeData/Exon
 Data,GRBC/MoEx-1_0-st-v1,
        verbose = TRUE, overwrite = FALSE, subsetToUpdate = NULL,
        typesToUpdate = pm, indicesNegativeControl = NULL, affinities =
 NULL,
        type = affinities, opticalAdjust = TRUE, gsbAdjust = TRUE,
        gsbParameters = NULL, .deprecated = FALSE)
 4: bgAdjustGcrma(NA, path = probeData/Exon Data,GRBC/MoEx-1_0-st-v1,
        verbose = TRUE, overwrite = FALSE, subsetToUpdate = NULL,
        typesToUpdate = pm, indicesNegativeControl = NULL, affinities =
 NULL,
        type = affinities, opticalAdjust = TRUE, gsbAdjust = TRUE,
        gsbParameters = NULL, .deprecated = FALSE)
 3: do.call(bgAdjustGcrma, args = args)
 2: process.GcRmaBackgroundCorrection(bc, verbose = verbose)
 1: process(bc, verbose = verbose)
 sessionInfo()
 R version 2.11.1 (2010-05-31)
 x86_64-apple-darwin9.8.0
 locale:
 [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base
 other attached packages:
  [1] aroma.affymetrix_1.7.0 aroma.apd_0.1.7        affxparser_1.20.0
  [4] R.huge_0.2.0           aroma.core_1.7.0       aroma.light_1.16.1
  [7] matrixStats_0.2.1      R.rsp_0.4.0            R.filesets_0.9.0
 [10] digest_0.4.2           R.cache_0.3.0          R.utils_1.5.2
 [13] R.oo_1.7.4             R.methodsS3_1.2.1
 loaded via a namespace (and not attached):
 [1] tools_2.11.1
 The working directory is desktop and the path for the cdf file and the raw
 data is as follows:
 /Users/prithish/Desktop/annotationData/chipTypes/MoEx-1_0-st-v1/MoEx-1_0-st-v1,coreR1,A20080718,MR.cdf
 ( I have several other cdf files
 like MoEx-1_0-st-v1,extendedR1,A20080718,MR.cdf/MoEx-1_0-st-v1,fullR1,A20080718,MR.cdf
 and MoEx-1_0-st-v1.cdf in the same directory.)
 /Users/prithish/Desktop/rawData/Exon Data/MoEx-1_0-st-v1/
 DK Litter D15 P1_(MoEx-1_0-st-v1).CEL
 DK Litter D15 P2_(MoEx-1_0-st-v1).CEL
 DK Litter D15 P3 #2_(MoEx-1_0-st-v1).CEL
 DK Litter D15 P3_(MoEx-1_0-st-v1).CEL
 DK Litter D15 P6_(MoEx-1_0-st-v1).CEL
 DK Litter D15 P14_(MoEx-1_0-st-v1).CEL
 Moreover I am following the thread and implementing the code you suggested
 there but it is not working with my dataset. Please help.
 Thank you,
 Prithish Banerjee,
 Graduate Research Assistant,
 Department of Statistics,
 West Virginia University.

 On Sun, Sep 26, 2010 at 4:23 PM, Henrik Bengtsson h...@aroma-project.org
 wrote:

 Hi,

 first of all, for this chip type you need to specify:

 bc - GcRmaBackgroundCorrection(csR, type=affinities);

 Moreover, you cannot use the custom CDF in the
 GcRmaBackgroundCorrection step, and have to do the follow workaround
 illustrated in the below example:


 library(aroma.affymetrix);
 verbose - Arguments$getVerbose(-10, timestamp=TRUE);
 dataSet - Affymetrix-Tissues;
 chipType - MoEx-1_0-st-v1;

 # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 # Setup data set
 # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 cdf - AffymetrixCdfFile$byChipType(chipType, tags=coreR1,A20080718,MR);
 print(cdf);
 csR - AffymetrixCelSet$byName(dataSet, chipType=chipType);
 print(csR);
 # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 # gcRMA-style background correction
 # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 # Currently, you must use the standard CDF file.
 cdf - getCdf(csR

Re: [aroma.affymetrix] Multiple NUSE and RLE Plots?

2010-09-26 Thread Henrik Bengtsson
Hi,

On Fri, Sep 24, 2010 at 1:27 PM, Vonn vwal...@email.unc.edu wrote:
 Hi All,

 I'm using aroma to analyze CEL files from 141 SNP 6.0 arrays.  I fit
 the quality assessment model as follows:

 plm = RmaPlm(csR)
 fit(plm, verbose = log)
 qam = QualityAssessmentModel(plm)

 Then I'd like to produce NUSE and RLE plots for 10 arrays at a time.
 Can someone please tell me how to do this?

plotNuse() and plotRle() for QualityAssessmentModel takes argument
'arrays', e.g.

plotNuse(qam, arrays=1:10);
plotNuse(qam, arrays=11:20);
...

Note that the NUSE and RLE estimates are, as wanted, calculated using
the complete data set, that is, the 'arrays' argument is only applied
to the plotting part.

/Henrik


 Thanks in advance for your response,

 Vonn

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest
 version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Help needed regarding GCRMA normalization of exon arrays using aroma.affymetrix.....

2010-09-26 Thread Henrik Bengtsson
Hi,

first of all, for this chip type you need to specify:

bc - GcRmaBackgroundCorrection(csR, type=affinities);

Moreover, you cannot use the custom CDF in the
GcRmaBackgroundCorrection step, and have to do the follow workaround
illustrated in the below example:


library(aroma.affymetrix);
verbose - Arguments$getVerbose(-10, timestamp=TRUE);
dataSet - Affymetrix-Tissues;
chipType - MoEx-1_0-st-v1;

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Setup data set
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
cdf - AffymetrixCdfFile$byChipType(chipType, tags=coreR1,A20080718,MR);
print(cdf);
csR - AffymetrixCelSet$byName(dataSet, chipType=chipType);
print(csR);
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# gcRMA-style background correction
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Currently, you must use the standard CDF file.
cdf - getCdf(csR);
cdfS - AffymetrixCdfFile$byChipType(getChipType(cdf, fullname=FALSE));
setCdf(csR, cdfS);
bc - GcRmaBackgroundCorrection(csR, type=affinities);
print(bc);
csB - process(bc, verbose=verbose);
print(csB);
# Now, use the custom CDF in what follows
setCdf(csB, cdf);
print(csB);

Yes, those last steps are rather confusing - we're working on updating
the code so you don't have to do that yourself.

FYI, the above solution/workaround was resolved in thread 'GCRMA
normalization with MoEx-1_0-st-v1' of March 24-April 8, 2010, cf.
http://goo.gl/cniq.

Hope this helps

/Henrik

On Fri, Sep 24, 2010 at 2:30 PM, Prithish Banerjee
prithish.baner...@gmail.com wrote:
 Hi All,
 I am trying to normalize a mouse exon array dataset using GCRMA
 normalization technique. I have exactly followed all the necessary steps for
 storing the dataset and the cdf file. the code and the output I am using are
 as follows:

 source(http://aroma-project.org/hbLite.R;);

 hbInstall(aroma.affymetrix)

 source(http://aroma-project.org/hbLite.R;);

 hbInstall(aroma.cn)

 verbose - Arguments$getVerbose(-10, timestamp=TRUE);

 dataSet - Exon Data [the path in the working directory is rawData/Exon
 Data/MoEx-1_0-st-v1/*.CEL files]

 chipType - MoEx-1_0-st-v1 [the path in the working directory is
 annotationData/chipTypes/MoEx-1_0-st-v1/MoEx-1_0-st-v1,coreR1,A20080718,MR.cdf]

 cdf - AffymetrixCdfFile$byChipType(chipType,tags=coreR1,A20080718,MR)
 [converted to binary using convertCdf command]

 print(cdf)

 AffymetrixCdfFile:

 Path: annotationData/chipTypes/MoEx-1_0-st-v1

 Filename: MoEx-1_0-st-v1,coreR1,A20080718,MR.cdf

 Filesize: 30.53MB

 Chip type: MoEx-1_0-st-v1,coreR1,A20080718,MR

 RAM: 0.00MB

 File format: v4 (binary; XDA)

 Dimension: 2560x2560

 Number of cells: 6553600

 Number of units: 17831

 Cells per unit: 367.54

 Number of QC units: 1

 csR - AffymetrixCelSet$byName(dataSet, chipType=chipType)

 print(csR)

 AffymetrixCelSet:

 Name: Exon Data

 Tags:

 Path: rawData/Exon Data/MoEx-1_0-st-v1

 Platform: Affymetrix

 Chip type: MoEx-1_0-st-v1

 Number of arrays: 7

 Names: DK Litter D15 P1_(MoEx-1_0-st-v1), DK Litter D15
 P14_(MoEx-1_0-st-v1), ..., DK Litter D15 P6_(MoEx-1_0-st-v1)

 Time period: 2009-06-18 13:22:04 -- 2009-06-30 15:13:54

 Total file size: 440.55MB

 RAM: 0.01MB

 cdf - getCdf(csR)

 cdfS - AffymetrixCdfFile$byChipType(getChipType(cdf, fullname=FALSE))

 setCdf(csR, cdfS)

 bc - GcRmaBackgroundCorrection(csR, type=affinities)

 print(bc)

 GcRmaBackgroundCorrection:

 Data set: Exon Data

 Input tags:

 User tags: *

 Asterisk ('*') tags: GRBC

 Output tags: GRBC

 Number of files: 7 (440.55MB)

 Platform: Affymetrix

 Chip type: MoEx-1_0-st-v1

 Algorithm parameters: (subsetToUpdate: NULL, typesToUpdate: chr pm,
 indicesNegativeControl: NULL, affinities: NULL, type: chr affinities,
 opticalAdjust: logi TRUE, gsbAdjust: logi TRUE, gsbParameters: NULL)

 Output path: probeData/Exon Data,GRBC/MoEx-1_0-st-v1

 Is done: FALSE

 RAM: 0.00MB

 csB - process(bc, verbose=verbose)

 20100923 13:24:12|Background correcting data set...

 20100923 13:24:12| Computing probe affinities...

 20100923 13:24:12|  Computing GCRMA probe affinities for 1257006 units...

 20100923 13:24:12|   Identify PMs and MMs among the CDF cell indices...

      logi [1:5266159] TRUE TRUE TRUE TRUE TRUE TRUE ...

        Mode   FALSE    TRUE    NA's

     logical  334476 4931683       0

 20100923 13:25:57|    MMs are defined as non-PMs

 20100923 13:25:57|    Number of PMs: 4931683

 20100923 13:25:57|    Number of MMs: 334476

 20100923 13:25:57|   Identify PMs and MMs among the CDF cell indices...done

 20100923 13:25:57|   Reading probe-sequence data...

 20100923 13:25:57|    Retrieving probe-sequence data...

 20100923 13:25:57|     Chip type (full): MoEx-1_0-st-v1

 20100923 13:25:57|     Locating probe-tab file...

 20100923 13:25:57|      Chip type: MoEx-1_0-st-v1

 Error in list(`process(bc, verbose = verbose)` = environment,
 `process.GcRmaBackgroundCorrection(bc, verbose = verbose)` = environment,
  :



 [2010-09-23 

Re: [aroma.affymetrix] unusual copy number analysis result - split copy numbers

2010-09-26 Thread Henrik Bengtsson
Hi.

On Mon, Sep 13, 2010 at 4:19 PM, Patrick patrickjdana...@gmail.com wrote:
 Hi everyone,
 I'm using AROMA's implementation of the CRMA v2 method to get copy
 number estimates for cancer samples, and I'm getting a very unusual
 result.  Many of the samples have a chromosome where AROMA has called
 primarily copy number gains or losses, and the losses are mixed in
 with each other.  That is, if you plot the probes' intensities by
 their positions on the chromosome, you see large stretches (~10,000
 probes) where there are no intensities in the normal range
 (corresponding to no gain or loss), and there are intensities both
 above and below the normal range, mixed in with each other along the
 chromosome.  It is as if the plots for a chromosome with a long
 deletion and a chromosome with a long addition were laid atop each
 other.

It is not 100% clear from your description what you are observing.
Note that it is possible to attach PNGs to messages sent to this
mailing list as long as you send it as an email (not via the web
interface).  What chip type are you working on and do you look at a
public data set?  Have you used CRMAv2 on other data sets without a
problem?

FYI, Johan Staaf reported odd looking copy number results that are
reproducible and very odd. See thread 'Problems with Affymetrix 250K
Sty2 arrays after CRMAv2 analysis' on June 23-August 5, 2010, cf.
http://goo.gl/FGVe.  From the discussion in that thread, it seems to
have something to do with annotation issues, but it is still to be
solved.  Is that what you are experiencing?

 It seems implausible that a cancer sample would have copy number gains
 and losses mixed in with each other in such small intervals, over such
 large stretches of chromosome, without any loci having the usual 2
 copies, so I suspect the normalization or the affy array is the source
 of this phenomenon.  I looked at the data without using AROMA, and the
 phenomenon was not evident.  I re-normalized the data 3 times, each
 time using only one step of the AROMA normalization in isolation.  The
 base position normalization step produced the phenomenon, and the
 allele crosstalk calibration and the fragment length normalization
 steps did not.

What would help troubleshooting is if you could see other software
such as dChip of Affymetrix GTC produces the same oddities.  If they
do, we know for sure it's something odd with the annotation.

/Henrik

 Any thoughts on what I'm seeing and on how the base pair normalization
 could cause it would be very appreciated.
 Thanks,
 Patrick

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


[aroma.affymetrix] Re: Alternatives way to access the mailing list archive

2010-09-26 Thread Henrik Bengtsson
Hi,

as a follow up on this one; does anyone know of alternative websites
that archives our mailing list?  We currently have:

http://groups.google.com/group/aroma-affymetrix/topics/
http://www.mail-archive.com/aroma-affymetrix@googlegroups.com/maillist.html

More specifically, I'm looking for alternatives that are accessible
from within China, and the above seem not to be.  It would be great to
solve this, because the archive is very useful resource.

Thanks

Henrik

On Tue, Sep 21, 2010 at 9:32 PM, Henrik Bengtsson h...@aroma-project.org 
wrote:
 Hi,

 it has been brought to my attention that the Google Group site, which
 provides our mailing list and its archive:

  http://groups.google.com/group/aroma-affymetrix/topics/

 is not accessible from/blocked by certain countries.  Luckily there
 are some alternatives by other services providing archives of the
 mailing list, such as:

  http://www.mail-archive.com/aroma-affymetrix@googlegroups.com/maillist.html

 I have added a link to the latter on http://aroma-project.org/forum/

 /Henrik


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Re: CbsModel

2010-09-26 Thread Henrik Bengtsson
Hi.

On Wed, Sep 22, 2010 at 9:35 AM, Kai wangz...@gmail.com wrote:
 Hi Henrik,

 Thank you for letting me know the hidden trick. It seems to work. I
 have another related question regarding the CbsModel function.

 You've mentioned in addressing another post that for non-affy
 platforms, once one has setup a platform-independent data set (*.asb
 files) as in Vignette: Creating binary data files containing copy
 number estimates, e.g.

 ds - AromaUnitTotalCnBinarySet$byName(dataSet, tags=tags,
 chipType=*);

 One can then pass this to CbsModel just as one passes an
 CnChipEffectSet 'ces' in other vignettes for affymetrix genotyping
 platforms. However, it seems to me that what are stored in the
 CnChipEffectSet are raw CN estimates, whereas the
 AromaUnitTotalCnBinaryFile objects contain log2 CN ratios.

 If these are correct, my question is that how CbsModel can tell
 whether the input data are in log2-scale or not, or whether the input
 data are ratios or not? Thank you very much for your help on this.

It does this by looking for special tags of the *.asb file.  More
precisely, if the filename has a log2ratio tag, then it's content is
assumed to log2-ratio.  Likewise, if there is a log10ratio tag, it's
content is assumed to be log10-ratios.  For historical reasons, a
logRatio tag is interpreted as log10ratio.  If none of these tags
exist, the content is assumed to be on the non-logarithmic scale.  I
recommend to use non-logged storage, because that is well defined also
for non-positive values.

DETAILS: The above is taken care of by the AromaUnitTotalCnBinaryFile
class and more precisely the internal/private getAM() method.  The
CbsModel and likewise does not know about this layer and happily
receives log2 ratios regardless of what is stored on file.

Hope this helps

/Henrik


 Best,
 Kai



 On Sep 20, 1:06 pm, Henrik Bengtsson h...@aroma-project.org wrote:
 Hi Kai.

 I am aware of the issue, and it is on the todo list to add argument
 specify that you don't want ratios to be calculated.  There is
 currently a secret workaround for this that should not be read as an
 official documented feature [that's a warning for users reading this
 thread in the future], but it should solve your immediate needs.

 cbs - CbsModel(ds);
 cbs$.calculateRatios - FALSE;

 See if that does it for you.

 /Henrik

 On Wed, Sep 15, 2010 at 10:14 PM, Kai wangz...@gmail.com wrote:
  Dear Henrik,

  I was trying to run CBS model on a set of paired CN estimates. The
  data were generated using an Illumina platform, so I have followed
  Vignette: Creating binary data files containing copy number
  estimates to create the log2ratio CN estimates between a tumor sample
  and its matched normal.

  I have loaded the data with the following codes:

  dataSet = Dataset,tagA,tagB;
  chipType = HumanOmni1-Quad;
  ds = AromaUnitTotalCnBinarySet$byName(dataSet,chipType=chipType);
  cbs = CbsModel(ds);

  However, when I looked at how the CBS model was set up, it says:

  cbs
  CbsModel:
  Name: Dataset
  Tags: tagA,tagB
  Chip type (virtual): HumanOmni1-Quad
  Path: cbsData/Dataset,tagA,tagB/HumanOmni1-Quad
  Number of chip types: 1
  Sample  reference file pairs:
  Chip type #1 of 1 ('HumanOmni1-Quad'):
  Sample data set:
  AromaUnitTotalCnBinarySet:
  Name: Dataset
  Tags: tagA,tagB
  Full name: Dataset,tagA,tagB
  Number of files: 10
  Names: sample1, sample2, ..., sample10 [10]
  Path (to the first file): rawCnData/Dataset,tagA,tagB/HumanOmni1-Quad
  Total file size: 43.46 MB
  RAM: 0.02MB
  Reference data set/file:
  average across arrays
  RAM: 0.00MB

  It seems to me that the CBS model is using average across arrays
  as reference, which would not be what I want, since my CN estimates
  have already been referenced. So my questions are:

  1. Is this how CBS will behave?
  2. Is there a way to let CBS take the CN estimates as is, without
  contrasting to any reference?

  Thank you very much for your help on this.

  Best,
  Kai

  --
  When reporting problems on aroma.affymetrix, make sure 1) to run the latest
  version of the package, 2) to report the output of sessionInfo() and
  traceback(), and 3) to post a complete code example.

  You received this message because you are subscribed to the Google Groups
  aroma.affymetrix group with websitehttp://www.aroma-project.org/.
  To post to this group, send email to aroma-affymetrix@googlegroups.com
  To unsubscribe and other options, go tohttp://www.aroma-project.org/forum/

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When

Re: [aroma.affymetrix] Relative Copy Number Analysis

2010-09-24 Thread Henrik Bengtsson
Hi Dario,

Pierre Neuvial has kindly provided a more up-to-date vignette for
doing paired total copy number analysis.  You find it at:

  http://aroma-project.org/vignettes/pairedTotalCopyNumberAnalysis

See if that helps

/Henrik

On Wed, Sep 22, 2010 at 5:05 PM, Dario Strbenac
d.strbe...@garvan.org.au wrote:
 Hello,

 I see the vignette for absolute copy number analysis, where you compare to a 
 HapMap sample pool, but I'm not sure how to do a control / treatment 
 analysis. I have 1 Affymetrix SNP6 .CEL of a cancer sample and 1 of a normal 
 sample. The documentation is brief or non-existent for most of the functions 
 that appear in the total copy number vignette. Can anyone share a workflow 
 for a relative analysis ?

 --
 Dario Strbenac
 Research Assistant
 Cancer Epigenetics
 Garvan Institute of Medical Research
 Darlinghurst NSW 2010
 Australia

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] CbsModel

2010-09-20 Thread Henrik Bengtsson
Hi Kai.

I am aware of the issue, and it is on the todo list to add argument
specify that you don't want ratios to be calculated.  There is
currently a secret workaround for this that should not be read as an
official documented feature [that's a warning for users reading this
thread in the future], but it should solve your immediate needs.

cbs - CbsModel(ds);
cbs$.calculateRatios - FALSE;

See if that does it for you.

/Henrik


On Wed, Sep 15, 2010 at 10:14 PM, Kai wangz...@gmail.com wrote:
 Dear Henrik,

 I was trying to run CBS model on a set of paired CN estimates. The
 data were generated using an Illumina platform, so I have followed
 Vignette: Creating binary data files containing copy number
 estimates to create the log2ratio CN estimates between a tumor sample
 and its matched normal.

 I have loaded the data with the following codes:

 dataSet = Dataset,tagA,tagB;
 chipType = HumanOmni1-Quad;
 ds = AromaUnitTotalCnBinarySet$byName(dataSet,chipType=chipType);
 cbs = CbsModel(ds);

 However, when I looked at how the CBS model was set up, it says:

 cbs
 CbsModel:
 Name: Dataset
 Tags: tagA,tagB
 Chip type (virtual): HumanOmni1-Quad
 Path: cbsData/Dataset,tagA,tagB/HumanOmni1-Quad
 Number of chip types: 1
 Sample  reference file pairs:
 Chip type #1 of 1 ('HumanOmni1-Quad'):
 Sample data set:
 AromaUnitTotalCnBinarySet:
 Name: Dataset
 Tags: tagA,tagB
 Full name: Dataset,tagA,tagB
 Number of files: 10
 Names: sample1, sample2, ..., sample10 [10]
 Path (to the first file): rawCnData/Dataset,tagA,tagB/HumanOmni1-Quad
 Total file size: 43.46 MB
 RAM: 0.02MB
 Reference data set/file:
 average across arrays
 RAM: 0.00MB

 It seems to me that the CBS model is using average across arrays
 as reference, which would not be what I want, since my CN estimates
 have already been referenced. So my questions are:

 1. Is this how CBS will behave?
 2. Is there a way to let CBS take the CN estimates as is, without
 contrasting to any reference?

 Thank you very much for your help on this.

 Best,
 Kai

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest
 version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] How long should it take to run CRMAv2 on 270 samples for Affymetrix SNP 6.0 arrays

2010-09-18 Thread Henrik Bengtsson
Hi.

On Fri, Sep 17, 2010 at 12:58 PM, Matt matt.kowg...@gmail.com wrote:
 Hi Henrik,

 I am processing the data from the 270 HapMap samples on the SNP 6.0
 arrays using the CRMAv2 method.  I wrote a script to follow the steps,
 minus the plotting, outlined on

 http://www.aroma-project.org/vignettes/CRMAv2

 It has been running for a week now without error. I have checked
 the .Rout file and it is still running, but it says it is doing chunk
 #622 of 1252, so only half-way. I wonder if there is a way I can run
 this faster? Is it possible to break the 270 samples up?

Have a look at the how-to page on 'Improve processing time':

  http://www.aroma-project.org/howtos/ImproveProcessingTime

Obviously it depends on your computer, but on a decent machine I would
expect something like 5-10 mins per array.  If you see more than say
15 min per array, you should definitely look into the above how-to
page.

The CRMAv2 algorithm was designed to be a truly single array
statistical method, meaning it will give identical result is process
each array independently and then merge or you merge and the process
all in one batch.  This is neat, because you can process samples as
they are added to an experiment/project.  Because of this you can also
run CRMAv2 on multiple machines in parallel.  Note that this is not
case with CRMA(v1) or any other CN preprocessing methods out there.
If you wish to take this approach, you might by the doCRMAv2()
function useful.  See page 'Block: doCRMAv2() / doASCRMAv2()':

 http://www.aroma-project.org/blocks/doCRMAv2

Hope this helps

Henrik


 Thanks for your help.

 Matt

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Re: Custom Canine SNP (DogSty06m520431); problem with chr24-39

2010-09-18 Thread Henrik Bengtsson
Hi,

On Tue, Aug 24, 2010 at 3:01 AM, Denis amer.ak...@rub.de wrote:

 Hi Henrik,

 Sorry for the delay, I had some difficulties in getting GLAD strated
 (including gsl ...).
 What should I else say than your the best and thank you very much for
 your help. I finally got it. I would like to provide you with a
 finallized version of the Canine,chromosomes.txt including the band
 pattern for your aroma-project.
 If you could give me a hint how to manage this (with respect to what
 information and column heading are necessary) I would start asap and
 attach it hopefully to the next reply :).

I missed that you were asking for more help here.  If you could send
me a tab-delimited text file with two columns 'chromosome' and
'nbrOfBases', where the chromosome column should contain chromosome
indices (integers) and the nbrOfBases column the number of
nucleotides/length of that chromosome (integer), that would be great
and I add it to the package.  Here is an example of what it should
look like:

chromosome  nbrOfBases
1   12548891
2   88182672
3   94659212
4   91429679
5   91969480
6   80531300
7   83918830
8   76319553
9   64388646
10  72471775
11  77415697
12  75458181
13  66171193
14  63877078
15  67136955
16  62534543
17  67108084
18  58768331
19  56709702
20  61246656
21  53889065
22  64179564
23  55386667
24  50674002
25  54562819
26  42004589
27  48236582
28  44161646
29  44727140
30  43154895
31  41581023
32  41575543
33  34226766
34  45065038
35  29486428
36  33827475
37  33893929
38  26869798
39  125840674

/Henrik


 Cheers

 Denis

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Problems updating aroma.affymetrix

2010-09-02 Thread Henrik Bengtsson
Hi,

if you are following Pierre's advice and still getting *that* error
message my best guess is that you are getting an error while
installing one of the packages that aroma.affymetrix depends on, which
in turn probably will fail the installation of aroma.affymetrix
itself.  The reason why one of the required packages didn't install,
may because your R version is far too old.  So, update R, then retry
again.

More details: The installation code - hbInstall() - assumes that the
installation went well and at the end tries to download and install
patches.  Since it did install well, you'll get that error on a too
old version.  I've updated that piece of the code not to try to patch
if installation failed.

Hope this helps

Henrik


On Wed, Sep 1, 2010 at 10:03 PM, Mark Robinson mrobin...@wehi.edu.au wrote:
 Hi Matt.

 Another point to mention is that you should update your version of R ... 
 2.7.x is 2.5 years old, which is a long time in R.  I'd recommend 2.11.x ..

 Cheers,
 Mark

 On 2010-09-02, at 2:51 PM, Pierre Neuvial wrote:

 [Forwarding this to the list so that others can read this thread]

 Pierre

 On Wed, Sep 1, 2010 at 12:57 PM, Matt matt.kowg...@gmail.com wrote:
 Hi Pierre,

 Thanks for the reply. I guess it's a problem with the source because what
 you say is exactly what I did and I get the error message shown.

 Best,
 Matt

 On Wed, Sep 1, 2010 at 3:20 PM, Pierre Neuvial pie...@stat.berkeley.edu
 wrote:

 Hi Matt,

 You want to *update* the package, not *patch* it: the difference
 between updates and patches is explained at
 http://aroma-project.org/howtos/updateOrPatch.

 So, to update aroma.affymetrix, do:

 source(http://aroma-project.org/hbLite.R;);
 hbInstall(aroma.affymetrix);

 as explained at http://aroma-project.org/install

 Hope this helps,

 Pierre.

 On Wed, Sep 1, 2010 at 10:08 AM, Matt matt.kowg...@gmail.com wrote:
 Hi there,

 I'd like to update my version of aroma.affymetrix, current version in
 use 0.9.1, so that I can utilize the new CN processing method. I
 followed the instructions on the site but I get the following error
 message

 Patching /home/matthew/.Rpatches/aroma.affymetrix/20080508/
 WeightsFile.R
 Failed to source:
 http://www.braju.com/R//patches/aroma.affymetrix/download.R
 Error in stop(ex$message) :
  Your version (0.9.1) of aroma.affymetrix is out of date. Please
 update.
 In addition: There were 11 warnings (use warnings() to see them)
 sessionInfo()
 R version 2.7.0 Under development (unstable) (2008-01-21 r44087)
 x86_64-unknown-linux-gnu

 locale:

 LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

 attached base packages:
 [1] tools     stats     graphics  grDevices datasets  utils
 methods
 [8] base

 other attached packages:
  [1] Biobase_2.0.1          aroma.affymetrix_0.9.1
 aroma.apd_0.1.7
  [4] R.huge_0.2.0           digest_0.4.2
 aroma.light_1.16.1
  [7] affxparser_1.12.2      R.rsp_0.3.6
 R.cache_0.3.0
 [10] R.utils_1.5.0          R.oo_1.7.3             R.methodsS3_1.2.0


 How can I fix this?

 Thanks for any help.
 Mattt

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the
 latest version of the package, 2) to report the output of sessionInfo() 
 and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google
 Groups aroma.affymetrix group with website 
 http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to
 http://www.aroma-project.org/forum/


 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the
 latest version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to
 http://www.aroma-project.org/forum/



 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/

 --
 Mark Robinson, PhD (Melb)
 Epigenetics Laboratory, Garvan
 Bioinformatics Division, WEHI
 e: m.robin...@garvan.org.au
 e: mrobin...@wehi.edu.au
 p: +61 (0)3 9345 2628
 f: +61 (0)3 9347 0852
 --






Re: [aroma.affymetrix] Custom Canine SNP (DogSty06m520431); problem with chr24-39

2010-08-23 Thread Henrik Bengtsson
Hi Denis,

you refer to the thread 'Custom Canine SNP' started on July 18, 2008.
In KD's message on August 14, 2008 you can see how he explicitly set
argument genome=Canine when he sets up the GLAD model.  From the
verbose output I can see you are using CBS, but it is not clear how
you set it up.  Are you doing:

cbs - CbsModel(ces, genome=Canine)?

If not, do that.  Then (for troubleshooting purposes only) try

df - getGenomeData(cbs, verbose=verbose);
print(df);

This latter step will try to load the tab-delimited file containing
the information about the number of bases per chromosomes. Since you
specify Canine above, it will try to locate and read the file:

annotationData/genomes/Canine/Canine,chromosomes.txt

or any with additional tags, e.g.

annotationData/genomes/Canine/Canine,chromosomes,UGP,HB20100822.txt

It needs to contain (at least) the two columns chromosome and
nbrOfBases.  I don't have the exact numbers for the Canine genome,
but see the attached file for an example.  Feel free to forward the
data to me, and I'll add this Canine annotation data so it's built in
to the aroma framework.

If you get the above working once, then process(ce) should work too.

Hope this helps

Henrik

On Mon, Aug 23, 2010 at 4:28 AM, Denis amer.ak...@rub.de wrote:
 Hi there,

 I hope you can help me with my problem since I have followed kind help
 with a similar problem on the google aroma.affymetrix formus, yet
 without the last bit of information I would need to succeed:
 http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/96676bd38d64e884/e01329e5f44ba42b?lnk=gstq=building+ufl+file+for+DogSty06m520431#e01329e5f44ba42b


 We are processing the DogSty06m520431 chips and my Prof. wants me to
 generate CNV calculations for the corresponding samples. So far I
 could stick to the helpful guide on the aroma homepage for “Total Copy
 Number Analysis (GWS5  GWS6)”. Yet I am now faced with 2 problems of
 which I hope you could give me a hand.

 First one is that when executing the command:

 process(ce, chromosomes=c(38), verbose=verbose)

 for chromosomes 24 the analyses aborts with following prompting for
 e.g. chr25:

 process(ce, chromosomes=c(25), verbose=verbose)
 20100823 13:05:15|Generating ChromosomeExplorer report...
 20100823 13:05:15| Setting up ChromosomeExplorer report files...
 20100823 13:05:15|  Copying template files...
 20100823 13:05:15|   Source path: C:/Programme/R/R-2.11.1/library/
 aroma.core/reports/includes
 20100823 13:05:15|   Destination path: reports/includes
 20100823 13:05:23|  Copying template files...done
 20100823 13:05:23| Setting up ChromosomeExplorer report files...done
 20100823 13:05:23| Explorer output version: 3
 20100823 13:05:23| Compiling ChromosomeExplorer.onLoad.js.rsp...
 20100823 13:05:23|  Source: C:/Programme/R/R-2.11.1/library/aroma.core/
 reports/templates/rsp/ChromosomeExplorer3/
 ChromosomeExplorer.onLoad.js.rsp
 20100823 13:05:23|  Output path: reports/Weim/ACC,-XY,AVG,+300,A+B
 20100823 13:05:23|  Scanning directories for available chip types...
 20100823 13:05:23|   Detected chip types: DogSty06m520431
 20100823 13:05:23|  Scanning directories for available chip
 types...done
 20100823 13:05:23|  Scanning image files for available zooms...
 20100823 13:05:24|   Detected (or default) zooms: 1, 2, 4, 8, 16, 32,
 64
 20100823 13:05:24|  Scanning image files for available zooms...done
 20100823 13:05:24|  Scanning directory for subdirectories...
 20100823 13:05:24|   Detected (or default) sets: cbs
 20100823 13:05:24|  Scanning directory for subdirectories...done
 20100823 13:05:24|  Compiling RSP...
           member data.class dimension objectSize
   1    chipTypes  character         1         72
   2    chrLayers  character         0         24
   3 sampleLabels  character         4        264
   4 sampleLayers  character         0         24
   5      samples  character         4        264
   6         sets  character         1         64
   7        zooms    numeric         7         56
 20100823 13:05:28|   Sample names:
   [1] W24a_(DogSty06m520431) W469_(DogSty06m520431)
 W511_(DogSty06m520431)
   [4] W513_(DogSty06m520431)
 20100823 13:05:28|   Full sample names:
   [1] W24a_(DogSty06m520431) W469_(DogSty06m520431)
 W511_(DogSty06m520431)
   [4] W513_(DogSty06m520431)
 20100823 13:05:28|  Compiling RSP...done
 20100823 13:05:29| Compiling ChromosomeExplorer.onLoad.js.rsp...done
 Loading required package: RColorBrewer
 Loading required package: Cairo
 20100823 13:05:40| Building tuple of reference sets...
 20100823 13:05:40|  No reference available.
 20100823 13:05:40|  Calculating average copy-number signals...
 20100823 13:05:40|   Retrieving average cell signals across 4
 arrays...
    CnChipEffectFile:
    Name: .average-intensities-median-mad
    Tags: f1b4541a56b9bb2404325d6053edc91e
    Full name: .average-intensities-median-
 mad,f1b4541a56b9bb2404325d6053edc91e
    Pathname: plmData/Weim,ACC,-XY,AVG,+300,A+B/
 

[aroma.affymetrix] Re: Problem with GLAD on linux cluster

2010-08-04 Thread Henrik Bengtsson
Hi Christian,

On Wed, Aug 4, 2010 at 9:04 AM, cstratowa
christian.strat...@vie.boehringer-ingelheim.com wrote:
 Dear Henrik,

 Thank you for your suggestion to use ceRef directly.

 Regarding your explanation of getAverageFile() the question is where
 the generated output will be saved.

 As I have mentioned, each node creates first a plmData subdirectory,
 e.g. Prostate/Prostate21/plmData and makes symbolic links to the
 normalized CEL-files located in Prostate/plmData. Thus the output of
 getAverageFile() should be stored for each node separately.

Ah, now I see; I've been reading it as you were linking the
directories, not the individual CEL files.


 This seems indeed to be the case, since e.g. the subdirectory
 Prostate/Prostate21/plmData/Prostate,ACC,-XY,QN,RMA,A+B,FLN,-XY/
 Mapping250K_Nsp contains the file .average-intensities-median-
 mad,a1c33926939ee43fbed83ae69301d215.CEL created at a certain time
 while subdirectory Prostate/Prostate8/plmData/Prostate,ACC,-
 XY,QN,RMA,A+B,FLN,-XY/Mapping250K_Nsp contains a file with the same
 name, i.e. .average-intensities-median-
 mad,a1c33926939ee43fbed83ae69301d215.CEL created at a different
 time.

Yes.

As I understand it now, you preprocess all of the data, and wait for
everything to be done (all *,chipEffects.CEL files to be generated)
before continuing with the above, correct?  If so, I'd suggest that
you also wait for getAverageFile() to finish first.  Then that average/
results file be available to all your cluster nodes as well.  I even
think you don't have to link each CEL file separately, because nothing
else should be written back to the data set.  It should be enough to
link each data set directory, or even just plmData/ itself (not even
sure the need to split it up anymore).


 As far as I understand these are the files created by getAverageFile()
 and thus each node creates its own file saved in its own subdirectory,
 so there will be no problem.

Yes.  Now I agree with you.


 It seems that the problem was indeed the result of saveObject() stored
 in .Rcache, which caused the race conditions. Since the removal of
 saveObject() I have until now experienced no problems.

Yes.  You are correct.

Since caching is mainly done for memoization purposes, that is, to
load already calculated results that are computational expensive to
obtain from file, it is recommended to store the cache in a fast
place.  In other words, it is better if the .Rcache directory is on
the local drive of the machine, rather than on a shared file system.
If you had done that, then each machine would had to have do those
calculations by themselves once, but when done the memoization would
be faster and you would not have had any race conditions accessing the
memoized results.  The default ~/.Rcache/ can be changed, cf.
http://www.aroma-project.org/archive/GoogleGroups/web/caching.

This was a useful conversation to me; it made me see other ways for
(unnecessary) race conditions to occur, and remind me how important it
is to not overlook the smallest details in scientific communication
since they can make big differences.

Cheers,

Henrik


 Thank you for your help.
 Best regards
 Christian

 On Aug 2, 2:54 pm, Henrik Bengtsson h...@stat.berkeley.edu wrote:



  Hi.

  On Mon, Jul 26, 2010 at 12:00 PM, cstratowa

  christian.strat...@vie.boehringer-ingelheim.com wrote:
   Dear Henrik,

   Maybe, my explanation was not clear enough:

   I have created my own package based on S4 classes, where one subclass
   is AromaSNP with slots celset, normset, plmset, effectset as lists,
   and methods readSNPData(), normalizeSNPData(), computeCN(),
   computeRawCN(), among others. Furthermore, the package includes
   scripts batch.aroma.norm.R, batch.aroma.model.R,
   batch.aroma.combine.R, and a perl script which distributes these
   scripts to the different cluster nodes.

   1, Normalization: Script batch.aroma.norm.R creates first the
   subdirectory structure which I have already described, and then does
   the normalization. All normalization steps run on one server and the
   results are saved as AromaSNP object aroma in Prostate/
   Prostate.Rdata. Furthermore, subdirectories Prostate/probeData and
   Prostate/plmData are created.

   2, GLAD: Script batch.aroma.norm.R is called from each node
   separately. For each node it creates first a plmData subdirectory,
   e.g. Prostate/Prostate21/plmData and makes symbolic links to the
   normalized CEL-files located in Prostate/plmData. Then it loads
   object aroma from Prostate/Prostate.Rdata, whereby each node has a
   separate RAM of 2GB. Slot ar...@effectset contains the normalized
   data and is called from computeCN() as cesList - ar...@effectset.
   This cesList (which is in the RAM of each node) is passed to model
   - GladModel(cesList, refList), and is thus used to compute
   getAverageFile(), if refList=NULL (which is the default). Since each
   node calls the same cesList object, function saveObject() writes

Re: [aroma.affymetrix] Re: Problem with GLAD on linux cluster

2010-08-02 Thread Henrik Bengtsson
 available.  First time one of your
processes completes a getAverageFile() call, a new file will be
created and stored on your file system.  It's name will be a md5
checksum that is generated from the names of the arrays in the set
that you call getAverageFile() on.  If you do it twice for the same
set of arrays, you will the second time get the results stored on
file, because they have already been calculated.

So far so good, the race condition occurs when you have two processes
A and B that operates on the same data set 'cesList'.  Process A runs
the script, it request the reference which is missing and starts
running getAverageFile(cesList[[1]]).  While this is done, Process B
starts doing the same thing, and since the *result file* of
getAverageFile(cesList[[1]]) is not available, it starts doing the
same thing.  Now Process A finish and writes its result file.  Later
Process B writes its results to the same result file, because they
process the same data set, more precisely getNames(cesList[[1]]) are
the same.  If Process B starts writing at the same time as Process A
writes, there is a potential problem.

From my troubleshooting, as far as I understands it, the only way you
could have gotten that error message was when two or more processes
did getAverageFile(cesList[[1]]) where getNames(cesList[[1]]) where
identical.  Are you 100% sure that is not the case? Are you saying
that is not the case?  If not, I am really puzzled how there could be
a clash in the first place.  Thus, the key point is to make sure that
multiple processing are not trying to calculate getAverageFile() on
the same array set at the same time.

/Henrik


 I hope that this explanation could explain better what the different
 steps are.

 Best regards
 Christian


 On Jul 23, 4:35 pm, Henrik Bengtsson henrik.bengts...@gmail.com
 wrote:
 Hi.

 On Jul 22, 10:24 am, cstratowa christian.strat...@vie.boehringer-



 ingelheim.com wrote:
  Dear Henrik,

  Thank you very much for changing the code for getAverageFile(), I will
  try it and let you know.

  Thank you also for the explanation of writing to a temporary file, now
  I understand your intention.

  Regarding race conditions: No, I do not assume that aroma.* takes care
  of potential race conditions. Here is what I do:

  Assume that I have downloaded from GEO a prostate cancer dataset
  consisting of 40 CEL-files. Then I create a directory Prostate and
  subdirectories Prostate/annotationData and Prostate/rawData
  following your required file structure.

   However, starting with the 2nd CEL-file I create subdirectories
  Prostate/Prostate2,...,Prostate/Prostate40, each containing a
  symbolic link to ../annotationData and ../rawData from Prostate.

 Do I understand you correctly that you use a separate project
 directory for each CEL file, so that when you process the data you get
 separate subdirectories probeData/ and plmData/ in each of these
 project directories?

  Thus when running GLAD each cluster node has its own directory to
  write to, e.g. Prostate/Prostate21/reports for creating the images.

 This is where I get lost.  In order to do CN segmentation (here GLAD),
 you need to calculate CN ratios relative to a reference.  Looking at
 your error message, that reference is calculated from the pool of
 samples, i.e. getAverageFile() is done on the pool of references.
 Thus, for this to make sense you need a *pool of samples*, but if I
 understood you correctly above, you don't have that, but only one
 array per project directory.  I guess I misunderstood you, because
 your error indicates something else.

 The only way the error you got occurred was because multiple R
 sessions tried to run getAverageFile(ces) on data sets that contain
 arrays with the same names and in the same order (more precisely
 getNames(ces)).  If they would contain different array names, there
 would be no clash, because that saveObject() statement (that I just
 removed) would write to different filenames.  This makes me suspect
 that you indeed use the same pool of reference samples.

  Only after all nodes have finished their computations, then I move the
  relevant files to the main directory, e.g. all images are moved to
  Prostate/reports. Afterwards I delete the subdirectories
  Prostate2,...,Prostate40 and their contents.

  As you can see, using this setup there should not be any race
  conditions. The only remaining problem are the temporary files which
  you store in .Rcache in my home directory.

 So, there is something I don't understand above.  Can you post you
 full script, because that would certainly remove some of the
 ambiguities.

 Also, it helps if change your script to be explicit about the
 getAverageFile() calculation, i.e.

 print(cesN);
 ceR - getAverageFile(cesN);
 print(ceR);
 seg - GladModel(cesN, ceR);
 print(seg);

 instead of letting GladModel() do it implicitly:

 seg - GladModel(cesN);
 print(seg);

 As explained above, if your parallelized R sessions calculate ceR

Re: [aroma.affymetrix] medpolish in analyzing HuGene arrays

2010-08-02 Thread Henrik Bengtsson
Hi Steven.

On Mon, Jul 26, 2010 at 9:00 PM, Steven Bosinger
steven.bosin...@gmail.com wrote:
 Hi,

 I'm new to aroma and bioC in general, so these are probably a very
 straightforward questions:

 I am using aroma to get QC on some Human Affymetrix Gene arrays.

 1. To keep it consistent with previous analyses using RMA pre-
 processing (BKG subtraction, quantile normalization and median polish
 summarization), can I use the medpolish function instead of RmaPlm?

Not sure what implementation you used before, but the RMA
summarization step in the oligo package uses median polish.  RmaPlm
does not do the low-level calculation itself, by rely on existing
code/package for this.   By default it is using
affyPLM/preprocessCore.  You can tell RmaPlm to use that of oligo as:

plm - RmaPlm(..., flavor=oligo);

There are some more comments in help(RmaPlm).

Note that even if two different implementations/software say they are
using median polish, they may not be numerically reproducible.  How
median polish is started, how many iteration it runs etc may give you
different results.  If I remember correctly, it is also known for not
always converging, i.e. it can oscillate between two results.

Note that median polish and rlm (robust linear modelling) are both
estimator for the same log-additive probe-level model, i.e. they try
to estimate the same parameters but in different ways.  Some
people/software documentations are sloppy and say they use median
polish, but in reality they might actually have used rlm.

I would recommend to use rlm, if possible.  You can always run both
variants and see how much the results differ.


 2. I read in the forum that NUSE plots aren't available when you
 summarize using medpolish, is this the case?

Good catch.  Could you provide a link where you found that?

In order to calculate NUSE (Normalized Unscaled Standard Errors), you
need standard deviations of the parameter estimates.  The median
polish estimator [see help(oligo::basicRMA)] does *not* give/return
standard deviation/errors of the parameter estimates.  Internally, we
fix the stddev to 1 (one), so if you try to calculate NUSE, you'll get
nothing useful or even an error.


 3. Is there a vignette/pdf file similar to BioC that lists all the
 available functions for aroma?

The website is the main source of documentation.


 4. How can I export the RMA pre-processed data matrix to another 3rd
 party software?

A good start is probably to use extractDataFrame() and write its
content in a format you like.  See the how-to page 'Extract probeset
summaries (chip effects) as a data frame':

  http://aroma-project.org/howtos/extractDataFrame


 5. Is there a function for MvA plots?

It's not clear to me *what* you want to plot, but basically:

plotMvsA(cf, reference=cfR);

where 'cf' and 'cfR' are two AffymetrixCelFile:s, e.g.

cf - getFile(cs, 1);
cfR - getAverageFile(cs);

You can do the same by replacing the AffymetrixCelSet 'cs' with an
ChipEffectSet 'ces'.


 6. How do I format plots? ie alter range, color etc

The same way as you usual do - what have you tried and what didn't work?


 Sorry for these newbie questions...

No worries.  Though, next time, please try to post one question/topic
per message, and try to be more precise in what you are asking/have
tried.  Then it is easier to help and quicker to reply to.

Cheers,

Henrik


 Steve.

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Batch Adjusted RMA data

2010-08-02 Thread Henrik Bengtsson
Hi Fong(?).

On Sat, Jul 24, 2010 at 2:10 AM, Fong fongchunc...@gmail.com wrote:
 Hi,

 I've used aroma.affymetrix to generate and extract the probeset
 summaries (chip effects) from a set of Human Exon array samples I
 have.  And then performed batch adjustment on these probeset summaries
 using another R script (ComBat.R).  Now I got the adjusted probeset
 values and I was running whether it was possible to feed these into
 firma again to use?

 I can't figure out how to load external RMA data into the
 aroma.affymetrix package,

Unfortunately, there is no such option available in aroma.affymetrix.
It can be done, but you really have to dig into the low-level parts
which requires lots of knowledge, which only a few developers have.

What adds to the complications, is that the FIRMA model relies on the
residuals of the probe-level modelling (PLM), see the bottom equation
in column 2 page 2 of Purdom et al. (2008):

  r_ijk = y_ijk + c_i + p_k

where {y} are the probe signals, {c} are the estimated chip effects
and {p} the estimate probe affinities.  With the risk of making a fool
of myself, I think ComBat is correcting only the chip effects {c}.
This means that when calculating the above residuals you would use
ComBat-normalized chip effects but the default RMA probe affinities.
I don't think FIRMA algorithm was designed for this.  Have you though
about this?

/Henrik


 Any help would be greatly appreciated.

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Re: Mo-Ex 1.0st array Analysis using aroma.affymetrix and FIRMA model

2010-08-02 Thread Henrik Bengtsson
Hi Sundar,

I've been leaving your messages to the FIRMA experts, because they can
better answer you questions.  However, I'll give a quick reply to the
things I can answer.

On Mon, Aug 2, 2010 at 6:59 PM, Sundar sundar...@gmail.com wrote:
 Hi,
       I am trying to analyze Mouse Exon 1.0 st array data using
 aroma.affymetrix and FIRMA model to find the splicing variants.
 CDF file : MoEx-1_0-st-v1,coreR1,A20080718,MR.cdf  ( downloaded from
 the aroma.affymetrix website)
 CEL files: 12 Mouse Exon 1.0 st arrays. ( 6 arrays for each strain A
 and B) and Within A and B, i have 3 arrays each of two experimental
 conditions)

 1) I am  not sure how the RMA normalization in aroma.affymetrix
 performs, is the normalization performed within array or between the
 array ?

The aroma.affymetrix package implements a standard RMA normalization,
that is, it reproduces it very well.  I recommend that you read up on
RMA model:

Bolstad, B.M., Irizarry R. A., Astrand, M., and Speed, T.P. (2003), A
Comparison of Normalization Methods for High Density Oligonucleotide
Array Data Based on Bias and Variance. Bioinformatics 19(2):185-193
Supplemental information

Rafael. A. Irizarry, Benjamin M. Bolstad, Francois Collin, Leslie M.
Cope, Bridget Hobbs and Terence P. Speed (2003), Summaries of
Affymetrix GeneChip probe level data Nucleic Acids Research 31(4):e15

Irizarry, RA, Hobbs, B, Collin, F, Beazer-Barclay, YD, Antonellis, KJ,
Scherf, U, Speed, TP (2002) Exploration, Normalization, and Summaries
of High Density Oligonucleotide Array Probe Level Data. Accepted for
publication in Biostatistics.

Then you'll see that the RMA pipeline is a multi-array method.  For
more question on this, I recommend you to use the larger Bioconductor
mailing list, because this one is used for aroma.* specific questions.

 How do i pass arguments to the function to control the process
 of normalization ?

What do you want to control.


 2) What are .js and .css files generated at the end of the analysis
 described in vignette ( Human Exon array ). Are there any third party
 software that could be used to analyze this out put ?

Those are Javascript and CSS files used by the ArrayExplorer HTML
reports.  They do not contain any kind of data.  They cannot be used
by other software.


 3) How do i convert the probe set ID or transcript ID into gene name ?
 Can i calculate the fold change in aroma.affymetrix ?

I leave this one to the FIRMA experts.  Make sure to search/go through
the aroma.affymetrix mailing archives - I think the question has been
asked and answered before.

More below



 Thank you,
 Sundar



 On Jul 27, 11:52 am, Sundar sundar...@gmail.com wrote:
 Hi,
          I am new to the Exon array concept. I am trying to analyze
 Mouse Exon 1.0 st array data using aroma.affymetrix and FIRMA model.
 Few questions i have are below.

 1) To get a start i have just implemented the code described in
 vignette  FIRMA: Human exon array analysis.  There are certain .CEL
 files and other extension files generated.

It is not clear which particular CEL files you are referring to.  The
ones under probeData/ contain normalized/calibrated probe signals in
CEL files of the same format/layout as the once in rawData/.  CEL
files contains probe intensities, probe standard deviation, and number
of pixels per probe.

The ones in plmData/ are special aroma-specific CEL files that should
be treated as internal file only, especially, they cannot be read by
other software.  These files contains chip effect estimates and
standard deviation of those.  There is also one CEL file containing
probe affinity estimates.  This is if you use the RMA-style
probe-level modelling.

 How can i read them into
 R ?

What do you want to do with the data that aroma doesn't do?

 How do i know what information they carry ? Is there any other
 software i can upload them to read these files ?

Answered above.

 How do i get control
 of the analysis in applying extractMatrix() or extractDataFrame() ,
 readUnits() functions ?

Again, it is not clear what you want to do, but maybe the How-to page
'Extract probeset summaries (chip effects) as a data frame'
[http://aroma-project.org/howtos/extractDataFrame] illustrates your
options.  I recommend that you use that instead of extractDataFrame().

Don't use readUnits(), unless you're a developer for aroma.affymetrix works.


 2)  I am unable to get a clear understanding of what the background
 calculation are taking place using the functions ( used to generate
 the files in Q.1) ? Except that these function performs QC,
 Normalization, Summarization etc. I am lacing clear understanding how
 those methods are implemented on Exon arrays , when compared to 3'
 prime arrays ? for instance,, In 3'-IVT arrays, I have the control of
 Normalizing within the group or across the group, where as in this
 exon array analysis I'm not sure what the function does ?

I believe a detailed understand on the FIRMA paper will help here and
help you be 

Re: [aroma.affymetrix] non finite values in FIRMA results

2010-08-02 Thread Henrik Bengtsson
Hi,

I'll leave the details to FIRMA experts, but you are using really
large values of argument 'ram'.  It might be that you ran out of
memory (the you got an error message).  If you used cut'n'paste,
instead of source(), to do the analysis, it might be that one of the
fit() methods was preemptively finished.  If so, not all units have
been fitted.  Rerun with ram=1 to see if you get a different result;
the units already fitted will be skipped.

I'm also not sure if

 431210/(1190297*107)
[1] 0.003385710

is an exceptionally large fraction.  Note also that 431210/107 is
exactly 4030; it could be that it is the exact same 4030 units that
are NA in all samples.  That could be explained by some units are not
fitted or are Affymetrix control units.

BTW, not all methods take argument 'ram'.  Instead, use the global
aroma settings for achieving the same, e.g. setOption(aromaSettings,
memory/ram, 50). More info at http://aroma-project.org/settings.
This way your scripts are clean and can be ran as-is on other
machines with less memory.

Maybe this helps(?)

/Henrik

On Mon, Aug 2, 2010 at 7:24 PM, Adi Tarca ata...@med.wayne.edu wrote:
 Hi all,
 I have a batch of 107 mice exon arrays for which I computed FIRMA
 scores and I got many NaN, Inf and 0 values which disable further
 analysis based on log FIRMA values for some probesets. I was wondering
 if this is a known issue or I am the only one to get these results.


 Here is the code I use to get the FIRMA scores:

 library(aroma.affymetrix)
 verbose - Arguments$getVerbose(-8, timestamp=TRUE)
 chipType - MoEx-1_0-st-v1
 cdf - AffymetrixCdfFile$byChipType(chipType,
 tags=fullR1,A20080718,MR)
 cs - AffymetrixCelSet$byName(mice2010, cdf=cdf)
 bc - RmaBackgroundCorrection(cs, tag=fullR1,A20080718,MR)
 csBC - process(bc,verbose=verbose,ram=500)
 qn - QuantileNormalization(csBC, typesToUpdate=pm)
 csN - process(qn, verbose=verbose,ram=500)
 plmTr - ExonRmaPlm(csN, mergeGroups=TRUE)
 fit(plmTr, verbose=verbose,ram=500)

 firma - FirmaModel(plmTr)
 fit(firma, verbose=verbose,ram=500)
 fs - getFirmaScores(firma)
 myres2=extractDataFrame(fs,addNames=FALSE)
 myres=as.matrix(myres2[,-(1:3)])


 Here is the counts of NaN Inf and 0 values:

 dim(myres)
 [1] 1190297     107
 sum(is.nan(myres))
 [1] 431210
 sum(is.infinite(myres),na.rm=TRUE)
 [1] 855
 sum(myres==0,na.rm=TRUE)
 [1] 214



 R version 2.11.0 (2010-04-22)
 x86_64-unknown-linux-gnu

 locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods
 base

 other attached packages:
  [1] aroma.affymetrix_1.5.0 aroma.apd_0.1.7
 affxparser_1.20.0
  [4] R.huge_0.2.0           aroma.core_1.5.0
 aroma.light_1.16.0
  [7] matrixStats_0.2.1      R.rsp_0.3.6
 R.cache_0.3.0
 [10] R.filesets_0.8.1       digest_0.4.2
 R.utils_1.4.0
 [13] R.oo_1.7.2             R.methodsS3_1.2.0















 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


[aroma.affymetrix] aroma.affymetrix v1.7.0 released

2010-07-28 Thread Henrik Bengtsson
Hi all,

new versions of aroma.affymetrix and friends have been released.  It
is highly recommended to update:

source(http://aroma-project.org/hbLite.R;);
hbInstall(aroma.affymetrix);

In addition to some added features, there were also a few bugs fixed
in this release. Thanks for the reports!  All details on what's new
can be found below.

Cheers,
Henrik  co-developers


- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Updates to aroma.affymetrix
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Version: 1.7.0 [2010-07-26]
o Committed to CRAN. No updates.

Version: 1.6.8 [2010-07-21]
o CLEAN UP: Now getAverageFile() for AffymetrixCelSet no longer writes
  debug information to ${Rcache}/aroma.affymetrix/idChecks/.

Version: 1.6.7 [2010-07-19]
o Now byPath(..., cdf) for ChipEffectSet will silently try to retrieve
  the the monocell CDF if argument 'cdf' is the main CDF.  If it fails
  an error is thrown.  This makes it possible to specify the main/
regular
  CDF (or chip type), instead of the monocell CDF, when retrieve a
  chip-effect data set.

Version: 1.6.6 [2010-07-02]
o Now AffymetrixCelSet$byName(..., chipType=GenomeWideSNP_6,Full)
will
  work (before chiptypes with tags would give an error).  This is now
  done by first locating the CDF for the chip type (with tags).
o Added doASCRMAv1() and doASCRMAv2() for convenient allele-specific
  doCRMAv1() and doCRMAv2() wrappers.
o CLEAN UP: Dropped argument 'transforms' from getImage()
  for AffymetrixCdfFile.

Version: 1.6.5 [2010-06-16]
o Added doRMA() for AffymetrixCelSet and data-set names.
  doRMA() runs in bounded memory and replicates the results of
  fitPLM() in the affyPLM package with great precision.

Version: 1.6.4 [2010-06-07]
o BUG FIX: Added argument shift=+300 to doCRMAv1().

Version: 1.6.3 [2010-05-30]
o Now translateFullName() of AffymetrixProbeTabFile translates
  'PROBE_STRAND' to 'targetStrandedness'.

Version: 1.6.2 [2010-05-26]
o Started to add scripts for downloading example data.

Version: 1.6.1 [2010-05-19]
o CORRECTION: doCRMAv1() did not shift +300 the signals before
  doing the probe-level summarization.
o BUG FIX: Fixed a bug in PdInfo2Cdf().  Thanks Kasper Daniel Hansen
  for reporting this.


- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Updates to aroma.core
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Version: 1.7.0 [2010-07-26]
o Committed to CRAN. No updates.

Version: 1.6.8 [2010-07-24]
o Added several methods for CopyNumberRegions, e.g. xRange(),
prune(),
  simulateRawCopyNumbers(), +(), -() and *().

Version: 1.6.7 [2010-07-20]
o Added writeDataFrame() for AromaUnitTotalCnBinarySet and
  AromaUnitFracBCnBinarySet to get the correct filename extension.
  Thanks Nicolas Vergne at the Curie Institute for reporting this.

Version: 1.6.6 [2010-07-19]
o Added subset() for CopyNumberRegions.
o Now extractRegion() for RawGenomicSignals also accepts a
  CopyNumberRegions object for argument 'regions'.
o Added extractRegions() for RawGenomicSignals.

Version: 1.6.5 [2010-07-08]
o BUG FIX: writeDateFrame() for AromaUnitSignalBinarySet would
  write the same data chunk over and over.

Version: 1.6.4 [2010-07-06]
o BUG FIX: indexOf() for ChromosomalModel would return NA if a search
  pattern contained parenthesis '(' and ')'.  There was a similar
issue
  in indexOf() for GenericDataFileSet/List in R.filesets, which was
  solved in R.filesets 0.8.3.  Now indexOf() for ChromosomalModel
  utilizes ditto for GenericDataFileSet for its solution.

Version: 1.6.3 [2010-06-22]
o BUG FIX: as.GrayscaleImage(..., transforms=NULL) for 'matrix' would
  throw Exception: Argument 'transforms' contains a non-function:
NULL.

Version: 1.6.2 [2010-06-02]
o BUG FIX: updateDataColumn() of AromaTabularBinaryFile would
  censor *signed integers* incorrectly; it should censor at/to
  [-(n+1),n], but did it at [-n,(n+1)] (two's complement).
  This caused it to write too large values as n+1, which then
  would be read as -(n+1), e.g. writing 130 would be censored
  to 128 (should be 127), which then would be read as -128.
  Added more detailed information on how many values were censored.
  Thanks Robert Ivanek for report on this.

Version: 1.6.1 [2010-05-27]
o Added trial version of fullname translator files.
o doCBS() for character:s support data set tuples.
o Added doCBS() for CopyNumberDataSetTuple:s.


- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Updates to R.filesets
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Version: 0.8.3 [2010-07-06]
o BUG FIX: indexOf() for GenericDataFileSet/List would return NA if
  the search pattern/string contained parentheses.  The reason is that
  such have a special meaning in regular expression.  Now indexOf()
  first search by regular expression patterns, then by fixed strings.
  Thanks Johan Staaf at Lund University and Larry(?) for reporting
  on this issue.

Version: 0.8.2 [2010-05-26]

Re: [aroma.affymetrix] peculiar array quality

2010-07-23 Thread Henrik Bengtsson
Hi,

it's hard to say what causing this, but if you see it in several
samples at the same location, then my immediate thought is that you
reference signal may carry it.  Are you using the average of the pool
of all samples as a reference or how do you calculate it?   How many
samples to you have in your reference pool?  CN polymorphic regions
that are frequent enough in your population could cause this, but then
it should be a real biological signal, which you say it isn't.

Are you using the full or the default GenomeWideSNP_6 CDF?  Affymetrix
removed several CN loci from the former to make the latter - CN loci
that they found to be poor for CN analysis.  This could also be a
reason though those loci should be scattered fairly randomly along the
genome.

You could also check if there is a difference between the signals from
SNPs and CN loci.  If there is, that would indicate that there is some
artifacts on the arrays.

Also, are you really sure you are using the correct annotation data?
For instance, if you use the full CDF to generate the data, but only
the default for extracting genome locations (assuming the same
ordering of row indices), such weird things may show up.  If you plot
your data using the ChromosomeExplorer, this should be taken care of
automatically.

Also, do some QC plots using ArrayExplorer; there might be spatial
artifacts, although it sounds unlikely.

Sorry, not much help, but at least some directions for troubleshooting.

/Henrik


On Wed, Jul 21, 2010 at 5:34 PM, Matt Wilkerson mdwilk...@gmail.com wrote:
 Hello,

 I have detected what I think is an array quality issue and wanted to get
 others' opinions about this phenomena.

 I observed this issue on chromosome views of CN from SNP6 arrays.  It looks
 like a smearing effect where CN has irregular values and a range of large
 negative numbers to zero within specific regions.  The regions at which this
 happens are identical in affected samples and occur on basically all
 chromosomes.  This smearing is not cancer DNA segment loss, where probes
 belonging to a segment have similar CN values.  In a group of about 70
 arrays, 1/3 of the arrays have this issue and the others have expected
 segments of discrete amplifications/deletions.  I have compared specimen,
 technical, and array characteristics to try to find a batch or quality
 issues, but the effect appears so far to be randomly occuring.

 I put an example at:
 http://www.unc.edu/~mwilkers/artifact.png
 In the plot, the points are probes. Axes are base position and log2 median
 centered copy number.  The lines are segments overlaid.  The colors are not
 important.

 I don't think this is an aroma issue - I detect the phenomena using
 apt-copynumber-workflow also.  The only affymetrix summary option that
 associated with the artifact samples was allele summarization mean.  The
 artifact arrays had lower values.
 Also, I have used aroma successfully with 250K_Sty arrays often and never
 seen this phenomena.

 My question:
 Has anyone seen this phenomena before?
 Does anyone have an explanation or suggestion?


 Thank you,

 Matt Wilkerson
 Lineberger Comprehensive Cancer Center
 University of North Carolina at Chapel Hill

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest
 version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Re: Problem with GLAD on linux cluster

2010-07-23 Thread Henrik Bengtsson
Hi.

On Thu, Jul 22, 2010 at 10:24 AM, cstratowa
christian.strat...@vie.boehringer-ingelheim.com wrote:
 Dear Henrik,

 Thank you very much for changing the code for getAverageFile(), I will
 try it and let you know.

 Thank you also for the explanation of writing to a temporary file, now
 I understand your intention.

 Regarding race conditions: No, I do not assume that aroma.* takes care
 of potential race conditions. Here is what I do:

 Assume that I have downloaded from GEO a prostate cancer dataset
 consisting of 40 CEL-files. Then I create a directory Prostate and
 subdirectories Prostate/annotationData and Prostate/rawData
 following your required file structure.

  However, starting with the 2nd CEL-file I create subdirectories
 Prostate/Prostate2,...,Prostate/Prostate40, each containing a
 symbolic link to ../annotationData and ../rawData from Prostate.

Do I understand you correctly that you use a separate project
directory for each CEL file, so that when you process the data you get
separate subdirectories probeData/ and plmData/ in each of these
project directories?

 Thus when running GLAD each cluster node has its own directory to
 write to, e.g. Prostate/Prostate21/reports for creating the images.

This is where I get lost.  In order to do CN segmentation (here GLAD),
you need to calculate CN ratios relative to a reference.  Looking at
your error message, that reference is calculated from the pool of
samples, i.e. getAverageFile() is done on the pool of references.
Thus, for this to make sense you need a *pool of samples*, but if I
understood you correctly above, you don't have that, but only one
array per project directory.  I guess I misunderstood you, because
your error indicates something else.

The only way the error you got occurred was because multiple R
sessions tried to run getAverageFile(ces) on data sets that contain
arrays with the same names and in the same order (more precisely
getNames(ces)).  If they would contain different array names, there
would be no clash, because that saveObject() statement (that I just
removed) would write to different filenames.  This makes me suspect
that you indeed use the same pool of reference samples.

 Only after all nodes have finished their computations, then I move the
 relevant files to the main directory, e.g. all images are moved to
 Prostate/reports. Afterwards I delete the subdirectories
 Prostate2,...,Prostate40 and their contents.

 As you can see, using this setup there should not be any race
 conditions. The only remaining problem are the temporary files which
 you store in .Rcache in my home directory.

So, there is something I don't understand above.  Can you post you
full script, because that would certainly remove some of the
ambiguities.

Also, it helps if change your script to be explicit about the
getAverageFile() calculation, i.e.

print(cesN);
ceR - getAverageFile(cesN);
print(ceR);
seg - GladModel(cesN, ceR);
print(seg);

instead of letting GladModel() do it implicitly:

seg - GladModel(cesN);
print(seg);

As explained above, if your parallelized R sessions calculate ceR -
getAverageFile(cesN) on the same 'cesN data set they will try to
generated the same 'ceR' result file, and you have a race condition.


 I know that you store the monocell files in .Rcache/
 aroma.affymetrix, so that the monocell files have to be created only
 once.

Actually, the monocell *CDF* is stored in the corresponding
annotationData/chipTypes/chipType/ directory.

What is stored in .Rcache/ is main for performance purpose, i.e. we
use it for memoization [http://en.wikipedia.org/wiki/Memoization].
Moreover, we mostly use it for memoization of annotation data, because
that type of information is likely to be requested multiple times for
the same chip types regardless of data set.  In order for memoization
to work well across R sessions and hosts, the .Rcache/ directory need
to be accessed globally.  We rarely use memoization for experimental
data, because that is typically only requested once (in the data sets
life time).

 However, for the temporary files please allow me to suggest that
 you create a temporary directory in your file structure, e.g.
 Prostate/tmp, where these files are stored. In my case this would
 definitely solve my problem since each subdirectory would contain its
 own temporary directory, e.g. Prostate/Prostate21/tmp. I do not know
 if this change would break any code or cause any problems, it is only
 a naive suggestion. What is your  opinion?

You suggestion makes sense for dataset specific temporary files etc,
but again, I don't think that is the case here.  Instead I think we
are misunderstanding each other.  You script will help.

/Henrik


 Best regards
 Christian


 On Jul 21, 6:46 pm, Henrik Bengtsson henrik.bengts...@gmail.com
 wrote:
 Hi Christian.

 On Wed, Jul 21, 2010 at 2:59 PM, cstratowa



 christian.strat...@vie.boehringer-ingelheim.com wrote:
  Dear Henrik,

  Thank you for this extensive explanation

[aroma.affymetrix] Re: Problem with GLAD on linux cluster

2010-07-23 Thread Henrik Bengtsson
Hi.

On Jul 22, 10:24 am, cstratowa christian.strat...@vie.boehringer-
ingelheim.com wrote:
 Dear Henrik,

 Thank you very much for changing the code for getAverageFile(), I will
 try it and let you know.

 Thank you also for the explanation of writing to a temporary file, now
 I understand your intention.

 Regarding race conditions: No, I do not assume that aroma.* takes care
 of potential race conditions. Here is what I do:

 Assume that I have downloaded from GEO a prostate cancer dataset
 consisting of 40 CEL-files. Then I create a directory Prostate and
 subdirectories Prostate/annotationData and Prostate/rawData
 following your required file structure.

  However, starting with the 2nd CEL-file I create subdirectories
 Prostate/Prostate2,...,Prostate/Prostate40, each containing a
 symbolic link to ../annotationData and ../rawData from Prostate.

Do I understand you correctly that you use a separate project
directory for each CEL file, so that when you process the data you get
separate subdirectories probeData/ and plmData/ in each of these
project directories?

 Thus when running GLAD each cluster node has its own directory to
 write to, e.g. Prostate/Prostate21/reports for creating the images.

This is where I get lost.  In order to do CN segmentation (here GLAD),
you need to calculate CN ratios relative to a reference.  Looking at
your error message, that reference is calculated from the pool of
samples, i.e. getAverageFile() is done on the pool of references.
Thus, for this to make sense you need a *pool of samples*, but if I
understood you correctly above, you don't have that, but only one
array per project directory.  I guess I misunderstood you, because
your error indicates something else.

The only way the error you got occurred was because multiple R
sessions tried to run getAverageFile(ces) on data sets that contain
arrays with the same names and in the same order (more precisely
getNames(ces)).  If they would contain different array names, there
would be no clash, because that saveObject() statement (that I just
removed) would write to different filenames.  This makes me suspect
that you indeed use the same pool of reference samples.

 Only after all nodes have finished their computations, then I move the
 relevant files to the main directory, e.g. all images are moved to
 Prostate/reports. Afterwards I delete the subdirectories
 Prostate2,...,Prostate40 and their contents.

 As you can see, using this setup there should not be any race
 conditions. The only remaining problem are the temporary files which
 you store in .Rcache in my home directory.

So, there is something I don't understand above.  Can you post you
full script, because that would certainly remove some of the
ambiguities.

Also, it helps if change your script to be explicit about the
getAverageFile() calculation, i.e.

print(cesN);
ceR - getAverageFile(cesN);
print(ceR);
seg - GladModel(cesN, ceR);
print(seg);

instead of letting GladModel() do it implicitly:

seg - GladModel(cesN);
print(seg);

As explained above, if your parallelized R sessions calculate ceR -
getAverageFile(cesN) on the same 'cesN data set they will try to
generated the same 'ceR' result file, and you have a race condition.


 I know that you store the monocell files in .Rcache/
 aroma.affymetrix, so that the monocell files have to be created only
 once.

Actually, the monocell *CDF* is stored in the corresponding
annotationData/chipTypes/chipType/ directory.

What is stored in .Rcache/ is main for performance purpose, i.e. we
use it for memoization [http://en.wikipedia.org/wiki/Memoization].
Moreover, we mostly use it for memoization of annotation data, because
that type of information is likely to be requested multiple times for
the same chip types regardless of data set.  In order for memoization
to work well across R sessions and hosts, the .Rcache/ directory need
to be accessed globally.  We rarely use memoization for experimental
data, because that is typically only requested once (in the data sets
life time).

 However, for the temporary files please allow me to suggest that
 you create a temporary directory in your file structure, e.g.
 Prostate/tmp, where these files are stored. In my case this would
 definitely solve my problem since each subdirectory would contain its
 own temporary directory, e.g. Prostate/Prostate21/tmp. I do not know
 if this change would break any code or cause any problems, it is only
 a naive suggestion. What is your  opinion?

Your suggestion makes sense for dataset specific temporary files etc,
but again, I don't think that is the case here.  Instead I think we
are misunderstanding each other.  You script will help.

/Henrik


 Best regards
 Christian

 On Jul 21, 6:46 pm, Henrik Bengtsson henrik.bengts...@gmail.com
 wrote:



  Hi Christian.

  On Wed, Jul 21, 2010 at 2:59 PM, cstratowa

  christian.strat...@vie.boehringer-ingelheim.com wrote:
   Dear Henrik,

   Thank you for this extensive explanation and sorry

[aroma.affymetrix] Re: Reference dataset for ACNE

2010-07-21 Thread Henrik Bengtsson
[sorry my repost did not contain my full reply due to a cut'n'paste
error.]

Hi Nicolas.

On Tue, Jul 20, 2010 at 11:42 AM, Nicolas Vergne
nicolas.vergne@gmail.com wrote:
 Hi everybody,

 I use ACNE for the normalization of SNP6.0 chip arrays.
 As ACNE is a multi-array methode, I would like to know if there is an
 option to precise the dataset of reference in the doACNE function?

You may ask one of two things.  Either you want to be able (a) to
specify the subset of the arrays that you trust and you wish to
estimate the ACNE model parameters based on, or you wish (b) to
estimate them from a separate reference (training) set.  The ACNE
package does unfortunately not support neither of this yet.

For (a), I can only say that you have to rely on the robust estimators
of ACNE and the assumption that most arrays behave as normals at any
given SNP (it can be different set of samples for each SNP).  For (b),
the best you can do for now, is to include your training data set when
you fit ACNE.  If it is large enough it will dominate the estimates.

As long as you do ACNE manually (i.e. not doACNE()):

 http://aroma-project.org/vignettes/ACNE

you can still do the CRMAv2 preprocessing part of ACNE separately for
the training data set.  It is only when you get to that NmfSnpPlm step
where you have to merge your test and the training data set, e.g.

csNRef - ...  # Probe-normalized training data set
csN - ...  # Probe-normalized test data set

# Append the training (reference) set to the test data set
csN - append(csN, csNRef);

# And fit the ACNE probe summarization for the lot
plm - NmfSnpPlm(csN, mergeStrands=TRUE);
...

and so on.

DETAILS:
In order to truly use external parameter estimates (priors), we to
be able to specify that in  the NmfSnpPlm setup.  Part of this
mechanism is already in place (generically in the aroma.affymetrix
framework), but not fully.  What is mainly missing is that the
internal low-level fitSnpNmf() of ACNE still don't recognize/utilize
such prior estimates.  I cannot predict when this can be done by me.
 You may want to look at it yourself, I recommend to get it working
with fitSnpNmf().  There is an example in help(fitSnpNmfArray) that
could be adjusted for testing it.   When that is in place, it
shouldn't be that hard for me to update NmfSnpPlm and the wrapper
doACNE() accordingly.  That is for alternative (b), though alternative
(a) also needs to be implemented in fitSnpNmf().

 I would like to use the same sample for each new chip normalization. And
 I wouldn't like to use the dataset that I want to normalize. Is it a
 good way? My problem is to not reproduce the analysis for each new
 chip in the project.

This sounds like the (b) alternative: It is rather well known that
there are large lab and batch effects in Affymetrix data.
Preprocessing removes some of this but certainly not everything.
Others have observed this over and over.  Because of this, estimating
the ACNE model parameters on one data set from a different batch/lab
and use them to normalize another data set will work less well than if
the parameters where estimated from samples with the same batch.

Hope this helps (a bit).

/Henrik

On Jul 20, 11:42 am, Nicolas Vergne nicolas.vergne@gmail.com
wrote:
 Hi everybody,

 I use ACNE for the normalization of SNP6.0 chip arrays.
 As ACNE is a multi-array methode, I would like to know if there is an
 option to precise the dataset of reference in the doACNE function? I
 would like to use the same sample for each new chip normalization. And
 I wouldn't like to use the dataset that I want to normalize. Is it a
 good way? My problem is to not reproduce the analysis for each new
 chip in the project.

 Thanks in advance for your answers,

                              Nicolas

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Re: Problem with GLAD on linux cluster

2010-07-21 Thread Henrik Bengtsson
Hi Christian.

On Wed, Jul 21, 2010 at 2:59 PM, cstratowa
christian.strat...@vie.boehringer-ingelheim.com wrote:
 Dear Henrik,

 Thank you for this extensive explanation and sorry for the late reply
 but I was pretty busy.

 Yes, it did work before! As I mentioned with versions
 aroma.affymetrix_1.1.0 and earlier I have never had a  problem doing
 the analyses on cluster nodes.

 Looking at the source code of different versions of saveObject() I
 realize that using saveObject(..,safe=FALSE) would be the same as
 using saveObject() from R.utils_0.9.1. Thus in principle this could
 solve my problem. Is this correct?

 Sadly, method AffymetrixCelSet::getAverageFile() in
 aroma.affymetrix_1.6.2 does not allow to pass parameter safe=FALSE
 to saveObject(). Is it possible for you to change it?

I have decided to remove that debug code that calls saveObject(),
because it is not really needed anymore.  The main reason why I remove
it is because it is obsolete code.  The intention of that code snippet
in getAverageFile() was never to protect against race conditions (it
was just an unplanned side effect).

Until next release, you can get a patched version as:

library(aroma.affymetrix);
downloadPackagePatch(aroma.affymetrix);

Note, as I said in my previous reply, by processing (=here calling
getAverageFile() on) the same data set on multiple hosts, you are
potentially running into race conditions resulting in corrupt data.
You should at least be aware of it and understand why this is the
case.


 It is still not clear to me why you create first a temporary file
 which you then rename (although you mention power failures etc).
 However, would it be possible to add a random number to the temporary
 filename, e.g. *.tmp.1948234, so that the problem with the existing
 temporary file could be avoided?

The main purpose of writing to a temporary file and then renaming is
to make sure that the file is complete.  If something happens while
writing the temporary file, the final file will not exist/be created.
If one would write to the final file from the beginning, there is no
way for us to know if the file was correctly created or not.  So,
writing via a temporary file, we effectively have a way of creating
files in one atomic action.


 Probably you only need to change line 59 to:

 pathnameT - sprintf(%s.tmp.%i, pathname,
 as.integer(runif(1,1,)))

In order not to corrupt the temporary file, we check if it already
exist as a protection for being overwritten/added to by another
process.  Yes, you could randomize the name of the temporary file,
lowering the risk of two hosts writing to the same temporary file.
However, when done, both hosts will try to rename their temporary
files to the same pathname.  If done at the same time, we still may
have problems.


 Regarding your suggestion to wrap getAverageFile() in Mutex calls I
 have no idea if there exists an R-package for this purpose. Neither
 Rmpi nor snow seem to be suitable for this purpose (at least  not
 without a complete re-write of my package).

Yes, I neither know of a functional mutex implementation in R.  You
can achieve some by utilizing the lock mechanisms of data base servers
(not SqlLite), but nothing ready is available to my knowledge.

Again, you seem to assume that aroma.* takes care of potential race
conditions for you - it does not.  It only tries to detect them
without warranty - and indeed, the reason why got the error in the
first place indicates that you are pushing the system and that race
conditions may very well happen.  If you run things in parallel and
you are updating/writing the *same data resource*, you should really
have protection against race conditions.  This is a generic problem
unrelated to aroma.*.

/Henrik


 One other question:
 Is it allowed to delete the contents of directory .Rcache/
 aroma.affymetrix/idChecks?

Yes, it should be safe to delete any .Rcache/ as long as no R session
is in the process of writing to it.  It's a cache containing redundant
information.


 Best regards
 Christian


 On Jul 2, 12:47 am, Henrik Bengtsson h...@stat.berkeley.edu wrote:
 Hi Christian.

 On Tue, Jun 29, 2010 at 3:39 PM, cstratowa



 christian.strat...@vie.boehringer-ingelheim.com wrote:
  Dear Henrik,

  Until now I have used aroma.affymetrix_1.1.0 with R-2.8.1 and could
  run my analysis on our sge-cluster w/o any problems.

  Now I have upgraded to R-2.11.1 and to aroma.affymetrix_1.6.2 and are
  curently testing with 8 chips whether my package based on
  aroma.affymetrix still works on the cluster. The normalization step on
  a server did run fine, howeever, distributing the 8 samples on the
  cluster to run GladModel() resulted in the problem that 3 of 8 cluster
  nodes did stop with the following error message:

  Loading required package: GLAD
  ...
  Loading required package: RColorBrewer
  Loading required package: Cairo
  Error in list(`computeCN(aroma, model = model, arrays = arrays[i],
  chromosomes = 1:23, ref

[aroma.affymetrix] Re: Problem with GLAD on linux cluster

2010-07-21 Thread Henrik Bengtsson
Hi Christian.

On Wed, Jul 21, 2010 at 2:59 PM, cstratowa
christian.strat...@vie.boehringer-ingelheim.com wrote:
 Dear Henrik,

 Thank you for this extensive explanation and sorry for the late reply
 but I was pretty busy.

 Yes, it did work before! As I mentioned with versions
 aroma.affymetrix_1.1.0 and earlier I have never had a  problem doing
 the analyses on cluster nodes.

 Looking at the source code of different versions of saveObject() I
 realize that using saveObject(..,safe=FALSE) would be the same as
 using saveObject() from R.utils_0.9.1. Thus in principle this could
 solve my problem. Is this correct?

 Sadly, method AffymetrixCelSet::getAverageFile() in
 aroma.affymetrix_1.6.2 does not allow to pass parameter safe=FALSE
 to saveObject(). Is it possible for you to change it?

I have decided to remove that debug code that calls saveObject(),
because it is not really needed anymore.  The main reason why I remove
it is because it is obsolete code.  The intention of that code snippet
in getAverageFile() was never to protect against race conditions (it
was just an unplanned side effect).

Until next release, you can get a patched version as:

library(aroma.affymetrix);
downloadPackagePatch(aroma.affymetrix);

Note, as I said in my previous reply, by processing (=here calling
getAverageFile() on) the same data set on multiple hosts, you are
potentially running into race conditions resulting in corrupt data.
You should at least be aware of it and understand why this is the
case.


 It is still not clear to me why you create first a temporary file
 which you then rename (although you mention power failures etc).
 However, would it be possible to add a random number to the temporary
 filename, e.g. *.tmp.1948234, so that the problem with the existing
 temporary file could be avoided?

The main purpose of writing to a temporary file and then renaming is
to make sure that the file is complete.  If something happens while
writing the temporary file, the final file will not exist/be created.
If one would write to the final file from the beginning, there is no
way for us to know if the file was correctly created or not.  So,
writing via a temporary file, we effectively have a way of creating
files in one atomic action.


 Probably you only need to change line 59 to:

 pathnameT - sprintf(%s.tmp.%i, pathname,
 as.integer(runif(1,1,)))

In order not to corrupt the temporary file, we check if it already
exist as a protection for being overwritten/added to by another
process.  Yes, you could randomize the name of the temporary file,
lowering the risk of two hosts writing to the same temporary file.
However, when done, both hosts will try to rename their temporary
files to the same pathname.  If done at the same time, we still may
have problems.


 Regarding your suggestion to wrap getAverageFile() in Mutex calls I
 have no idea if there exists an R-package for this purpose. Neither
 Rmpi nor snow seem to be suitable for this purpose (at least  not
 without a complete re-write of my package).

Yes, I neither know of a functional mutex implementation in R.  You
can achieve some by utilizing the lock mechanisms of data base servers
(not SqlLite), but nothing ready is available to my knowledge.

Again, you seem to assume that aroma.* takes care of potential race
conditions for you - it does not.  It only tries to detect them
without warranty - and indeed, the reason why got the error in the
first place indicates that you are pushing the system and that race
conditions may very well happen.  If you run things in parallel and
you are updating/writing the *same data resource*, you should really
have protection against race conditions.  This is a generic problem
unrelated to aroma.*.

/Henrik


 One other question:
 Is it allowed to delete the contents of directory .Rcache/
 aroma.affymetrix/idChecks?

Yes, it should be safe to delete any .Rcache/ as long as no R session
is in the process of writing to it.  It's a cache containing redundant
information.


 Best regards
 Christian

 On Jul 2, 12:47 am, Henrik Bengtsson h...@stat.berkeley.edu wrote:



  Hi Christian.

  On Tue, Jun 29, 2010 at 3:39 PM, cstratowa

  christian.strat...@vie.boehringer-ingelheim.com wrote:
   Dear Henrik,

   Until now I have used aroma.affymetrix_1.1.0 with R-2.8.1 and could
   run my analysis on our sge-cluster w/o any problems.

   Now I have upgraded to R-2.11.1 and to aroma.affymetrix_1.6.2 and are
   curently testing with 8 chips whether my package based on
   aroma.affymetrix still works on the cluster. The normalization step on
   a server did run fine, howeever, distributing the 8 samples on the
   cluster to run GladModel() resulted in the problem that 3 of 8 cluster
   nodes did stop with the following error message:

   Loading required package: GLAD
   ...
   Loading required package: RColorBrewer
   Loading required package: Cairo
   Error in list(`computeCN(aroma, model = model, arrays = arrays[i],
   chromosomes = 1

Re: [aroma.affymetrix] writeDataFrame in CRMAv2 and ACNE

2010-07-20 Thread Henrik Bengtsson
[reposting; the forum has hiccups and does not put my replies in the
archives or deliver to everyone.]

Hi Nicolas,

thanks for reporting this unwanted feature.  I've fixed it so that the
default filename is *,total.txt and *,fracB.txt, respectively.  Until
the next release is available, you can Install a patch as:

library(aroma.affymetrix);
downloadPackagePatch(aroma.core);

FYI, the writeDataFrame() methods takes argument 'filename' (and
'path') allowing you to name them whatever you wish.

/Henrik

On Tue, Jul 20, 2010 at 11:45 AM, Nicolas Vergne
nicolas.vergne@gmail.com wrote:
 Hi everybody,

 Just a little remark.

 I can not use writeDataFrame for CN and for BAF in the same script
 because txt file already exists. So I have to create two directories
 (or delete txt directory). Is there another solution?

 tags - ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY;
 chipType - GenomeWideSNP_6;
 ds1 - AromaUnitTotalCnBinarySet$byName(dataSet, tags=tags,
 chipType=chipType);
 dfTxt1 - writeDataFrame(ds1, columns=c(unitName, chromosome,
 position, *));

 tags - ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY;
 chipType - GenomeWideSNP_6;
 ds2 - AromaUnitFracBCnBinarySet$byName(dataSet, tags=tags,
 chipType=chipType);
 dfTxt2 - writeDataFrame(ds2, columns=c(unitName, chromosome,
 position, *));

 Thank you in advance for your answers,

                                     Nicolas

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


[aroma.affymetrix] IGNORE: Mail test #1

2010-07-19 Thread Henrik Bengtsson
Hi, please ignore this message. I am trying to figure out why messages
that I have sent (group owner) yesterday have not been delivered to
the group and mailinglist archive.  /Henrik

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: Exporting summarized signals to be used by Affymetrix GTC? (Was: Re: [aroma.affymetrix] Re: CRMA v2 errors)

2010-07-18 Thread Henrik Bengtsson
Hi Markus,

sorry but this one slipped through my net.

On Thu, Jun 17, 2010 at 5:27 PM, Smaug72 leber.mar...@gmx.de wrote:
 Dear Henrik,

 unfortunately we are faced with another problem.
 We processed several CEL files with CRMAv2 as decribed by vignette:
 http://aroma-project.org/vignettes/CRMAv2
 We received no errors during the run.

 The problem is that we cannot process the CEL files, which are
 generated.

 The CEL-Files generated in step 2 (Normalization for nucleotide-
 position probe sequence effects) have a size of about 65,9 MB.
 These files can be loaded by GTC Software version 4.0, but they cannot
 be processed by further examination (e.g. QC-matrix calculation)

Yes, the output of all *probe-level* preprocessing methods generates
CEL files of the same format/layout that you can treat as if they were
raw CEL files.   You should be able to use these CEL files in for
instance the Affymetrix GTC software, dChip and so on.


 The CEL files generated in step 3 (Probe summarization) have a size of
 about 26,9 MB (decrease in size of about 39 MB).
 These CEL-Files cannot by loaded by GTC Software version 4.0.

No/correct, because those CEL files are so called chip-effect CEL
files (*,chipEffects.CEL), which are custom-made (=only recognized) by
the aroma.affymetrix software.  They cannot be read by other software
(and you should not try to either).

It sounds like you wish to export the summarized CN signals from
aroma.affymetrix into the Affymetrix GTC software.  I don't know what
kind of data GTC can import, but you can export/write CN signals to
tab-delimited text files by:

# CRMA v2 vignette
cesN - ... # from the PCR fragment-length normalization

# Generate platform-independent data sets
dsNList - exportTotalAndFracB(cesN, drop=FALSE);

Then you can do:

writeDataFrame(dsNList$total, ...);

and (only if you used combineAlleles=FALSE):

writeDataFrame(dsNList$fracB, ...);

For more details on writeDataFrame() and what is written, see

   http://aroma-project.org/howtos/writeDataFrame

The data is (total,fracB) = (total signal, allele B fractions).  If
you want (thetaA,thetaB) or similar you need to fix that yourself
afterward.  Again, I don't know what kind of data Affymetrix GTC can
import, if any.

BTW, if you use doCRMAv2() it will do/return 'dsNList' for you, cf.
http://aroma-project.org/blocks/doCRMAv2.   That may be more
convenient for you.

Hope this helps

Henrik


 Do have experience with this error or explanations?

 Thank you and best regards,
 Markus

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] problem with CDF file

2010-07-12 Thread Henrik Bengtsson
Hi,

before continuing, do you have the latest version of aroma.affymetrix
(v1.6.0) installed, e.g. what does

library(aroma.affymetrix);
print(sessionInfo());

report?  You probably also want to update to R v2.11.1 (R v2.9.0 is rather old).

Second, the annoationData/ directory should be located in your working
directory, i.e.

print(getwd());

I doubt that 'C:/Program Files/R/R-2.9.0/library/aroma.affymetrix/' is
your working directory.  See thread 'Could not locate a file for this
chip type (Was: ...)' from August 27, 2009

  
http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/c18f714638a6eb24/9b34427b16128ef3

for more troubleshooting tricks.

/Henrik

On Tue, Jul 13, 2010 at 12:09 AM, Zsuzsa zsu...@gmail.com wrote:
 Hello Henrik,

 I am trying to use the the aroma.affymetrix package to check the
 quality of some mouse GeneST arrays.  I got stuck with the cdf file.
 I downloaded the unsupported cdf file from Affymetrix, placed it in
 annotationData/chipTypes/MoGene-1_0-st-v1 folder and the run the
 following commands:
 library(affxparser)
 convertCdf(filename = C:/Program Files/R/R-2.9.0/library/
 aroma.affymetrix/annotationData/chipTypes/MoGene-1_0-st-v1/MoGene-1_0-
 st-v1.r3.cdf,
  outFilename = C:/Program Files/R/R-2.9.0/library/aroma.affymetrix/
 annotationData/chipTypes/MoGene-1_0-st-v1/MoGene-1_0-st-v1,r3.cdf)

 library(aroma.affymetrix)
 verbose - Arguments$getVerbose(-8, timestamp=TRUE)
 chipType - MoGene-1_0-st-v1
 cdf - AffymetrixCdfFile$byChipType(chipType, tags=r3)

 The problem is I am getting the following error message:
 Error in list(`AffymetrixCdfFile$byChipType(chipType, tags = r3)` =
 environment,  :

 [2010-07-12 12:42:56] Exception: Could not locate a file for this chip
 type: MoGene-1_0-st-v1,r3
  at throw(Exception(...))
  at throw.default(Could not locate a file for this chip type: ,
 paste(c(chipType, tags), collapse = ,))
  at throw(Could not locate a file for this chip type: ,
 paste(c(chipType, tags), collapse = ,))
  at method(static, ...)
  at AffymetrixCdfFile$byChipType(chipType, tags = r3)

 I think the function is looking for the file someplace else, but I
 don't know where.  Would you be able to help me out on this.

 Thank you.
 Zsuzsa

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] ArrayExplorer: Error in readCelHeader(pathname)

2010-07-06 Thread Henrik Bengtsson
Hi Richard,

sorry for the delay - this one slipped through and I simply missed our
message, but I caught it while troubleshooting the same problem
reported in thread 'ArrayExplorer issue' started on 2010-07-02.

The reason for your problems is a bug in aroma.core/R.filesets that
causes hiccups when there are parentheses in the file/array names.
This will be solved in the next release of aroma.core and R.filesets.
Until that is available, please use the provided patches:

library(aroma.affymetrix);
downloadPackagePatch(R.filesets);
downloadPackagePatch(aroma.core);
downloadPackagePatch(aroma.affymetrix);

Let me know if this helps.

/Henrik

On Fri, May 14, 2010 at 6:17 PM, Richard Beyer rpbe...@gmail.com wrote:
 Hi All,

 I am having a new problem with code I've run many times in the past.
 I guess something changed and I was hoping someone else has seen
 something similar and could point me in the right direction.  I have
 affy rat ST chips (also same error with mouse ST).  I do the usual
 preprocessing (using Mark Robinson's doEverything script).  All is
 well as far as getting results, plotRle and plotNuse work fine. I have
 also executed the command in doEverything one at a time and seen no
 errors.

 The problem appears here:

 e.AndersonRat1   -
 doEverything(AndersonRatST_10.03.12.all_probes_bg_qn,
 RaGene-1_0-st-v1, getExpression=TRUE, doNorm=FALSE,
 doResiduals=TRUE)
  .
  .
  .
  .
 Calculating PLM residuals...done
 Warning message:
 In fitfcn(y) :
  Ignoring a unit group when fitting probe-level model, because it has
 a ridiculously large number of data points: 6515x50  5000x1
 plotRle(e.AndersonRat1$qam,main=RLE e.AndersonRat1$qam probe level QN)

 rs - e.AndersonRat1$res
 ae - ArrayExplorer(rs)
 setColorMaps(ae, c(log2,log2pos,rainbow))
 process(ae, interleaved=auto)
 Error in readCelHeader(pathname) :
  Cannot read CEL file header. File not found: NA/NA
 In addition: Warning messages:
 1: In min(x) : no non-missing arguments to min; returning Inf
 2: In max(x) : no non-missing arguments to max; returning -Inf


 R version 2.11.0 (2010-04-22)
 x86_64-redhat-linux-gnu

 locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=C
  [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C
               LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices datasets  utils     methods   base

 other attached packages:
  [1] preprocessCore_1.10.0  affyio_1.16.0          Biobase_2.8.0
   aroma.affymetrix_1.5.0 aroma.apd_0.1.7        affxparser_1.20.0
  [7] R.huge_0.2.0           aroma.core_1.5.0       aroma.light_1.15.1
   matrixStats_0.2.1      R.rsp_0.3.6            R.cache_0.3.0
 [13] R.filesets_0.8.1       digest_0.4.2           R.utils_1.4.0
   R.oo_1.7.2             affy_1.24.2            R.methodsS3_1.2.0

 loaded via a namespace (and not attached):
 [1] tools_2.11.0


 All of this code was working with R 2.10.0.

 There seems to be lots of CEL files in the right places.  For example:

     Pathname: 
 plmData/AndersonRatST_10.03.12.all_probes_bg_noqn,RBC,RMA/RaGene-1_0-st-v1/Anderson_PG50_042210_(RaGene-1_0-st-v1),residuals.CEL

 I'm not sure how best to track this down.  If anyone has a suggestion
 or pointer, I'd be very grateful..

 Thanks much,
 Dick
 ***
 Richard P. Beyer, Ph.D. University of Washington
 Tel.:(206) 616 7378     Env.  Occ. Health Sci. , Box 354695
 Fax: (206) 685 4696     4225 Roosevelt Way NE, # 100
                        Seattle, WA 98105-6099
 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html
 http://staff.washington.edu/~dbeyer
 ***

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Error in using the extract() function

2010-07-06 Thread Henrik Bengtsson
Hi Johan,

On Wed, Jun 30, 2010 at 9:13 AM, Johan Staaf johan.st...@med.lu.se wrote:
 Dear Henrik,
 I get an error when trying to extract CRMAv2 processed data when using the
 extract() function like below.

 cesSamples - extract(cesNList[[chipType]], assay.vector)

 The error occurs when the assay vector contains sample names with
 parentheses in them, like:
 WHOOP_p_STY30_(CO-108057)_Mapping250K_Sty_H01_107610

 However, there is no errors in the actual processing of the data, meaning
 that I get the file:
 WHOOP_p_STY30_(CO-108057)_Mapping250K_Sty_H01_107610,chipEffects.CEL

Correct, this is because when you use extract() to pull out a subset
of the arrays, extract() is calling indexOf() and it is in the latter
there is a bug.  When you process a data set, indexOf() is not used in
your case, which is why there is no issue/error.

As you might have noticed from the discussion on the mailing list,
this problem is related to recent reports by others who also use
parentheses in their filenames.

I've solved the bug for the next release of aroma.core and R.filesets.
 Until that is available, please use the provided patches:

library(aroma.affymetrix);
downloadPackagePatch(R.filesets);
downloadPackagePatch(aroma.core);
downloadPackagePatch(aroma.affymetrix);

Let me know if this helps.

/Henrik



 Best regards
 Johan

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest
 version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Microsoft Visual C++ Runtime Library

2010-07-01 Thread Henrik Bengtsson
It sounds like on of your CEL files are corrupt.  Start out with the 5
CEL files that work and add other CEL files one by one to the
directory to figure out which work and which do not.  Also, the CEL
files should roughly be of the same file size; if one is much
different that is a likely clue that it may be corrupt.

Details: Ideally a corrupt CEL file should not crash R, but rather
generate a nice error message.  Unfortunately, it is the low-level
Affymetrix Fusion SDK code that cause it to crash, which is beyond R
and aroma.affymetrix.

/Henrik

On Thu, Jul 1, 2010 at 11:36 PM, Liang Cheng vikingch...@gmail.com wrote:
 Thank you, Pierre,
 in the  beginning:
 cs - AffymetrixCelSet$byName(1,
 chipType=Mapping250K_Nsp)
 if I try to put 10 CEL files, a window will come out and it shows that:
  the application has requested the runtime to terminate it in an
 unusaul way.

 But if there are 5 CEL files, it works well.
 I appreciate your help,
 Liang

 2010/7/1 Pierre Neuvial pie...@stat.berkeley.edu

 Thanks, and when do you get an error ? Can you paste the error message
 and the output of traceBack() ?

 Pierre

 On Thu, Jul 1, 2010 at 10:26 AM, Liang Cheng vikingch...@gmail.com
 wrote:
  Thank you, Pierre,
  the following is the sessionInfo and code:
 
  R version 2.11.0 (2010-04-22)
  i386-pc-mingw32
  locale:
  [1] LC_COLLATE=English_United States.1252
  [2] LC_CTYPE=English_United States.1252
  [3] LC_MONETARY=English_United States.1252
  [4] LC_NUMERIC=C
  [5] LC_TIME=English_United States.1252
  attached base packages:
  [1] stats graphics  grDevices utils datasets  methods   base
  other attached packages:
   [1] preprocessCore_1.10.0  aroma.affymetrix_1.6.0 aroma.apd_0.1.7
   [4] affxparser_1.20.0  R.huge_0.2.0   aroma.core_1.6.0
   [7] aroma.light_1.16.0 matrixStats_0.2.1  R.rsp_0.3.6
  [10] R.cache_0.3.0  R.filesets_0.8.2   digest_0.4.2
  [13] R.utils_1.4.2  R.oo_1.7.3 R.methodsS3_1.2.0
 
  Code:
 
   library(aroma.affymetrix)
  cs - AffymetrixCelSet$byName(1,
  chipType=Mapping250K_Nsp)
  qn - QuantileNormalization(cs)
  csQN - process(qn, verbose=TRUE)
  plm - RmaCnPlm(csQN, combineAlleles=TRUE, mergeStrands=TRUE)
  fit(plm, verbose=TRUE)
  ces - getChipEffectSet(plm)
  exData - extractDataFrame(ces, units=NULL, addNames=TRUE)
  write.table(exData,file=fileName.txt,row.names=FALSE)
 
  thank you very much,
 
  Liang
 
 
  2010/7/1 Pierre Neuvial pie...@stat.berkeley.edu
 
  Hi,
 
  Could you please report the output of sessionInfo() and traceback(),
  and post a complete code example ?
 
  Pierre
 
  On Tue, Jun 29, 2010 at 10:09 AM, Liang Cheng vikingch...@gmail.com
  wrote:
   Hello everyone,
   I meet this error when I try to read 10 CEL files by using
   AffymetrixCelSet:
  
   the application has requested the runtime to terminate it in an
   unusaul
   way.
  
   Can someone help me?
   thanks a lot,
   Liang
  
   --
   When reporting problems on aroma.affymetrix, make sure 1) to run the
   latest
   version of the package, 2) to report the output of sessionInfo() and
   traceback(), and 3) to post a complete code example.
  
  
   You received this message because you are subscribed to the Google
   Groups
   aroma.affymetrix group with website http://www.aroma-project.org/.
   To post to this group, send email to
   aroma-affymetrix@googlegroups.com
   To unsubscribe and other options, go to
   http://www.aroma-project.org/forum/
  
 
  --
  When reporting problems on aroma.affymetrix, make sure 1) to run the
  latest version of the package, 2) to report the output of sessionInfo()
  and
  traceback(), and 3) to post a complete code example.
 
 
  You received this message because you are subscribed to the Google
  Groups
  aroma.affymetrix group with website http://www.aroma-project.org/.
  To post to this group, send email to aroma-affymetrix@googlegroups.com
  To unsubscribe and other options, go to
  http://www.aroma-project.org/forum/
 
  --
  When reporting problems on aroma.affymetrix, make sure 1) to run the
  latest
  version of the package, 2) to report the output of sessionInfo() and
  traceback(), and 3) to post a complete code example.
 
 
  You received this message because you are subscribed to the Google
  Groups
  aroma.affymetrix group with website http://www.aroma-project.org/.
  To post to this group, send email to aroma-affymetrix@googlegroups.com
  To unsubscribe and other options, go to
  http://www.aroma-project.org/forum/
 

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the
 latest version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to
 

Re: Where can I download the CDF file? (Was: Re: [aroma.affymetrix] problem with Arguments$getInstanceOf(dataSet, SnpChipEffectSet))

2010-07-01 Thread Henrik Bengtsson
Please see FAQ. 2007-05-24 on http://aroma-project.org/FAQ

/Henrik

On Fri, Jun 25, 2010 at 10:49 AM, Liang Cheng vikingch...@gmail.com wrote:
 Henrik,
 I found that if I want to read CEL files, I have to get the CDF files. Where
 can I get the ones mentioned in your slider, especially the 250k ones?
 thanks a lot,
 Viking

[snip]

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Problem with GLAD on linux cluster

2010-07-01 Thread Henrik Bengtsson
Hi Christian.

On Tue, Jun 29, 2010 at 3:39 PM, cstratowa
christian.strat...@vie.boehringer-ingelheim.com wrote:
 Dear Henrik,

 Until now I have used aroma.affymetrix_1.1.0 with R-2.8.1 and could
 run my analysis on our sge-cluster w/o any problems.

 Now I have upgraded to R-2.11.1 and to aroma.affymetrix_1.6.2 and are
 curently testing with 8 chips whether my package based on
 aroma.affymetrix still works on the cluster. The normalization step on
 a server did run fine, howeever, distributing the 8 samples on the
 cluster to run GladModel() resulted in the problem that 3 of 8 cluster
 nodes did stop with the following error message:

 Loading required package: GLAD
 ...
 Loading required package: RColorBrewer
 Loading required package: Cairo
 Error in list(`computeCN(aroma, model = model, arrays = arrays[i],
 chromosomes = 1:23, ref` = environment,  :

 [2010-06-29 15:08:49] Exception: Cannot save to file. Temporary file
 already exists: ~/.Rcache/aroma.affymetrix/idChecks/
 a1c33926939ee43fbed83ae69301d215.tmp
  at throw(Exception(...))
  at throw.default(Cannot save to file. Temporary file already
 exists: , pathn
  at throw(Cannot save to file. Temporary file already exists: ,
 pathnameT)
  at saveObject.default(list(key = key, keyIds = lapply(key, digest2),
 id = id),
  at saveObject(list(key = key, keyIds = lapply(key, digest2), id =
 id), idPathn
  at getAverageFile.AffymetrixCelSet(ces, force = force, verbose =
 less(verbose)
  at NextMethod(generic = getAverageFile, object = this, indices =
 indices, ..
  at getAverageFile.ChipEffectSet(ces, force = force, verbose =
 less(verbose))
  at NextMethod(generic = getAverageFile, object = this, ...)
  at getAverageFile.SnpChipEffectSet(ces, force = force, verbose =
 less(verbose)
  at NextMethod(generic = getAverageFile, object = this, ...)
  at getAverageFile.CnChipEffectS
 Calls: computeCN ... saveObject.default - throw - throw.default -
 throw - throw.Exception
 Execution halted

 Interestingly, on the other 5 nodes GladModel() seems to run fine.

 Do you have any idea what the reason for this problem might be?

This seems to be due to a race condition, because several processes
calls getAverageFile() on the same data set (set of data files).  It
has nothing to do with the GladModel - that is only calling
getAverageFile() in order to calculate the average signal across all
samples in the data set.

More precisely, in this particular case it is saveObject() of R.utils
that detects that there already exist a temporary file (added file
name extension *.tmp) that is currently being created and written to
by another process.  This temporary file is renamed to its final name
when done.  The reason why didn't observe it before is most likely
because this additional feature was added to saveObject() in R.utils
v1.2.4:

Version: 1.2.4 [2009-10-30]
o ROBUSTIFICATION: Lowered the risk for saveObject() to leave an
  imcomplete file due to say power failures etc.  This is done by
  first writing to a temporary file, which is then renamed.  If the
  temporary file already exists, an exception is thrown.

Ok, that's the details explaining the error message and the traceback
you report.

So, did it work before?  Did you get valid estimates?  Probably,
because the way getAverageFile() is written it is unlikely that a
corrupt result file is created. For sure is that the calculations
where done multiple times if there were race conditions.

I'd like to put out a little disclaimer that although I try write
methods so that they work even when there are race conditions.
However, as you've noticed, I am also very conservative, that is, I
rather detect the race condition and throw an exception, than silently
ignore it.  Then plan is to loosen this up in the future. I just like
to say this here so that you understand my current design
decisions/plans.

I have to think about this particular case, because I could loosen up
getAverageFile() a bit, I think. However, at the moment it is better
if you take care of the race conditions yourself.  Assume you current
code looks something like this:

fln - FragmentLengthNormalization(ces);
cesN - process(fln);
seg - GladModel(cesN);
process(seg);

Then first you should know that the latter two lines are
computationally identical to [it is only slightly more complicated if
you use chip type pairs]:

ceR - getAverageFile(cesN);
seg - GladModel(cesN, ceR);
process(seg);

So, if you can synchronize the averaging by (conceptually only):

mutex - waitForMutex(foo);
ceR - getAverageFile(cesN);
releaseMutex(mutex);

then it should all be fine.  Replace waitForMutex()/releaseMutex()
with your favorite synchronization mechanism.  FYI, if there would be
a cross-platform bullet proof and generic synchronization mechanism in
R, I would internally add synchronization to lots of methods.

Hope this helps(?)

Henrik


 sessionInfo()
 R version 2.11.1 (2010-05-31)
 x86_64-unknown-linux-gnu

 locale:
 [1] C

 attached base packages:
 [1] stats     

Re: [aroma.affymetrix] Problems with Affymetrix 250K Sty2 arrays after CRMAv2 analysis

2010-06-23 Thread Henrik Bengtsson
Hi Johan,

this certainly looks like a computational hiccup.  I never seen it,
though I can imagine various ways how it could happen.  Instead of
guessing, do you have a complete script that you did, you did you type
the commands on the command line one by one?  You are saying you did
the analysis according to the tutorial for 10-500K analysis
(CRMAv1), but the steps you describe are from the CRMAv2 method.

An most importantly, because I think the answer is in the here, how
did you extract the summarized data and how did you generate the plot?
 How did you calculate your reference signals?  One of my guesses is
that the upper CN band, which looks to be correct, are for the
non-polymorphic CN loci, whereas the lower one, which is shifted, is
from SNP signals.  It could be that you are somehow only plotting
allele-specific CA and/or CB signals and not C=CA+CB for SNPs.

So, if you send all the commands verbatim, I can let you know what
needs to be changed.

/Henrik


On Wed, Jun 23, 2010 at 9:02 AM, Johan Staaf johan.st...@med.lu.se wrote:
 Hi Henrik
 I have a question about strange looking genomic profiles for Affymetrix 250K
 Sty2 chips from GSE14994 after CRMAv2 analysis (please see attached png
 figures). Processing was done using calibration for allelic crosstalk,
 normalization for nucleotide-position probe sequence effects, probe
 summarization, and pcr fragment normalization according to the tutorial for
 10-500K analysis. CN was obtained by comparison to normal samples also
 obtained from public repositories and processed simultaneously.

 Do you know of the cause of this, and how it could be corrected?

 Best regards
 Johan

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest
 version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] problem with Arguments$getInstanceOf(dataSet, SnpChipEffectSet)

2010-06-23 Thread Henrik Bengtsson
Hi.

On Wed, Jun 23, 2010 at 11:04 AM, mortiz mortiz...@gmail.com wrote:
 hi everyone,

 im trying to develop a function based on FragmentLengthNormalization,
 but when i try to execute my new function it gives me the next error
 message:

 Error in process.NormalRegions(normalReg, verbose = verbose) :
  attempt to apply non-function

 sessionInfo()
 R version 2.9.2 (2009-08-24)
 i386-pc-mingw32

It's really time to update your R installation.


 locale:
 LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.
 1252;LC_MONETARY=English_United States.
 1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods
 base

 other attached packages:
  [1] sfit_0.1.8             setRNG_2009.11-1
 MASS_7.2-48            aroma.affymetrix_1.3.0 aroma.apd_0.1.7
 affxparser_1.16.0      R.huge_0.2.0           aroma.core_1.3.1
  [9] aroma.light_1.15.1     matrixStats_0.1.8
 R.rsp_0.3.6            R.filesets_0.6.5       digest_0.4.1
 R.cache_0.2.0          R.utils_1.2.4          R.oo_1.6.7
 [17] R.methodsS3_1.0.3

 loaded via a namespace (and not attached):
 [1] tools_2.9.2

 traceback()
 2: process.NormalRegions(prueba, verbose = verbose)
 1: process(prueba, verbose = verbose)

 the thing is that if i try FragmentLengthNormalization it does not
 give any problem, but if I do
 source(I:/aroma/FragmentLengthNormalization.R)

 then when I execute FragmentLengthNormalization over my ces variable
 it gives the same error

 str(ces)
 Classes 'CnChipEffectSet', 'SnpChipEffectSet', 'ChipEffectSet',
 'ParameterCelSet', 'AffymetrixCelSet', 'AffymetrixFileSet',
 'AromaPlatformInterface', 'AromaMicroarrayDataSet',
 'GenericDataFileSet', 'FullNameInterface', 'Object'  atomic [1:1] NA
  ..- attr(*, .env)=environment: 0x0419c000
  ..- attr(*, ...instantiationTime)= POSIXct[1:1], format:
 2010-06-23 10:28:05

 can anyone help me with this???

I cannot see how you would get an error because you source() the
FragmentLengthNormalization.R source file - that is really weird and
often there is probably a very simple explanation.

If you still mean that you can reproduce this error by sourcing
FragmentLengthNormalization.R, then lets focus on that first and
forget about your new code.   Please provide a complete script showing
what you are doing.  Also, you can always do:

debug(process.FragmentLengthNormalization);
cesN - process(fln, verbose=-50);

so that you can step through it to figure out exactly at what
statement the error occurs.  Make sure to start out from  a fresh R
session.   If FLN doesn't give an error, you can move on to debug()
your own method(s).

Hope this helps

/Henrik


 thanks!! :)

 maria o.

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Residual plot vertical separation

2010-06-22 Thread Henrik Bengtsson
Hi Mamum,

On Fri, Jun 18, 2010 at 12:25 PM, Mamun Rashid mamunbabu2...@gmail.com wrote:

 Hi everyone,
 I am analysing some affymetrix exon array data. I have been performing some
 Quality checking of the  data. I have plotted the residulas from the plm fit 
 of
 raw intensity data.

 I am using the core CDF file HuEx-1_0-st-v2,core,A20071112,EP

  chipType - HuEx-1_0-st-v2
  cdf - AffymetrixCdfFile$byChipType(chipType,tags=core,A20071112,EP)

I'm not sure where/when you've downloaded this CDF file, because it's
name does not have the coreR3 tag but the core tag that is
available at

http://aroma-project.org/chipTypes/HuEx-1_0-st-v2/transcriptClustersCDFs

You probably have the same file as
HuEx-1_0-st-v2,coreR3,A20071112,EP.CDF.  Compare the checksum you get
with the one below.

 cdf - AffymetrixCdfFile$byChipType(HuEx-1_0-st-v2,coreR3,A20071112,EP);
 cdf
AffymetrixCdfFile:
Path: annotationData/chipTypes/HuEx-1_0-st-v2
Filename: HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf
Filesize: 38.25MB
Chip type: HuEx-1_0-st-v2,coreR3,A20071112,EP
RAM: 0.00MB
File format: v4 (binary; XDA)
Dimension: 2560x2560
Number of cells: 6553600
Number of units: 18708
Cells per unit: 350.31
Number of QC units: 1
 getChecksum(cdf);
[1] e7b0bacd27699534d125b16266d7cc09

If the checksums are identical, the file content is identical.


  exp_name - Affy-Exon
  cs - AffymetrixCelSet$byName(exp_name,cdf=cdf)


 ## *** Background Correction *** ##

  bc - RmaBackgroundCorrection(cs, tag=core)
  csBC - process(bc,verbose=verbose)    ## Background corrected Raw Data


 ## *** summarization with PLM *** ##
  plmTr - ExonRmaPlm(csBC, mergeGroups=TRUE)
  fit(plmTr, verbose=verbose)

 ## *** Residual calculation of PLM fit *** ##
  rs - calculateResiduals(plmTr, verbose=verbose)

 # To browse spatial false-colored images of the residuals

  ae - ArrayExplorer(rs)
  setColorMaps(ae, c(log2,log2neg,rainbow, log2,log2pos,rainbow))
  process(ae, interleaved=auto, verbose=verbose)
  display(ae)

 Now I see a clear vertical separation between most of the residual plots.

I am not 100% sure what you mean by a clear vertical separation, but
I guess you mean that there is a narrow white-ish band near the
left-right center of the array.

This is the likely reason: The PLM

plmTr - ExonRmaPlm(csBC, mergeGroups=TRUE);

is only done only using the PM probes, and any residuals are therefore
only defined for those probes.  For all other probes (e.g. MMs but
also all the PMs not included in the CDF) there are no residuals
defined (which show up as white in your plots).   Next, if you look at
the distribution of the PMs *as defined by* the CDF
(HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf), there are fewer PMs in that
center band than on the rest of the array, which means you will see
fewer residuals in that band as well.  There are also less fewer PMs
in the left part compared with the right part of the array, which is
probably the reason why the right part seems to be darker than the
left part (when you look at the residual plots).

The spatial distribution of PMs *according to the CDF* can be studied
as follows:

library(aroma.affymetrix);
downloadPackagePatch(aroma.core);
verbose - Arguments$getVerbose(-8, timestamp=TRUE);
cdf - AffymetrixCdfFile$byChipType(chipType,tags=coreR3,A20071112,EP);

# Get a spatial imageshowing the PMs (as defined by the CDF)
# NOTE: The first time you do this for a new CDF, this will be very
slow (~20 mins)
# because internally extractDataFrame(cdf) is called.  After that,
it'll be fast.
img - getImage(cdf, field=isPm, verbose=verbose);
img - 1-img; # PM=1 (white) - PM=0 (black).

# Write to file...
pathname - sprintf(%s,isPm.png, getFilename(cdf));
EBImage::writeImage(img, file=pathname);

# This image is available at:
# 
http://www.aroma-project.org/images/public/chipTypes/HuEx-1_0-st-v2/HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf,isPm.png
# By just looking at it, you can see it is darker (more PMs) in
# the right part than the left.  You also see the much lower
# density of PMs in the narrow band in the middle.

# Average density of PMs in left and right parts
cols - 1:(nbrOfColumns(cdf)/2);
imgL - img[cols,];  # sic! - the 'img' is rotated
imgR - img[-cols,];
print(mean(imgL == 0));
## [1] 0.1414371
print(mean(imgR == 0));
## [1] 0.106
print(mean(imgR == 0) / mean(imgL == 0));
## [1] 1.335439

Thus, there are 19% PMs in the right part and 14% in the left part (as
defined by the HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf), which means
there 33.5% more PMs in the right part compared with the left part.

 Some of the plots revealed some artefacts and scratches shich might
 occured due to hybridization and scanning problem.

Almost all artifacts show up in the residual plots, because the
artifact often affect only one of the probes in a probeset (since
Affymetrix designed the arrays so that probes in a probeset are spread
out on the array to avoid all being affected).  Since it only affects
one of the probes its residual will be an outlier 

Re: [aroma.affymetrix] How to convert a xxx.CEL file to a yyy.txt file?

2010-06-22 Thread Henrik Bengtsson
On Wed, Jun 23, 2010 at 7:29 AM, Liang Cheng vikingch...@gmail.com wrote:
 I want to read the data in it and then process it.
 so: the method to read it  all kinds of methods to process it

I am sorry, but that is still an extremely vague specification of what
you are going to do.

The best I can tell you is to have a look at the various vignettes
online - http://www.aroma-project.org/ - to get a feel on the vast
number of alternatives you have.  You probably also want to look at
the various Bioconductor packages supporting Affymetrix data -
http://www.bioconductor.org/.

/Henrik


 thank you

 2010/6/23 Henrik Bengtsson h...@stat.berkeley.edu

 On Wed, Jun 23, 2010 at 7:00 AM, Liang Cheng vikingch...@gmail.com
 wrote:
  Thank you, Henrik
 
  So there is no function from aroma package, which can deal with the
  xx.cel
  file?

 Please explain what you mean deal with.   There are hundreds of
 methods in the aroma.affymetrix package that process CEL files in
 various ways.

 /Henrik

 
  Viking
 
  2010/6/22 Henrik Bengtsson h...@stat.berkeley.edu
 
  Hi Viking,
 
  On Tue, Jun 22, 2010 at 6:28 PM, Viking vikingch...@gmail.com wrote:
   I am new to aroma-project. I didn't find some materials to learn how
   to do it.
  
   Can somebody please help me? Thanks a lot.
 
  although you new here, please allow me to use a bit of sarcasm,
  because then I can award you the price of 'Posting The Least Precise
  Question' on the list during the last four years ;)
 
  More seriously, what are you trying to do?  A CEL file contains (avg.
  intensity, std.dev. of intensity, nbr of pixels) for each probe.  In
  addition to this there is some information about probes flagged as
  outliers by the Affymetrix scanner/image analysis software.  You can
  find more information in the help pages of the affxparser software.
  There is not information about probesets, genes etc.  Just so you
  know.
 
  The easiest to access all the information in a CEL file is probably
  use the low-level affxparser package and its readCel() function, e.g.
 
  pathname - 0001-7,10K,15-08-2006.CEL;
  data - readCel(pathname, readXY=TRUE, readIntensities=TRUE,
  readStdvs=TRUE, readPixels=TRUE);
 
  List of 8
   $ header     :List of 14
   ..$ filename      : chr C:/Users/hb/Documents/My
  Data/rawData/Jeremy_2007-10k
  /Mapping10K_Xba142/0001-7,10K,15-08-2006.CEL
   ..$ version       : int 4
   ..$ cols          : int 658
   ..$ rows          : int 658
   ..$ total         : int 432964
   ..$ algorithm     : chr Percentile
   ..$ parameters    : chr
  Percentile:75;CellMargin:2;OutlierHigh:1.500;OutlierL
 
 
  ow:1.004;AlgVersion:6.0;FixedCellSize:TRUE;FullFeatureWidth:5;FullFeatureH|
  __t
  runcated__
   ..$ chiptype      : chr Mapping10K_Xba142
   ..$ header        : chr
  Cols=658\nRows=658\nTotalX=658\nTotalY=658\nOffsetX=0
  \nOffsetY=0\nGridCornerUL=235 130\nGridCornerUR=3603
  136\nGridCornerLR=359| __t
  runcated__
   ..$ datheader     : chr [25..29720]  0001-7 10K 15-08-2006:CLS=3715
  RWS=3715
  XIN=1  YIN=1  VE=30        2.0 08/15/06 10:02:13 50206820  M10   \024
   \02| __t
  runcated__
   ..$ librarypackage: chr 
   ..$ cellmargin    : int 2
   ..$ noutliers     : int 1527
   ..$ nmasked       : int 0
   $ x          : int [1:432964] 0 1 2 3 4 5 6 7 8 9 ...
   $ y          : int [1:432964] 0 0 0 0 0 0 0 0 0 0 ...
   $ intensities: num [1:432964] 271 16038 282 17471 138 ...
   $ stdvs      : num [1:432964] 34.9 2321.7 36 3107.4 16.9 ...
   $ pixels     : int [1:432964] 9 9 9 9 9 9 9 9 9 9 ...
   $ outliers   : int [1:1527] 272 307 345 360 486 624 952 1019 1037 1155
  ...
   $ masked     : NULL
 
  See help(readCel, package=affxparser) for more information.
 
  Then you can write whichever fields you like to to file.
 
  Hope this helps
 
  Henrik
 
  
   Viking
  
   --
   When reporting problems on aroma.affymetrix, make sure 1) to run the
   latest version of the package, 2) to report the output of
   sessionInfo() and
   traceback(), and 3) to post a complete code example.
  
  
   You received this message because you are subscribed to the Google
   Groups aroma.affymetrix group with website
   http://www.aroma-project.org/.
   To post to this group, send email to
   aroma-affymetrix@googlegroups.com
   To unsubscribe and other options, go to
   http://www.aroma-project.org/forum/
  
 
  --
  When reporting problems on aroma.affymetrix, make sure 1) to run the
  latest version of the package, 2) to report the output of sessionInfo()
  and
  traceback(), and 3) to post a complete code example.
 
 
  You received this message because you are subscribed to the Google
  Groups
  aroma.affymetrix group with website http://www.aroma-project.org/.
  To post to this group, send email to aroma-affymetrix@googlegroups.com
  To unsubscribe and other options, go to
  http://www.aroma-project.org/forum/
 
  --
  When reporting problems on aroma.affymetrix, make sure 1) to run the
  latest
  version of the package, 2) to report the output

Re: [aroma.affymetrix] an error on locating cdf

2010-06-21 Thread Henrik Bengtsson
Hi.

On Mon, Jun 21, 2010 at 2:26 PM, Albyn albyn.dhun...@gmail.com wrote:
 Dear all,

 I am new to R and aroma.affymetrix both. I have 25 SNP6.0 Cel files
 and I have to find out LOH and UPD. I would like to use
 aroma.affymetrix could anybody suggest me if it is a good idea to use
 this package???

You can do lots of different types of preprocessing in
aroma.affymetrix, particularly CRMAv2.  You can do total CN
segmentation, e.g. CBS.   To do LOH and UPD analysis, you'll need
parental-specific CN (PSCN) analysis, e.g PSCN segmentation.   We are
still working on getting PSCN analysis/segmentation into a standard
pipeline form, and that will take time.  However, you can still
generate raw PSCN signals and use those to look/confirm LOH regions,
e.g. via plotting allele B fraction along genome.  See for instance,

  http://aroma-project.org/vignettes/tumorboost-highlevel

I haven't used it in a while, but some people also use dChip for this.


 I am trying to install aroma.affymetrix and following the steps given
 on vignette.

Could you please be more precise with which vignette you are looking at?

 I got stone on my second step.

 cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags=Full);
 Error in list(`AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags =
 Full)` = environment,  :

 [2010-06-21 14:12:57] Exception: Could not locate a file for this chip
 type: GenomeWideSNP_6,Full
  at throw(Exception(...))
  at throw.default(Could not locate a file for this chip type: ,
 paste(c(chipType, tags), collapse = ,))
  at throw(Could not locate a file for this chip type: ,
 paste(c(chipType, tags), collapse = ,))
  at method(static, ...)
  at AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags = Full)

 Could anyone suggest me what might have been here??

Please read the Setup instructions at

   http://aroma-project.org/setup/annotationData

The CRMAv2 vignette for processing GenomeWideSNP_6 arrays should also
be rather clear about this;

  http://aroma-project.org/vignettes/CRMAv2

Does this help?

/Henrik

 /yogesh

 Here is my session Info:
 sessionInfo()
 R version 2.11.0 (2010-04-22)
 i386-apple-darwin9.8.0

 locale:
 [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods
 base

 other attached packages:
  [1] genomewidesnp6Crlmm_1.0.2 aroma.affymetrix_1.6.0
 aroma.apd_0.1.7           affxparser_1.20.0
 R.huge_0.2.0              aroma.core_1.6.0
  [7] aroma.light_1.16.0        matrixStats_0.2.1
 R.rsp_0.3.6               R.filesets_0.8.2
 digest_0.4.2              R.cache_0.3.0
 [13] R.utils_1.4.2             R.oo_1.7.3
 R.methodsS3_1.2.0

 loaded via a namespace (and not attached):
 [1] tools_2.11.0

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Problem with Total Copy Number Vignette

2010-06-07 Thread Henrik Bengtsson
Hi Jack.

On Mon, Jun 7, 2010 at 3:36 PM, Jack Yu j.yu...@gmail.com wrote:
 Hello,

 I sent an e-mail earlier regarding errors in running the total copy
 number vignette, but please disregard that as it turns out it was just
 an issue with the annotation files. Sorry for the inconveniences.

Good.  I've just send a message to that thread/discussion ('Error
during Total copy number analysis using CRMA v1', June 4, 2010)
closing it.  That way anyone reading the forum archives can see it was
solved.  In the future, please always try to reply to original thread
saying that it's been solved or not.

 However, I've encountered another problem that I'm hoping someone
 could help me with.

 After running the normalization of the chip effects using:

 cesNList[[chipType]] - process(fln, verbose=verbose)

 I encountered the error of:

 Error in list(`process(fln, verbose = verbose)` = environment,
 `process.FragmentLengthNormalization(fln, verbose = verbose)` =
 environment,  :

 [2010-06-07 09:34:07] Exception: Cannot fit target function to enzyme,
 because there are no (finite) data points that are unique to this
 enzyme: 1

Are you following the same vignette ('Total copy number analysis using
CRMA v1 (10K, 100K, 500K)') as you did your previous thread?  Then my
best guess is that you forgot to do:

fit(plm, verbose=verbose);

before moving on to the PCR fragment length normalization.  If that
does not help, let me know what the output of:

ces - getChipEffectSet(plm);
print(ces);


More importantly, is there a reason why you want to use CRMAv1 and not
CRMAv2?  Note that the latter is recommended for GenomeWideSNP_6 data
sets.  To use CRMAv2, see vignette 'Estimation of total copy numbers
using the CRMA v2 method (10K-GWS6)'
[http://aroma-project.org/vignettes/CRMAv2].

...or even easier, just use the new doCRMAv1() or doCRMAv2(), cf.

  http://aroma-project.org/blocks

/Henrik


 sessionInfo()
 R version 2.11.1 (2010-05-31)
 powerpc-apple-darwin9.8.0

 locale:
 [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
  [1] aroma.affymetrix_1.6.0 aroma.apd_0.1.7        affxparser_1.20.0
   R.huge_0.2.0           aroma.core_1.6.0
  [6] aroma.light_1.16.0     matrixStats_0.2.1      R.rsp_0.3.6
   R.cache_0.3.0          R.filesets_0.8.1
 [11] digest_0.4.2           R.utils_1.4.0          R.oo_1.7.2
   R.methodsS3_1.2.0

 traceback()
 8: throw.Exception(Exception(...))
 7: throw(Exception(...))
 6: throw.default(Cannot fit target function to enzyme, because there
 are no (finite) data points that are unique to this enzyme: ,
       ee)
 5: throw(Cannot fit target function to enzyme, because there are no
 (finite) data points that are unique to this enzyme: ,
       ee)
 4: getTargetFunctions.FragmentLengthNormalization(this, verbose = 
 less(verbose))
 3: getTargetFunctions(this, verbose = less(verbose))
 2: process.FragmentLengthNormalization(fln, verbose = verbose)
 1: process(fln, verbose = verbose)

 Thanks in advance,
 Jack

 --
 Jack Y. Yu
 Washington University in St.Louis
 (505) 920-0701

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] re: base pair normalization in CRMAv2

2010-06-07 Thread Henrik Bengtsson
On Mon, Jun 7, 2010 at 4:45 PM, seth redmond
seth.redm...@imperial.ac.uk wrote:
 sadly it appears my CDF is square:
 print(cdf)
 AffymetrixCdfFile:
 Path: annotationData/chipTypes/Ag_SNP_1m520721
 Filename: Ag_SNP_1m520721.CDF
 Filesize: 243.65MB
 Chip type: Ag_SNP_1m520721
 RAM: 0.00MB
 File format: v4 (binary; XDA)
 Dimension: 2560x2560
 Number of cells: 6553600
 Number of units: 404170
 Cells per unit: 16.21
 Number of QC units: 4

Ok, then that is ruled out.


 So the script to construct the ACS just fills in the sequences based on
 postition, it doesn't map to probe names?

There is no such thing as probe names, only probeset/unit names.
The only unique/safe way to refer to a cell (probe) on an array is
by its (x,y) coordinate.

What cells (probes) belong to what units (probesets) is defined in
the CDF file.  The CDF may get updated over time or custom CDFs may be
used.  Thus cells may be belong to different units, depending on CDF
used.  Contrary, the cells, the cell sequences, and their (x,y)
locations never change.

This is the reason why one only want to use (x,y) coordinates to infer
the probe sequences.

 I guess it could be worth
 hand-checking a few to see if they match up.

Yes, you want to do something like this:

library(aroma.affymetrix);
chipType - GenomeWideSNP_6;
cdf - AffymetrixCdfFile$byChipType(chipType, tags=Full);
acs - AromaCellSequenceFile$byChipType(chipType);

ugc - getUnitGroupCellMap(cdf, units=1002:1003);
str(ugc);

Classes 'UnitGroupCellMap' and 'data.frame':12 obs. of  3 variables:
 $ unit : int  1002 1002 1002 1002 1002 1002 1003 1003 1003 1003 ...
 $ group: int  1 1 1 2 2 2 1 1 1 2 ...
 $ cell : int  640620 52 6150942 640619 51 6150941 ...

seqs - readSequences(acs, cells=ugc$cell);
str(seqs);
 chr [1:12] AAGCCTTTCTTACCTCCAAATGTTG ...

ugcs - cbind(ugc, sequence=seqs);
print(ugcs);

   unit groupcell  sequence
1  1002 1  640620 AAGCCTTTCTTACCTCCAAATGTTG
2  1002 1  52 AAGCCTTTCTTACCTCCAAATGTTG
3  1002 1 6150942 AAGCCTTTCTTACCTCCAAATGTTG
4  1002 2  640619 AAGCCTTTCTTACCTCTAAATGTTG
5  1002 2  51 AAGCCTTTCTTACCTCTAAATGTTG
6  1002 2 6150941 AAGCCTTTCTTACCTCTAAATGTTG
7  1003 1 6212010 ATTCAGTAGGTCTGGTGAAATCTCA
8  1003 1 1346300 ATTCAGTAGGTCTGGTGAAATCTCA
9  1003 1 2406114 ATTCAGTAGGTCTGGTGAAATCTCA
10 1003 2 6212009 ATTCAGTAGGTCTAGTGAAATCTCA
11 1003 2 1346299 ATTCAGTAGGTCTAGTGAAATCTCA
12 1003 2 2406113 ATTCAGTAGGTCTAGTGAAATCTCA

I still like to return to what I said in my first reply:

 It is hard to tell what is happening and even if something goes wrong
 - try to zoom out a bit so you see most of the data cloud when
 plotting the signals after ACC.  It looks like the data is zoomed in
 to the lower quantiles.

So, if you redo those plots after ACC with a great xlim, e.g.
xlim=5*xlim it might not look that bad after all.

/Henrik





 On 5 Jun 2010, at 16:42, Henrik Bengtsson wrote:

 Hi.

 On Thu, Jun 3, 2010 at 4:59 PM, seth redmond
 seth.redm...@imperial.ac.uk wrote:

 yes, this is a custom chip. The code used to create the ACS file [...]

 See comments below; if you chip type does not have the same number of
 probe rows as probe columns, there is an error causing you to get
 incorrect sequences.  Is your CDF square of rectangular?  FYI, it
 helps me/us help you if report as much as possible when your give
 issue reports, e.g. print(cdf).

 and run the acc is below, as far as I remember it's pretty standard.

 Yes running the ACC is standard, but requires the correct
 probe-sequence files, since that is what is used to infer the probe
 pairs for allele pairs.

 Head of the input file is also included.

 You seem to have forgotten to send/paste this one.

 So even if the seq files were completely wrong I wouldn't necessarily
 expect
 to see this degree of wrongness?

 Again, from the plots along I was convinced something was wrong, but only
 from t

 Is there any way to skip the ACC step altogether?

 Yes, the input and the output of ACC are standard CEL sets.  That is,
 you can just pass the ACC's input set to the downstream step instead
 of the output set, e.g. csC - csR (replacing the ACC step).





 db - TabularTextFile(Ag_SNP_1m520721.ACS_input_file.txt,path=path);
 print(db);
 colClassPattern - c(^Probe (X|Y)=integer, ^(Probe Sequence|Target
 Strandedness)$=character);
 df - readDataFrame(db, colClassPattern=colClassPattern);
 cells - affy::xy2indices(x=df[[Probe X]], y=df[[Probe Y]],
 nr=nbrOfRows(cdf));

 Woops, my bad.  The example on how-to page 'How to: Create an Aroma
 Cell Sequence (ACS) file' [http://aroma-project.org/node/100] should
 read:

 cells - affy::xy2indices(x=df[[Probe X]], y=df[[Probe Y]],
 nr=nbrOfColumns(cdf));

 The example was only correct for chip types with square dimension,
 i.e. nbrOfRows(cdf) == nbrOfColumns(cdf).  When I wrote the example, I
 was mislead incorrectly to believe that 'nr' was number of rows.
 This is the only place where it was wrong

Re: [aroma.affymetrix] Error in sort.list(pairsToBuild) during AllelicCrosstalkCalibration

2010-06-07 Thread Henrik Bengtsson
Hi.

On Mon, Jun 7, 2010 at 4:41 PM, k o ott4...@gmail.com wrote:
 Dear Usergrop.

 while proccessing data from the 5000k Sty array (but not the Nsp set),
 I receive the folleowing error:

 acc - AllelicCrosstalkCalibration(csR, model=CRMAv2)
 csC - process(acc)
 Error in sort.list(pairsToBuild) :
  'x' must be atomic for 'sort.list'
 Have you called 'sort' on a list?
 Calls: process ... groupBySnpNucleotides.AromaCellSequenceFile - sort
 - sort.list
 In addition: Warning messages:
 1: In rm(idxs, seqsPP, positions, cellsPP, snpPosition, cells, pos) :
  object 'idxs' not found
 2: In rm(idxs, seqsPP, positions, cellsPP, snpPosition, cells, pos) :
  object 'seqsPP' not found
 3: In rm(idxs, seqsPP, positions, cellsPP, snpPosition, cells, pos) :
  object 'positions' not found
 4: In rm(idxs, seqsPP, positions, cellsPP, snpPosition, cells, pos) :
  object 'cellsPP' not found
 5: In rm(idxs, seqsPP, positions, cellsPP, snpPosition, cells, pos) :
  object 'snpPosition' not found
 Execution halted

 Any ideas what could be the cause?

Yes, it looks like you are not using a correct ACS file
(Mapping250K_Sty,.acs / 170,393,859 bytes).   Where did you get yours
from? Then one (Mapping250K_Sty,HB20080710.acs) you can download from
http://aroma-project.org/chipTypes/Mapping250K_Nsp-and-Mapping250K_Sty
is of size 170,394,014 bytes (after gunzip).

More, are you following one of the vignettes online, or another
documentation?  The reason why I ask is because you are dChip
annotation data files, i.e.

DChipGenomeInformation:
Pathname: annotationData/chipTypes/Mapping250K_Sty/Mapping500K genome
info hg17.txt

DChipSnpInformation:
Pathname: annotationData/chipTypes/Mapping250K_Sty/Mapping250K_Sty snp info.txt

It's been several years since we updated the documentation and moved
to use so called UGP and UFL annotation data files instead. We are no
longer using dChip annotation files in the aroma project. You can
download the UGP and UFL files for your chip type from:

http://aroma-project.org/chipTypes/Mapping250K_Nsp-and-Mapping250K_Sty

You certainly want to get the above correct before you start analyzing
your 5,000 arrays.

Also, I strongly recommend you to update to aroma.affymetrix v1.6.0.
Just reinstall according to http://aroma-project.org/install (already
installed/up-to-date packages will be skipped).  You may also want to
update to R v2.11.1; your R v2.10.1 is 6 mths old (and no longer
supported by R/Bioconductor officials; though aroma.affymetrix works
with it).

Hope this helps

/Henrik


 Thank you
 Karl-Heinz


 Session log:
 R version 2.10.1 (2009-12-14)
 Copyright (C) 2009 The R Foundation for Statistical Computing
 ISBN 3-900051-07-0

 R is free software and comes with ABSOLUTELY NO WARRANTY.
 You are welcome to redistribute it under certain conditions.
 Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

 R is a collaborative project with many contributors.
 Type 'contributors()' for more information and
 'citation()' on how to cite R or R packages in publications.

 Type 'demo()' for some demos, 'help()' for on-line help, or
 'help.start()' for an HTML browser interface to help.
 Type 'q()' to quit R.

 Attempting to load the environment 'package:R.utils'
 Loading required package: R.oo
 Loading required package: R.methodsS3
 R.methodsS3 v1.2.0 (2010-03-13) successfully loaded. See ?R.methodsS3 for 
 help.
 Loading required package: utils
 R.oo v1.7.2 (2010-04-13) successfully loaded. See ?R.oo for help.
 R.utils v1.4.0 (2010-03-24) successfully loaded. See ?R.utils for help.
 [Previously saved workspace restored]

 library(aroma.affymetrix)
 Loading required package: R.filesets
 Loading required package: digest
 R.filesets v0.8.1 (2010-04-22) successfully loaded. See ?R.filesets for help.
 Loading required package: aroma.core
 Loading required package: R.cache
 R.cache v0.3.0 (2010-03-13) successfully loaded. See ?R.cache for help.
 Loading required package: R.rsp
 R.rsp v0.3.6 (2009-09-16) successfully loaded. See ?R.rsp for help.
  Type browseRsp() to open the RSP main menu in your browser.
 Loading required package: matrixStats
 matrixStats v0.2.1 (2010-04-05) successfully loaded. See ?matrixStats for 
 help.
 Loading required package: aroma.light
 aroma.light v1.15.1 (2009-11-01) successfully loaded. See ?aroma.light for 
 help.
 aroma.core v1.5.0 (2010-02-22) successfully loaded. See ?aroma.core for help.
 Loading required package: aroma.apd
 Loading required package: R.huge
 R.huge v0.2.0 (2009-10-16) successfully loaded. See ?R.huge for help.
 Loading required package: affxparser
 aroma.apd v0.1.7 (2009-10-16) successfully loaded. See ?aroma.apd for help.
 aroma.affymetrix v1.5.0 (2010-02-22) successfully loaded. See
 ?aroma.affymetrix for help.
 log - verbose - Arguments$getVerbose(-8, timestamp=TRUE)
 options(digits=4) # Don't display too many decimals.

 chiptype='Mapping250K_Sty'
 project-haprefTrio

 cdf - 

Re: [aroma.affymetrix] re: base pair normalization in CRMAv2

2010-06-05 Thread Henrik Bengtsson
Hi.

On Thu, Jun 3, 2010 at 4:59 PM, seth redmond
seth.redm...@imperial.ac.uk wrote:
 yes, this is a custom chip. The code used to create the ACS file [...]

See comments below; if you chip type does not have the same number of
probe rows as probe columns, there is an error causing you to get
incorrect sequences.  Is your CDF square of rectangular?  FYI, it
helps me/us help you if report as much as possible when your give
issue reports, e.g. print(cdf).

 and run the acc is below, as far as I remember it's pretty standard.

Yes running the ACC is standard, but requires the correct
probe-sequence files, since that is what is used to infer the probe
pairs for allele pairs.

 Head of the input file is also included.

You seem to have forgotten to send/paste this one.

 So even if the seq files were completely wrong I wouldn't necessarily expect
 to see this degree of wrongness?

Again, from the plots along I was convinced something was wrong, but only from t

 Is there any way to skip the ACC step altogether?

Yes, the input and the output of ACC are standard CEL sets.  That is,
you can just pass the ACC's input set to the downstream step instead
of the output set, e.g. csC - csR (replacing the ACC step).





 db - TabularTextFile(Ag_SNP_1m520721.ACS_input_file.txt,path=path);
 print(db);
 colClassPattern - c(^Probe (X|Y)=integer, ^(Probe Sequence|Target
 Strandedness)$=character);
 df - readDataFrame(db, colClassPattern=colClassPattern);
 cells - affy::xy2indices(x=df[[Probe X]], y=df[[Probe Y]],
 nr=nbrOfRows(cdf));

Woops, my bad.  The example on how-to page 'How to: Create an Aroma
Cell Sequence (ACS) file' [http://aroma-project.org/node/100] should
read:

cells - affy::xy2indices(x=df[[Probe X]], y=df[[Probe Y]],
nr=nbrOfColumns(cdf));

The example was only correct for chip types with square dimension,
i.e. nbrOfRows(cdf) == nbrOfColumns(cdf).  When I wrote the example, I
was mislead incorrectly to believe that 'nr' was number of rows.
This is the only place where it was wrong; all internal code of
aroma.affymetrix uses:

x - df[[Probe X]];
y - df[[Probe Y]];
cells - nbrOfColumns(cdf) * y + x + 1L;

I have updated the online example to use the above code instead.  (If
you insist of using affy::xy2indices() you should also add an explicit
xy.offset=0, because the default is not safe; see code of
affy::xy2indices).

Conclusing, if you CDF is rectangulare, then you need to recreate your ACS file.

 seqs - df[[Probe Sequence]];
 strands - df[[Target Strandedness]];
 rm(df);
 acs - AromaCellSequenceFile$allocateFromCdf(cdf);
 updateSequences(acs, cells=cells, seqs=seqs, verbose=-10);
 updateTargetStrands(acs, cells=cells, strands=strands, verbose=-10);
 footer - readFooter(acs);
 footer$srcFile - list(filename=getFilename(db), checksum=getChecksum(db));
 footer$createdBy - list(name=Seth Redmond,
 email=seth.redm...@imperial.ac.uk);
 writeFooter(acs, footer);

Other that the above calculation of 'cells', this looks all correct.

Hope this helps

/Henrik

 ...
 acc - AllelicCrosstalkCalibration(csR, model=CRMAv2);
 csC - process(acc, verbose=-10);
 plotAllelePairs(acc, array=array, pairs=1:6, what=input, xlim=1.5*xlim);

 On 28 May 2010, at 17:59, Henrik Bengtsson wrote:

 Hi.

 On Thu, May 27, 2010 at 6:17 PM, seth redmond
 seth.redm...@imperial.ac.uk wrote:

 I've been working through the CRMAv2 vingette here:

        http://www.aroma-project.org/vignettes/CRMAv2

 Have you been following it exactly, or have you done modifications?
 It always helps to show the code you are doing.

 And though I am getting CNV calls that make some kind of sense, the

 crosstalk calibration looks quite amazingly far from OK (before and after

 graphs attached).

 Clearly I have a problem here, but it's hard to start figuring out where.

 The probe sequence files were constructed from some config files, so there

 may be missing tag sequences or similar, but as far as I can see the

 sequences do seem to be matching up to the correct probes. So is there

 anywhere else I could look?

 What chip type is this?  Is it a custom SNP chip?

 Are all arrays like this, or is this an exceptionally bad one?  Is
 there anyone you are satisfied with?

 It is hard to tell what is happening and even if something goes wrong
 - try to zoom out a bit so you see most of the data cloud when
 plotting the signals after ACC.  It looks like the data is zoomed in
 to the lower quantiles.

 Also, ACC only corrects for global offset and global crosstalk (for
 each of the six possible nucleotide pairs); it will not magically give
 cleaner genotype clouds/arms.  Some of the offset is definitely
 corrected for.

 If it is a custom chip type and you get the probe sequences wrong,
 then you the six groups of nucleotide pairs will be wrong, which will
 give sub optimal correction, but probably not totally wrong.

 /Henrik


 -s



 --

 When reporting problems on aroma.affymetrix, make sure 1) to run the latest

 version of the package, 2) to report

Re: [aroma.affymetrix] Re: Mouse diversity array --building the required files for aroma.affymetrix UGP, UFL

2010-06-03 Thread Henrik Bengtsson
On Thu, Jun 3, 2010 at 12:21 PM, Ivanek, Robert robert.iva...@fmi.ch wrote:
 Hi Henrik,

 I think you are right, the fragment sizes are theoretical ones. I
 would guess that the reason why also the long fragments are reported
 is because the same SNP is present in short fragment produced by the
 other enzyme.

 Thank you very much for the patch.

 Would you mind to update the MOUSEDIVm520650 chipType page and add
 there the UGP and UFL files?

Ideally users contribute with UGP and UFL too, though this time I've
done it since I've already done most of the work.  Please compare to
what you got when you did.

/Henrik


 Best Regards

 Robert


 On Jun 2, 6:47 pm, Henrik Bengtsson h...@stat.berkeley.edu wrote:
 Hi.

 On Wed, Jun 2, 2010 at 11:16 AM, Ivanek, Robert robert.iva...@fmi.ch wrote:
  HI Henrik,

  I was a little bit investigating the error and I found out that some
  of the fragments reported in NetAffx files are really long.
  Why they got a negative value of -32768 and not a positive one?

 Thanks for reporting.  It turns out to be a bug in aroma.core causing
 it to censor values into [-32767,32768], whereas it should have been
 [-32768,32767].  Thus, the fragment lengths that are too large where
 written as 32768, which when read back became -32768 (that's how
 signed integers loops around when output of range).  That should have
 been written as 32767.

 I have fixed this in the next release of aroma.core.  Until that is
 released, you can install a patch as explained in:

  http://aroma-project.org/howtos/updateOrPatch

 With the patch, you will get correct censoring and more informative
 warnings, e.g.

 Warning messages:
 1: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr = hdr,  
 :
   33 values to be assigned were out of range [-32768,32767] and
 therefore censored to fit the range. Of these, 33 values in
 [35102,655381] were too large.
 2: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr = hdr,  
 :
   21 values to be assigned were out of range [-32768,32767] and
 therefore censored to fit the range. Of these, 21 values in
 [50496,56758] were too large.

 About the very large fragment lengths:  My guess is that they are
 theoretical fragments lengths.  After running the PCR in the assay,
 very long fragments are not amplified and hence filtered out.  For the
 specific enzyme, you should not get any hybrization signal for very
 long fragments.  It is possible that you have signal from the cuts of
 the other enzyme.   Maybe someone else has a better explanation of why
 they are so long and still on the array?   You could also drop a
 message on the Affymetrix forums and ask.

 /Henrik



  Robert

  On Jun 1, 7:16 pm, Ivanek, Robert robert.iva...@fmi.ch wrote:
  Hi Henrik,

  Thanks for the answer and also the ACS file.
  I have one more question regarding the UFL file generation.

  I tried it by using the NettAffx and I got the following error:

  R ufl - AromaUflFile$allocateFromCdf(cdf, nbrOfEnzymes=2,
  tags=c(na30, RI20100601))
  R csv - AffymetrixNetAffxCsvFile$byChipType(chipType, tags=.na30);
  R units - importFrom(ufl, csv);
  Warning messages:
  1: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr =
  hdr,  :
    Values to be assigned were out of range [-32767,32768] and therefore
  censored to fit the range.
  2: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr =
  hdr,  :
    Values to be assigned were out of range [-32767,32768] and therefore
  censored to fit the range.

  R csv - AffymetrixNetAffxCsvFile$byChipType(chipType,
  tags=.cn.na30);
  R units - importFrom(ufl, csv);
  Warning messages:
  1: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr =
  hdr,  :
    Values to be assigned were out of range [-32767,32768] and therefore
  censored to fit the range.
  2: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr =
  hdr,  :
    Values to be assigned were out of range [-32767,32768] and therefore
  censored to fit the range.

  And the summary produce the following
  R summary(ufl)
   length           length.02
   Min.   :-32768   Min.   :-32768
   1st Qu.:   614   1st Qu.:   541
   Median :  1146   Median :   997
   Mean   :  1601   Mean   :  1466
   3rd Qu.:  2195   3rd Qu.:  2000
   Max.   : 22095   Max.   : 30002
   NA's   :230775   NA's   :230775

  Would you be so kind and build also the UFL and UGP files?

  Best Regards

  Robert

  On May 30, 7:27 pm, Henrik Bengtsson h...@stat.berkeley.edu wrote:

   Hi.

   On Wed, May 26, 2010 at 3:24 PM, Ivanek, Robert robert.iva...@fmi.ch 
   wrote:
Dear Sir or Madam,

I would like to analyse the copy number variation data from Affymetrix
Mouse Diversity Array. I have not found any information on your 
website
about this particular array.

   I have created page for this:

  http://aroma-project.org/chipTypes/MOUSEDIVm520650

I have tried to build the annotation files
which are required by aroma

Re: [aroma.affymetrix] Re: Mouse diversity array --building the required files for aroma.affymetrix UGP, UFL

2010-06-02 Thread Henrik Bengtsson
Hi.

On Wed, Jun 2, 2010 at 11:16 AM, Ivanek, Robert robert.iva...@fmi.ch wrote:
 HI Henrik,

 I was a little bit investigating the error and I found out that some
 of the fragments reported in NetAffx files are really long.
 Why they got a negative value of -32768 and not a positive one?

Thanks for reporting.  It turns out to be a bug in aroma.core causing
it to censor values into [-32767,32768], whereas it should have been
[-32768,32767].  Thus, the fragment lengths that are too large where
written as 32768, which when read back became -32768 (that's how
signed integers loops around when output of range).  That should have
been written as 32767.

I have fixed this in the next release of aroma.core.  Until that is
released, you can install a patch as explained in:

 http://aroma-project.org/howtos/updateOrPatch

With the patch, you will get correct censoring and more informative
warnings, e.g.

Warning messages:
1: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr = hdr,  :
  33 values to be assigned were out of range [-32768,32767] and
therefore censored to fit the range. Of these, 33 values in
[35102,655381] were too large.
2: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr = hdr,  :
  21 values to be assigned were out of range [-32768,32767] and
therefore censored to fit the range. Of these, 21 values in
[50496,56758] were too large.

About the very large fragment lengths:  My guess is that they are
theoretical fragments lengths.  After running the PCR in the assay,
very long fragments are not amplified and hence filtered out.  For the
specific enzyme, you should not get any hybrization signal for very
long fragments.  It is possible that you have signal from the cuts of
the other enzyme.   Maybe someone else has a better explanation of why
they are so long and still on the array?   You could also drop a
message on the Affymetrix forums and ask.

/Henrik


 Robert

 On Jun 1, 7:16 pm, Ivanek, Robert robert.iva...@fmi.ch wrote:
 Hi Henrik,

 Thanks for the answer and also the ACS file.
 I have one more question regarding the UFL file generation.

 I tried it by using the NettAffx and I got the following error:

 R ufl - AromaUflFile$allocateFromCdf(cdf, nbrOfEnzymes=2,
 tags=c(na30, RI20100601))
 R csv - AffymetrixNetAffxCsvFile$byChipType(chipType, tags=.na30);
 R units - importFrom(ufl, csv);
 Warning messages:
 1: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr =
 hdr,  :
   Values to be assigned were out of range [-32767,32768] and therefore
 censored to fit the range.
 2: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr =
 hdr,  :
   Values to be assigned were out of range [-32767,32768] and therefore
 censored to fit the range.

 R csv - AffymetrixNetAffxCsvFile$byChipType(chipType,
 tags=.cn.na30);
 R units - importFrom(ufl, csv);
 Warning messages:
 1: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr =
 hdr,  :
   Values to be assigned were out of range [-32767,32768] and therefore
 censored to fit the range.
 2: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr =
 hdr,  :
   Values to be assigned were out of range [-32767,32768] and therefore
 censored to fit the range.

 And the summary produce the following
 R summary(ufl)
  length           length.02
  Min.   :-32768   Min.   :-32768
  1st Qu.:   614   1st Qu.:   541
  Median :  1146   Median :   997
  Mean   :  1601   Mean   :  1466
  3rd Qu.:  2195   3rd Qu.:  2000
  Max.   : 22095   Max.   : 30002
  NA's   :230775   NA's   :230775

 Would you be so kind and build also the UFL and UGP files?

 Best Regards

 Robert

 On May 30, 7:27 pm, Henrik Bengtsson h...@stat.berkeley.edu wrote:

  Hi.

  On Wed, May 26, 2010 at 3:24 PM, Ivanek, Robert robert.iva...@fmi.ch 
  wrote:
   Dear Sir or Madam,

   I would like to analyse the copy number variation data from Affymetrix
   Mouse Diversity Array. I have not found any information on your website
   about this particular array.

  I have created page for this:

 http://aroma-project.org/chipTypes/MOUSEDIVm520650

   I have tried to build the annotation files
   which are required by aroma but without success. I have few questions
   regarding that:

   1: Is aroma.affymetrix able to analyse the Mouse Diversity Array ?

  Yes, because there should be no reason why it shouldn't - it uses a
  standard CDF etc.  As you've noted, UGP (and UFL) files have not been
  created by anyone yet.

  For CN analysis, at least the UGP (genome positions) annotation data
  file needs to be there.

   2: I tried to build the UGP file directly from NetAffx annotation
   files using the code on your website, however I am getting the following
   error.

   ##
   library(aroma.affymetrix)
   ##
   ## create UGP from NetAffx files
   cdf - AffymetrixCdfFile$byChipType(MOUSEDIVm520650)
   ##
   ## Creates an empty UGP file for the CDF, if missing.
   ugp - AromaUgpFile$allocateFromCdf(cdf, tags=c(na30, RI20100526

Re: [aroma.affymetrix] Mouse diversity array --building the required files for aroma.affymetrix UGP, UFL

2010-05-30 Thread Henrik Bengtsson
Hi.

On Wed, May 26, 2010 at 3:24 PM, Ivanek, Robert robert.iva...@fmi.ch wrote:
 Dear Sir or Madam,

 I would like to analyse the copy number variation data from Affymetrix
 Mouse Diversity Array. I have not found any information on your website
 about this particular array.

I have created page for this:

http://aroma-project.org/chipTypes/MOUSEDIVm520650

 I have tried to build the annotation files
 which are required by aroma but without success. I have few questions
 regarding that:

 1: Is aroma.affymetrix able to analyse the Mouse Diversity Array ?

Yes, because there should be no reason why it shouldn't - it uses a
standard CDF etc.  As you've noted, UGP (and UFL) files have not been
created by anyone yet.

For CN analysis, at least the UGP (genome positions) annotation data
file needs to be there.


 2: I tried to build the UGP file directly from NetAffx annotation
 files using the code on your website, however I am getting the following
 error.

 ##
 library(aroma.affymetrix)
 ##
 ## create UGP from NetAffx files
 cdf - AffymetrixCdfFile$byChipType(MOUSEDIVm520650)
 ##
 ## Creates an empty UGP file for the CDF, if missing.
 ugp - AromaUgpFile$allocateFromCdf(cdf, tags=c(na30, RI20100526))
 ##
 ## Import NetAffx unit position data
 csv - AffymetrixNetAffxCsvFile$byChipType(MOUSEDIVm520650,
 otags=.na30)

 Error in list(`AffymetrixNetAffxCsvFile$byChipType(MOUSEDIVm520650,
 tags = .na30)` = environment,  :

 [2010-05-26 15:11:00] Exception: File format error of the tabular file
 ('annotationData/chipTypes/MOUSEDIVm520650/NetAffx/MOUSEDIVm520650.na30.annot.csv'):
  \
 line 1 did not have 12 elements
  at throw(Exception(...))
  at throw.default(File format error of the tabular file (',
 getPathname(this), '): , ex$message)
  at throw(File format error of the tabular file (',
 getPathname(this), '): , ex$message)
  at value[[3]](cond)
  at tryCatchOne(expr, names, parentenv, handlers[[1]])
  at tryCatchList(expr, classes, parentenv, handlers)
  at tryCatch({
  at verify.TabularTextFile(this, ...)
  at verify(this, ...)
  at this(...)
  at newInstance.Class(clazz, ...)
  at newInstance(clazz, ...)
  at newInstance.Object(static, pathname)
  at newInstance(static, pathname)
  at method(static, ...)
  at AffymetrixNetAffxCsvFile$byChipType(MOUSEDIVm520650, tags =
 .na30)
 In addition: Warning message:
 In read.table(3L, header = TRUE, colClasses = c(NA_character_,
 NA_character_,  :
  not all columns named in 'colClasses' exist

I had a look at the MOUSEDIVm520650.na30.annot.csv file.  The line
containing column names, that is:

Probe Set ID,dbSNP RS ID,Chromosome,Physical
Position,Strand,Cytoband,Allele A,Allele B,Associated
Gene,Genetic Map,Fragment Enzyme Type Length Start Stop,

contains a trailing comma (,) that shouldn't be there (file format
error).  This cause R to think there should be 12 and not 11 columns
in the data set.  Open the file in an editor and remove that trailing
comma and any whitespace after Fragment Enzyme Type Length Start
Stop.  Then save the file.  That should solve the problem.

The other CSV file - MOUSEDIVm520650.cn.na30.annot.csv - does not have
this problem.



 3. I tried it also by using the manual approach using the
 tab=delimited file, however it seems to me that the mitochondria probes
 are skipped  (NA values in ugp[,1] but valid values in ugp[,2]).

The Affymetrix NetAffx CSV files use s M for the mitochondria
chromosome.  In aroma we encode this by integer 25.

 Another
 problem is that some positions for other chromosomes are not loaded in
 properly (valid values in ugp[,1] but NA values in ugp[,2]).

You don't show how you read the data manually, so it is hard to say
what you are doing wrong here.  But note that there are quite a few
arguments in read.table() that you need to set correctly in order to
read Affymetrix NetAffx CSV files (it doesn't make easier that
Affymetrix changes the file format once in a while and have stray
erroneous symbols such as the above comma).

Also, search our forum for 'MOUSEDIVm520650', because about a year ago
David Rosenberg disscussed this chip type and I think he did create
various annotation data files for the chip type.  This was before the
chip type was publicly announced by Affymetrix.

/Henrik



 Here is the sessionInfo:

 R version 2.11.0 (2010-04-22)
 x86_64-unknown-linux-gnu

 locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=C
  [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C
 LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
  [1] aroma.affymetrix_1.6.0 aroma.apd_0.1.7        affxparser_1.20.0
 R.huge_0.2.0           aroma.core_1.6.0       aroma.light_1.16.0
  [7] matrixStats_0.2.1      R.rsp_0.3.6            R.cache_0.3.0
 R.filesets_0.8.1       digest_0.4.2   

Re: [aroma.affymetrix] Suggestions for multiple processing jobs of the same platform (Affymetrix SNP chips)

2010-05-18 Thread Henrik Bengtsson
Hi.

On Tue, May 18, 2010 at 5:39 AM, Tae-Hoon Chung hoontaech...@gmail.com wrote:
 Hi, All;

 I have a simple question: What's the best way of performing multiple
 processing jobs of the same platform (Affymetrix SNP chips)?

 My concerns are as follows: (1) Many of the jobs involving Affymetrix SNP
 chips may access files in annotationData and it may result in conflict due
 to multiple jobs trying to access the same files in annotationData at the
 same time.
 Is this the case and is there any safeguard for this?
 If this is the real possibility, then what is the best way of avoiding this
 kind of trouble?

This are definitely valid concerns.  The quick answer is that the
aroma framework tries very hard to protect you against potential
conflicts and minimize the risk for getting invalid results from
running parallel analyses on the same data set.

The data under annotationData/ is basically only read, which means any
number of R sessions can access those files without conflicts.  The
only exception is when a so called monocell CDF is created for a new
CDF.  This is only done once per CDF lifetime, so the risk for having
to processes trying to create the same monocell CDF is very small.
Still, there is a risk (some monocell CDF takes several minutes to
generate), and in order to protect ourselves against corrupt monocell
CDFs, they are created/written atomically (this is done by first
writing to a temporary file which is then renamed).  For tiling array
analysis so called unique CDFs are created in a similar fashion.

Likewise, any data sets under rawData/ should/can be considered read
only, meaning any number of R sessions can access those without
conflicts.  Again, there are exceptions and that is when the average
signals across arrays are calculated (via getAverageFile()) or when
the target distribution is calculated for quantile normalization;
those kind of result files are stored where the data set is located
(which can be rawData/).   As above, all data files created in the
aroma framework are generated/written in an atomic fashion, decreasing
the risk for conflicts (and if they occur they are very likely to be
detected).

In order to be completely protected against multiple (write) access of
the same data files, there is a need for a formal synchronization
mechanism.  This turns out to a very hard problem, especially if we
want to support it on all operating and file system out there.  But
for you information, we are working toward it and we take nothing for
granted.  See also the page on 'Future directions'
[http://aroma-project.org/features/future/].

Finally, as long as you only analyze different data set (or apply
different methods on the same data set) you will be fine.


 (2) Many (or most) of the jobs produce lots of intermediate files in
 probeData/plmData folders, requiring many disk accessing and it seems like
 this takes up a lot of computational resources of the machine, slowing down
 other jobs.
 Is this just my impression or is it what's really going on?

Yes, all intermediate results are stored in persistent memory, i.e. on
the file system.  The overhead from the actual I/O is not that big,
but sure it is significant.  Note in all analysis you have to read the
data once and often write the results at least once.  To this, the
aroma framework add I/O for doing the same for the intermediate
results.

One major bottleneck is when you fit the probe-level models (probe
summarization) and it is mostly because data from multiple arrays are
read and restructured into a list reflecting the structure of the CDF,
then fitted, and finally unstructured to be written to separate
files.  The wrapping and unwrapping into nested CDF list structures is
what takes time.  If you look at the verbose output from fit() of a
PLM, you can see that most of the writing time is spend on
unwrapping/encoding the estimates.   For the more recent SNP  CN chip
types (GWS5, GWS6, ...) that also have non-polymorphic CN units, we
can speed up the fitting of those CN units lots by fitting the PLM as:

if (length(findUnitsTodo(plm))  0) {
   # Fit CN probes quickly (~5-10s/array + some overhead)
  units - fitCnProbes(plm, verbose=verbose);
  str(units);
  # int [1:945826] 935590 935591 935592 935593 935594 935595 ...

  # Fit remaining units, i.e. SNPs (~5-10min/array)
  units - fit(plm, verbose=verbose);
  str(units);
}

(I noticed you from your other message that you were looking at ACNE;
I've updated the ACNE vignette to reflect the above, which should
speed things up lots).   FYI, the fitCnProbes() utilizes the knowledge
that (many) CN units are single probes, which allows us to quickly
fit those units without having to go through the wrapping/unwrapping
into a CDF list structure.  Analogously, one can optimize the
processing of other common dimensions of SNP/CN units; I am slowly
preparing for such a move but it takes time because any algorithm/code
has to be able work with any existing and future CDF.


 If this is 

Re: [aroma.affymetrix] removal of bad quality chips from a big dataset

2010-05-17 Thread Henrik Bengtsson
Hi.

On Sat, May 15, 2010 at 6:10 PM, Gabriele Zoppoli zopp...@gmail.com wrote:
 Hi,

 I'm new here, and I'm sorry if I'll post obvious questions. I looked
 throughout the newsgroup and on the aroma.affymetrix web page, but I
 couldn't find the answer from my question, so here it is:

 I'm trying to analyze a 950 chip dataset from Wooster et al (318
 cancer cell lines in triplicate - on average). So I followed the steps
 as in the web page, and arrived to the plotNuse and plotRle part.
 Nothing wrong so far, and I can clearly see that some arrays are
 outliers and possibly have poor quality, so I would like to remove
 them for further analyses. The issue is, I don't have a clue how to
 know which ones they are and how to remove them, because the plots are
 too crowdy and I don't know how to see what is what and how to take it
 out.

The plotRle() and plotNuse() methods both take argument 'arrays',
which allows you to specify *which* arrays to display.  This allows
you to plot a smaller number of arrays per plot, which should make it
possible for you to narrow done the arrays of interest.  For example,

plotRle(qam, arrays=1:50);
plotNuse(qam, arrays=c(54,80:90,130:144));

Note also that if you are plotting to a image file, you can make it
really wide to fit almost any number of arrays, and then use an image
browser to scroll it:

filename - sprintf(%s,plotRle.png, getName(qam));
arrays - 1:200;
png(filename, width=100+20*length(arrays), height=400);
plotRle(qam, arrays=arrays);
dev.off();

This way you should be able to graphically identify which arrays are
bad.  This way you can get an index vector of all arrays you which to
exclude, e.g.

exclArrays - c(4,54,57,98);

Then you can drop this from the data set as:

ces - extract(ces, -exclArrays);

The new 'ces' object will contain all but the excluded arrays.

FYI, there are also ways to grab the RLE statistics and identify bad
arrays using that.  The easiest way to do this, is the get what
plotRle()/plotNuse() returns, e.g.

stats - plotRle(qam, arrays=1:50);

where 'stats' will be a list of length length(arrays) containing
boxplot statistics.  See str(stats) for the output.

Hope this helps

Henrik


 Some information:

 sessionInfo()
 R version 2.10.1 (2009-12-14)
 i386-pc-mingw32

 locale:
 [1] LC_COLLATE=English_United States.1252
 [2] LC_CTYPE=English_United States.1252
 [3] LC_MONETARY=English_United States.1252
 [4] LC_NUMERIC=C
 [5] LC_TIME=English_United States.1252

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods
 base

 other attached packages:
  [1] aroma.affymetrix_1.5.0 aroma.apd_0.1.7
 affxparser_1.18.0
  [4] R.huge_0.2.0           aroma.core_1.5.0
 aroma.light_1.16.0
  [7] matrixStats_0.2.1      R.rsp_0.3.6
 R.cache_0.3.0
 [10] R.filesets_0.8.1       digest_0.4.2
 R.utils_1.4.0
 [13] R.oo_1.7.2             R.methodsS3_1.2.0

 loaded via a namespace (and not attached):
 [1] tools_2.10.1

 About my data:

 print(qam)
 QualityAssessmentModel:
 Name: Dataset Wooster
 Tags: RBC,QN,RMA,QC
 Path: qcData/Dataset Wooster,RBC,QN,RMA,QC/HG-U133_Plus_2
 Chip-effect set:
    ChipEffectSet:
    Name: Dataset Wooster
    Tags: RBC,QN,RMA
    Path: plmData/Dataset Wooster,RBC,QN,RMA/HG-U133_Plus_2
    Platform: Affymetrix
    Chip type: HG-U133_Plus_2,monocell
    Number of arrays: 950
    Names: 1A2 _SS392785_HG-U133_Plus_2_HCHP-186915_, 1A2 _SS392786_HG-
 U133_Plus_2_HCHP-186916_, ..., YAPC_SS331347_HG-
 U133_Plus_2_HCHP-182915_
    Time period: [not reported if more than 500 arrays]
    Total file size: 546.67MB
    RAM: 0.84MB
    Parameters: (probeModel: chr pm)
 RAM: 0.00MB

 And a final question (a very stupid one, I'm sure): once I finished my
 quality controls, how do I average technical replicates?

 Thank you and I beg your pardon for any silly question

 Gabriele

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


[aroma.affymetrix] aroma.affymetrix v1.6.0 released

2010-05-17 Thread Henrik Bengtsson
Hi all,

aroma.affymetrix and friends have been updated and is now being rolled
out to the CRAN servers.  It is highly recommended to update:

source(http://aroma-project.org/hbLite.R;);
hbInstall(aroma.affymetrix);

This update follows the April releases of R v2.11.0 and Bioconductor
v2.6, which we also recommended to use with the aroma framework.  In
this release we have added further protection against ending up with
partially written data files due to an abruptly terminated R session.
There were also some bug fixes, which mainly were due to changes in
the new release of Bioconductor that broke some existing methods, e.g.
SNPRMA and CRLMM.  Thanks to all users for reporting bugs and other
potential issues.  In addition, we have better (although not perfect)
support for gcRMA on more chip types.  Affymetrix's recent SNP  CN
chip type Cytogenetics_Array is also better supported.  For other
updates and more details, see the end of this message.

Documentation keeps getting added to the http://www.aroma-project.org/
website.  As before, any kind of contribution to it is greatly
appreciated.

Cheers,
Henrik  co-developers


- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Updates to aroma.affymetrix
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Version: 1.6.0 [2010-05-14]
o Package submitted to CRAN.
o Package pass system and redundancy tests.
o Package pass R CMD check on R v2.11.0 and v2.12.0 devel.

Version: 1.5.9 [2010-05-13]
o SPEED UP: Now the constructor AllelicCrosstalkCalibration() is set to
  recognize the Cytogenetics_Array chip type.  This avoids having to
  scan the CDF for unit types and check for SNPs, which is slow and not
  really wanted for a constructor function.
o ROBUSTNESS: Added a redundancy test of CRMA v1.5 for the
  Cytogenetics_Array chip type.
o ROBUSTNESS: Now fromDataFile() of ChipEffectFile and FirmaFile, as
  well as convertToUnique() of AffymetrixCelSet allocates/creates data
  files atomically.  As elsewhere, this is done by first creating and
  writing to a temporary file, which when complete is then renamed.
  This lowers the risk of generating incomplete files.
o CLEAN UP/DEPRECATED: AffymetrixCelSet$createBlankSet() was removed,
  because it has not been used anywhere since 2007.
o BUG FIX: convertToUnique() for AffymetrixCelSet would not recognize
  Windows Shortcut links.

Version: 1.5.8 [2010-05-09]
o Made justSNPRMA(..., normalizeSNPsOnly=auto) for AffymetrixCelSet
  the default.
o Now all findUnitsTodo() for data sets checks the data file that
  comes last in a lexicographic ordering.  This is now consistent
  with how the summarization methods update the files.  Before it
  used to be the one that is last in the data set.
o Now all updateUnits() for data sets updates the data files in
  lexicographic order.
o Now CrlmmModel(..., recalibrate=TRUE) is the default.
o Now justSNPRMA(..., returnESet=TRUE) for AffymetrixCelSet
  returns an AlleleSet due to updates in oligo v1.12.0.
o Added extractAlleleSet() to SnpChipEffectSet.  Replaces
  extractSnpQSet(), because the SnpQSet class was dropped
  in oligo v1.12.0 and replaced by the AlleleSet class.
o BUG FIX: fit() of CrlmmModel would not work with oligo v1.12.0
  and newer.
o BUG FIX: getCallSet() and getCrlmmParametersSet() of CrlmmModel
  used non-existing verbose object 'log' instead of 'verbose'.

Version: 1.5.7 [2010-04-22]
o Added groupUnitsByDimension() to AffymetrixCdfFile.
o ROBUSTNESS: Added redundancy tests for doCRMAv2() and
  writeDataFrame().
o BUG FIX: doCRMAv1() for AffymetrixCelSet used undefined 'csN'
  internally instead of 'csC'.

Version: 1.5.6 [2010-04-15]
o BUG FIX: computeAffinities(..., verbose=FALSE) of AffymetrixCdfFile
  would give throw Error in reset(pb) : object 'pb' not found.
  Thanks Stephen ? at Mnemosyne BioSciences, Finland, for this report.

Version: 1.5.5 [2010-04-07]
o ROBUSTNESS: Added a test script for gcRMA background correction
  on the MoEx-1_0-st-v1 chip type.

Version: 1.5.4 [2010-04-06]
o Added an internal version of doCRMAv1().
o Added argument 'plm' to existing doCRMAv2().

Version: 1.5.3 [2010-03-31]
o Updated getProbeSequenceData() for AffymetrixCdfFile to recognize
  more NetAffx probe-tab files, e.g. MoEx-1_0-st-v1.probe.tab.
o KNOWN ISSUES: getProbeSequenceData() for AffymetrixCdfFile requires
  that the unit names in the probe-tab file match the ones in the
  CDF.  This may cause issues if custom CDFs with custom unit names
  are used.  This is another reason why we should move away from
  probe-tab files and instead use aroma binary cell sequence files.

Version: 1.5.2 [2010-03-26]
o Added argument 'defValue' to createFrom() for AffymetrixCelFile
  so that one can specify the default value for cleared elements.

Version: 1.5.1 [2010-03-14]
o BUG FIX: allocateFromCdf() of AromaCellCpgFile, AromaCellPositionFile,
  and AromaCellMatchScoreFile would drop all but the first tag.


- - - - - - - - - - - - - - - - - - - - - - - 

Re: [aroma.affymetrix] Re: CRMA v2 errors

2010-05-14 Thread Henrik Bengtsson
Hi.

On Fri, May 14, 2010 at 10:36 AM, Markus Leber leber.mar...@gmx.de wrote:
 Dear Henrik,

 thank you very much for your support.
 You are right. I am sorry that I didn't notice this problem.
 Step Calibration for crosstalk between allele probe pairs works without 
 problems now.

 Unfortunately at the beginning of step Normalization for nucleotide-position 
 probe sequence effects I noticed another error.
 First I initialize with:

 bpn - BasePositionNormalization(csC, target=zero)
 print(bpn)

 Afterwards I call:

 csN - process(bpn, verbose=verbose)

 Within this procedure I noticed this error:

 ...
 20100513 23:03:38|    Storing normalized data...
 20100513 23:03:38|     Temporary pathname: 
 probeData/HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY/GenomeWideSNP_6/GIGAS_g_GAINmixHapMapAffy2_GenomeWideEx_6_A03_31250.CEL.tmp
 20100513 23:03:38|     Creating CEL file for results, if missing...
 20100513 23:03:38|      Creating CEL file...
 20100513 23:03:38|       Chip type: GenomeWideSNP_6,Full
 20100513 23:03:38|       Pathname: 
 probeData/HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY/GenomeWideSNP_6/GIGAS_g_GAINmixHapMapAffy2_GenomeWideEx_6_A03_31250.CEL.tmp
 20100513 23:03:38|       Method 'copy'...
 20100513 23:03:38|        Copying file...
 Error in list(`process(bpn, verbose = verbose)` = environment, 
 `process.AbstractProbeSequenceNormalization(bpn, verbose = verbose)` = 
 environment,  :


 I printed the workflow  output of this session in file CRMAv2_Error.txt, 
 which is attached.
 Do you have experience with this error or you have an idea about the reason 
 of this problem?

It's good that you attach logs, especially when they are really long.
However, there is an advantage of pasting the error message and
traceback into this message, because then it will be found when others
search the archives for similar messages.  In your case, if you had
added a little bit more of the error message that would have been
enough for most of us to immediately spot was it going on.  From the
end of your CRMAv2_Error.txt log:

20100513 23:03:38|Storing normalized data...
20100513 23:03:38| Temporary pathname:
probeData/HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY/GenomeWideSNP_6/GIGAS_g_GAINmixHapMapAffy2_GenomeWideEx_6_A03_31250.CEL.tmp
20100513 23:03:38| Creating CEL file for results, if missing...
20100513 23:03:38|  Creating CEL file...
20100513 23:03:38|   Chip type: GenomeWideSNP_6,Full
20100513 23:03:38|   Pathname:
probeData/HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY/GenomeWideSNP_6/GIGAS_g_GAINmixHapMapAffy2_GenomeWideEx_6_A03_31250.CEL.tmp
20100513 23:03:38|   Method 'copy'...
20100513 23:03:38|Copying file...
Fehler in list(`process(bpn, verbose = verbose)` = environment,
`process.AbstractProbeSequenceNormalization(bpn, verbose = verbose)` =
environment,  :

[2010-05-13 23:03:38] Exception: Failed to copy file. Temporary copy
file exists: 
probeData/HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY/GenomeWideSNP_6/GIGAS_g_GAINmixHapMapAffy2_GenomeWideEx_6_A03_31250.CEL.tmp.tmp.tmp
  at throw(Exception(...))
  at throw.default(Failed to copy file. Temporary copy file exists: , tmpPathn
  at throw(Failed to copy file. Temporary copy file exists: , tmpPathname)
  at copyFile.default(getPathname(this), pathname, overwrite = overwrite, verbos
  at copyFile(getPathname(this), pathname, overwrite = overwrite, verbose = less
  at copyTo.GenericDataFile(this, filename = tmpPathname, path = NULL, verbose =
  at copyTo(this, filename = tmpPathname, path = NULL, verbose = less(verbose))
  at createFrom.AffymetrixCelFile(df, filename = pathnameT, path = NULL, verbose
  at createFrom(df, filename = pathnameT, path = NULL, verbose = less(verbose))
  at process.AbstractProbeSequenceNormalization(bpn, verbose = verbose)
  at process(bpn, verbose
Zusätzlich: Warnmeldung:
In log2(y) : NaNs wurden erzeugt
20100513 23:03:39|Copying file...done
[...]

See that traceback?  That is really useful because it tells us what
commands have been called internally and in what function the error
occurs.

EXPLANATION:
The error message tries to be as clear as possible on what the problem
is, even though it does not provide a suggest how to solve (that is
long-term wish I have for the aroma framework).  This error thrown in
order to protect you by telling you that there seem to be an existing
temporary file that has been generated but not been completed.  It can
either from running the same script simultaneously/in parallel on a
different machine with access to the same file directory, or from a
having interrupted a previous run leaving a half written file.  I
suspect the latter is the case for you; did you run it before and the
interrupt it and restarted it?

SOLUTION:
Go to the 
probeData/HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY/GenomeWideSNP_6/
directory and delete any files with file extension *.tmp (or
*.tmp.tmp, *.tmp.tmp.tmp and so on).  Then restart the script.  You
can keep 

Re: [aroma.affymetrix] Re: CRMA v2 errors

2010-05-12 Thread Henrik Bengtsson
Hi.

On Wed, May 12, 2010 at 12:23 PM, Smaug72 leber.mar...@gmx.de wrote:
 Dear Henrik,

 thank you very much for your reply.
 Again I installed R and CRMA v2 on a new virtual machine (Suse 11.2),
 so that I can step back if necessary.

To get the terms correct; you installed the aroma.affymetrix package.
CRMAv2 is a statistical methods not a software.

 This time I received 21 warnings after the installation process. But
 no error was detected.
 Nevertheless the same error (unexpected symbol in array - 1xlim)
 occurred.

Did you read my reply?  That statement that gives the error should be
two statements on two different lines of code.  Again, the *only*
thing you have to fix is:

array - 1
xlim - c(-500,15000)


 You propose to test the 6 CEL files mentioned in the vignette.
 But it seems the six CEL files (NA06985.CEL, ..., NA07019.CEL) can't
 be downloaded from the web, or?

No, because I/we do not have the right to redistribute those data files.

 It seems that they belongs to the Genome-Wide Human SNP Array 6.0
 Sample Data Set, which consists of 3 DVDs and must be ordered by
 affymetrix?

That is one of many possible sources.  The example data set is using
HapMap samples.  There are a few data sets for this out there. As long
as you get the CEL files, you should be fine.  Under
http://aroma-project.org/node/51 you find links to the HapMap
Consortium that also provide (individual) CEL files for download.

 So far I use a dataset from the Broad Institute:

 http://www.broadinstitute.org/mpg/birdsuite/download.html
 - birdsuite_test_inputs_1.5.3.tgz

 Do you think the reason for the error can be a false integration of
 the CEL files?

No; please read my previous reply - again,  the error has nothing to
do with your CEL files; it is only those two lines of code that you
have use.

 I thought the annotation data and raw data are checked within the
 previous procedure?

Not sure what previous procedure is, but if you mean the part of the
script above:

array - 1
xlim - c(-500,15000)

then yes.  You seem to got it right.

/Henrik


 Thank you,
 Markus

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] segfault while fitting plm in copy number processing using CRMAv2 algorithm

2010-05-11 Thread Henrik Bengtsson
(units)
 +
 +   ## Fit remaining units, i.e. SNPs
 (~5-10min/array)
 +   units - fit(plm, verbose=verbose)
 +   str(units)
 + }

  *** caught segfault ***
 address 0x10ae8d020, cause 'memory not mapped'

 Traceback:
  1: .Call(R_affx_get_cel_file, filename, readHeader, readIntensities,
 rea\
 dXY, readXY, readPixels, readStdvs, readOutliers, readMasked, indices,
 as.i\
 nteger(verbose), PACKAGE = affxparser)
  2: readCel(getPathname(this), indices = idxs, readIntensities = FALSE,
 rea\
 dStdvs = TRUE, readPixels = FALSE)
  3: findUnitsTodo.ChipEffectFile(ce, ...)
  4: findUnitsTodo(ce, ...)
  5: findUnitsTodo.ChipEffectSet(ces, verbose = verbose, ...)
  6: findUnitsTodo(ces, verbose = verbose, ...)
  7: findUnitsTodo.ProbeLevelModel(plm)
  8: findUnitsTodo(plm)

 Possible actions:
 1: abort (with core dump, if enabled)
 2: normal R exit
 3: exit R without saving workspace
 4: exit R saving workspace
 Selection:

 The session info is as follows:

 print(sessionInfo())
 R version 2.11.0 (2010-04-22)
 x86_64-apple-darwin9.8.0

 locale:
 [1] C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
  [1] aroma.affymetrix_1.5.0 aroma.apd_0.1.7    affxparser_1.20.0
  [4] R.huge_0.2.0   aroma.core_1.5.0   aroma.light_1.16.0
  [7] matrixStats_0.2.1  R.rsp_0.3.6    R.cache_0.3.0
 [10] R.filesets_0.8.1   digest_0.4.2   R.utils_1.4.0
 [13] R.oo_1.7.2 R.methodsS3_1.2.0

 loaded via a namespace (and not attached):
 [1] tools_2.11.0
 Warning message:
 'DESCRIPTION' file has 'Encoding' field and re-encoding is not possible

 TH

 2010/5/10 Henrik Bengtsson h...@stat.berkeley.edu

 Ok.

 This could be an issue with affxparser and 64-bit OSX; recent problem
 reports with affxparser has been with 64-bit OSX.

 You are running R v2.10.x, which is outdated.  The new stable release
 of R is R v2.11.x.  I recommend that you update, because the rest of
 the community have already moved on and any bug fixes to R and
 packages will be for R v2.11.x.   Updating will give you access to
 newer version of package, including affxparser v1.20.0.  There has
 been some fixes to affxparser, and with some luck they solve your
 problem.  If you update R, then just rerun the aroma installation:

 source(http://aroma-project.org/hbLite.R;);
 hbInstall(aroma.affymetrix);

 and several other packages will also be updated.

 [
 If you're really stuck with R v2.10.x, you could try installing
 affxparser v1.20.0 as:

 source(http://aroma-project.org/hbLite.R;);
 biocLite(affxparser, rver=2.11.0);

 but I really recommend to update R.
 ]

 Let's see if that solves your problem.  If not, we have to do some
 more troubleshooting...

 /Henrik


 On Mon, May 10, 2010 at 9:37 AM, Chung Tae-Hoon hoontaech...@gmail.com
 wrote:
  I'm sorry to forget providing necessary information.
 
 
 
  .libPaths(/Library/Frameworks/R.framework/Versions/2.10/Resources/library64
  \
  )
  library(aroma.affymetrix)
  Loading required package: R.utils
  Loading required package: R.oo
  Loading required package: R.methodsS3
  R.methodsS3 v1.2.0 (2010-03-13) successfully loaded. See ?R.methodsS3
  for
  help.
  R.oo v1.7.1 (2010-03-17) successfully loaded. See ?R.oo for help.
  R.utils v1.4.0 (2010-03-24) successfully loaded. See ?R.utils for help.
  Loading required package: R.filesets
  Loading required package: digest
  R.filesets v0.8.0 (2010-02-22) successfully loaded. See ?R.filesets for
  help.
  Loading required package: aroma.core
  Loading required package: R.cache
  R.cache v0.3.0 (2010-03-13) successfully loaded. See ?R.cache for help.
  Loading required package: R.rsp
  R.rsp v0.3.6 (2009-09-16) successfully loaded. See ?R.rsp for help.
   Type browseRsp() to open the RSP main menu in your browser.
  Loading required package: matrixStats
  matrixStats v0.2.1 (2010-04-05) successfully loaded. See ?matrixStats
  for
  help.
  Loading required package: aroma.light
  aroma.light v1.15.1 (2009-11-01) successfully loaded. See ?aroma.light
  for
  help\
  .
  aroma.core v1.5.0 (2010-02-22) successfully loaded. See ?aroma.core for
  help.
  Loading required package: aroma.apd
  Loading required package: R.huge
  R.huge v0.2.0 (2009-10-16) successfully loaded. See ?R.huge for help.
  Loading required package: affxparser
  aroma.apd v0.1.7 (2009-10-16) successfully loaded. See ?aroma.apd for
  help.
  aroma.affymetrix v1.5.0 (2010-02-22) successfully loaded. See
  ?aroma.affymetrix\
   for help.
  Patching
  /Users/thchung/.Rpatches/aroma.affymetrix/20100331/AffymetrixCdfFile.g\
  etProbeSequenceData.R
  print(sessionInfo())
  R version 2.10.1 Patched (2010-02-01 r51089)
  x86_64-apple-darwin9.8.0
 
  locale:
  [1] C
 
  attached base packages:
  [1] stats     graphics  grDevices utils     datasets  methods   base
 
  other attached packages:
   [1] aroma.affymetrix_1.5.0 aroma.apd_0.1.7        affxparser_1.18.0
   [4] R.huge_0.2.0           aroma.core_1.5.0

Re: [aroma.affymetrix] CRMA v2 errors

2010-05-11 Thread Henrik Bengtsson
Hi.

On Tue, May 11, 2010 at 2:42 PM, Smaug72 leber.mar...@gmx.de wrote:
 Dear Henrik Bengtsson,

 we would like to use the CRMA v2 for our work.
 Unfortunately we receive errors.

 First we use a Linux system (Suse 11.2).
 I installed R (version 2.11.0) without any problems.
 Afterwards I followed your instructions on your webpage (http://aroma-
 project.org/install) to install CRMA v2.
 After the installation process I received this warning:

 In packageDescription(pkg) : no package 'DNAcopy' was found

The DNAcopy package is needed first when you do segmentation.


 Nevertheless the program works fine at the beginning.
 I followed your instructions on this page:

 http://aroma-project.org/vignettes/CRMAv2

 The analysis startup and the declaration of the raw data set work
 without errors.
 The section Step 1 - Calibration for crosstalk between allele probe
 pairs also works as long as I come to the command:

 array - 1xlim - c(-500,15000)
 Fehler: Unerwartetes Symbol in array - 1xlim

That is a cut'n'paste error when a newline gone missing; that web page
now reads:

array - 1
xlim - c(-500,15000)

just as it does a few lines down.   This is the only cause for you problems.

 plotAllelePairs(acc, array=array, pairs=1:6, what=input, xlim=xlim/3)
 Fehler in list(`plotAllelePairs(acc, array = array, pairs = 1:6, what
 = input, xlim = xli` = environment,  :
 [2010-05-11 10:40:58] Exception: Argument 'array' is not a vector:
 function
  at throw(Exception(...))
  at throw.default(sprintf(Argument '%s' is not a vector: %s, .name,
 storage.m
  at throw(sprintf(Argument '%s' is not a vector: %s, .name,
 storage.mode(x)))
  at getVector.Arguments(static, x, ..., .name = .name)
  at getVector(static, x, ..., .name = .name)
  at getNumerics.Arguments(static, ..., asMode = integer, disallow =
 disallow)
  at getNumerics(static, ..., asMode = integer, disallow = disallow)
  at getIntegers.Arguments(static, x, ..., range = range, .name
 = .name)
  at getIntegers(static, x, ..., range = range, .name = .name)
  at getIndices.Arguments(static, ..., length = length)
  at getIndices(static, ..., length = length)
  at method(static, ...)
  at Arguments$getIndex(array, range = c(1, Inf))
  at plotAllelePairs.AllelicCrosstalkCalibration(acc, array = array,
 pairs = 1:6
  at plotAllelePairs(acc, array = array, pairs = 1:6, what = input,
 xlim
 ..

 First I receive the error: unexpected symbol in array - 1xlim
 Do you have experience with this error?
 I don't know whether the following error Argument 'array' is not a
 vector occurs as a consequence of the first error.

 I removed the ^M symbols in the CEL files. But this error occurs
 with ^M and without it.

Not necessary - a good rule of thumb is that if you have to mess with
your raw data files, you are probably doing something wrong.

 Do you know an accurate dataset, which we can use as basis to test the
 software with a unix system.

Any Affymetrix data set will work the same regardless of operating
system.  The example HapMap data set using in the vignette should
work.

Hope this helps

Henrik


 I hope you find some time to answer my request.
 Thanks in advance.

 Cheers
 Markus

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] segfault while fitting plm in copy number processing using CRMAv2 algorithm

2010-05-10 Thread Henrik Bengtsson
What does

print(sessionInfo());

report after doing library(aroma.affymetrix)?

/Henrik

On Mon, May 10, 2010 at 4:47 AM, Chung Tae-Hoon hoontaech...@gmail.com wrote:
 Hi, All;



 I was trying to process Affymetrix 250K Sty SNP Chip of HapMap project using
 CRMAv2 algorithm.

 I was following the vignette on the web.

 It worked out smoothly without trouble until I got segfault error while
 fitting plm as follows:



 ## All annotation data file verification worked out fine!



 cdf - AffymetrixCdfFile$byChipType(“Mapping250K_Sty”)

 print (cdf)  ## worked fine.



 gi - getGenomeInformation(cdf)

 print (gi)  ## worked fine.



 si - getSnpInformation(cdf)

 print (si)  ## worked fine.



 acs - AromaCellSequenceFile$byChipType(getChipType(cdf))

 print (acs)  ## worked fine.



 ## step 1. declaring raw data set

 csR - AffymetrixCelSet$byName(“HapMap500K,Sty”, cdf=cdf)

 ## print (csR)  ## worked fine

 ##
 AffymetrixCelSet:

 ## Name:
 HapMap500K

 ## Tags:
 Sty

 ## Path:
 rawData/HapMap500K,Sty/Mapping250K_Sty

 ## Platform:
 Affymetrix

 ## Chip type:
 Mapping250K_Sty

 ## Number of arrays: 270


 ## Names: NA06985, NA06991, ...,
 NA19240

 ## Time period: 2005-08-31 11:28:01 -- 2005-12-09
 14:53:56

 ## Total file size: 16918.81MB


 ## RAM:
 0.35MB



 ## step 2. processing data

 ##--- Processing step 1. calibration for crosstalk between allele probe
 pairs

 acc - AllelicCrosstalkCalibration(csR, model=CRMAv2)

 ## print (acc)


 ##
 AllelicCrosstalkCalibration:

 ## Data set:
 HapMap500K

 ## Input tags:
 Sty

 ## User tags:
 *

 ## Asterisk ('*') tags:
 ACC,-XY

 ## Output tags:
 Sty,ACC,-XY

 ## Number of files: 270
 (16918.81MB)

 ## Platform:
 Affymetrix

 ## Chip type:
 Mapping250K_Sty

 ## Algorithm parameters: (rescaleBy: chr groups, targetAvg: num [1:2] 2200
 22\

 00, subsetToAvg: chr -XY, mergeShifts: logi TRUE, B: int 1, flavor: chr
 sfit\

 , algorithmParameters:List of 3, ..$ alpha: num [1:8] 0.1 0.075 0.05 0.03
 0.01\

  0.0025 0.001 0.0001, ..$ q: num 2, ..$ Q: num
 98)

 ## Output path:
 probeData/HapMap500K,Sty,ACC,-XY/Mapping250K_Sty

 ## Is done:
 FALSE

 ## RAM: 0.01MB



 csC - process(acc, verbose=verbose)

 ##
 print(csC)

 ##
 AffymetrixCelSet:

 ## Name:
 HapMap500K

 ## Tags:
 Sty,ACC,-XY

 ## Path:
 probeData/HapMap500K,Sty,ACC,-XY/Mapping250K_Sty

 ## Platform: Affymetrix


 ## Chip type:
 Mapping250K_Sty

 ## Number of arrays:
 270

 ## Names: NA06985, NA06991, ..., NA19240


 ## Time period: 2005-08-31 11:28:01 -- 2005-12-09
 14:53:56

 ## Total file size:
 16918.81MB

 ## RAM: 0.35MB




 ##--- Processing step 2. Normalization for nucleotide-position probe
 sequence effects


 bpn - BasePositionNormalization(csC, target=zero)

 ## print
 (bpn)

 ## BasePositionNormalization:

  ## Data set:
 HapMap500K

 ## Input tags:
 Sty,ACC,-XY

 ## User tags:
 *

 ## Asterisk ('*') tags:
 BPN,-XY

 ## Output tags:
 Sty,ACC,-XY,BPN,-XY

 ## Number of files: 270
 (16918.81MB)

 ## Platform:
 Affymetrix

 ## Chip type:
 Mapping250K_Sty

 ## Algorithm parameters: (unitsToFit: chr -XY, typesToFit: chr pm,
 unitsToU\

 pdate: NULL, typesToUpdate: chr pm, shift: num 0, target: chr zero,
 model: \

 chr smooth.spline, df: int
 5)

 ## Output path:
 probeData/HapMap500K,Sty,ACC,-XY,BPN,-XY/Mapping250K_Sty

 ## Is done:
 FALSE

 ## RAM:
 0.01MB



 csN - process(bpn, verbose=verbose)

 ## print (csN)


 ##
 AffymetrixCelSet:

 ## Name:
 HapMap500K

 ## Tags: Sty,ACC,-XY,BPN,-XY


 ## Path:
 probeData/HapMap500K,Sty,ACC,-XY,BPN,-XY/Mapping250K_Sty

 ## Platform: Affymetrix

 ## Chip type: Mapping250K_Sty


 ## Number of arrays:
 270

 ## Names: NA06985, NA06991, ...,
 NA19240

 ## Time period: 2005-08-31 11:28:01 -- 2005-12-09
 14:53:56

 ## Total file size:
 16918.81MB

 ## RAM:
 0.35MB



 ##--- Processing step 3. Probe
 summarization

 plm - RmaCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE)

 ## print
 (plm)

 ##
 RmaCnPlm:

 ## Data set:
 HapMap500K

 ## Chip type:
 Mapping250K_Sty

 ## Input tags:
 Sty,ACC,-XY,BPN,-XY

 ## Output tags:
 Sty,ACC,-XY,BPN,-XY,RMA,A+B

 ## Parameters: (probeModel: chr pm; shift: num 0; flavor: chr affyPLM;
 trea\

 tNAsAs: chr weights; mergeStrands: logi TRUE; combineAlleles: logi
 TRUE).

 ## Path:
 plmData/HapMap500K,Sty,ACC,-XY,BPN,-XY,RMA,A+B/Mapping250K_Sty

 ## RAM: 0.00MB




 if (length(findUnitsTodo(plm))  0) {

   ## Fit CN probes quickly (~5-10s/array + some
 overhead)

   units - fitCnProbes(plm, verbose=verbose)

   str(units)



   ## Fit remaining units, i.e. SNPs
 (~5-10min/array)

   units - fit(plm, verbose=verbose)

   str(units)

 }



 *** caught segfault ***

 address 0x104fd2020, cause 'memory not mapped'



 Traceback:

  1: .Call(R_affx_get_cel_file, filename, readHeader, readIntensities,
 rea\

 dXY, readXY, readPixels, readStdvs, readOutliers, readMasked, indices,
 as.i\

 nteger(verbose), PACKAGE = affxparser)

  2: readCel(getPathname(this), indices = idxs, readIntensities = FALSE,
 rea\

 dStdvs = TRUE, 

Re: [aroma.affymetrix] segfault while fitting plm in copy number processing using CRMAv2 algorithm

2010-05-10 Thread Henrik Bengtsson
Ok.

This could be an issue with affxparser and 64-bit OSX; recent problem
reports with affxparser has been with 64-bit OSX.

You are running R v2.10.x, which is outdated.  The new stable release
of R is R v2.11.x.  I recommend that you update, because the rest of
the community have already moved on and any bug fixes to R and
packages will be for R v2.11.x.   Updating will give you access to
newer version of package, including affxparser v1.20.0.  There has
been some fixes to affxparser, and with some luck they solve your
problem.  If you update R, then just rerun the aroma installation:

source(http://aroma-project.org/hbLite.R;);
hbInstall(aroma.affymetrix);

and several other packages will also be updated.

[
If you're really stuck with R v2.10.x, you could try installing
affxparser v1.20.0 as:

source(http://aroma-project.org/hbLite.R;);
biocLite(affxparser, rver=2.11.0);

but I really recommend to update R.
]

Let's see if that solves your problem.  If not, we have to do some
more troubleshooting...

/Henrik


On Mon, May 10, 2010 at 9:37 AM, Chung Tae-Hoon hoontaech...@gmail.com wrote:
 I'm sorry to forget providing necessary information.


 .libPaths(/Library/Frameworks/R.framework/Versions/2.10/Resources/library64
 \
 )
 library(aroma.affymetrix)
 Loading required package: R.utils
 Loading required package: R.oo
 Loading required package: R.methodsS3
 R.methodsS3 v1.2.0 (2010-03-13) successfully loaded. See ?R.methodsS3 for
 help.
 R.oo v1.7.1 (2010-03-17) successfully loaded. See ?R.oo for help.
 R.utils v1.4.0 (2010-03-24) successfully loaded. See ?R.utils for help.
 Loading required package: R.filesets
 Loading required package: digest
 R.filesets v0.8.0 (2010-02-22) successfully loaded. See ?R.filesets for
 help.
 Loading required package: aroma.core
 Loading required package: R.cache
 R.cache v0.3.0 (2010-03-13) successfully loaded. See ?R.cache for help.
 Loading required package: R.rsp
 R.rsp v0.3.6 (2009-09-16) successfully loaded. See ?R.rsp for help.
  Type browseRsp() to open the RSP main menu in your browser.
 Loading required package: matrixStats
 matrixStats v0.2.1 (2010-04-05) successfully loaded. See ?matrixStats for
 help.
 Loading required package: aroma.light
 aroma.light v1.15.1 (2009-11-01) successfully loaded. See ?aroma.light for
 help\
 .
 aroma.core v1.5.0 (2010-02-22) successfully loaded. See ?aroma.core for
 help.
 Loading required package: aroma.apd
 Loading required package: R.huge
 R.huge v0.2.0 (2009-10-16) successfully loaded. See ?R.huge for help.
 Loading required package: affxparser
 aroma.apd v0.1.7 (2009-10-16) successfully loaded. See ?aroma.apd for help.
 aroma.affymetrix v1.5.0 (2010-02-22) successfully loaded. See
 ?aroma.affymetrix\
  for help.
 Patching
 /Users/thchung/.Rpatches/aroma.affymetrix/20100331/AffymetrixCdfFile.g\
 etProbeSequenceData.R
 print(sessionInfo())
 R version 2.10.1 Patched (2010-02-01 r51089)
 x86_64-apple-darwin9.8.0

 locale:
 [1] C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
  [1] aroma.affymetrix_1.5.0 aroma.apd_0.1.7        affxparser_1.18.0
  [4] R.huge_0.2.0           aroma.core_1.5.0       aroma.light_1.15.1
  [7] matrixStats_0.2.1      R.rsp_0.3.6            R.cache_0.3.0
 [10] R.filesets_0.8.0       digest_0.4.2           R.utils_1.4.0
 [13] R.oo_1.7.1             R.methodsS3_1.2.0

 I am using 64-bit R-2.10.1 on Mac OS x.

 TH

 -Original Message-
 From: aroma-affymetrix@googlegroups.com
 [mailto:aroma-affymet...@googlegroups.com] On Behalf Of Henrik Bengtsson
 Sent: Monday, May 10, 2010 3:08 PM
 To: aroma-affymetrix
 Subject: Re: [aroma.affymetrix] segfault while fitting plm in copy number
 processing using CRMAv2 algorithm

 What does

 print(sessionInfo());

 report after doing library(aroma.affymetrix)?

 /Henrik

 On Mon, May 10, 2010 at 4:47 AM, Chung Tae-Hoon hoontaech...@gmail.com
 wrote:
 Hi, All;



 I was trying to process Affymetrix 250K Sty SNP Chip of HapMap project
 using
 CRMAv2 algorithm.

 I was following the vignette on the web.

 It worked out smoothly without trouble until I got segfault error while
 fitting plm as follows:



 ## All annotation data file verification worked out fine!



 cdf - AffymetrixCdfFile$byChipType(“Mapping250K_Sty”)

 print (cdf)  ## worked fine.



 gi - getGenomeInformation(cdf)

 print (gi)  ## worked fine.



 si - getSnpInformation(cdf)

 print (si)  ## worked fine.



 acs - AromaCellSequenceFile$byChipType(getChipType(cdf))

 print (acs)  ## worked fine.



 ## step 1. declaring raw data set

 csR - AffymetrixCelSet$byName(“HapMap500K,Sty”, cdf=cdf)

 ## print (csR)  ## worked fine

 ##
 AffymetrixCelSet:

 ## Name:
 HapMap500K

 ## Tags:
 Sty

 ## Path:
 rawData/HapMap500K,Sty/Mapping250K_Sty

 ## Platform:
 Affymetrix

 ## Chip type:
 Mapping250K_Sty

 ## Number of arrays: 270


 ## Names: NA06985, NA06991, ...,
 NA19240

 ## Time period: 2005-08-31 11:28:01 -- 2005-12-09
 14:53:56

Re: [aroma.affymetrix] error with extractSnpQSet

2010-05-06 Thread Henrik Bengtsson
Hi,

before doing anything else, please provide what

print(sessionInfo())

reports.

/Henrik

On Wed, Apr 28, 2010 at 12:43 PM, Nolwenn Le Meur nlem...@gmail.com wrote:
 Hi everyone,

 I am trying to analyze pooling-based GWAS (I am used to expression
 data but new to the GWAS field) .

 I have 2 datasets from Illumina 610SNP and Affymetrix 250K_Nsp. I have
 started with the Affy one but I am not sure my preprocessing is valid.
 I followed Marco and Henrik exchange for a start and now I would like
 to compute genotype calls using the Crlmm model I can't make it run.
 Here is my script and the errors:

 library(aroma.affymetrix)
 log - verbose - Arguments$getVerbose(-8, timestamp=TRUE)
 name - moins60-1
 chipType - c(Mapping250K_Nsp)
 ## verify cdf
 cdf - AffymetrixCdfFile$byChipType(chipType)
 ## sequence
 acs - AromaCellSequenceFile$byChipType(chipType)
 ## read in cel
 cs - AffymetrixCelSet$byName(name, chipType=chipType)
 ## normalization (note: should do something specific because pooled
 data?)
 cn - justSNPRMA.AffymetrixCelSet(cs, normalizeToHapmap=TRUE,
                 returnESet=FALSE, verbose=log)
 ## Genotype call (does not seem to work)
 crlmm - CrlmmModel(cn, tags=*,oligo)
 ## . I did copy all log because of the length
      ..$ : chr [1:18] m601-1 m601-2 m602-1 m602-2 ...
 20100428 12:28:09|   Extracting data...done
 20100428 12:28:09|   Ordering unit groups to be (sense, antisense)...
 20100428 12:28:09|    Swapping elements:
     int [1:1918] 20 58 59 100 109 178 183 207 227 237 ...
 20100428 12:28:09|   Ordering unit groups to be (sense,
 antisense)...done
 20100428 12:28:09|   Allocate and populate SnpQSet...
 Error in getClass(Class, where = topenv(parent.frame())) :
  SnpQSet is not a defined class
 20100428 12:28:09|   Allocate and populate SnpQSet...done
 20100428 12:28:09|  Extracting data...done
 20100428 12:28:09| Chunk #1 of 7...done
 20100428 12:28:09|Calling genotypes by CRLMM...done

 units3 - fit(crlmm, ram=oligo, verbose=log)
 ##.. same
 20100428 12:35:35|    Swapping elements:
     int [1:1918] 20 58 59 100 109 178 183 207 227 237 ...
 20100428 12:35:35|   Ordering unit groups to be (sense,
 antisense)...done
 20100428 12:35:35|   Allocate and populate SnpQSet...
 Error in getClass(Class, where = topenv(parent.frame())) :
  SnpQSet is not a defined class
 20100428 12:35:35|   Allocate and populate SnpQSet...done
 20100428 12:35:35|  Extracting data...done
 20100428 12:35:35| Chunk #1 of 7...done
 20100428 12:35:35|Calling genotypes by CRLMM...done
 str(units3)
 Error in str(units3) : object 'units3' not found

 and all calls are NA


 I directly tried extractSNPQSet but same thing:
 ## Extract SNPQSet require oligo
 snpqset - extractSnpQSet(cn)
 Error in getClass(Class, where = topenv(parent.frame())) :
  SnpQSet is not a defined class

 Any help or suggestion is appreciated

 Nolwenn

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] quantile normalisation - what to expect?

2010-04-23 Thread Henrik Bengtsson
Hi.

On Thu, Apr 22, 2010 at 9:26 PM, mike dewar mikede...@gmail.com wrote:
 Hi,

 I'm trying to normalize data generated by the immunological genome
 project (immgen.org). They have released raw data for 128 arrays and I
 would like to preprocess their data. I'm very new to this, so
 apologies for any obvious gaffes in what I'm about to show you.

No need to apologies; we're all learning new things all the time.

 I have
 been using aroma.affymetrix to preprocess the data, and the whole
 process occurs without error. However, when it comes time to look at
 differential expression, I'm finding that nearly /everything/ is diff
 expressed leading me to suspect that I'm doing some preprocessing
 wrong.

 My question is this: after having preprocessed the data should each of
 my arrays be similarly distributed? For example, if I plot my data on
 a QQ-plot, should it lie along the line y=x?

Yes, that's a correct expectation.  Depending on exactly how the
quantiles are normalized, you expect them to either be exactly on y =
x, or scattered around it (with or without tails behaving slightly off
the line).


 The code I'm using for the preprocessing is (pretty much copied
 verbatim from the aroma website):

 cdf - AffymetrixCdfFile$byChipType('MoGene-1_0-st-v1',tags='r3')
 cs - AffymetrixCelSet$byName(GEOid,cdf=cdf)
 # background correction
 bc - RmaBackgroundCorrection(cs)
 csBC - process(bc,verbose=verbose)
 # normalise
 qn - QuantileNormalization(csBC)
 csN - process(qn, verbose=verbose)

First, here you are using the default settings, which means that *all*
probes on the array are used in the estimation and normalization.  You
can also tell it to normalize PMs only etc.  In your case you probably
want to use:

  qn - QuantileNormalization(csBC, typesToUpdate=pm);

as also suggested in Vignette 'Gene 1.0 ST array analysis'
[http://aroma-project.org/node/38].  It makes a difference, which is
illustrated in Vignette 'Empirical probe-signal densities and
rank-based quantile normalization'
[http://aroma-project.org/node/141].

Also, when you want to validate the QN output, you should first do it
on the probe signals, because that is what is normalized here.  So,
plot the probe-signal densities before and after QN as done in the
latter vignette.

 # proble level model
 plm - RmaPlm(csN)
 fit(plm,verbose=verbose)

FYI, the default behavior is that the probe summary is done on PM
probes only, that is, the explicit equivalent to the above is:

  plm - RmaPlm(csN, probeModel=pm)

(this is why you want to also do QN on PM only).

 # extract data from the probe level model
 ces - getChipEffectSet(plm)

Note that you here are working with probe summaries, so you would not
expect perfect agreement on the empirical densities (because the QN
was done on the signals before summarization).  However, they will
probably agree well.  Try:

 plotDensity(ces);

 gene_summary - extractMatrix(ces,returnUgcMap=TRUE)
 # transform to a log scale
 gene_summary - log2(gene_summary)

 which all runs without error. However when I look at a few columns of
 the data, for the first 1000 genes using

 qqnorm(gene_summary[1:1000,1:3])

Note the difference between qqnorm() and qqplot()!   You want to use
qqplot() to compare your densities to each other, not to the normal
distribution.

/Henrik


 I get a rather curvy line that's nowhere near the line y=x. This
 doesn't agree with my (admittedly rather limited) understanding of
 what quantile normalisation is supposed to do.

 Can anyone advise? Should I be worried that I don't have a qqnorm plot
 that lies along y=x? Is it the normalisation that I should be worried
 about? Is my naivety leading me down the wrong path when it comes to
 preprocessing?

 Thanks in advance,

 Mike Dewar

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] failure of AllelicCrosstalkCalibration on R-2.11

2010-04-23 Thread Henrik Bengtsson
On Fri, Apr 23, 2010 at 2:20 AM, Karl Kornacker
kornac...@midohio.twcbc.com wrote:
 I'm running R-2.11 on 64-bit Windows 7. R-Forge does not show a Windows
 x86_64 version of sfit.

I've emailed the R-forge team asking about the Win64 plans:

  https://r-forge.r-project.org/forum/forum.php?thread_id=2420forum_id=77

More importantly, could you please let me know what happens if you do:

  source(http://aroma-project.org/hbLite.R;)
  hbInstall(aroma.affymetrix)

Do you get an error message?

What happens if you do:

  install.packages(sfit, repos=http://R-Forge.R-project.org;)

I don't have access to Windows 64-bit so I cannot test this myself.
The reason why I ask is that I believe in the special case of 'sfit'
it will still work using the 32-bit on a 64-bit system.  This is
because sfit is actually containing an executable (bin/cfit.exe), and
isn't it that Win64 can run Win32 executables?The R binaries is
just a dummy (libs/dummy.dll) that is never loaded by R.  The reason
for this rather special setup is historical and only for the sfit
package.

Could you please provide me with the above details?  Then I can make
decisions on what actions I should take next, e.g. do we need to build
a specific Win64 version now, or can we wait for r-forge to do it for
us and so on.

/Henrik

PS. The decision to (not) put 'sfit' on CRAN is not mine; the original
author (Pratyaksha Wirapati) wish to migrate to the 'expectile'
package and instead put that on CRAN.  In order to make that move, we
have to make sure to get fully sfit-reproducible results using
expectile, and we still haven't tested it well enough.  We have also
identified convergence issues with the expectile code, causing the
cross-talk calibration to fail in very rare cases when using
expectile.  I need to find enough problematic real-world cases in
order for Pratyaksha to be able to troubleshoot it.  The latter delay
is due to me.  Since sfit has way more CPU mileage, and there are no
reported problems with it, we will use that as the default in
aroma.affymetrix.



 -kk

 -Original Message-
 From: aroma-affymetrix@googlegroups.com
 [mailto:aroma-affymet...@googlegroups.com] On Behalf Of Henrik Bengtsson
 Sent: Thursday, April 22, 2010 7:46 PM
 To: aroma-affymetrix
 Subject: Re: [aroma.affymetrix] failure of AllelicCrosstalkCalibration on
 R-2.11

 Hi.

 On Thu, Apr 22, 2010 at 11:13 PM, Karl Kornacker
 kornac...@midohio.twcbc.com wrote:
 The key function AllelicCrosstalkCalibration has stealth dependencies on
 additional packages (sfit and/or expectile) which are currently
 unavailable
 for R-2.11. When might updated versions of these packages for R-2.11
 become
 available?

 What have you tried?  What platform are you using?

 If you install aroma.affymetrix as explained on

  http://www.aroma-project.org/install

 you should get 'sfit'.  If that for some reason should not work,
 'sfit' can be installed manually from r-forge, cf.

  http://r-forge.r-project.org/R/?group_id=349

 Hope this helps

 /Henrik




 Karl Kornacker







 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the
 latest
 version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/aroma-affymetrix?hl=en


 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest
 version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/aroma-affymetrix?hl=en

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2

Re: [aroma.affymetrix] failure of AllelicCrosstalkCalibration on R-2.11

2010-04-23 Thread Henrik Bengtsson
Hi.

On Fri, Apr 23, 2010 at 1:24 PM, Karl Kornacker
kornac...@midohio.twcbc.com wrote:
 Henrik,

 Here are the error messages when attempting to load 32-bit versions of sfit
 and expectile under R-2.11:

 library(sfit)
 Error: package 'sfit' was built before R 2.10.0: please re-install it

Q1. That doesn't look correct to me; it looks like you have an old
version installed from somewhere else, and not from following one of
the two installation options.  Is that correct?

Q2. What do you get if you do:

packageDescription(sfit)

That is probably pointing to a previously installed version?!


Now, if I try (on my Win32 system, which should be the default on your
Win64 system):

  install.packages(sfit, repos=http://R-Forge.R-project.org;,
type=win64.binary);

I get:

Warning in install.packages(sfit, repos = http://R-Forge.R-project.org;,  :
  argument 'lib' is missing: using 'C:\Users\hb/R/win-library/2.11'
Warning: unable to access index for repository
http://R-Forge.R-project.org/bin/windows64/contrib/2.11
Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
  package 'sfit' is not available

Q3. In other words, nothing gets installed.  Is that also what you get?

Q4. If so, please try

  install.packages(sfit, repos=http://R-Forge.R-project.org;,
type=win.binary);

Does it install now?


 library(expectile)
 Error: package 'expectile' was built before R 2.10.0: please re-install it

Don't worry about 'expectile'; you will not need it.


 This stealth dependency of Aroma.Affymetrix on unavailable packages
 remains hidden until a call to AllelicCrosstalkCalibration attempts to load
 the package specified by the flavor parameter.

That is a design decision.  Some packages are only loaded/required at
the point when it is know the user really need the feature.  I could
setup aroma.affymetrix to require that all packages should be
available/required upon install.  However, many packages are optional
(formally listed under 'Suggests' in DESCRIPTION), because they are
rarely used/only used by some people in some studies.  It would be
annoying for those to have to install packages they don't need.  The
fewer package required, the fewer potential issues you will have.

The only real alternative is to provide a validate function that a
user can use to assert that all packages, even optional ones, are
installed.  That seems to be a feature R should provide and not
aroma.* per se.  I go half way, and have hbInstall(aroma.affymetrix)
install some of the optional packages (including sfit), but if they
are not installed, 99.9% of aroma.affymetrix will still work.

Cheers,

/Henrik


 Karl

 -Original Message-
 From: aroma-affymetrix@googlegroups.com
 [mailto:aroma-affymet...@googlegroups.com] On Behalf Of Henrik Bengtsson
 Sent: Friday, April 23, 2010 6:02 AM
 To: aroma-affymetrix
 Subject: Re: [aroma.affymetrix] failure of AllelicCrosstalkCalibration on
 R-2.11

 On Fri, Apr 23, 2010 at 2:20 AM, Karl Kornacker
 kornac...@midohio.twcbc.com wrote:
 I'm running R-2.11 on 64-bit Windows 7. R-Forge does not show a Windows
 x86_64 version of sfit.

 I've emailed the R-forge team asking about the Win64 plans:

  https://r-forge.r-project.org/forum/forum.php?thread_id=2420forum_id=77

 More importantly, could you please let me know what happens if you do:

  source(http://aroma-project.org/hbLite.R;)
  hbInstall(aroma.affymetrix)

 Do you get an error message?

 What happens if you do:

  install.packages(sfit, repos=http://R-Forge.R-project.org;)

 I don't have access to Windows 64-bit so I cannot test this myself.
 The reason why I ask is that I believe in the special case of 'sfit'
 it will still work using the 32-bit on a 64-bit system.  This is
 because sfit is actually containing an executable (bin/cfit.exe), and
 isn't it that Win64 can run Win32 executables?    The R binaries is
 just a dummy (libs/dummy.dll) that is never loaded by R.  The reason
 for this rather special setup is historical and only for the sfit
 package.

 Could you please provide me with the above details?  Then I can make
 decisions on what actions I should take next, e.g. do we need to build
 a specific Win64 version now, or can we wait for r-forge to do it for
 us and so on.

 /Henrik

 PS. The decision to (not) put 'sfit' on CRAN is not mine; the original
 author (Pratyaksha Wirapati) wish to migrate to the 'expectile'
 package and instead put that on CRAN.  In order to make that move, we
 have to make sure to get fully sfit-reproducible results using
 expectile, and we still haven't tested it well enough.  We have also
 identified convergence issues with the expectile code, causing the
 cross-talk calibration to fail in very rare cases when using
 expectile.  I need to find enough problematic real-world cases in
 order for Pratyaksha to be able to troubleshoot it.  The latter delay
 is due to me.  Since sfit has way more CPU mileage, and there are no
 reported problems with it, we will use that as the default

Re: [aroma.affymetrix] failure of AllelicCrosstalkCalibration on R-2.11

2010-04-22 Thread Henrik Bengtsson
Hi.

On Thu, Apr 22, 2010 at 11:13 PM, Karl Kornacker
kornac...@midohio.twcbc.com wrote:
 The key function AllelicCrosstalkCalibration has stealth dependencies on
 additional packages (sfit and/or expectile) which are currently unavailable
 for R-2.11. When might updated versions of these packages for R-2.11 become
 available?

What have you tried?  What platform are you using?

If you install aroma.affymetrix as explained on

  http://www.aroma-project.org/install

you should get 'sfit'.  If that for some reason should not work,
'sfit' can be installed manually from r-forge, cf.

  http://r-forge.r-project.org/R/?group_id=349

Hope this helps

/Henrik




 Karl Kornacker







 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest
 version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/aroma-affymetrix?hl=en


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] Error ExtractDataFrame

2010-04-21 Thread Henrik Bengtsson
Please report your sessionInfo(). /Henrik

On Wed, Apr 21, 2010 at 11:51 AM, elodie elodie.chapeaubl...@gmail.com wrote:
 Hi,

 I try to use aroma.affymetrix for Human Exon chip with custom CDF. I
 tested several BrainArray custom CDF (refseq, ense, vegae). Before, I
 used convertCdf() to convert cdf in good format.

 With ense or vegae custom cdf, I have a error with extractDataFrame()
 method. I tested this code with only two HuEx-1_0-st-v2 chip before
 I run  an analyse on all samples (230 samples)

 My R code :

 library(aroma.affymetrix)
 library(affxparser)

 cdf - AffymetrixCdfFile$byChipType(HuEx-1_0-st-v2)
 cs - AffymetrixCelSet$byName(vessie, cdf=cdf)

 setCdf(cs,cdf)
 bc - RmaBackgroundCorrection(cs, tag=coreR2)
 verbose - Arguments$getVerbose(-8, timestamp=TRUE)
 csBC - process(bc,verbose=verbose)
 qn - QuantileNormalization(csBC, typesToUpdate=pm)
 csN - process(qn, verbose=verbose)
 getCdf(csN)
 plmEx - ExonRmaPlm(csN, mergeGroups=FALSE)
 fit(plmEx, verbose=verbose, force=TRUE) #changement
 cesEx - getChipEffectSet(plmEx)
 ExFitdf - extractDataFrame(cesEx, units=NULL, addNames=TRUE)

 The last line returns this error :

 Erreur dans list(`extractDataFrame(cesEx, units = NULL, addNames =
 TRUE)` = environment,  :

 [2010-04-21 11:43:33] Exception: Range of argument 'indices' is out of
 range [1,262144]: [1,304497]
  at throw(Exception(...))
  at throw.default(sprintf(Range of argument '%s' is out of range [%s,
 %s]: [%s,
  at throw(sprintf(Range of argument '%s' is out of range [%s,%s]:
 [%s,%s], .n
  at getNumerics.Arguments(static, ..., asMode = integer, disallow =
 disallow)
  at getNumerics(static, ..., asMode = integer, disallow = disallow)
  at getIntegers.Arguments(static, x, ..., range = range, .name
 = .name)
  at getIntegers(static, x, ..., range = range, .name = .name)
  at method(static, ...)
  at Arguments$getIndices(indices, max = nbrOfCells, disallow = NaN)
  at readRawData.AffymetrixCelFile(this, ...)
  at readRawData(this, ...)
  at getData.AffymetrixCelFile(this, indices = map[, cell], fields =
 celFields
  at getData(this, indices = map[, cell], fields =
 celFields[fields])
  at withCallingHandlers(expr, warning = function(w)
 invokeRestart(muffleWarnin
  at


 Can you help me to identify the problem and find a solution ?


 Thanks,



 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] time to read the sequences of one chromosome in GWS6.0

2010-04-20 Thread Henrik Bengtsson
Hi.

On Tue, Apr 20, 2010 at 10:16 AM, mortiz mortiz...@gmail.com wrote:
 hi everyone,

 I need to read the sequences of the probes from the GWS6.0 chip and it
 has taken me more than 12 hours to do it only for 2 chromosomes.  Im
 guessing im doing something wrong, because the basepairnormalization
 has to do the same and it doesnt take this long. this is what im
 doing:

 sessionInfo()
 R version 2.10.1 (2009-12-14)
 i386-pc-mingw32

 locale:
 [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
 LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
 LC_TIME=Spanish_Spain.1252

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods
 base

 other attached packages:
  [1] MASS_7.3-4             aroma.affymetrix_1.5.0
 aroma.apd_0.1.7        affxparser_1.18.0      R.huge_0.2.0
 aroma.core_1.5.0       aroma.light_1.15.1
  [8] matrixStats_0.1.9      R.rsp_0.3.6
 R.cache_0.2.0          R.filesets_0.8.0       digest_0.4.2
 R.utils_1.3.3          R.oo_1.6.7
 [15] R.methodsS3_1.1.0

 loaded via a namespace (and not attached):
 [1] tools_2.10.1

 for (ii in 1:22){
 units - getUnitsOnChromosome(gi, ii);
  cells - getCellIndices(cdf, units=units)
  auxSeqs - applyCdfGroups(cells, function(groups) lapply(groups,
 function(group) {
  readSequenceMatrix(acs, cells=group$indices)}))
 }

A good rule of thumb in R is that whenever you use an apply function
over a large number of elements you are most likely doing something
very slow.

More importantly (I think), in the above code, you are accessing the
probe sequence file for every single unit group, and on SNP6 there are
approximately 900,000*2+900,000 = 2,700,000 unit groups.  Reading from
file has some overhead, so you want to do as few requests as possible.

Instead, read all of the sequence matrix first and then do your
subsetting in memory.  (In aroma.* we do this in chunks, but the idea
is the same.  We never read it one unit at the time.).

Thus, the first speed up would be to do:

acsData - readSequenceMatrix(acs);  # One request instead of 2,7 millions.

for (ii in 1:22) {
 units - getUnitsOnChromosome(gi, chromosome=ii);
 cells - getCellIndices(cdf, units=units);
 auxSeqs - applyCdfGroups(cells, function(groups) {
   lapply(groups, function(group) {
 cells - group$indices;
     acsData[cells,,drop=FALSE];
   });
 });
} # for (ii ...)

That should speed things up.  You still have too levels of apply:s
that slows things down.  What are you trying to do?  Do you need the
data in such a nested list structure?

If this is specific to SNP  CN chip types, you can write your
algorithm/code to deal with SNPs and (single-cell) CN loci separately.

/Henrik




 if anyone knows a faster way to do it, please let me know :)

 thanks ;)

 maria

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] problem with gcrma using HG-U133_Plus_2 CDF from affymetrix

2010-04-15 Thread Henrik Bengtsson
Hi.

On Thu, Apr 15, 2010 at 10:56 AM, step...@mnemosyne.co.uk
step...@mnemosyne.co.uk wrote:
 Dear aroma users,
 I am trying to run gcrma across a collection of human breast cell line CEL
 files without success - code has worked in previous aroma versions, and is
 still contemporary with documented instructions at
 http://www.aroma-project.org .
 From a freshly started R instance - CDF file binary formatted and straight
 from Affymetrix. Normalisation process is fine with e.g. RMA, but for
 consistency with a related project I would prefer the results using gcrma.
 The run returns a simple missing object 'pb' error.
 library(aroma.affymetrix)
 cs - AffymetrixCelSet$byName(Breast, chipType=HG-U133_Plus_2);
  bc - GcRmaBackgroundCorrection(cs);
 csB - process(bc);
 Error in reset(pb) : object 'pb' not found

Setting the 'verbose' argument to anything by FALSE, should workaround
this bug, e.g.

 csB - process(bc, verbose=0);

will give minimal output information (only a progressbar).  If you
don't mind the verbose output, use verbose=TRUE or similar.

Details:
The bug is in computeAffinities() for AffymetrixCdfFile. This bug has
probably been around for a very long time, so I suspect that when you
say code has worked in previous aroma versions, it could be that you
did set the 'verbose' argument before [though I've been wrong before].
 Almost all our redundancy tests are turning on the verbose output,
which is why this passed unnoticed.

BTW, for next time, make sure to also report traceback() after getting
an error. That helps narrowing down the issue.

This bug will be fixed for the next release; then you can skip
'verbose' and not even the progress bar will be outputted.

Hope this helps

/Henrik

 sessionInfo()
 R version 2.10.1 (2009-12-14)
 x86_64-unknown-linux-gnu

 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8    LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
  [1] aroma.affymetrix_1.5.0 aroma.apd_0.1.7    affxparser_1.18.0
  [4] R.huge_0.2.0   aroma.core_1.5.0   aroma.light_1.15.2
  [7] matrixStats_0.1.9  R.rsp_0.3.6    R.cache_0.3.0
 [10] R.filesets_0.8.0   digest_0.4.2   R.utils_1.3.3
 [13] R.oo_1.7.1 R.methodsS3_1.2.0

 loaded via a namespace (and not attached):
 [1] splines_2.10.1 tools_2.10.1
 bc
 GcRmaBackgroundCorrection:
 Data set: Breast
 Input tags:
 User tags: *
 Asterisk ('*') tags: GRBC
 Output tags: GRBC
 Number of files: 84 (1085.65MB)
 Platform: Affymetrix
 Chip type: HG-U133_Plus_2
 Algorithm parameters: (subsetToUpdate: NULL, typesToUpdate: chr pm,
 indicesNeg
 ativeControl: NULL, affinities: NULL, type: chr fullmodel, opticalAdjust:
 logi
  TRUE, gsbAdjust: logi TRUE, gsbParameters: NULL)
 Output path: probeData/Breast,GRBC/HG-U133_Plus_2
 Is done: FALSE
 RAM: 0.00MB
 Is there any obvious reason for this FUBAR - do I wait for the next release
 of aroma?
 Cheers - greetings from Sunny Finland
 Stephen



 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest
 version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/aroma-affymetrix?hl=en


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en

To unsubscribe, reply using remove me as the subject.


Re: [aroma.affymetrix] Re: GCRMA normalization with MoEx-1_0-st-v1

2010-04-07 Thread Henrik Bengtsson
Hi,

sorry for not being clear; I never made the fix available, because I
though it would help anyway, because don't want to use the standard
CDF for this chip type either way.  However, I realized that you can
of course use it for the gcRMA background correction step, and from
there use one of the custom CDFs.  For example:

library(aroma.affymetrix);

verbose - Arguments$getVerbose(-10, timestamp=TRUE);


dataSet - Affymetrix-Tissues;
chipType - MoEx-1_0-st-v1;

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Setup data set
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
cdf - AffymetrixCdfFile$byChipType(chipType, tags=coreR1,A20080718,MR);
print(cdf);
csR - AffymetrixCelSet$byName(dataSet, chipType=chipType);
print(csR);

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# gcRMA-style background correction
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Currently, you must use the standard CDF file.
cdf - getCdf(csR);
cdfS - AffymetrixCdfFile$byChipType(getChipType(cdf, fullname=FALSE));
setCdf(csR, cdfS);
bc - GcRmaBackgroundCorrection(csR, type=affinities);
print(bc);
csB - process(bc, verbose=verbose);
print(csB);
# Now, use the custom CDF in what follows
setCdf(csB, cdf);
print(csB);

(The above is now be part of the redundancy test suite of aroma.affymetrix).

In order to install the patch, follow the instructions on
http://aroma-project.org/howtos/updateOrPatch

/Henrik

On Wed, Mar 31, 2010 at 4:37 PM, Gil Tomás gil@gmail.com wrote:
 Thanks for your reply.
 I've downloaded the MoEx-1_0-st-v1.cdf (binary version of the
 unsupported CDF file from Affymetrix) from 
 http://www.aroma-project.org/node/31.
 I used it to run the analysis and here's what I got:

 **
 R ## * gcrma normalization
 R cdf - AffymetrixCdfFile$byChipType (MoEx-1_0-st-v1) # taken from
 http://www.aroma-project.org/node/31
 R cs - AffymetrixCelSet$byName (affy-brain-dev, cdf = cdf) #
 produces cell set class object
 R
 R bc - GcRmaBackgroundCorrection (cs)
 R csB - process (bc, verbose = -10) # as suggested by Henrik
 Bengtsson
 Background correcting data set...
  Computing probe affinities...
  Computing GCRMA probe affinities for 1257006 units...
   Identify PMs and MMs among the CDF cell indices...
     logi [1:5266159] TRUE TRUE TRUE TRUE TRUE TRUE ...
       Mode   FALSE    TRUE    NA's
    logical  334476 4931683       0
    MMs are defined as non-PMs
    Number of PMs: 4931683
    Number of MMs: 334476
   Identify PMs and MMs among the CDF cell indices...done
   Reading probe-sequence data...
    Retrieving probe-sequence data...
     Chip type (full): MoEx-1_0-st-v1
     Locating probe-tab file...
      Chip type: MoEx-1_0-st-v1
      AffymetrixProbeTabFile:
      Name: MoEx-1_0-st-v1
      Tags:
      Full name: MoEx-1_0-st-v1
      Pathname: annotationData/chipTypes/MoEx-1_0-st-v1/NetAffx/
 MoEx-1_0-st-v1.probe.tab
      File size: 460.47 MB (482839635 bytes)
      RAM: 0.01 MB
      Number of data rows: NA
      Columns [12]: 'probeID', 'probeSetID', 'probeXPos', 'probeYPos',
 'assembly', 'seqname', 'start', 'stop', 'strand', 'probeSequence',
 'targetStrandedness', 'category'
      Number of text lines: NA
      AffymetrixCdfFile:
      Path: annotationData/chipTypes/MoEx-1_0-st-v1/MoEx-1_0-st-v1
      Filename: MoEx-1_0-st-v1.cdf
      Filesize: 274.30MB
      Chip type: MoEx-1_0-st-v1
      RAM: 0.00MB
      File format: v4 (binary; XDA)
      Dimension: 2560x2560
      Number of cells: 6553600
      Number of units: 1257006
      Cells per unit: 5.21
      Number of QC units: 0
     Locating probe-tab file...done
     Validating probe-tab file against CDF...
       chr Unit name: 
 Error in list(`process(bc, verbose = -10)` = environment,
 `process.GcRmaBackgroundCorrection(bc, verbose = -10)` =
 environment,  :

 [2010-03-31 16:27:37] Exception: Either argument 'names' or 'pattern'
 must be specified.
  at throw(Exception(...))
  at throw.default(Either argument 'names' or 'pattern' must be
 specified.)
  at throw(Either argument 'names' or 'pattern' must be specified.)
  at indexOf.UnitNamesFile(this, names = unitName)
  at indexOf(this, names = unitName)
  at getProbeSequenceData.AffymetrixCdfFile(this, safe = safe, verbose
 = verbose
  at getProbeSequenceData(this, safe = safe, verbose = verbose)
  at computeAffinities.AffymetrixCdfFile(cdf, paths = probePath, ...,
 verbose =
  at computeAffinities(cdf, paths = probePath, ..., verbose =
 less(verbose))
  at bgAdjustGcrma.AffymetrixCelSet(NA, path = probeData/affy-brain-
 dev,GRBC/Mo
  at bgAdjustGcrma(NA, path = probeData/affy-brain-dev,GRBC/MoEx-1_0-
 st-v1, ve
  at do.call(bgAdjustGcrma, args = args)
  at process.GcRmaBackgroundCorrection(bc, verbose = -10)
  at process(bc, verbose = -10)
 In addition: Warning message:
 In readDataFrame.TabularTextFile(ptf, colClassPatterns = c(`^unitName
 $` = character),  :
  Argument 'rows' was out of range [1,0]. Ignored rows beyond this
 range

Re: [aroma.affymetrix] Re: GCRMA normalization with MoEx-1_0-st-v1

2010-03-31 Thread Henrik Bengtsson
Thanks.

I've located and identified the problem.  I fixed it for the case when
you use the default CDF from Affymetrix.  Unfortunately, it won't work
in the case you use custom CDF, as you do.   To solve that, we need to
make more updates and I've already taken some actions for that, but
this will take weeks before it's ready.  Maybe it will be ready for
the next big release of aroma.affymetrix.

/Henrik

On Tue, Mar 30, 2010 at 3:10 PM, Gil Tomás gil@gmail.com wrote:
 Sorry for the delayed reply:

 **
 R ## * gcrma normalization
 R cdf - AffymetrixCdfFile$byChipType (MoEx-1_0-st-
 v1,fullR1,A20080718,MR) # taken from http://www.aroma-project.org/node/31
 R cs - AffymetrixCelSet$byName (affy-brain-dev, cdf = cdf) #
 produces cell set class object
 R
 R bc - GcRmaBackgroundCorrection (cs)
 R csB - process (bc, verbose = -10) # as suggested by Henrik
 Bengtsson
 Background correcting data set...
  Computing probe affinities...
  Computing GCRMA probe affinities for 265508 units...
   Identify PMs and MMs among the CDF cell indices...
     logi [1:4565541] TRUE TRUE TRUE TRUE TRUE TRUE ...
       Mode    TRUE    NA's
    logical 4565541       0
    MMs are defined as non-PMs
    Number of PMs: 4565541
    Number of MMs: 0
   Identify PMs and MMs among the CDF cell indices...done
   Reading probe-sequence data...
    Retrieving probe-sequence data...
     Chip type (full): MoEx-1_0-st-v1,fullR1,A20080718,MR
     Locating probe-tab file...
      Chip type: MoEx-1_0-st-v1
      AffymetrixProbeTabFile:
      Name: MoEx-1_0-st-v1
      Tags:
      Full name: MoEx-1_0-st-v1
      Pathname: annotationData/chipTypes/MoEx-1_0-st-v1/NetAffx/
 MoEx-1_0-st-v1.probe.tab
      File size: 460.47 MB (482839635 bytes)
      RAM: 0.01 MB
      Number of data rows: NA
      Columns [12]: 'probeID', 'probeSetID', 'probeXPos', 'probeYPos',
 'assembly', 'seqname', 'start', 'stop', 'strand', 'probeSequence',
 'targetStrandedness', 'category'
      Number of text lines: NA
      AffymetrixCdfFile:
      Path: annotationData/chipTypes/MoEx-1_0-st-v1/MoEx-1_0-st-
 v1,fullR1,A20080718,MR.cdf
      Filename: MoEx-1_0-st-v1,fullR1,A20080718,MR.cdf
      Filesize: 176.32MB
      Chip type: MoEx-1_0-st-v1,fullR1,A20080718,MR
      RAM: 0.00MB
      File format: v4 (binary; XDA)
      Dimension: 2560x2560
      Number of cells: 6553600
      Number of units: 265508
      Cells per unit: 24.68
      Number of QC units: 1
     Locating probe-tab file...done
     Validating probe-tab file against CDF...
       chr Unit name: 
 Error in list(`process(bc, verbose = -10)` = environment,
 `process.GcRmaBackgroundCorrection(bc, verbose = -10)` =
 environment,  :

 [2010-03-30 15:07:34] Exception: Either argument 'names' or 'pattern'
 must be specified.
  at throw(Exception(...))
  at throw.default(Either argument 'names' or 'pattern' must be
 specified.)
  at throw(Either argument 'names' or 'pattern' must be specified.)
  at indexOf.UnitNamesFile(this, names = unitName)
  at indexOf(this, names = unitName)
  at getProbeSequenceData.AffymetrixCdfFile(this, safe = safe, verbose
 = verbose
  at getProbeSequenceData(this, safe = safe, verbose = verbose)
  at computeAffinities.AffymetrixCdfFile(cdf, paths = probePath, ...,
 verbose =
  at computeAffinities(cdf, paths = probePath, ..., verbose =
 less(verbose))
  at bgAdjustGcrma.AffymetrixCelSet(NA, path = probeData/affy-brain-
 dev,GRBC/Mo
  at bgAdjustGcrma(NA, path = probeData/affy-brain-dev,GRBC/MoEx-1_0-
 st-v1, ve
  at do.call(bgAdjustGcrma, args = args)
  at process.GcRmaBackgroundCorrection(bc, verbose = -10)
  at process(bc, verbose = -10)
 In addition: Warning message:
 In readDataFrame.TabularTextFile(ptf, colClassPatterns = c(`^unitName
 $` = character),  :
  Argument 'rows' was out of range [1,0]. Ignored rows beyond this
 range.
     Validating probe-tab file against CDF...done
    Retrieving probe-sequence data...done
   Reading probe-sequence data...done
  Computing GCRMA probe affinities for 265508 units...done
  Computing probe affinities...done
 Background correcting data set...done
 **


 On Mar 25, 7:17 pm, Henrik Bengtsson henrik.bengts...@gmail.com
 wrote:
 What's the verbose output, if you do:

 csB - process(bc, verbose=-10)

 /H



 On Thu, Mar 25, 2010 at 4:34 PM, Gil Tomás gil@gmail.com wrote:
  Thank you much for your reply Henrik. Each of your comments were very
  enlightening. First of all, the documentation source I'm following is
  that from the official aroma.affymetrix site, particularly the section
  documenting reproducible research for the gcRMA code (http://aroma-
  project.org/replication/gcRMA).
  I then redefined the filesystem of the project according to your
  instructions (keeping the cdf file with the tag comma separated
  nomenclature) and reran the code:

  **
  R cdf - AffymetrixCdfFile$byChipType (MoEx-1_0-st-
  v1,fullR1,A20080718,MR) # taken fromhttp://www.aroma-project.org/node/31
  R cs - AffymetrixCelSet$byName (affy-brain-dev

[aroma.affymetrix] FYI: If the aroma.affymetrix mailing list goes down...

2010-03-29 Thread Henrik Bengtsson
Hi,

in a few moments, I will update a few pages on the aroma.affymetrix
Google Group so that they point to the new website
http://www.aroma-project.org/.  This may cause the aroma.affymetrix
mailing list/forum to go down, meaning if you try to post a message a
message will bounce back to you with an error message.  If this
happens, we will try to fix it asap, but if so, we will be in hands of
Google to solve it.  Make sure to follow updates on:

http://www.aroma-project.org/

For a background and reasons for previous forum hiccups, see

http://aroma-project.org/forum/GoogleGroup/KnownIssues

Hopefully nothing goes wrong...

/Henrik

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en

To unsubscribe from this group, send email to 
aroma-affymetrix+unsubscribegooglegroups.com or reply to this email with the 
words REMOVE ME as the subject.


Re: [aroma.affymetrix] saturated Affy 500K SNP array signals?

2010-03-27 Thread Henrik Bengtsson
Hi.

On Fri, Mar 26, 2010 at 9:17 PM, Louie van de Lagemaat
louie...@gmail.com wrote:
 Hi Henrik et al,

 I have been reanalyzing an older dataset of Mapping500K (Nsp+Sty)
 arrays for CNVs using aroma, and this works in general really well.
 However, I have noticed that overall many individuals in this
 particular dataset appear to have CNVs highly skewed toward deletions.
  Only a few individuals seem to have the expected balance of
 insertions  and deletions.

It is not clear from your description what the problem is.  What to do
you mean by skewed toward deletions?  Do you mean there are a higher
number of deleted regions, or do you mean that the CN mean levels are
shifted down away from CN=2, or something else?


 Is it possible that the samples that show almost exclusively deletions
 are saturated?  If the arrays are indeed saturated, is there a way
 around this that is implemented in aroma?   Or is there a better
 explanation?  Some signal intensity distributions are in the png
 attached.

To me those distributions looks alright; could you explain which one
of those plots looks funny to you and why you think so, and I might be
able to clarify further.

/Henrik


 Thanks in advance for any help or ideas you can offer,

 Louie van de Lagemaat
 Sanger Fellow
 Wellcome Trust Sanger Institute
 Hinxton, Cambridge CB10 1SA

 ---

 # PS, here's the script I use:

 library(aroma.affymetrix)
 library(aroma.cn)
 verbose - Arguments$getVerbose(-8, timestamp=TRUE)
 setOption(aromaSettings, memory/ram, 50)
 setOption(aromaSettings, memory/gcArrayFrequency, 20)

 cs - AffymetrixCelSet$byName(ProjName, chipType=Mapping250K_Nsp)
 IntermediateResults[[csNsp]] - extract(cs, !isDuplicated(cs))
 cs - AffymetrixCelSet$byName(ProjName, chipType=Mapping250K_Sty)
 IntermediateResults[[csSty]] - extract(cs, !isDuplicated(cs))

 # ACC
 acc - AllelicCrosstalkCalibration(IntermediateResults[[csNsp]],
 model=CRMAv2)
 IntermediateResults[[csCNsp]] - process(acc, verbose=verbose)
 acc - AllelicCrosstalkCalibration(IntermediateResults[[csSty]],
 model=CRMAv2)
 IntermediateResults[[csCSty]] - process(acc, verbose=verbose)

 # BPN
 bpn - BasePositionNormalization(IntermediateResults[[csCNsp]], 
 target=zero)
 IntermediateResults[[csNNsp]] - process(bpn, verbose=verbose)
 bpn - BasePositionNormalization(IntermediateResults[[csCSty]], 
 target=zero)
 IntermediateResults[[csNSty]] - process(bpn, verbose=verbose)

 # QN
 qn - QuantileNormalization(IntermediateResults[[csNNsp]])
 IntermediateResults[[csQNsp]] - process(qn, verbose=verbose)
 qn - QuantileNormalization(IntermediateResults[[csNSty]])
 IntermediateResults[[csQSty]] - process(qn, verbose=verbose)

 # probe level model
 plm - RmaCnPlm(IntermediateResults[[csQNsp]], combineAlleles=TRUE,
 mergeStrands=TRUE)
 fit(plm, verbose=verbose)
 IntermediateResults[[plmNsp]] = plm
 plm - RmaCnPlm(IntermediateResults[[csQSty]], combineAlleles=TRUE,
 mergeStrands=TRUE)
 fit(plm, verbose=verbose)
 IntermediateResults[[plmSty]] = plm

 # fragment length normalization
 cesNList - list()
 ces - getChipEffectSet(IntermediateResults[[plmNsp]])
 fln - FragmentLengthNormalization(ces, target=zero)
 cesNList[[Mapping250K_Nsp]] - process(fln, verbose=verbose)
 ces - getChipEffectSet(IntermediateResults[[plmSty]])
 fln - FragmentLengthNormalization(ces, target=zero)
 cesNList[[Mapping250K_Sty]] - process(fln, verbose=verbose)

 # get male reference - an all-male sample
 IntermediateResults[[ceRefNspM]] -
 calculateBaseline(cesNList[[Mapping250K_Nsp]], chromosomes=1:22,
 ploidy=2, defaultPloidy=2, verbose=verbose)
 IntermediateResults[[ceRefStyM]] -
 calculateBaseline(cesNList[[Mapping250K_Sty]], chromosomes=1:22,
 ploidy=2, defaultPloidy=2, verbose=verbose)
 IntermediateResults[[ceRefNspM]] -
 calculateBaseline(cesNList[[Mapping250K_Nsp]], chromosomes=23,
 ploidy=1, defaultPloidy=1, verbose=verbose)
 IntermediateResults[[ceRefStyM]] -
 calculateBaseline(cesNList[[Mapping250K_Sty]], chromosomes=23,
 ploidy=1, defaultPloidy=1, verbose=verbose)

 # call CNVs using both CBS and GLAD, for comparison
 CbsSegM - CbsModel(cesNList,
   list(Mapping250K_Nsp = IntermediateResults[[ceRefNspM]],
 Mapping250K_Sty = IntermediateResults[[ceRefStyM]]))
 writeRegions(CbsSegM, chromosomes = 1:23, verbose = verbose)

 GladSegM - GladModel(cesNList,
   list(Mapping250K_Nsp = IntermediateResults[[ceRefNspM]],
 Mapping250K_Sty = IntermediateResults[[ceRefStyM]]))
 writeRegions(GladSegM, chromosomes = 1:23, verbose = verbose)

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send 

Re: [aroma.affymetrix] Alternatives to quantile normalization? (Was: Re: [aroma.affymetrix] ProbeLevelTransform subclasses: how to use)

2010-03-27 Thread Henrik Bengtsson
Hi,

can you post your complete script where you go from CEL files to how
you generate those MA plots?

/Henrik

On Sat, Mar 27, 2010 at 12:57 AM, Richard Beyer rpbe...@gmail.com wrote:
 Hi Henrik,

 I have attached 3 png files.  Two density plots of raw intensities:
 for all probes and for pm probes:

 name = AndersonRatST_10.03.12
 chip=RaGene-1_0-st-v1
 checkChipType=FALSE
 cs - AffymetrixCelSet$byName(name, chipType=chip, 
 checkChipType=checkChipType)

 graphics.off()
 png(file=Anderson rat ST raw intensity all probes 26mar10.png,width
 = 1240, height = 1240, units = px, pointsize = 16, bg = white, res
 = NA)
   plotDensity(cs,col=cols1,ylim=c(0,0.6),xlim=c(4,15),main=raw
 intensity all probes)
   legend(10,0.6,legend= as.character(paste(grps,1:35)),fill=cols1,ncol=3)
 graphics.off()
 png(file=Anderson rat ST raw intensity pm probes 26mar10.png,width =
 1240, height = 1240, units = px, pointsize = 16, bg = white, res =
 NA)
    plotDensity(cs,col=cols1,ylim=c(0,0.4),types=pm,xlim=c(4,12),main=raw
 intensity pm probes)
    legend(10,0.4,legend= as.character(paste(grps,1:35)),fill=cols1,ncol=3)
 graphics.off()

 I also attached a png file that is a MA plot for the output of the
 limma analysis.  In this experiment there are 7 groups: 2A   2B
 2C   4A   4B   4C   Sham. ( I didn't use your more elegant
 way of generating the plots, I just plotted the topTable results from
 limma.)

 The group names for the various contrasts are also shown in the
 density plots.  There are 5 arrays for each group.  The weird MA plots
 are the middle two in the bottom row of the MA plots png file. They
 are for the contrasts: 4A-4B and 4A-4C.  Actually all 12 MA plots look
 a bit weird, meaning the lens shaped cloud of points is not centered
 about the zero line.  We have exhaustively checked the wet lab QC and
 everything looks good.

 I appreciate you having a look at these figures.  Please let me know
 if you can suggest some further analysis.  I was even wondering about
 doing quantile normalization at the probeset level, rather than the
 probe level.  I am puzzled.

 Thanks very much,
 Dick

 On Fri, Mar 26, 2010 at 2:18 AM, Henrik Bengtsson
 henrik.bengts...@gmail.com wrote:
 Hi,

 On Thu, Mar 25, 2010 at 7:20 PM, Richard Beyer rpbe...@gmail.com wrote:
 Hi Henrik,

 I'm quite enjoying aroma-project.org.  Thanks for your detailed help.
 I am making some progress now.  I think my dataset has some issues
 that are becoming clearer.  I'm trying QuantileNormalization(csBC,
 typesToUpdate=pm, tags=c(*, type)), using just the pm probes.
 Maybe this will help.

 so when doing that, what does the density plots looks afterward, and
 more interestingly, what does the M vs A plots for the PM signals look
 like?

 You can get hold of the index vector for the PM probes by:

 cdf - getCdf(cs);
 cells - getCellIndices(cdf, stratifyBy=pm, unlist=TRUE, useNames=FALSE);

 Then you can plot the M vs A for any pair of CEL files as:

 cfT - getFile(cs, 1);
 cfR - getFile(cs, 2);
 smoothScatterMvsA(cfT, cfR, indices=cells);

 (or plotMvsA(...) for a scatter plot).

 If you want to compare to the pool of all arrays, do:

 cfR - getAverageFile(cs);

 /Henrik

 PS. Please post PNGs instead of PDFs, because they are smaller.


 I've attached a pdf file that shows the raw intensities with and
 without the control spots:

   plotDensity(cs,col=rainbow(35),ylim=c(0,0.7))
   legend(0,0.7,legend= as.character(1:35),fill=rainbow(35))
   plotDensity(cs,col=rainbow(35),ylim=c(0,0.7),types=pm)
   legend(0,0.7,legend= as.character(1:35),fill=rainbow(35))

 Cheers,
 Dick

 On Thu, Mar 25, 2010 at 10:06 AM, Henrik Bengtsson
 henrik.bengts...@gmail.com wrote:
 Hi.

 On Thu, Mar 25, 2010 at 5:02 PM, Richard Beyer rpbe...@gmail.com wrote:
 Hi Henrik,

 Thanks very much for your help.  I saw your post about the new
 documentation link after I wrote my question.  So I will look through
 that.

 What you are saying is very helpful even though I wasn't thinking
 quite so large scale or ambitious.

 Ok.


 I have an immediate problem with seeing very weird results on a
 dataset of 35 rat ST arrays.  When I run the
 RmaBackgroundCorrection(), QuantileNormalization() normalized data
 through limma and do MA plots of one group against another, the main
 cloud of data points is not the expected lens shape centered about
 the origin.  The shape is more like a s-wave with part of the cloud of
 points above and part below the zero line.  All the arrays pass Affy
 QC as done by Expression Console and they seem fine when I plot NUSE
 and RLE.  Judging from the shapes seen in the MA plots, my first
 reaction is that the assumption of most-probesets-are-unchanged is not
 being enforced by the quantile normalization step.  So, I wanted to
 just try a few reasonable alternatives to the quantile normalization
 step.  In addition, I think I've seen less pronounced versions of
 these s-wave shapes in MA plots from ST arrays in other data sets, but
 not nearly so pronounced

Re: [aroma.affymetrix] Alternatives to quantile normalization? (Was: Re: [aroma.affymetrix] ProbeLevelTransform subclasses: how to use)

2010-03-26 Thread Henrik Bengtsson
Hi,

On Thu, Mar 25, 2010 at 7:20 PM, Richard Beyer rpbe...@gmail.com wrote:
 Hi Henrik,

 I'm quite enjoying aroma-project.org.  Thanks for your detailed help.
 I am making some progress now.  I think my dataset has some issues
 that are becoming clearer.  I'm trying QuantileNormalization(csBC,
 typesToUpdate=pm, tags=c(*, type)), using just the pm probes.
 Maybe this will help.

so when doing that, what does the density plots looks afterward, and
more interestingly, what does the M vs A plots for the PM signals look
like?

You can get hold of the index vector for the PM probes by:

cdf - getCdf(cs);
cells - getCellIndices(cdf, stratifyBy=pm, unlist=TRUE, useNames=FALSE);

Then you can plot the M vs A for any pair of CEL files as:

cfT - getFile(cs, 1);
cfR - getFile(cs, 2);
smoothScatterMvsA(cfT, cfR, indices=cells);

(or plotMvsA(...) for a scatter plot).

If you want to compare to the pool of all arrays, do:

cfR - getAverageFile(cs);

/Henrik

PS. Please post PNGs instead of PDFs, because they are smaller.


 I've attached a pdf file that shows the raw intensities with and
 without the control spots:

   plotDensity(cs,col=rainbow(35),ylim=c(0,0.7))
   legend(0,0.7,legend= as.character(1:35),fill=rainbow(35))
   plotDensity(cs,col=rainbow(35),ylim=c(0,0.7),types=pm)
   legend(0,0.7,legend= as.character(1:35),fill=rainbow(35))

 Cheers,
 Dick

 On Thu, Mar 25, 2010 at 10:06 AM, Henrik Bengtsson
 henrik.bengts...@gmail.com wrote:
 Hi.

 On Thu, Mar 25, 2010 at 5:02 PM, Richard Beyer rpbe...@gmail.com wrote:
 Hi Henrik,

 Thanks very much for your help.  I saw your post about the new
 documentation link after I wrote my question.  So I will look through
 that.

 What you are saying is very helpful even though I wasn't thinking
 quite so large scale or ambitious.

 Ok.


 I have an immediate problem with seeing very weird results on a
 dataset of 35 rat ST arrays.  When I run the
 RmaBackgroundCorrection(), QuantileNormalization() normalized data
 through limma and do MA plots of one group against another, the main
 cloud of data points is not the expected lens shape centered about
 the origin.  The shape is more like a s-wave with part of the cloud of
 points above and part below the zero line.  All the arrays pass Affy
 QC as done by Expression Console and they seem fine when I plot NUSE
 and RLE.  Judging from the shapes seen in the MA plots, my first
 reaction is that the assumption of most-probesets-are-unchanged is not
 being enforced by the quantile normalization step.  So, I wanted to
 just try a few reasonable alternatives to the quantile normalization
 step.  In addition, I think I've seen less pronounced versions of
 these s-wave shapes in MA plots from ST arrays in other data sets, but
 not nearly so pronounced as this one.

 The end result is I'm stuck and puzzled.

 In theory quantile normalization should do a decent job of making the
 log-ratios independent of the log-intensities, that is, the cloud in
 an M vs. A scatter plot should be fairly straight (with the possible
 exception at very weak signals or very large signals).  If you really
 want to dive into the arguments, see:

 H. Bengtsson  O. Hössjer, Methodological study of affine
 transformations of gene expression data with proposed robust
 non-parametric multi-dimensional normalization method. BMC
 Bioinformatics, 2006. [http://www.aroma-project.org/publications]

 Since you don't get this, I would first make sure that you are
 plotting the same data points that you are normalizing.  Note that
 quantile normalization can be done on PMs only, on all probes etc.
 See page 'Empirical probe-signal densities and rank-based quantile
 normalization' for how different settings give different normalization
 outputs.

 Hope this helps

 /Henrik

 PS. It is possible to attached PNGs to emails to this list; you may
 want to share your figures.


 Thanks again,
 Dick

 On Thu, Mar 25, 2010 at 8:26 AM, Henrik Bengtsson
 henrik.bengts...@gmail.com wrote:
 Hi.

 On Thu, Mar 25, 2010 at 5:30 AM, dbe...@u.washington.edu
 rpbe...@gmail.com wrote:
 Hello,

 I would like to get more info, perhaps example calls, on the various
 subclasses of ProbeLevelTransform.

 I see from a previous post by Mark Robinson, I have the examples:
   if(doNorm){
    bc - RmaBackgroundCorrection(cs)
    csBC - process(bc,verbose=verbose,force=force)
    setCdf(csBC, cdf)
    qn - QuantileNormalization(csBC, typesToUpdate=pm)
    csN - process(qn, verbose=verbose,force=force) #time required
    setCdf(csN, cdf)
   }

 What I'd like to be able to do is something akin to what I used to be
 able to do with the affy expresso call.  That is, specify different
 background methods, different normalization methods, such as invariant
 set, rma, constant, etc.

 It sounds like you wish to setup a high-level API providing wrappers
 for common preprocessing sequences/pipelines.  There has been some
 independent attempts by us doing this, but we haven't done a serious
 attempt

Re: [aroma.affymetrix] GCRMA normalization with MoEx-1_0-st-v1

2010-03-25 Thread Henrik Bengtsson
Hi,

GCRMA is not fully supported for all chip types, and I haven't checked
if MoEx-1_0-st-v1 is one.  But, first, lets fix some other mistakes
you're doing.

On Wed, Mar 24, 2010 at 4:48 PM, Gil Tomás gil@gmail.com wrote:
 Dear all,

 I am trying to normalize a dataset hybridized with MoEx-1_0-st-v1 with
 GCRMA on aroma.affymetrix. Here's the code I'm using to do so:

I assume you have put this together from what you have found online;
if there is a particular source/webpage/manual where you've found
this, please let me know so we can make sure it is corrected.


 **
 R prj.dir - /Users/giltomas/projects/brain-dev/raw-data/aroma-
 affymetrix # sets up the project directory
 R setwd (prj.dir)
 R library (aroma.affymetrix)
 R ## * gcrma normalization
 R cdf - AffymetrixCdfFile$byChipType (MoEx-1_0-st-
 v1.fullR1.A20080718.MR.bin) # taken from http://www.aroma-project.org/node/31
 and convert to binary with convertCdf

Here you are trying to load a CDF for a chip type named
'MoEx-1_0-st-v1.fullR1.A20080718.MR.bin'.  Affymetrix does not produce
such a *chip type*, though they do produce a chip type named
'MoEx-1_0-st-v1'.  There are three things you've probably
misunderstood above:

(1) The terms *chip type* and *CDF* are not the same, cf. page
'Differences between chip type and chip definition file (CDF)':

   http://aroma-project.org/definitions/chipTypesAndCDFs

(2) You have a CDF file named
'MoEx-1_0-st-v1.fullR1.A20080718.MR.bin.cdf'.  On page 'Chip type:
MoEx-1_0-st-v1':

   http://aroma-project.org/chipTypes/MoEx-1_0-st-v1 (same as your URL)

there is a MoEx-1_0-st-v1,fullR1,A20080718,MR.cdf file.  It looks like you have:

(2a) replaced the commas with periods - *do not do that*.  The aroma
framework use well structured comma-seperated filenames, cf. page
'Definition: Fullnames, names and tags of directories and files':

  http://aroma-project.org/node/77

(2b) Converted a CDF that is already in a binary format using
convertCdf().  FYI, all CDFs created/provided by us are already in a
binary format; it is only a few unofficial CDF provided by Affymetrix
that come in the ASCII/text format.

Thus, you want to have a directory structure as:

annotationData/
  chipTypes/
MoEx-1_0-st-v1/
  MoEx-1_0-st-v1,fullR1,A20080718,MR.cdf

See how the chip type directory has the same name as the *name* part
(before first comma) as the CDF file.  See (2a) above.

Same applies to your raw data, you want to have the data set directory as:

rawData/
  affy-brain-dev/
 MoEx-1_0-st-v1/
*.CEL files

Have a look at the pages at

  http://aroma-project.org/setup

which should clarify this further.

...then proceed with the rest.

/Henrik

PS. FYI, the approach you have done *may* have worked in other setups,
but it is incorrect and should not be done. You are likely to run into
problems sooner or later.


 R cs - AffymetrixCelSet$byName (affy-brain-dev, cdf = cdf) #
 produces cell set class object
 R bc - GcRmaBackgroundCorrection (cs)
 R csB - process (bc)
 Error in list(`process(bc)` = environment,
 `process.GcRmaBackgroundCorrection(bc)` = environment,  :

 [2010-03-24 16:36:00] Exception: Either argument 'names' or 'pattern'
 must be specified.
  at throw(Exception(...))
  at throw.default(Either argument 'names' or 'pattern' must be
 specified.)
  at throw(Either argument 'names' or 'pattern' must be specified.)
  at indexOf.UnitNamesFile(this, names = unitName)
  at indexOf(this, names = unitName)
  at getProbeSequenceData.AffymetrixCdfFile(this, safe = safe, verbose
 = verbose
  at getProbeSequenceData(this, safe = safe, verbose = verbose)
  at computeAffinities.AffymetrixCdfFile(cdf, paths = probePath, ...,
 verbose =
  at computeAffinities(cdf, paths = probePath, ..., verbose =
 less(verbose))
  at bgAdjustGcrma.AffymetrixCelSet(NA, path = probeData/affy-brain-
 dev,GRBC/Mo
  at bgAdjustGcrma(NA, path = probeData/affy-brain-dev,GRBC/MoEx-1_0-
 st-v1.full
  at do.call(bgAdjustGcrma, args = args)
  at process.GcRmaBackgroundCorrection(bc)
  at process(bc)
 In addition: Warning message:
 In readDataFrame.TabularTextFile(ptf, colClassPatterns = c(`^unitName
 $` = character),  :
  Argument 'rows' was out of range [1,0]. Ignored rows beyond this
 range.
 R traceback ()
 15: throw.Exception(Exception(...))
 14: throw(Exception(...))
 13: throw.default(Either argument 'names' or 'pattern' must be
 specified.)
 12: throw(Either argument 'names' or 'pattern' must be specified.)
 11: indexOf.UnitNamesFile(this, names = unitName)
 10: indexOf(this, names = unitName)
 9: getProbeSequenceData.AffymetrixCdfFile(this, safe = safe, verbose =
 verbose)
 8: getProbeSequenceData(this, safe = safe, verbose = verbose)
 7: computeAffinities.AffymetrixCdfFile(cdf, paths = probePath, ...,
       verbose = less(verbose))
 6: computeAffinities(cdf, paths = probePath, ..., verbose =
 less(verbose))
 5: bgAdjustGcrma.AffymetrixCelSet(NA, path = probeData/affy-brain-
 dev,GRBC/MoEx-1_0-st-v1.fullR1.A20080718.MR.bin,
   

Re: [aroma.affymetrix] Is it correct to do analyze different expression data based on the same platform on the same time?

2010-03-25 Thread Henrik Bengtsson
Hi.

On Sun, Mar 14, 2010 at 3:04 AM, Yong pkuonl...@gmail.com wrote:
 Hi Everyone,

 I kind of remember that it is difficult or not correct to analyze
 multiple datasets based on different array design like hgu133plus2 and
 HuEx-1_0-st-v2. So, I am wondering whether it is OK to pool different
 experiments together if they are based on the same array design, such
 as HuEx-1_0-st-v2. Specifically, if we follow the standard routine,
 i.e., RmaBackgroundCorrection + QuantileNormalization + ExonRmaPlm +
 getChipEffectSet , could we still filter those experiment specific
 factors and make these experiment comparable?

Are you aware of:

M.D. Robinson  T.P. Speed. A comparison of Affymetrix gene expression
arrays. BMC Bioinformatics, 2007, 8, 449.
Available via: http://aroma-project.org/publications

Maybe that help you answer your question/problem.

/Henrik


 Many thanks ahead.

 Yong Zhang
 University of Chicago

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en

To unsubscribe from this group, send email to 
aroma-affymetrix+unsubscribegooglegroups.com or reply to this email with the 
words REMOVE ME as the subject.


Re: [aroma.affymetrix] Alternatives to quantile normalization? (Was: Re: [aroma.affymetrix] ProbeLevelTransform subclasses: how to use)

2010-03-25 Thread Henrik Bengtsson
Hi.

On Thu, Mar 25, 2010 at 5:02 PM, Richard Beyer rpbe...@gmail.com wrote:
 Hi Henrik,

 Thanks very much for your help.  I saw your post about the new
 documentation link after I wrote my question.  So I will look through
 that.

 What you are saying is very helpful even though I wasn't thinking
 quite so large scale or ambitious.

Ok.


 I have an immediate problem with seeing very weird results on a
 dataset of 35 rat ST arrays.  When I run the
 RmaBackgroundCorrection(), QuantileNormalization() normalized data
 through limma and do MA plots of one group against another, the main
 cloud of data points is not the expected lens shape centered about
 the origin.  The shape is more like a s-wave with part of the cloud of
 points above and part below the zero line.  All the arrays pass Affy
 QC as done by Expression Console and they seem fine when I plot NUSE
 and RLE.  Judging from the shapes seen in the MA plots, my first
 reaction is that the assumption of most-probesets-are-unchanged is not
 being enforced by the quantile normalization step.  So, I wanted to
 just try a few reasonable alternatives to the quantile normalization
 step.  In addition, I think I've seen less pronounced versions of
 these s-wave shapes in MA plots from ST arrays in other data sets, but
 not nearly so pronounced as this one.

 The end result is I'm stuck and puzzled.

In theory quantile normalization should do a decent job of making the
log-ratios independent of the log-intensities, that is, the cloud in
an M vs. A scatter plot should be fairly straight (with the possible
exception at very weak signals or very large signals).  If you really
want to dive into the arguments, see:

H. Bengtsson  O. Hössjer, Methodological study of affine
transformations of gene expression data with proposed robust
non-parametric multi-dimensional normalization method. BMC
Bioinformatics, 2006. [http://www.aroma-project.org/publications]

Since you don't get this, I would first make sure that you are
plotting the same data points that you are normalizing.  Note that
quantile normalization can be done on PMs only, on all probes etc.
See page 'Empirical probe-signal densities and rank-based quantile
normalization' for how different settings give different normalization
outputs.

Hope this helps

/Henrik

PS. It is possible to attached PNGs to emails to this list; you may
want to share your figures.


 Thanks again,
 Dick

 On Thu, Mar 25, 2010 at 8:26 AM, Henrik Bengtsson
 henrik.bengts...@gmail.com wrote:
 Hi.

 On Thu, Mar 25, 2010 at 5:30 AM, dbe...@u.washington.edu
 rpbe...@gmail.com wrote:
 Hello,

 I would like to get more info, perhaps example calls, on the various
 subclasses of ProbeLevelTransform.

 I see from a previous post by Mark Robinson, I have the examples:
   if(doNorm){
    bc - RmaBackgroundCorrection(cs)
    csBC - process(bc,verbose=verbose,force=force)
    setCdf(csBC, cdf)
    qn - QuantileNormalization(csBC, typesToUpdate=pm)
    csN - process(qn, verbose=verbose,force=force) #time required
    setCdf(csN, cdf)
   }

 What I'd like to be able to do is something akin to what I used to be
 able to do with the affy expresso call.  That is, specify different
 background methods, different normalization methods, such as invariant
 set, rma, constant, etc.

 It sounds like you wish to setup a high-level API providing wrappers
 for common preprocessing sequences/pipelines.  There has been some
 independent attempts by us doing this, but we haven't done a serious
 attempt in standardizing this.

 An *important* objective is that whenever providing such wrappers, we
 should make sure that they replicate existing implementations as well
 as possible.  For instance, if you setup an expresso() method
 operating on aroma.affymetrix classes, you want to make sure it can
 replicate the results of expresso() in the affy package, otherwise you
 will just add lots of confusion out there.  For some methods, we have
 do assert near-perfect reproducibility, cf.
 http://aroma-project.org/replication

 If you want to provide an expresso() method for AffymetrixCelSet
 objects, I suggest that simply implement case by case using the
 scripts provided in the online vignettes and for the case we can
 guarantee to replicate affy exactly.  All other cases should throw an
 error.  For each case provided there should be at least one redundancy
 test so that we can assert that the reproducibility is guaranteed
 whenever we release a new version of aroma.affymetrix/other packages
 are updated.  After this we can discuss the missing cases and add
 support for them one by one.

 One way you can start is to override the expresso() method like this
 using S3 dispatching:

 # This will make sure expresso() of affy is called whenever an
 AffyBatch is used.
 setMethodS3(expresso, AffyBatch, function(...) {
  affy::expresso(...);
 });

 # This is your expresso() metod for AffymetrixCelSet objects.
 setMethodS3(expresso, AffymetrixCelSet, function(cs

Re: [aroma.affymetrix] Re: GCRMA normalization with MoEx-1_0-st-v1

2010-03-25 Thread Henrik Bengtsson
What's the verbose output, if you do:

csB - process(bc, verbose=-10)

/H

On Thu, Mar 25, 2010 at 4:34 PM, Gil Tomás gil@gmail.com wrote:
 Thank you much for your reply Henrik. Each of your comments were very
 enlightening. First of all, the documentation source I'm following is
 that from the official aroma.affymetrix site, particularly the section
 documenting reproducible research for the gcRMA code (http://aroma-
 project.org/replication/gcRMA).
 I then redefined the filesystem of the project according to your
 instructions (keeping the cdf file with the tag comma separated
 nomenclature) and reran the code:

 **
 R cdf - AffymetrixCdfFile$byChipType (MoEx-1_0-st-
 v1,fullR1,A20080718,MR) # taken from http://www.aroma-project.org/node/31
 R cs - AffymetrixCelSet$byName (affy-brain-dev, cdf = cdf) #
 produces cell set class object
 R print (cs)
 AffymetrixCelSet:
 Name: affy-brain-dev
 Tags:
 Path: rawData/affy-brain-dev/MoEx-1_0-st-v1
 Platform: Affymetrix
 Chip type: MoEx-1_0-st-v1,fullR1,A20080718,MR
 Number of arrays: 12
 Names: hyb7808_(MoEx-1_0-st-v1), hyb7809_(MoEx-1_0-st-v1), ...,
 hyb7819_(MoEx-1_0-st-v1)
 Time period: 2009-11-19 11:00:18 -- 2009-11-20 13:15:37
 Total file size: 752.61MB
 RAM: 0.01MB
 R bc - GcRmaBackgroundCorrection (cs)
 R csB - process (bc)
 Error in list(`process(bc)` = environment,
 `process.GcRmaBackgroundCorrection(bc)` = environment,  :

 [2010-03-25 16:17:22] Exception: Either argument 'names' or 'pattern'
 must be specified.
  at throw(Exception(...))
  at throw.default(Either argument 'names' or 'pattern' must be
 specified.)
  at throw(Either argument 'names' or 'pattern' must be specified.)
  at indexOf.UnitNamesFile(this, names = unitName)
  at indexOf(this, names = unitName)
  at getProbeSequenceData.AffymetrixCdfFile(this, safe = safe, verbose
 = verbose
  at getProbeSequenceData(this, safe = safe, verbose = verbose)
  at computeAffinities.AffymetrixCdfFile(cdf, paths = probePath, ...,
 verbose =
  at computeAffinities(cdf, paths = probePath, ..., verbose =
 less(verbose))
  at bgAdjustGcrma.AffymetrixCelSet(NA, path = probeData/affy-brain-
 dev,GRBC/Mo
  at bgAdjustGcrma(NA, path = probeData/affy-brain-dev,GRBC/MoEx-1_0-
 st-v1, ve
  at do.call(bgAdjustGcrma, args = args)
  at process.GcRmaBackgroundCorrection(bc)
  at process(bc)
 In addition: Warning message:
 In readDataFrame.TabularTextFile(ptf, colClassPatterns = c(`^unitName
 $` = character),  :
  Argument 'rows' was out of range [1,0]. Ignored rows beyond this
 range.
 R traceback ()
 15: throw.Exception(Exception(...))
 14: throw(Exception(...))
 13: throw.default(Either argument 'names' or 'pattern' must be
 specified.)
 12: throw(Either argument 'names' or 'pattern' must be specified.)
 11: indexOf.UnitNamesFile(this, names = unitName)
 10: indexOf(this, names = unitName)
 9: getProbeSequenceData.AffymetrixCdfFile(this, safe = safe, verbose =
 verbose)
 8: getProbeSequenceData(this, safe = safe, verbose = verbose)
 7: computeAffinities.AffymetrixCdfFile(cdf, paths = probePath, ...,
       verbose = less(verbose))
 6: computeAffinities(cdf, paths = probePath, ..., verbose =
 less(verbose))
 5: bgAdjustGcrma.AffymetrixCelSet(NA, path = probeData/affy-brain-
 dev,GRBC/MoEx-1_0-st-v1,
       verbose = FALSE, overwrite = FALSE, subsetToUpdate = NULL,
       typesToUpdate = pm, indicesNegativeControl = NULL, affinities
 = NULL,
       type = fullmodel, opticalAdjust = TRUE, gsbAdjust = TRUE,
       gsbParameters = NULL, .deprecated = FALSE)
 4: bgAdjustGcrma(NA, path = probeData/affy-brain-dev,GRBC/MoEx-1_0-st-
 v1,
       verbose = FALSE, overwrite = FALSE, subsetToUpdate = NULL,
       typesToUpdate = pm, indicesNegativeControl = NULL, affinities
 = NULL,
       type = fullmodel, opticalAdjust = TRUE, gsbAdjust = TRUE,
       gsbParameters = NULL, .deprecated = FALSE)
 3: do.call(bgAdjustGcrma, args = args)
 2: process.GcRmaBackgroundCorrection(bc)
 1: process(bc)
 ***

 Now that my modus operandi conforms to your prescribed norm, I still
 observe an error message that is very much the same as the one before.
 Could you give me a hint as to why it occurs? Is it that because the
 gcRMA implementation of aroma.affymetrix doesn't support the MoEx-1_0-
 st-v1 chip? How could I infer that?

 On Mar 25, 3:26 pm, Henrik Bengtsson henrik.bengts...@gmail.com
 wrote:
 Hi,

 GCRMA is not fully supported for all chip types, and I haven't checked
 if MoEx-1_0-st-v1 is one.  But, first, lets fix some other mistakes
 you're doing.

 On Wed, Mar 24, 2010 at 4:48 PM, Gil Tomás gil@gmail.com wrote:
  Dear all,

  I am trying to normalize a dataset hybridized with MoEx-1_0-st-v1 with
  GCRMA on aroma.affymetrix. Here's the code I'm using to do so:

 I assume you have put this together from what you have found online;
 if there is a particular source/webpage/manual where you've found
 this, please let me know so we can make sure it is corrected.



  **
  R prj.dir - /Users/giltomas/projects/brain-dev

Re: [aroma.affymetrix] getUniqueCdf inflates dimensions of original cdf

2010-03-15 Thread Henrik Bengtsson
Hi,

I leave this one to Mark Robinson who is designed createUniqueCdf()
for AffymetrixCdfFile and is on top of this. Though, in the meanwhile
could you please:

1. Clarify the origin of Mm_PromPR_v02.CDF, because Affymetrix does
not provide an CDF.

2. Make the Mm_PromPR_v02.CDF available to us?   If you're happy to
share it (and got the rights), I'm happy to have aroma-project.org to
either link to it or host it.

/Henrik


On Fri, Mar 12, 2010 at 8:04 PM, stvjc carey...@gmail.com wrote:
 cdfU
 AffymetrixCdfFile:
 Path: annotationData/chipTypes/Mm_PromPR_v02
 Filename: Mm_PromPR_v02,unique.CDF
 Filesize: 126.33MB
 Chip type: Mm_PromPR_v02,unique
 RAM: 0.00MB
 File format: v4 (binary; XDA)
 Dimension: 3026x3026
 Number of cells: 9156676
 Number of units: 25373
 Cells per unit: 360.88
 Number of QC units: 0
 cdf
 AffymetrixCdfFile:
 Path: annotationData/chipTypes/Mm_PromPR_v02
 Filename: Mm_PromPR_v02.cdf
 Filesize: 126.33MB
 Chip type: Mm_PromPR_v02
 RAM: 0.00MB
 File format: v4 (binary; XDA)
 Dimension: 2166x2166
 Number of cells: 4691556
 Number of units: 25373
 Cells per unit: 184.90
 Number of QC units: 0

 this leads to (i think)

 csU = convertToUnique(csN, verbose=verbose)
 20100312 14:02:59|Converting to unique CDF...
 20100312 14:02:59| Getting unique CDF...
 20100312 14:02:59| Getting unique CDF...done
 20100312 14:02:59| Input tags:MN,lm
 20100312 14:02:59| Input Path: probeData/Dawn,MN,lm/Mm_PromPR_v02
 20100312 14:02:59| Output Path:probeData/Dawn,MN,lm,UNQ/Mm_PromPR_v02
 20100312 14:02:59| allTags:MN,lm,UNQ
 20100312 14:02:59| Test whether dataset exists
 20100312 14:02:59| Reading cell indices from standard CDF...
 20100312 14:03:08| Reading cell indices from standard CDF...done
 20100312 14:03:08| Reading cell indices list from unique CDF...
 20100312 14:03:17| Reading cell indices list from unique CDF...done
 20100312 14:03:17| Converting CEL data from standard to unique CDF for
 sample 1 ( 10_BL6_IP_Mmp ) of 8...
 20100312 14:03:17|  Reading intensity values according to standard
 CDF...
 Error in readCel(filename, indices = indices, readHeader = FALSE,
 readOutliers = FALSE,  :
  Argument 'indices' contains an element out of range.
 20100312 14:03:23|  Reading intensity values according to standard
 CDF...done
 20100312 14:03:23| Converting CEL data from standard to unique CDF for
 sample 1 ( 10_BL6_IP_Mmp ) of 8...done
 20100312 14:03:23|Converting to unique CDF...done

 sessionInfo()
 R version 2.11.0 Under development (unstable) (2010-03-02 r51194)
 x86_64-apple-darwin9.8.0

 locale:
 [1] C

 attached base packages:
 [1] stats     graphics  grDevices datasets  tools     utils
 methods
 [8] base

 other attached packages:
  [1] gsmoothr_0.1.4         limma_3.3.4
 aroma.affymetrix_1.5.0
  [4] aroma.apd_0.1.7        affxparser_1.19.6
 R.huge_0.2.0
  [7] aroma.core_1.5.0       aroma.light_1.15.1
 matrixStats_0.1.9
 [10] R.rsp_0.3.6            R.cache_0.2.0
 R.filesets_0.8.0
 [13] R.utils_1.3.3          R.oo_1.6.7
 R.methodsS3_1.1.0
 [16] weaver_1.13.0          codetools_0.2-2
 digest_0.4.2

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] Re: can't load CDF file

2010-03-10 Thread Henrik Bengtsson
Hi.

On Wed, Mar 10, 2010 at 7:23 PM, dkny169 daniela...@yahoo.com wrote:
 Hi Henrik,
 The upload of the cdf file worked now perfectly. Thanks for pointing
 out the right version of the supplementary file.
 Unfortunately, the upload of the .CEL files still doesn't work? Any
 ideas?

 cs-AffymetrixCelSet$byName(tissues,cdf=cdf)
 Error in list(`AffymetrixCelSet$byName(tissues, cdf = cdf)` =
 environment,  :

 [2010-03-10 13:20:35] Exception: Could not locate a file for this chip
 type: MoGene-1_0-st-v1

This means that it could not locate any CEL files in the data set
directory.  In other words, make sure your CEL files are located in:

rawData/tissues/MoGene-1_0-st-v1/

This is explained on page 'Setup: Location of raw data files':

  http://www.aroma-project.org/node/68

Hope this helps

/Henrik

PS. It is called the loading of... or better the setup of..., not
upload of

  at throw(Exception(...))
  at throw.default(The specified CDF structure (', getChipType(cdf),
 ') is no
  at throw(The specified CDF structure (', getChipType(cdf), ') is
 not compat
  at setCdf.AffymetrixCelSet(set, cdf)
  at setCdf(set, cdf)
  at byPath.AffymetrixCelSet(static, path = path, cdf = cdf, ...)
  at byPath(static, path = path, cdf = cdf, ...)
  at withCallingHandlers(expr, warning = function(w)
 invokeRestart(muffleWarnin
  at suppressWarnings({
  at method(static, ...)
  at AffymetrixCelSet$byName(tissues, cdf = cdf)

 Many thanks,
 Daniela

 On Mar 9, 5:12 pm, Henrik Bengtsson henrik.bengts...@gmail.com
 wrote:
 Hi.

 On Tue, Mar 9, 2010 at 11:01 PM, dkny169 daniela...@yahoo.com wrote:
  Hi Henrik,
  I got my documentation from 
  here:http://bioinf.wehi.edu.au/folders/firmagene/sup3.R

 Thanks.  That is from the FIRMAGene supplementary materials:

    http://bioinf.wehi.edu.au/folders/firmagene/

 Mark provides a more up-to-date version:

 [04-Feb-2010] Made a modification to the sup3.R script, now available
 as sup3_04feb2010.R, to make sure we use the Ensembl annotation that
 corresponds to the hg18 (Mar 2006) build.

 Mark, would you mind making it more clear on the above URL that
 'sup3.R' is out dated?  Reversing the NEWS list so that the most
 recent events are at the top may help too.  Maybe also by renaming the
 outdated one sup3.R to sup3_06jun2009.R.

  If you could give me a link to a vignette or manual on how to use
  FIRMAGene, which is up to date and understandable, I would be more
  than thankful!

 More comments below...



  Many thanks,
  Daniela

  On Mar 9, 4:56 pm, Henrik Bengtsson henrik.bengts...@gmail.com
  wrote:
  Hi,

  please let us know what your source of documentation is, e.g.
  webpages, because you are using method names that are either outdated
  or non-public.  Then I'll answer your questions...

  /Henrik

  On Tue, Mar 9, 2010 at 10:44 PM, dkny169 daniela...@yahoo.com wrote:
   Hi Mark,
   Thanks for your answer. I think it works now; I had the working
   directory set at chipTypes and not at the parent directory of
   annotationData.
   I am getting following back:
   cdf-AffymetrixCdfFile$findByChipType(MoEx-1_0-st-v1)
   cdf
   [1] annotationData/chipTypes/MoEx-1_0-st-v1/MoEx-1_0-st-v1.cdf

 Don't use findByChipType(), which only returns a pathname, but
 byChipType(), which returns an AffymetrixCdfFile object, i.e.

 cdf - AffymetrixCdfFile$byChipType(MoEx-1_0-st-v1);

 [You did this in your first email, but then all of a sudden
 findByChipType(), which is why I wondered where you got that from.]



   I am trying to upload the CEL files now that are stored in rawData/
   tissues/MoEx-1_0-st-v1. The working directory is set at the parent
   directory of rawData. But again I am getting a failure message. What
   am I doing wrong now?
   cs-AffymetrixCelSet$fromName(tissues,cdf=cdf)
   Error in list(`AffymetrixCelSet$fromName(tissues, cdf = cdf)` =
   environment,  :

   [2010-03-09 16:28:12] Exception: AffymetrixCelSet$fromName() is
   defunct. Use AffymetrixCelSet$byName() instead.

 This error message is clear, ehe?

    at throw(Exception(...))
    at throw.default(msg)
    at throw(msg)
    at method(static, ...)
    at AffymetrixCelSet$fromName(tissues, cdf = cdf)
   cs-AffymetrixCelSet$byName(tissues,cdf=cdf)
   Error in list(`AffymetrixCelSet$byName(tissues, cdf = cdf)` =
   environment,  :

   [2010-03-09 16:38:21] Exception: Argument 'cdf' is neither of nor
   inherits class AffymetrixCdfFile: character

 This one is because you used findByChipType() above; use byChipType()
 and it will work.

 Hope this helps

 Henrik

    at throw(Exception(...))
    at throw.default(sprintf(Argument '%s' is neither of nor inherits
   class %s: %
    at throw(sprintf(Argument '%s' is neither of nor inherits class %s:
   %s, .nam
    at method(static, ...)
    at Arguments$getInstanceOf(cdf, AffymetrixCdfFile)
    at method(static, ...)
    at AffymetrixCelSet$byName(tissues, cdf = cdf)

   Thanks,
   Daniela

   On Mar 9, 3:23 pm, Mark Robinson mrobin...@wehi.edu.au wrote:
   Hi

Re: [aroma.affymetrix] Re: can't load CDF file

2010-03-09 Thread Henrik Bengtsson
Hi.

On Tue, Mar 9, 2010 at 11:01 PM, dkny169 daniela...@yahoo.com wrote:
 Hi Henrik,
 I got my documentation from here: 
 http://bioinf.wehi.edu.au/folders/firmagene/sup3.R

Thanks.  That is from the FIRMAGene supplementary materials:

   http://bioinf.wehi.edu.au/folders/firmagene/

Mark provides a more up-to-date version:

[04-Feb-2010] Made a modification to the sup3.R script, now available
as sup3_04feb2010.R, to make sure we use the Ensembl annotation that
corresponds to the hg18 (Mar 2006) build.

Mark, would you mind making it more clear on the above URL that
'sup3.R' is out dated?  Reversing the NEWS list so that the most
recent events are at the top may help too.  Maybe also by renaming the
outdated one sup3.R to sup3_06jun2009.R.

 If you could give me a link to a vignette or manual on how to use
 FIRMAGene, which is up to date and understandable, I would be more
 than thankful!

More comments below...

 Many thanks,
 Daniela

 On Mar 9, 4:56 pm, Henrik Bengtsson henrik.bengts...@gmail.com
 wrote:
 Hi,

 please let us know what your source of documentation is, e.g.
 webpages, because you are using method names that are either outdated
 or non-public.  Then I'll answer your questions...

 /Henrik

 On Tue, Mar 9, 2010 at 10:44 PM, dkny169 daniela...@yahoo.com wrote:
  Hi Mark,
  Thanks for your answer. I think it works now; I had the working
  directory set at chipTypes and not at the parent directory of
  annotationData.
  I am getting following back:
  cdf-AffymetrixCdfFile$findByChipType(MoEx-1_0-st-v1)
  cdf
  [1] annotationData/chipTypes/MoEx-1_0-st-v1/MoEx-1_0-st-v1.cdf

Don't use findByChipType(), which only returns a pathname, but
byChipType(), which returns an AffymetrixCdfFile object, i.e.

cdf - AffymetrixCdfFile$byChipType(MoEx-1_0-st-v1);

[You did this in your first email, but then all of a sudden
findByChipType(), which is why I wondered where you got that from.]


  I am trying to upload the CEL files now that are stored in rawData/
  tissues/MoEx-1_0-st-v1. The working directory is set at the parent
  directory of rawData. But again I am getting a failure message. What
  am I doing wrong now?
  cs-AffymetrixCelSet$fromName(tissues,cdf=cdf)
  Error in list(`AffymetrixCelSet$fromName(tissues, cdf = cdf)` =
  environment,  :

  [2010-03-09 16:28:12] Exception: AffymetrixCelSet$fromName() is
  defunct. Use AffymetrixCelSet$byName() instead.

This error message is clear, ehe?

   at throw(Exception(...))
   at throw.default(msg)
   at throw(msg)
   at method(static, ...)
   at AffymetrixCelSet$fromName(tissues, cdf = cdf)
  cs-AffymetrixCelSet$byName(tissues,cdf=cdf)
  Error in list(`AffymetrixCelSet$byName(tissues, cdf = cdf)` =
  environment,  :

  [2010-03-09 16:38:21] Exception: Argument 'cdf' is neither of nor
  inherits class AffymetrixCdfFile: character

This one is because you used findByChipType() above; use byChipType()
and it will work.

Hope this helps

Henrik

   at throw(Exception(...))
   at throw.default(sprintf(Argument '%s' is neither of nor inherits
  class %s: %
   at throw(sprintf(Argument '%s' is neither of nor inherits class %s:
  %s, .nam
   at method(static, ...)
   at Arguments$getInstanceOf(cdf, AffymetrixCdfFile)
   at method(static, ...)
   at AffymetrixCelSet$byName(tissues, cdf = cdf)

  Thanks,
  Daniela

  On Mar 9, 3:23 pm, Mark Robinson mrobin...@wehi.edu.au wrote:
  Hi Daniela.

  Is your CDF in the:

  annotationData/chipTypes/MoEx-1_0-st-v1/

  directory?

  (http://aroma-project.org/node/66)

  Cheers,
  Mark

   Hi,
   I stored my CDF file in annotationData/chipTypes; nevertheless I cannot
   upload the file.
   Can anyone please tel me what I am doing wrong:

   cdf-AffymetrixCdfFile$byChipType(MoEx-1_0-st-v1.cdf)

   ror in list(`AffymetrixCdfFile$byChipType(MoEx-1_0-st-v1.cdf)` =
   environment, :

   [2010-03-09 14:50:58] Exception: Could not locate a file for this chip
   type: MoEx-1_0-st-v1.cdf
   at throw(Exception(...))
   at throw.default(Could not locate a file for this chip type: ,
   paste(c(chipT
   at throw(Could not locate a file for this chip type: ,
   paste(c(chipType, tag
   at method(static, ...)
   at AffymetrixCdfFile$byChipType(MoEx-1_0-st-v1.cdf)

   Many thanks for your help!

   --
   When reporting problems on aroma.affymetrix, make sure 1) to run the
   latest version of the package, 2) to report the output of sessionInfo()
   and traceback(), and 3) to post a complete code example.

   You received this message because you are subscribed to the Google 
   Groups
   aroma.affymetrix group.
   To post to this group, send email to aroma-affymetrix@googlegroups.com
   To unsubscribe from this group, send email to
   aroma-affymetrix-unsubscr...@googlegroups.com
   For more options, visit this group at
  http://groups.google.com/group/aroma-affymetrix?hl=en

  __
  The information in this email is confidential and intended solely for the 
  addressee

Re: [aroma.affymetrix] aroma.affymetrix and R 64bit

2010-03-01 Thread Henrik Bengtsson
On Mon, Mar 1, 2010 at 6:38 PM, zaid z...@genomedx.com wrote:
 Is there support for aroma.affymetrix under R 64 bit?

Definitely on Linux.  Are you asking about Windows 64-bit?  Then, I
think so, cf:

  http://cran.r-project.org/web/checks/check_results_aroma.affymetrix.html

but I don't have machines to test it myself - someone out there with a
Win64 system that can test?  You should also be aware of the 'Update 2
on MinGW-w64 builds for 64-bit Windows' (March 1, 2010) post on
R-devel:

  http://tolstoy.newcastle.edu.au/R/e9/devel/10/03/0598.html

/Henrik


 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] Re: aroma.affymetrix and R 64bit

2010-03-01 Thread Henrik Bengtsson
So affxparser is built by the Bioconductor framework.  Please post to
the BioC mailing list and ask there - they should be able to tell what
the Win64 plans are.

/Henrik

On Mon, Mar 1, 2010 at 9:41 PM, zaid z...@genomedx.com wrote:
 I'm using March 1st 64bit version on Windows but for some reason
 affxparser seems to be unavailable.

 On Mar 1, 9:48 am, Henrik Bengtsson henrik.bengts...@gmail.com
 wrote:
 On Mon, Mar 1, 2010 at 6:38 PM, zaid z...@genomedx.com wrote:
  Is there support for aroma.affymetrix under R 64 bit?

 Definitely on Linux.  Are you asking about Windows 64-bit?  Then, I
 think so, cf:

  http://cran.r-project.org/web/checks/check_results_aroma.affymetrix.html

 but I don't have machines to test it myself - someone out there with a
 Win64 system that can test?  You should also be aware of the 'Update 2
 on MinGW-w64 builds for 64-bit Windows' (March 1, 2010) post on
 R-devel:

  http://tolstoy.newcastle.edu.au/R/e9/devel/10/03/0598.html

 /Henrik





  --
  When reporting problems on aroma.affymetrix, make sure 1) to run the 
  latest version of the package, 2) to report the output of sessionInfo() 
  and traceback(), and 3) to post a complete code example.

  You received this message because you are subscribed to the Google Groups 
  aroma.affymetrix group.
  To post to this group, send email to aroma-affymetrix@googlegroups.com
  To unsubscribe from this group, send email to 
  aroma-affymetrix-unsubscr...@googlegroups.com
  For more options, visit this group 
  athttp://groups.google.com/group/aroma-affymetrix?hl=en- Hide quoted text -

 - Show quoted text -

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] Segmentation

2010-03-01 Thread Henrik Bengtsson
Hi.

On Tue, Feb 23, 2010 at 3:36 AM, Alfredo Hidalgo
ahida...@inmegen.gob.mx wrote:
 Hi!

 We are interested in running a GISTIC analysis on the data we obtained after
 segmentation with GLAD with Aroma, but there seems to be a problem regarding
 the start and end postions of the segments, which apparently do not match
 the physical positions of the markers file we are using for GISTIC.

I don't know what marker file you are using, but the locations for the
markers (units) used in the segmentation methods in aroma.affymetrix
is given by the UGP (unit genome position) file you are using.  The
unit names are given by the CDF.  You can find the UGP from the CDF as
follows:

cesN - ...
glad - GladModel(cesN);
cdf - getCdf(cesN);
ugp - getAromaUgpFile(cdf);

df - data.frame(unitName=getUnitNames(cdf));
gp - readDataFrame(ugp);
df - cbind(df, gp);


 Does the segmentation reports the actual physical position of the first and
 last markers or does it reports other nearby position?

I returns whatever the glad() of the GLAD package returns.  See
help(glad, package=GLAD) for more details.

If you wish to troubleshoot more at a low level, you can extract the
low-level data like this:

cn - extractRawCopyNumbers(glad, array=1, chromosome=1);
data - as.data.frame(cn);
pos - data$x;
M - data$cn;

and then use that to call glad().  That might help you.


 Another question, I have a copy number file obtained from paired analyisis
 in Partek Genomics Suite, and want to do the segmentation using GLAD or CBS.
 How can I incorporate my CN file into the Aroma pipeline to do the
 segmentation?

This requires that you first allocate so called binary CN files and
import the CN data to them.  There is no pipeline to do this
automatically for Partek data, so you have to do it manually.  This
requires a bit of understanding of the annotation data files involved
etc.  It is explained in Vignette 'Creating binary data files
containing copy number estimates':

  http://www.aroma-project.org/node/88

If you get that far, you can the, using the example in that vignette,
load the complete data set as:

ds - AromaUnitTotalCnBinarySet$byName(MyDataSet,tagA,tagB,
chipType=HG-CGH-244A);
glad - GladModel(ds);

and continue from there.

It is on the todo list to document all this better, but don't expect
anything soon.

/Henrik


 Thanks al lot!!

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest
 version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/aroma-affymetrix?hl=en

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] Re: QC analysis and HuEx

2010-02-24 Thread Henrik Bengtsson
Ok, before we try to troubleshoot this one, please update to the
latest aroma.affymetrix version.  The one you are using is nearly
three months old, and I prefer to troubleshoot the current code base.

When you've done that, it should be enough to run plotRle(); you don't
have to rerun everything.

BTW, did you remember to call fit() on the probe-level model?

/Henrik

On Wed, Feb 24, 2010 at 8:15 PM, zaid z...@genomedx.com wrote:
traceback()
 8: plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs)
 7: bxp(bxpStats, ylim = ylim, outline = outline, las = las, ...)
 6: plotBoxplotStats.list(stats, main = main, ylab = ylab, ...)
 5: plotBoxplotStats(stats, main = main, ylab = ylab, ...)
 4: plotBoxplot.ChipEffectSet(ces, type = RLE, ...)
 3: plotBoxplot(ces, type = RLE, ...)
 2: plotRle.QualityAssessmentModel(qamTr)
 1: plotRle(qamTr)

qamTr
 QualityAssessmentModel:
 Name: tissues
 Tags: RBC,QN,RMA,merged,QC
 Path: qcData/tissues,RBC,QN,RMA,merged,QC/HuEx-1_0-st-v2
 Chip-effect set:
    ExonChipEffectSet:
    Name: tissues
    Tags: RBC,QN,RMA,merged
    Path: plmData/tissues,RBC,QN,RMA,merged/HuEx-1_0-st-v2
    Platform: Affymetrix
    Chip type: HuEx-1_0-st-v2,monocell
    Number of arrays: 2
    Names: S370-A-HuEx-1_0-st-v2-01-1 (S09-13138), S371-A-HuEx-1_0-st-
 v2-01-1 (S09-07848)
    Time period: 2010-02-24 10:28:31 -- 2010-02-24 10:28:31
    Total file size: 5.43MB
    RAM: 0.01MB
    Parameters: (probeModel: chr pm, mergeGroups: logi TRUE)
 RAM: 0.00MB

 sessionInfo()
 R version 2.10.0 (2009-10-26)
 i386-pc-mingw32

 locale:
 [1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252
 LC_MONETARY=English_Canada.1252
 [4] LC_NUMERIC=C                    LC_TIME=English_Canada.1252

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods
 base

 other attached packages:
  [1] Biobase_2.6.0          aroma.affymetrix_1.3.0
 aroma.apd_0.1.7        affxparser_1.18.0
  [5] R.huge_0.2.0           aroma.core_1.3.1
 aroma.light_1.15.1     matrixStats_0.1.8
  [9] R.rsp_0.3.6            R.filesets_0.6.5
 digest_0.4.1           R.cache_0.2.0
 [13] R.utils_1.2.4          R.oo_1.6.5
 affy_1.24.2            R.methodsS3_1.0.3

 loaded via a namespace (and not attached):
 [1] affyio_1.14.0        preprocessCore_1.8.0


 How can i get more details on the error.
 I tried to use less CEL files and as few as 3, still no luck.

 Thanks in advance

 On Feb 24, 10:46 am, Henrik Bengtsson henrik.bengts...@gmail.com
 wrote:
 Hi,

 there are probably more output from the error, or ?  If so, could you
 please provide us with that one?  Also, whenever you get an error, is
 it is always helpful to report output of traceback() [see email
 footer].

 What's your sessionInfo()?

 /Henrik



 On Wed, Feb 24, 2010 at 7:29 PM, zaid z...@genomedx.com wrote:
  Error:
  Error in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars
  $yaxs) :
   NAs not allowed in 'ylim'

  On Feb 24, 10:19 am, zaid z...@genomedx.com wrote:
  I was doing QC analysis in aroma in R on HuEx chip but got an error
  while trying to plot NUSE.
  ylim contains NA.

  I'm running R 2.10(32bit) on a windows 7(64bit).

  my command:
  library(aroma.affymetrix)
   verbose - Arguments$getVerbose(-8, timestamp=TRUE)
   chipType - HuEx-1_0-st-v2
   cdf - AffymetrixCdfFile$byChipType(chipType)
   print(cdf)
   cs - AffymetrixCelSet$byName(tissues, cdf=cdf)
  bc - RmaBackgroundCorrection(cs)
  csBC - process(bc,verbose=verbose)
  qn - QuantileNormalization(csBC, typesToUpdate=pm)
  csN - process(qn, verbose=verbose)
  plmTr - ExonRmaPlm(csN, mergeGroups=TRUE)
  fit(plmTr, verbose=verbose)
  qamTr - QualityAssessmentModel(plmTr)
   plotNuse(qamTr)
   plotRle(qamTr)
  End of command

  I was able to run the previous on U95A data and Plus 2 data. Also, in
  the past I was able to run that on HuEx data.
  The cdf file I'm using is binary and used multiples ones (HuEx-1_0-
  st-
  v2,core,A20071112,EP.cdf, HuEx-1_0-st-
  v2,control,A20071112,EP.cdf, HuEx-1_0-st-
  v2,extended,A20071112,EP.cdf etc offered on Elizabeth's 
  Columnhttp://groups.google.com/group/aroma-affymetrix/web/affymetrix-define...
  )

  Could you please point me how to fix this problem?

  Thanks in advance

  --
  When reporting problems on aroma.affymetrix, make sure 1) to run the 
  latest version of the package, 2) to report the output of sessionInfo() 
  and traceback(), and 3) to post a complete code example.

  You received this message because you are subscribed to the Google Groups 
  aroma.affymetrix group.
  To post to this group, send email to aroma-affymetrix@googlegroups.com
  To unsubscribe from this group, send email to 
  aroma-affymetrix-unsubscr...@googlegroups.com
  For more options, visit this group 
  athttp://groups.google.com/group/aroma-affymetrix?hl=en- Hide quoted text -

 - Show quoted text -

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output

Re: [aroma.affymetrix] error in doing GenomeGraphs

2010-02-18 Thread Henrik Bengtsson
On Tue, Feb 16, 2010 at 6:50 PM, camelbbs camel...@gmail.com wrote:
 Hi,
 Can anyone help me for this error?

 u-indexOf(cdf,6811818)
 u
 integer(0)

This tells you that there is no unit with name 6811818 in the
MoEx-1_0-st-v1,coreR1,A20080718,MR CDF file.  You are simply asking
for information on a non-existing unit. You can get the unit names
available in a CDF by:

unitNames - getUnitNames(cdf);

 ugcM - getUnitGroupCellMap(getCdf(ds), units=u, retNames=TRUE)
 Error in if (any(units  1)) stop(Argument 'units' contains non-
 positive indices.) :
  missing value where TRUE/FALSE needed

This is an error, because you request to get the (unit,group,cell) map
of zero (an empty set of) units.   The error message is not clear on
this, because it is really an unexpected use case.

Hope this helps

Henrik

 cdf
 AffymetrixCdfFile:
 Path: annotationData/chipTypes/MoEx-1_0-st-v1
 Filename: MoEx-1_0-st-v1,coreR1,A20080718,MR.cdf
 Filesize: 30.53MB
 Chip type: MoEx-1_0-st-v1,coreR1,A20080718,MR
 RAM: 0.62MB
 File format: v4 (binary; XDA)
 Dimension: 2560x2560
 Number of cells: 6553600
 Number of units: 17831
 Cells per unit: 367.54
 Number of QC units: 1


 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


[aroma.affymetrix] GenomeWideSNP_6: Updated UFL and UGP annotation data files

2010-02-15 Thread Henrik Bengtsson
For your information:

The unit fragment length (UFL) and the unit genome position (UGP)
annotation files for the Affymetrix GenomeWideSNP_6 chip type has been
updated, and available at:

  http://aroma-project.org/chipTypes/GenomeWideSNP_6

The source was the two Affymetrix NetAffx CSV files
GenomeWideSNP_6.cn.na30.annot.csv (777MB) and
GenomeWideSNP_6.na30.annot.csv (1.32GB).

The updates are only minor from previous versions.  More details below.

/Henrik

HISTORY:

# UGP:
# o na27.1 - na30
#   No differences
# o na27 - na27.1
#   No differences
# o na26 - na27
#   Two units (932039, 1872834) where moved from ChrX to ChrY.
#   Same location.
# o na24 - na26
#   Only minor modifications for non-missing values:
#   - three loci changed chromosomes
#   - an additional 23 loci changed positions, of which only 17 moved
# more than 2 base pairs.

# UFL:
# o na27.1 - na30
#   - No changes.
# o na27 - na27.1
#   - No changes.
# o na26 - na27
#   - No changes.
# o na24 - na26
#   - All changes are for SNP units.
#   - There are 6 NspI and 14 StyI changes in SNP fragment lengths,
# which some are only minor.
#   - There are 1108 more SNPs that now have missing values.

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] Access to source files

2010-02-10 Thread Henrik Bengtsson
Hi.

On Tue, Feb 9, 2010 at 7:40 PM, Randy Gobbel randy.gob...@gmail.com wrote:
 I've just noticed in the past few days that with the new Web site,
 it's not at all obvious how to download source files for the various
 packages that go into aroma.affymetrix.  I've managed it by pawing
 through old messages looking for links, but it would be nice to have a
 direct link in an obvious place.  Sometimes there's no substitute for
 reading the sources, or occasionally building a custom version, if
 you're running a bleeding-edge version of R.

True.

I've added a how to page 'Access the source code':

  http://www.aroma-project.org/howtos/AccessSourceCode

You were probably thinking of the old approach that was based on
hbGet() - it is no longer supported/maintained, because basically
everything is now on CRAN.

Hope this helps

Henrik

PS. It might be confusing that there is still old documentation over
at the Google Group.  The plan is first to make sure everything is
moved, then we will try to replace the Google Group front page with a
link to http://www.aroma-project.org/, and eventually delete all old
Google Group pages.  What prevents us from doing this is that we are
still not sure if the Google Group will be blocked yet again if we
touch it.  When it is blocked, the mailing list is also blocked and we
will be in the mercy of Google to unlock it.

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] SNP Call rates

2010-02-09 Thread Henrik Bengtsson
Hi Rama.

On Tue, Feb 9, 2010 at 10:15 AM, Rama Gullapalli
dr.ramachan...@gmail.com wrote:
 Hi All,
 First time poster, long-time lurker. Really appreciate all the wonderful
 stuff you guys are doing (Henrik et al). Would love to be able to help in
 anyway deemed necessary (Including documentation work...). I am going
 through the process of understanding the capabilities of the software right
 now.

Thanks, we appreciate your feedback.

We are grateful to any help on documentation, proof reading, script
validation etc.


 I had a question.

 CNAG 3.0 and GTC3.0.1 software have something called a SNP call rate
 estimator. I was wondering if there was a similar function in Aroma? What
 would be the function which would be able to perform a similar analysis?

Sorry, there is no SNP call rate feature available in aroma.affymetrix.

Depends how you define it, it may be more or less easy to calculate
by hand given probe signals etc.  I think some tools provide rough
estimates from the raw probe signals, some from after a long-running
genotyping, and so on.

Cheers,

Henrik


 Thanks for your time.
 Regards
 Rama

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest
 version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/aroma-affymetrix?hl=en

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: Runtime error with (Was: Re: [aroma.affymetrix] Re: Question for custom CDF of ST-Array)

2010-02-08 Thread Henrik Bengtsson
On Thu, Jan 28, 2010 at 4:58 PM, branko b.miso...@lumc.nl wrote:
 Hi all,

[snip]

 4) Last  one , regarding QC  issue with plotting … SO when doing Array
 (pseudo) image plots  my RGui in windows complains e.g.:

  If I do:   cf - getFile (cs, 1)
               plotImage(cf, transform=list(log2), palette=rainbow
 (256))

               #Loading required package: EBImage
               #Loading required package: abind
                ….

    I get “Runtime error!”  message from Visual C++ and I have to
 click 2 times “ok” and then I get the picture…
    Here is the link to the error msg::
      http://www.4shared.com/file/209798878/f5a3f82e/Aromaplotimageerror.html

    SO you can imagine I’m not  enthusiastic of clicking twice for 300
 arrays and then again for several type of plots.
    Any idea where is the issue ?  (I guess something with EBImage
 dependency make issue )

If you google EBImage together with the error message
[http://goo.gl/J9KQ] you'll get to the EBImage installation PDF which
in Section 3. Windows explains what the reason is and how to solve
it.

/Henrik

    Below  is my session info .

   Hope you can help .

  Best  regards,

  Branko

 --
 Branislav Misovic,
 Department of Toxicogenetics
 Leiden University Medical Center
 PO.box 9600, Building2,Room:T3-11
 2300 RC Leiden
 The Netherlands
 Phone: +31 71 526 9636
 Mob: 0653135855
 E-mail: b.miso...@lumc.nl

   sessionInfo()
 R version 2.10.0 (2009-10-26)
 i386-pc-mingw32

 locale:
 [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
 Kingdom.1252
 [3] LC_MONETARY=English_United Kingdom.1252
 LC_NUMERIC=C
 [5] LC_TIME=English_United Kingdom.1252

 attached base packages:
 [1] stats     graphics  grDevices datasets  utils     methods
 base

 other attached packages:
  [1] abind_1.1-0            aroma.affymetrix_1.3.4
 aroma.apd_0.1.7
  [4] affxparser_1.18.0      R.huge_0.2.0
 aroma.core_1.3.4
  [7] aroma.light_1.15.1     matrixStats_0.1.8
 R.rsp_0.3.6
 [10] R.filesets_0.6.5       digest_0.4.1
 R.cache_0.2.0
 [13] R.utils_1.2.4          R.oo_1.6.6
 EBImage_3.2.0
 [16] R.methodsS3_1.0.3

 loaded via a namespace (and not attached):
 [1] tools_2.10.0





 --
 Branislav Misovic,
 Department of Toxicogenetics
 Leiden University Medical Center
 PO.box 9600, Building2,Room:T3-11
 2300 RC Leiden
 The Netherlands
 Phone: +31 71 526 9636
 Mob: 0653135855
 E-mail: b.miso...@lumc.nl

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] extracting data from plmEx

2010-02-08 Thread Henrik Bengtsson
Hi.

On Tue, Jan 26, 2010 at 7:00 PM, parantu shah parantu.s...@gmail.com wrote:
 Hi

 I want to extract as ExpressionSet - the set of normalized array
 (200+) aftter fitting the

 plmEx - ExonRmaPlm(csN, mergeGroups=FALSE)
 fit(plmEx, verbose=verbose)

 [ not the chip effect or anything else]

I don't understand this comment.  You say you want to extract the
normalized probe signals, not the probe-level summarized signals?   If
so, I believe you have missunderstood what fit() on a ExonRmaPlm does,
because the ExonRmaPlm class is for summarizing probe-level data into
chip effects.  Thus, if you want to work with the probe signals you do
not need to do this step, but instead work with the 'csN' data set.

I don't know what 'csN' is here, but one guess is that it is the CEL
data set output from a QuantileNormalization step.  If so, this is a
set of CEL files just as your original CEL files.  Then you can use
standard Bioconductor methods to read the CEL data, e.g.

pathnames - getPathnames(csN);
ab - ReadAffy(filenames=pathnames);
print(ab);

Note that this gives you an  AffyBatch object, but not an
ExpressionSet, though both extend the eSet class.  More importantly,
ReadAffy() really reads all probe signals into memory, so you have to
deal with all the regular memory issues that you have with
Bioconductor - you are no longer riding on the aroma.affymetrix
framework.


 to use normalized data in another Bioconductor module.

 ExtractDataFrame and ExtractMatrix doesn't work.

Exactly what did you try, and in what way did they not work?

/Henrik


 Any ideas will be well come.

 Thanks
 Parantu



 --
 Parantu Shah, PhD
 Dept. of Biostatistics  Computational Biology
 Dana-Farber Cancer Institute
 Harvard School of Public Health

 CLS-11075,  3 Blackfan Circle
 Boston MA 02115
 Phone : 617 582 8852
 http://www.hsph.harvard.edu/~pshah

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] ArrayExplorer problem

2010-02-07 Thread Henrik Bengtsson
Hi,

I managed to reproduce this when using the same names as you.  It
turns out to be a bug causing indexOf(ds, foo+bar) of a data set
'ds' to return NA when the requested name contains a '+' symbol.  The
reason was that the '+' was parsed as a regular expression symbol.

I've fixed R.filesets, where the bug is located.  Until I release a
new version of R.filesets and submit it to CRAN, you can use the
following patch to R.filesets:

# INSTALL
library(aroma.affymetrix);
downloadPackagePatch(R.filesets);

When you've done this once, the patch will be available in all future
R sessions.  It will uninstall itself when you later install the new
R.filesets version.

Let me know if this works.

/Henrik

On Tue, Jan 26, 2010 at 1:59 AM, Randy Gobbel randy.gob...@gmail.com wrote:
 I'm running into the following error when trying to use
 ArrayExplorer.  I'm running on an 8 CPU (Xeon) Mac Pro, OS 10.5.8,
 EBImage and supporting software just installed from scratch:

 rs - calculateResidualSet(plm)
 ae - ArrayExplorer(rs)
 setColorMaps(ae, c('log2,log2neg,rainbow','log2,log2pos,rainbow'))
 process(ae, interleaved='auto')
 Error in readCelHeader(pathname) :
  Cannot read CEL file header. File not found: NA/NA
 In addition: Warning messages:
 1: In min(x) : no non-missing arguments to min; returning Inf
 2: In max(x) : no non-missing arguments to max; returning -Inf
 traceback()
 34: stop(Cannot read CEL file header. File not found: , filename)
 33: readCelHeader(pathname)
 32: getHeader.AffymetrixCelFile(this)
 31: getHeader(this)
 30: getCdf.AffymetrixCelFile(this$files[[1]], ...)
 29: getCdf(this$files[[1]], ...)
 28: getCdf.AffymetrixCelSet(this)
 27: getCdf(this)
 26: clearCache.AffymetrixCelSet(res)
 25: NextMethod(generic = clearCache, object = this, ...)
 24: clearCache.ResidualSet(res)
 23: clearCache(res)
 22: extract.GenericDataFileSet(X[[1L]], ...)
 21: FUN(X[[1L]], ...)
 20: lapply.default(dsList, FUN = extract, files, ...)
 19: lapply(dsList, FUN = extract, files, ...)
 18: extract.GenericDataFileSetList(this, ..., onDuplicated = error)
 17: extract(this, ..., onDuplicated = error)
 16: getFileList.GenericDataFileSetList(this, ii, ...)
 15: getFileList(this, ii, ...)
 14: getFullNames.AromaMicroarrayDataSetTuple(setTuple)
 13: getFullNames(setTuple)
 12: eval(expr, envir, enclos)
 11: eval(rExpr, envir = envir)
 10: sourceRsp.default(file = pathname, response = response, ...)
 9: sourceRsp(file = pathname, response = response, ...)
 8: rspToHtml.default(pathname, path = NULL, outFile = outFile, outPath
 = outPath,
       overwrite = TRUE, envir = env)
 7: rspToHtml(pathname, path = NULL, outFile = outFile, outPath =
 outPath,
       overwrite = TRUE, envir = env)
 6: updateOnChipTypeJS.ArrayExplorer(this, ...)
 5: updateOnChipTypeJS(this, ...)
 4: setup.ArrayExplorer(this, ..., verbose = less(verbose))
 3: setup(this, ..., verbose = less(verbose))
 2: process.ArrayExplorer(ae, interleaved = auto)
 1: process(ae, interleaved = auto)
 rs
 ResidualSet:
 Name: all
 Tags: RBC,QN,RMA
 Path: plmData/all,RBC,QN,RMA/Hs133P_Hs_REFSEQ
 Platform: Affymetrix
 Chip type: Hs133P_Hs_REFSEQ
 Number of arrays: 9
 Names: EA08034_98020_H133+_MCCW199, EA08034_98021_H133+_SKINW199, ...,
 EA08034_98031_H133+_PN-1NN2
 Time period: 2010-01-20 15:59:10 -- 2010-01-20 15:59:28
 Total file size: 116.30MB
 RAM: 0.01MB
 Parameters: (probeModel: chr pm)
 cdf
 AffymetrixCdfFile:
 Path: annotationData/chipTypes/Hs133P_Hs_REFSEQ
 Filename: Hs133P_Hs_REFSEQ.cdf
 Filesize: 15.21MB
 Chip type: Hs133P_Hs_REFSEQ
 RAM: 0.00MB
 File format: v4 (binary; XDA)
 Dimension: 1164x1164
 Number of cells: 1354896
 Number of units: 25102
 Cells per unit: 53.98
 Number of QC units: 9

 Any suggestions? Everything else seems to be working fine at this
 point.

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: How to find help on aroma.* packages? (Was: Re: [aroma.affymetrix] Re: Question for custom CDF of ST-Array)

2010-02-07 Thread Henrik Bengtsson
Hi,

On Thu, Jan 28, 2010 at 4:58 PM, branko b.miso...@lumc.nl wrote:
 Hi all,

[snip]

 3) Few  questions regarding Quality checks and basic data
 manipulations in Aroma:

[snip]

    I ask this silly questions because Using R commands like str()
 doesn’t show me the
    object fields etc. so I can’t use standard R matrix commands,

str() does not work on the aroma.* objects, but if you do ll() [two
L:s] you will see some of the contents of these objects.  However, the
idea is that you use the methods API to access the objects, not the
fields (slots).

 also  help (“some Aroma command” ) doesn’t show enough information.
    Sometimes it gives empty help page.

The R help pages for the aroma.* packages are sparse.  The reason for
this is simply that it is a daunting task to setup them up and keep
them up to date.  Not enough of time/resources/people for this. The
ones you do find, they are up to date.  The others I point to a
generic help page saying it is not documented.

Instead, I ask everyone to use the online documentation at:

   http://www.aroma-project.org/

That is *the* place to find documentation about aroma.* packages.  You
can trust what you find there.  There are of course more features in
the aroma.* packages.   Some advanced users dive into the code, to
find out these, and even add their own extension.  However, our
strategy is that new features will be documented online first when we
consider them to be stable.  This is the only way we can keep up with
the maintenance.   FYI, the aroma.* and R.* packages now consists of
6+ lines of code.

    I could not find pdf manual in Aroma installed libraries nor in
 the Google group. I see only html file showing me all the functions 
 classes.
     Is there easier way to look for functions than main html pages ?

If you mean a PDF vignette when you say pdf, that does not exists.
We don't use (Sweave) vignettes.  Instead we prefer to document things
online, for the above reasons.

    Code of functions are not visible by just typing  func.name() , I
 guess I can always get source code and search but there is likely easy
 way to do it.

This is not specific to aroma.* packages.  When you type

print(methodName)

you will see the S3 generic function.  In order to see the function
for your particular object, you need to use:

methods(methodName)

Please see R help on the S3 class system, for why/how this works.


     It is visible that  Aroma uses different classes than
 BioConductor.  I assume there is a good reason for that, but maybe you
 can give some link with explanation?

That is too daunting task to document, but the short answer is that
the Bioconductor classes do not support large data sets.  This is
why we developed aroma.affymetrix in the first place.  Advanced
develoeprs can also look at the R.filesets to see the core of how we
deal with large data sets.

[snip]

/Henrik

   Hope you can help .

  Best  regards,

  Branko

 --
 Branislav Misovic,
 Department of Toxicogenetics
 Leiden University Medical Center
 PO.box 9600, Building2,Room:T3-11
 2300 RC Leiden
 The Netherlands
 Phone: +31 71 526 9636
 Mob: 0653135855
 E-mail: b.miso...@lumc.nl

   sessionInfo()
 R version 2.10.0 (2009-10-26)
 i386-pc-mingw32

 locale:
 [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
 Kingdom.1252
 [3] LC_MONETARY=English_United Kingdom.1252
 LC_NUMERIC=C
 [5] LC_TIME=English_United Kingdom.1252

 attached base packages:
 [1] stats     graphics  grDevices datasets  utils     methods
 base

 other attached packages:
  [1] abind_1.1-0            aroma.affymetrix_1.3.4
 aroma.apd_0.1.7
  [4] affxparser_1.18.0      R.huge_0.2.0
 aroma.core_1.3.4
  [7] aroma.light_1.15.1     matrixStats_0.1.8
 R.rsp_0.3.6
 [10] R.filesets_0.6.5       digest_0.4.1
 R.cache_0.2.0
 [13] R.utils_1.2.4          R.oo_1.6.6
 EBImage_3.2.0
 [16] R.methodsS3_1.0.3

 loaded via a namespace (and not attached):
 [1] tools_2.10.0





 --
 Branislav Misovic,
 Department of Toxicogenetics
 Leiden University Medical Center
 PO.box 9600, Building2,Room:T3-11
 2300 RC Leiden
 The Netherlands
 Phone: +31 71 526 9636
 Mob: 0653135855
 E-mail: b.miso...@lumc.nl

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.



Re: [aroma.affymetrix] Re: Directory structure: FIRMAGene

2010-01-27 Thread Henrik Bengtsson
2010/1/27 Mikhail mikhail.dozmo...@gmail.com:
 Henrik, thank you for such a thorough answer. Now I understand how to
 create two datasets, and it did work. I'm trying to use these datasets
 for FIRMAGene analysis, as described in 
 http://bioinf.wehi.edu.au/folders/firmagene/sup3.R
 file. Nowhere in this file I can see how and where the two datasets
 are defined and compared. It starts with ONE dataset reading in cs,

 # this assumes the CEL files are at ./rawData/tissues/HuGene-1_0-st-
 v1/
 cs-AffymetrixCelSet$fromName(tissues,cdf=cdf,verbose=verbose)

 and continues through FIRMAGene.

 Yes, I can create two datasets, as you described. Shall they be joined
 together to process for normalization and then to FIRMAGene? I tried

 csM - AffymetrixCelSet$byName(MS, tags=M, cdf=cdf)
 csS - AffymetrixCelSet$byName(MS, tags=S, cdf=cdf)

 cs-cbind(csM,csS)

Now I'm starting to understanding your question better.  You want to
keep the data sets in different directories (as solved), but join them
together into one for the analysis.  In order to do this, you can
append one set to another.  The safest/best way to do this would be to
do:

cs - append(csM, csS);
setFullName(cs, MS,M+S);

This will setup a set of your 3+3 CEL files with fullname MS,M+S.
This name will be used in all downstream analysis/output data sets.
If you don't use setFullName(), the fullname will be that of the first
data set ('csM').

/Henrik


 Is it correct?  I doubt, because the following code in the example
 file mentioned above doesn't work.  I wonder if FIRMAGene can
 recognize the tags from two datasets, for proper comparison. There are
 several example files at http://bioinf.wehi.edu.au/folders/firmagene/,
 none of them, however, explains where datasets for comparison are
 defined. Therefore I wonder which directory structure shall I create
 and how to properly read the data for FIRMAGene processing.

 Originally, I have 3 .cel files for one condition (M), and 3 .cel
 files for another (S). Thank you! Mikhail.

 On Jan 26, 6:58 pm, Henrik Bengtsson henrik.bengts...@gmail.com
 wrote:
 Hi

 On Tue, Jan 26, 2010 at 3:32 PM, Mikhail mikhail.dozmo...@gmail.com wrote:
  Hi,
  Onhttp://www.aroma-project.org/node/79there's nice summary of what
  the directory structure should look like. To my understanding the
  directory structure also reflects the project structure.

 The annotationData/chipTypes/ directory structure shouldn't.  Image
 that this as a global structure shared with everyone.

  For example,
  I want to identify differentially expressed exons from Affymetrix
  Human Gene 1.0 ST, using 
  FIRMAgenehttp://bioinf.wehi.edu.au/folders/firmagene/.
  I have two groups, and need to compare them. So I set the structure
  like this:

  For annotation:
  annotationData\chipTypes\HuGene-1_0-st-v1
  annotationData\chipTypes\HuGene-1_0-st-v1\HuGene-1_0-st-v1_M
  annotationData\chipTypes\HuGene-1_0-st-v1\HuGene-1_0-st-v1_S

 Not sure what you mean by two groups in this context and what 'M'
 and 'S' refers to.  Are those two latter subdirectories?

 Note that the definition of a 'chip type' differ from the definition
 of an annotation data file (e.g. CDF file).   The chip type never
 changes after the array is designed and produced by Affymetrix.  The
 annotation data files will change as the human genome annotation and
 other things gets updated.

 Thus, if you buy HuGene-1_0-st-v1 arrays from Affymetrix, you want any
 annotation data files to be stored under
 annotationData/chipTypes/HuGene-1_0-st-v1/.   Similarly, for you raw
 data files.

 Let's see if my below comments clarifies it to you.



  For rawData:
  rawData\MS
  rawData\MS\HuGene-1_0-st-v1_M
  rawData\MS\HuGene-1_0-st-v1_S

 I believe you want to do:

 rawData/MS,M/HuGene-1_0-st-v1/
 rawData/MS,S/HuGene-1_0-st-v1/

 This way you will have two data sets for the same chip type with
 fullnames MS,M and MS,S.  By definition of names, tags 
 fullnames, both data sets have the name MS differing by the tags M
 and S.

 You can also use fullnames MS_M and MS_S, which then gives data
 sets with different names (same) and no tags.



  The following code runs OK
  chipType - HuGene-1_0-st-v1
  cdf - AffymetrixCdfFile$byChipType(chipType, tags=r3)

 So, I'm not sure where you placed the CDF, but yes, the CDF will be
 found if it is located in (or in a subdirectory of)
 annotationData/chipTypes/HuGene-1_0-st-v1/.



  But this
  cs - AffymetrixCelSet$byName(MS, cdf=cdf)
  gives an error:No such directory: MS/HuGene-1_0-st-v1

  I don't want to put all my files in the same HuGene-1_0-st-v1 folder,
  they are from different groups and suppose to be compared.

 Using the rawData/ structure I suggest above, you can do:

 csM - AffymetrixCelSet$byName(MS, tags=M, cdf=cdf)
 csS - AffymetrixCelSet$byName(MS, tags=S, cdf=cdf)

 or equivalently

 csM - AffymetrixCelSet$byName(MS,M, cdf=cdf)
 csS - AffymetrixCelSet$byName(MS,S, cdf=cdf)

 If you then process:

 cs - csM

 all output data sets will have

<    1   2   3   4   5   6   7   >