[aroma.affymetrix] Questions on extracting probeset summaries

2015-01-22 Thread Qingzhou Zhang
Hi Henrik, 

I was processing HG-U133_Plus_2 datasets. While extracting probeset 
summaries(chip effects) as a data frame, I only got 27604 objs * n 
variables. 
I was hoping to get a data frame of 54675 objs., which equals the number of 
units in HG-U133_Plus_2 chip. Am I missing some steps, or processing the 
wrong CEL files?

Thanks a lot!

-- 
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--- 
You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to aroma-affymetrix+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[aroma.affymetrix] Re: sfit

2015-01-22 Thread Chengyu Liu
Hi,

Please check old discussions. I remember there were discussions. I had the 
same issue. 
Most probably, you did not install if you were using Linux system

On Wednesday, January 21, 2015 at 3:46:49 PM UTC+2, Juanjo Lozano wrote:

 Hi,

 I found

 Error: Package not loaded: sfit
 Execution halted

 in 

 R version 3.1.2 (2014-10-31) -- Pumpkin Helmet

 Could you help-me?

 Best

 Juanjo



-- 
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--- 
You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to aroma-affymetrix+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[aroma.affymetrix] Re: Regarding the copy number states and further processing

2015-01-22 Thread Chengyu Liu
Hi,
 

 I have tried this and works good but at the end I need the information 
 whether there is a gain or loss at the segment. I will use GLAD model to 
 get gain or loss at a segment. My samples and controls are completely 
 unrelated so I am little bit doubtful whether I am doing right or not. I 
 also found some other algorithms that can work on segments produced by CBS 
 model still looking into them.


 I think you can use GLAD to call gain and loss. But CBS does not return 
gain or loss, only segments. If you use CBS you should call gain or loss 
yourself (or use other tools such as GISTIC).
 

 Then I am also looking for CNA. What other softwares have you tried on 
 data from CytoScan HD array? 

Like you I used aroma to preprocess, segmented using CBS and manually call 
gain or loss. The simplest way is using a threshold to define gain or loss. 
If I remember correctly, one of TCGA papers in Nature, there a fixed 
threshold was used to define gain and loss. Maybe you can check that.

Br,

 




 Br,
 C.Y

  


 Thanks,

 Best Regards,
 Sam.

 On Tuesday, January 20, 2015 at 10:38:27 AM UTC+1, Chengyu Liu wrote:

 Hi,

 On Monday, January 19, 2015 at 3:42:59 PM UTC+2, Sam Padmanabhuni 
 wrote:

 Dear AromaAffymetrix Team,

 First of all, thank you very much for such a detailed vignette on 
 how to perform the CNV analysis. 

 I am Sam, a PhD student in genetics, working on CNV analysis on data 
 from CytoScan HD Array. I have read the vignette to do CRMAv2 and 
 non-paired CBS. I have copied the commands and ran in R.

 But, I have few questions regarding CbsModel and GladModel in 
 segmentation algorithm:

 1. It is mentioned that, copy number states is not calculated in 
 CbsModel segmentation. How do I get information of whether the segment 
 is a 
 loss or gain from output of CbsModel? I mean can this information be 
 passed 
 to other algorithms to estimate copy number state.

 As far as I know, the out put of CBS is the relative copy number.  It 
 does not directly tell you the copy number states. 


 2. I have looked in to GLAD model and it is mentioned that it is 
 developed for aCGH but my data is not from aCGH. Can it be still used 
 to 
 calculate copy number states for the data I am working on?

 GLAD can calculate copy number states for affy-array, although I have 
 not used it before.


 3. Also, do you have a vignette on how to run CRMAv2 and CBS on 
 CytoScan HD array? This would be really helpful.

 It is the same with other chiptype, prepare input as required (there 
 is vignette).


 BTW, I am also working on CytoScan HD. What kind of analysis are you 
 going to do? Do you have paired samples or non-paired? Maybe we have 
 something common and we can discuss.

 Br,
 C.Y



 Thank you,

 Best,
 Sam.



 Best Regards,
 Sam.  



 Best,
 Sam. 


-- 
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--- 
You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to aroma-affymetrix+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [aroma.affymetrix] Questions on extracting probeset summaries

2015-01-22 Thread Henrik Bengtsson
Thanks. I can *not* reproduce this, e.g.

 ces
ChipEffectSet:
Name: GSE9890
Tags: GRBC,QN,RMA,oligo
Path: plmData/GSE9890,GRBC,QN,RMA,oligo/HG-U133_Plus_2
Platform: Affymetrix
Chip type: HG-U133_Plus_2,monocell
Number of arrays: 10
Names: GSM249671, GSM249672, GSM249673, ..., GSM249680 [10]
Time period: 2015-01-17 09:43:28 -- 2015-01-17 09:43:35
Total file size: 5.75MB
RAM: 0.02MB
Parameters: {}

 ces[[1]]
ChipEffectFile:
Name: GSM249671
Tags: chipEffects
Full name: GSM249671,chipEffects
Pathname: 
plmData/GSE9890,GRBC,QN,RMA,oligo/HG-U133_Plus_2/GSM249671,chipEffects.CEL
File size: 589.25 kB (603394 bytes)
RAM: 0.02 MB
File format: v4 (binary; XDA)
Platform: Affymetrix
Chip type: HG-U133_Plus_2,monocell
Timestamp: 2015-01-17 09:43:28
Parameters: {probeModel: chr pm}

 data - extractDataFrame(ces, units=NULL, addNames=TRUE)
 str(data)
'data.frame':   54675 obs. of  15 variables:
 $ unitName : chr  AFFX-BioB-5_at AFFX-BioB-M_at AFFX-BioB-3_at
AFFX-BioC-5_at ...
 $ groupName: chr  ...
 $ unit : int  1 2 3 4 5 6 7 8 9 10 ...
 $ group: int  1 1 1 1 1 1 1 1 1 1 ...
 $ cell : int  1 2 3 4 5 6 7 8 9 10 ...
 $ GSM249671: num  1614 2691 2120 3904 2238 ...
 $ GSM249672: num  2612 4060 3301 5686 3280 ...
 $ GSM249673: num  2876 5178 4014 6861 4050 ...
 $ GSM249674: num  3328 5704 4350 7617 4505 ...
 $ GSM249675: num  3101 5455 4131 7735 4560 ...
 $ GSM249676: num  5081 8883 7173 10997 7188 ...
 $ GSM249677: num  2329 4186 3209 5853 3482 ...
 $ GSM249678: num  1723 3177 2353 5537 3141 ...
 $ GSM249679: num  1442 2458 2114 4285 2370 ...
 $ GSM249680: num  1469 2641 2154 4583 2582 ...

So, let's start troubleshooting.  First, you should see the exact same
as I do for:

 cdf - getCdf(ces)
 cdf
AffymetrixCdfFile:
Path: annotationData/chipTypes/HG-U133_Plus_2
Filename: HG-U133_Plus_2,monocell.CDF
File size: 9.63 MB (10098009 bytes)
Chip type: HG-U133_Plus_2,monocell
RAM: 3.34MB
File format: v4 (binary; XDA)
Dimension: 246x245
Number of cells: 60270
Number of units: 54675
Cells per unit: 1.10
Number of QC units: 9

If not, that's where the problem is.  If ok, then check this output:

 map - getUnitGroupCellMap(cdf)
str(map) str(map)
Classes 'UnitGroupCellMap' and 'data.frame':54675 obs. of  3 variables:
 $ unit : int  1 2 3 4 5 6 7 8 9 10 ...
 $ group: int  1 1 1 1 1 1 1 1 1 1 ...
 $ cell : int  1 2 3 4 5 6 7 8 9 10 ...

This map is essential in what information gets pulled out and
returned.  The number of rows/observations in this data frame should
match the number of units in the 'cdf', i.e. 54,675 units.

Let's start with that.

Henrik

On Thu, Jan 22, 2015 at 5:00 PM, Qingzhou Zhang zqznept...@gmail.com wrote:
 Hi, Henrik,

 Thanks for the reply!

 Here is my code:

 library(aroma.affymetrix)
 RawName = Project1
 RawChipType = HG-U133_Plus_2

 ces - doGCRMA(RawName, chipType = RawChipType)
 data - extractDataFrame(ces, units = NULL, addNames = TRUE)

 Here is the sessionInfo()

 R version 3.1.1 (2014-07-10)
 Platform: x86_64-pc-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 LC_TIME=en_GB.UTF-8
  [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_GB.UTF-8
 LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_GB.UTF-8   LC_NAME=C  LC_ADDRESS=C
 [10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8
 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
 [1] aroma.light_2.2.1   aroma.affymetrix_2.13.0 aroma.core_2.13.0
 R.devices_2.12.0
 [5] R.filesets_2.6.0R.utils_1.34.0  R.oo_1.18.2
 affxparser_1.38.0
 [9] R.methodsS3_1.6.2

 loaded via a namespace (and not attached):
  [1] aroma.apd_0.5.0base64enc_0.1-2Cairo_1.5-7digest_0.6.8
 DNAcopy_1.40.0
  [6] matrixStats_0.13.0 PSCBS_0.43.0   R.cache_0.11.0 R.huge_0.8.0
 R.rsp_0.19.7
 [11] tools_3.1.1

 Here is the traceback()

 1: extractDataFrame(ces, units = NULL, addNames = TRUE)


 I tried several times, but always got a data frame containing 27604 obj. :-(

 Thanks


 On Friday, 23 January 2015 01:36:00 UTC+8, Henrik Bengtsson wrote:

 On Thu, Jan 22, 2015 at 5:44 AM, Qingzhou Zhang zqzne...@gmail.com
 wrote:
  Hi Henrik,
 
  I was processing HG-U133_Plus_2 datasets. While extracting probeset
  summaries(chip effects) as a data frame, I only got 27604 objs * n
  variables.
  I was hoping to get a data frame of 54675 objs., which equals the number
  of
  units in HG-U133_Plus_2 chip. Am I missing some steps, or processing the
  wrong CEL files?

 Hard to say.  Can you share your code (from beginning to end) showing
 what you're doing?

 Henrik

 
  Thanks a lot!
 
  --
  --
  When reporting problems on aroma.affymetrix, make sure 1) to run the
  latest
  version of the package, 2) to report the output of sessionInfo() and
  traceback(), and 3) to post a complete code example.
 
 
  You received this message because you are subscribed to the Google
  Groups
  aroma.affymetrix group with 

Re: [aroma.affymetrix] Questions on extracting probeset summaries

2015-01-22 Thread Qingzhou Zhang
Hi, Henrik,

Thanks for the reply!

Here is my code:

library(aroma.affymetrix)
RawName = Project1
RawChipType = HG-U133_Plus_2

ces - doGCRMA(RawName, chipType = RawChipType)
data - extractDataFrame(ces, units = NULL, addNames = TRUE)

Here is the sessionInfo()

R version 3.1.1 (2014-07-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C   LC_TIME=en_GB.UTF
-8   
 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_GB.UTF-8LC_MESSAGES=en_US
.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8   LC_NAME=C  LC_ADDRESS=C 
 
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION
=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] aroma.light_2.2.1   aroma.affymetrix_2.13.0 aroma.core_2.13.0   
R.devices_2.12.0   
[5] R.filesets_2.6.0R.utils_1.34.0  R.oo_1.18.2 
affxparser_1.38.0  
[9] R.methodsS3_1.6.2  

loaded via a namespace (and not attached):
 [1] aroma.apd_0.5.0base64enc_0.1-2Cairo_1.5-7digest_0.6.8 
  DNAcopy_1.40.0
 [6] matrixStats_0.13.0 PSCBS_0.43.0   R.cache_0.11.0 R.huge_0.8.0 
  R.rsp_0.19.7  
[11] tools_3.1.1 

Here is the traceback()

1: extractDataFrame(ces, units = NULL, addNames = TRUE)


I tried several times, but always got a data frame containing 27604 obj. :-(

Thanks


On Friday, 23 January 2015 01:36:00 UTC+8, Henrik Bengtsson wrote:

 On Thu, Jan 22, 2015 at 5:44 AM, Qingzhou Zhang zqzne...@gmail.com 
 javascript: wrote: 
  Hi Henrik, 
  
  I was processing HG-U133_Plus_2 datasets. While extracting probeset 
  summaries(chip effects) as a data frame, I only got 27604 objs * n 
  variables. 
  I was hoping to get a data frame of 54675 objs., which equals the number 
 of 
  units in HG-U133_Plus_2 chip. Am I missing some steps, or processing the 
  wrong CEL files? 

 Hard to say.  Can you share your code (from beginning to end) showing 
 what you're doing? 

 Henrik 

  
  Thanks a lot! 
  
  -- 
  -- 
  When reporting problems on aroma.affymetrix, make sure 1) to run the 
 latest 
  version of the package, 2) to report the output of sessionInfo() and 
  traceback(), and 3) to post a complete code example. 
  
  
  You received this message because you are subscribed to the Google 
 Groups 
  aroma.affymetrix group with website http://www.aroma-project.org/. 
  To post to this group, send email to aroma-af...@googlegroups.com 
 javascript: 
  To unsubscribe and other options, go to 
 http://www.aroma-project.org/forum/ 
  
  --- 
  You received this message because you are subscribed to the Google 
 Groups 
  aroma.affymetrix group. 
  To unsubscribe from this group and stop receiving emails from it, send 
 an 
  email to aroma-affymetr...@googlegroups.com javascript:. 
  For more options, visit https://groups.google.com/d/optout. 


-- 
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--- 
You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to aroma-affymetrix+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [aroma.affymetrix] Re: Regarding the copy number states and further processing

2015-01-22 Thread Henrik Bengtsson
Hi guys,

here are some late feedback on this discussion:

* When talking about copy numbers, it is important to always be very
clear and distinguish between whether we talk about normal/germline
CNs or tumor CNs.  The former take integer CN levels (0, 1, 2, 3,
...), whereas for tumors we very rarely observe pure homogeneous tumor
cells, which is why we only measure and observe non-integer CN levels.
Hopefully, we observe at least discrete CN levels in tumors, but one
should never expect integer levels.

* aCGH: a historical term often used as a synonym for total copy
numbers.  For example, some say aCGH analysis when they really mean
total copy-number analysis.  aCGH stands for array-CGH, or in full
'array comparative genomic hybridization'.  This refers to the older
generation two-color/two-channel arrays where a test and a reference
sample where labelled with two different dyes and competitively
hybridized to the same array and the same probes.  I recommend to stop
using this term and instead use total copy number, total CN, or
TCN (when it's clear).   By being explicit about total, you're
also explicitly contrasting it to parent-specific CNs (which you can
do if you have SNP data).

* CNA: Copy-Number Aberration.  This term can be applied to both tumor
and germline samples.  In tumors you expect non-integer CN levels.  In
germline/normals you expect integer CN levels (0, 1, 2, 3, ...).

* CNP: Copy-Number Polymorphism.  This term applies to copy-number
differences in relationship to a population.  This also implies we're
talking about germline genomes.  In other words, CNPs are also integer
CN levels (0, 1, 2, 3, ...).  CNPs are used to specify, say, 2% of
the Europeans have a 1 copy deletion of length 1.0-1.5 Mb on Chr 3 at
124.5Mb.  CNPs is for segment deletions and gains what SNPs are for
nucleotide polymorphisms.  The term CNP is rare.  It is much more
common to hear/see CNV.

* CNV: Copy-Number Variation.  Ideally the word variation refers to
polymorphism and therefore the term CNV should be used only to refer
to CNPs.  I don't know if there is a formal definitions, but I find it
unfortunate to see CNV being used when CNA should be used.  By my
books, CNV only takes integer CN levels (0, 1, 2, 3, ...).  The term
CNV should never be used to refer to CN levels in tumors.

* Calling total CN levels is very hard in tumors, and as the first
above point alludes to, it may not even be a well defined problem.
For instance, imagine you have a tumor sample with 5% tumor cells and
95% normal cells, and that the those tumors cells all have a deletion
on Chr 2.  Then, at what point to you consider that sample itself to
have a deletion on Chr 2?  Are you after he sample/tissue itself, or
are you after those 5% tumors cells?  What if you have a heterogeneous
mix of tumor cells?  The more precise you can specify your question
the more easy it is for you to decided what approach forward (may)
work and what doesn't work.  Here work can also be read as make
sense.

* The first and most important task for almost all segmentation
methods is to *segment* the genome, that is, identify at what genomic
locations the observed DNA (tumor, normal or a mix) changes in CN
level.  Together, these location, aka change points, defines how the
genome can be partitioned into segments with equal CN levels, such
that when we look at a particular segment, we can assume that all
genomic locations within that segment has the same underlying genomic
composition (e.g. gain, loss, loss in 5% of the cells, etc.).  CBS,
GLAD, and many other methods, segment the genome this way as a first
step.

* A common task after having decided on the segments (partitioning of
the genome), is to decide on what is going on within each segment.
Not all methods does this.  For instance, CBS only provides you with
the change points.  GLAD on the other hand does both the segmentation
and then also provides a method for calling.  Theoretically, there is
nothing preventing you from using the GLAD *calling* algorithm using
the segmentation found by CBS.  Unfortunately, I don't think it is
straightforward to do that in practice; at least you have to coerce
one data format into one that GLAD understands.

* GLAD does not scale well with the number of loci, because it's
computational complexity is ~O(n^2), unless things have changed since.
In 2007, I tried to predict GLAD's processing time when we were using
the Affymetrix 500K chips and the GenomeWideSNP_5 and GenomeWideSNP_6
were starting to come out.  A GWS6 chip would basically take days to
segment.  See attached PNG for a table.

* CBS is much faster as an algorithm.  Also, the implementation in the
DNAcopy package has been made even faster over time.  There was a
major speedup back in 2009, cf.
http://aroma-project.org/benchmarks/DNAcopy_v1.19.2-speedup/

Over and for now

Henrik

On Thu, Jan 22, 2015 at 12:42 AM, Chengyu Liu chengyu.liu...@gmail.com wrote:
 Hi,


 I have tried this and works good but at the 

Re: [aroma.affymetrix] Questions on extracting probeset summaries

2015-01-22 Thread Henrik Bengtsson
On Thu, Jan 22, 2015 at 5:44 AM, Qingzhou Zhang zqznept...@gmail.com wrote:
 Hi Henrik,

 I was processing HG-U133_Plus_2 datasets. While extracting probeset
 summaries(chip effects) as a data frame, I only got 27604 objs * n
 variables.
 I was hoping to get a data frame of 54675 objs., which equals the number of
 units in HG-U133_Plus_2 chip. Am I missing some steps, or processing the
 wrong CEL files?

Hard to say.  Can you share your code (from beginning to end) showing
what you're doing?

Henrik


 Thanks a lot!

 --
 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest
 version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/

 ---
 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to aroma-affymetrix+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--- 
You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to aroma-affymetrix+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.