Re: [galaxy-user] Identification of replicate outlier

2012-11-11 Thread Jeremy Goecks

 c) if you can create an appropriate input matrix (read counts by exon
 or other contig for each sample eg), the Principal Component Analysis
 tool might be helpful (library size normalization is one devil that
 lies in the detail and it's not quite the same as MDS - see below)

I like starting with this approach because it can be done easily in Galaxy. You 
can take the expression datasets produced by Cufflinks for each replicate and 
join them on gene name to get a big table of replicate-expression values and 
either eyeball it or use PCA. Note that since Cufflinks produces FPKM, library 
size is already accounted for.

Another idea/approach: Cuffdiff already has an advanced model for dealing with 
replicates: 

http://cufflinks.cbcb.umd.edu/howitworks.html#reps

You may want to investigate how this model works and whether you can tune it 
with parameter settings before giving up on using all your replicates. 

One challenge with this approach is that the Galaxy Cuffdiff wrapper does not 
yet include all parameters, so you might try enhancing the Cuffdiff wrapper 
with additional, relevant parameters and using those as well as the existing 
ones. If you do this, please consider submitting your enhancements back to me 
and I can integrate them into our code base.

Best,
J.___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Identification of replicate outlier

2012-11-09 Thread Dave Corney
Hi Ross,

Thanks for the suggestions. I'm aware that this is not really a
Galaxy-specific question, and I've been browsing through SeqAnswers and
found a couple of suggestions using edgeR or DESeq, but nothing for Tuxedo
suite. However, I have no experience with either of these tools, so I was
wondering how others have approached this problem if their workflow is
based on Cufflinks.

In the meantime, I'll go through your suggestions and see where I get.

Thanks,
Dave


On Thu, Nov 8, 2012 at 7:21 PM, Ross ross.laza...@gmail.com wrote:

 Hi Dave,
 This is an interesting and non-trivial question that extends well
 beyond Galaxy - and there's no simple solution AFAIK
 Defining an 'outlier' tends to boil down to subjective judgement in
 most real cases I've seen.
 EG: see
 http://comments.gmane.org/gmane.science.biology.informatics.conductor/40927

 My 2c worth:
 a) confirm that all of your sample library sizes and quality score
 distributions are comparable with the FastQC tool. A sample with
 relatively low library size may indicate an upstream technical failure
 with (eg) RNA extraction or a flowcell lane.
 b) check that the number of unique alignments to the reference are
 similar (eg picard alignment summary metrics or even the samtools
 flagstat tool)
 c) if you can create an appropriate input matrix (read counts by exon
 or other contig for each sample eg), the Principal Component Analysis
 tool might be helpful (library size normalization is one devil that
 lies in the detail and it's not quite the same as MDS - see below)
 d) If you're an R hacker, you might find

 http://gettinggeneticsdone.blogspot.com.au/2012/09/deseq-vs-edger-comparison.html
 useful - it shows how to get MDS plots which are probably the most
 reliable way to identify samples that don't cluster well with the
 other members of their tribe



 On Fri, Nov 9, 2012 at 10:22 AM, Dave Corney dcor...@princeton.edu
 wrote:
  Hello list,
 
  I've been analyzing an experiment with two groups each with three
  replicates. My workflow was TopHat (paired end) - Cufflinks - CuffDiff.
  Unfortunately, there are not many significant differences identified by
  CuffDiff.
 
  I am wondering whether one of my replicates might be an outlier. Does
  anybody have a suggestion on how to search for an outlier? The quality
  statistics of the unprocessed data looked equally good for all samples,
 so I
  don't think that this is a problem.
 
  Thanks,
  Dave
 
 
  ___
  The Galaxy User list should be used for the discussion of
  Galaxy analysis and other features on the public server
  at usegalaxy.org.  Please keep all replies on the list by
  using reply all in your mail client.  For discussion of
  local Galaxy instances and the Galaxy source code, please
  use the Galaxy Development list:
 
http://lists.bx.psu.edu/listinfo/galaxy-dev
 
  To manage your subscriptions to this and other Galaxy lists,
  please use the interface at:
 
http://lists.bx.psu.edu/



 --
 Ross Lazarus MBBS MPH;
 Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444
 http://scholar.google.com/citations?hl=enuser=UCUuEM4J

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Identification of replicate outlier

2012-11-08 Thread Ross
Hi Dave,
This is an interesting and non-trivial question that extends well
beyond Galaxy - and there's no simple solution AFAIK
Defining an 'outlier' tends to boil down to subjective judgement in
most real cases I've seen.
EG: see 
http://comments.gmane.org/gmane.science.biology.informatics.conductor/40927

My 2c worth:
a) confirm that all of your sample library sizes and quality score
distributions are comparable with the FastQC tool. A sample with
relatively low library size may indicate an upstream technical failure
with (eg) RNA extraction or a flowcell lane.
b) check that the number of unique alignments to the reference are
similar (eg picard alignment summary metrics or even the samtools
flagstat tool)
c) if you can create an appropriate input matrix (read counts by exon
or other contig for each sample eg), the Principal Component Analysis
tool might be helpful (library size normalization is one devil that
lies in the detail and it's not quite the same as MDS - see below)
d) If you're an R hacker, you might find
http://gettinggeneticsdone.blogspot.com.au/2012/09/deseq-vs-edger-comparison.html
useful - it shows how to get MDS plots which are probably the most
reliable way to identify samples that don't cluster well with the
other members of their tribe



On Fri, Nov 9, 2012 at 10:22 AM, Dave Corney dcor...@princeton.edu wrote:
 Hello list,

 I've been analyzing an experiment with two groups each with three
 replicates. My workflow was TopHat (paired end) - Cufflinks - CuffDiff.
 Unfortunately, there are not many significant differences identified by
 CuffDiff.

 I am wondering whether one of my replicates might be an outlier. Does
 anybody have a suggestion on how to search for an outlier? The quality
 statistics of the unprocessed data looked equally good for all samples, so I
 don't think that this is a problem.

 Thanks,
 Dave


 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:

   http://lists.bx.psu.edu/



-- 
Ross Lazarus MBBS MPH;
Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444
http://scholar.google.com/citations?hl=enuser=UCUuEM4J
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/