Re: [galaxy-user] a question about cuffdiff "values"

2012-08-06 Thread Jeremy Goecks
Hi El,

> 1) what do these numbers represent?

FPKM values for sample 1 and 2. Cufflinks documentation is the place to get 
definitions for all columns: 
http://cufflinks.cbcb.umd.edu/manual.html#gene_exp_diff

> 2) If in the "value" column where I expect a higher number has  a "value of 
> 10" or less mean anything or should one be selecting for values higher that 
> these single digit numbers 
> 3) And in the column of genes that might be repressed is there really a 
> difference between a "value of 0.1  versus something like 0.01" since that 
> can change my log ratios significantly--this, of course, goes back to my 
> first question

These questions get at the challenge of interpreting FPKM values. One thing to 
look at is the confidence intervals (CI) produced by Cufflinks/diff. CIs that 
overlap 0 are, in my experience, unreliable no matter how large the FPKM. 

Most likely genes with FPKM values near 0 have CIs overlapping 0, which means 
there's likely no difference between them. However, genes with low FPKM values 
( e.g. < 10) but tight CIs and > 0 should probably be included for further 
analysis.

Another thing to look at is whether a couple highly-expressed genes are 
reducing FPKM values. If so, using the upper-quartile normalization option can 
help you get better resolution for genes expressed at low levels.

Good luck,
J.___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-user] a question about cuffdiff "values"

2012-08-06 Thread Elwood Linney
Hello:
I am a Galaxy-naive molecular, developmental biologist studying
repression/derepression of early embryonic gene expression in zebrafish
embryos.

After attending the Galaxy meeting I returned home and worked up two
mRNAseq files to determine RNA expression differences using cuffdiff
between a treated and an untreated sample (i.e. data from cuffdiff under
the title of "gene differential expression testing").

I downloaded the data, opened it up in an Excel file and captured all the
"significant" rows.

If I look at the "value 1" and "value 2" columns I find that many of the
numbers are single digits.  I expect that in one of the columns that the
numbers will be very low (that is, less than 1) because the treatment
should be inducing gene expression in a subfamily of genes that are
repressed.

My questions are:

1) what do these numbers represent?

2) If in the "value" column where I expect a higher number has  a "value of
10" or less mean anything or should one be selecting for values higher that
these single digit numbers

3) And in the column of genes that might be repressed is there really a
difference between a "value of 0.1  versus something like 0.01" since that
can change my log ratios significantly--this, of course, goes back to my
first question

I would appreciate any help I could get, sincerely,

el linney
Professor of Molecular Genetics and Microbiology
Duke University Medical Center
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/