Re: [galaxy-user] Extract data and new genes

2012-05-18 Thread Luciano Cosme
Thanks Jeremy,
   I will do it before try the *de novo *assembly.

Luciano

On Fri, May 18, 2012 at 1:44 PM, Jeremy Goecks wrote:

> I find a lot of potential new genes (hundreds or thousands of reads
> aligning to regions where there is no gene annotation),
>
>
> This shouldn't be completely unexpected. High-coverage RNA-seq data is
> constantly revealing new exons/splicing/transcripts, even in well-annotated
> genomes.
>
> I also find new exons for some genes or exons with different sizes. I was
> thinking to do an *de novo* assembly to find new transcripts and genes,
> but I was wondering if there is something else I could do.
>
>
> My suggestion: do reference-guided assembly with Cufflinks; this will
> yield both existing and new transcripts.
>
> For example, maybe I could just extract those regions where thousands of
> reads align (new gene). I know that we can extract the sequence data for
> specific transcript, is it possible to extract reads for regions without
> annotation, only based in the number of reads aligned?
>
>
> You could subtract known genes from the Cufflinks assembly to get only
> novel transcripts.
>
> Best,
> J.
>
>
>
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Extract data and new genes

2012-05-18 Thread Jeremy Goecks
> I find a lot of potential new genes (hundreds or thousands of reads aligning 
> to regions where there is no gene annotation),

This shouldn't be completely unexpected. High-coverage RNA-seq data is 
constantly revealing new exons/splicing/transcripts, even in well-annotated 
genomes.

> I also find new exons for some genes or exons with different sizes. I was 
> thinking to do an de novo assembly to find new transcripts and genes, but I 
> was wondering if there is something else I could do.

My suggestion: do reference-guided assembly with Cufflinks; this will yield 
both existing and new transcripts.

> For example, maybe I could just extract those regions where thousands of 
> reads align (new gene). I know that we can extract the sequence data for 
> specific transcript, is it possible to extract reads for regions without 
> annotation, only based in the number of reads aligned?

You could subtract known genes from the Cufflinks assembly to get only novel 
transcripts.

Best,
J.


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/