[galaxy-dev] From NCBI SRA to UCSC viewer pipeline.

2011-07-07 Thread colin molter
Hi all,
i am trying to use a local instance of my galaxy to pre-format data stored
at sra-ncbi.
Does anyone has a working pipeline that (s)he could share.

Here is the pipeline I am using, with some questions.

1/ download sra files to my server.
2/ transform them in fastq using the sra toolbox.
3/ upload them in galaxy, by using the 'add to data library'
4/ use the fastq groomer to enable to use the fast q in galaxy.
Note: i guess that the data at sra are already in the fastq sanger format.
So it could be nice to be able to skip that point (it took 10 hours to groom
a fastq of 25Gb).
5/ MAP with Bowtie -- fastq to SAM
6/ filter SAM
7/ SAM to BAM

problems:
* sra data i got are RNAseq. I heard that bowtie is not good because can't
deal with the splicing (so bowtie is ok for genome but not for RNAseq) ==
what is the best way to align RNAseq? Tophat? The problem is that i heard
that if tophat can deal with gaps, it looses information about deletions.
Someone told me that it could be better to use BWA and then to add a further
step to deal with the splicing and the gaps. Any information?

* to see my data in the IGV, an index (BAI) should be created. Normally, IGV
could create it itself, but it didn't work. I heard that data should be
ordered. The SAM i got from Bowtie is ordered by name and it should be
ordered by chromosom and position. Is it right? In that case i could use the
sort tool of galaxy and apply it on the SAM before to transform it in a BAM.
Is it right?

any other/related hints.
Is there not a simple tutorial/screencast about this process that i guess
most of the galaxy users have already did?

thx
colin
-- 
Colin Molter
University of Brussels - InSilico Team - http://insilico.ulb.ac.be/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] From NCBI SRA to UCSC viewer pipeline.

2011-07-07 Thread Peter Cock
On Thu, Jul 7, 2011 at 7:14 AM, colin molter colin.mol...@gmail.com wrote:
 Hi all,
 i am trying to use a local instance of my galaxy to pre-format data stored
 at sra-ncbi.
 Does anyone has a working pipeline that (s)he could share.
 Here is the pipeline I am using, with some questions.
 1/ download sra files to my server.
 2/ transform them in fastq using the sra toolbox.
 3/ upload them in galaxy, by using the 'add to data library'
 4/ use the fastq groomer to enable to use the fast q in galaxy.
 Note: i guess that the data at sra are already in the fastq sanger format.
 So it could be nice to be able to skip that point (it took 10 hours to groom
 a fastq of 25Gb).

Just upload load them and set the file format to fastqsanger
(either when you upload, or afterwards via the pencil icon
to edit the attributes - this is fast as it doesn't change the file).

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/