Re: [galaxy-user] Pre-processing of Illumina RNA-Seq paired end data

2012-02-23 Thread SHAUN WEBB


Hi Ravi,

I got around this problem by using the fastq interlacer to join reads  
in to a single file, then use deinterlacer to output only reads that  
have a pair in the correct order.


You may need to alter read IDs first by adding /1 and /2 to the end  
(see interlacer help text). I used unix command line sed but I'm sure  
you can use galaxy tools to do this.


Shaun


Quoting Ravi Karra ravi.ka...@gmail.com on Wed, 22 Feb 2012 12:29:18 -0500:


Hello,

I have Illumina 76bp paired end data for a zebrafish RNA-seq  
experiment and am basically stuck while trying to pre-process my  
data prior to using Tophat/CuffDiff.


For each sample, I have a read1 fastq file and a paired read2 fastq  
file.  After using FASTQ Groomer, I trimmed the ends using FASTQ  
quality trimmer with a threshold quality score of 20 ans a window  
size of 1 (I think that will essentially lop off the end of the read  
until the quality score is = 20).  Next, I trimmed the adapters  
using Clip.


What I am left with is a modified read1 fastq file and a modified  
read2 file, where the pairs are not in the same order and some reads  
are left without pairs. From what I have read, I don't think TopHat  
can incorporate paired end data that is out of order.. I tried to  
get around the ordering issue using FASTQ joiner, but this tool is  
not able to join the reads (return is 0 joined reads).  I am not  
really sure why FASTQ joiner didn't work for me and am looking for  
suggestions of what to try next.


Thanks!
ravi
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/






--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-user] Pre-processing of Illumina RNA-Seq paired end data

2012-02-22 Thread Ravi Karra
Hello, 

I have Illumina 76bp paired end data for a zebrafish RNA-seq experiment and am 
basically stuck while trying to pre-process my data prior to using 
Tophat/CuffDiff.

For each sample, I have a read1 fastq file and a paired read2 fastq file.  
After using FASTQ Groomer, I trimmed the ends using FASTQ quality trimmer with 
a threshold quality score of 20 ans a window size of 1 (I think that will 
essentially lop off the end of the read until the quality score is = 20).  
Next, I trimmed the adapters using Clip.

What I am left with is a modified read1 fastq file and a modified read2 file, 
where the pairs are not in the same order and some reads are left without 
pairs. From what I have read, I don't think TopHat can incorporate paired end 
data that is out of order.. I tried to get around the ordering issue using 
FASTQ joiner, but this tool is not able to join the reads (return is 0 joined 
reads).  I am not really sure why FASTQ joiner didn't work for me and am 
looking for suggestions of what to try next.

Thanks!
ravi
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Pre-processing of Illumina RNA-Seq paired end data

2012-02-22 Thread Sameet Mehta
Hi,

I think you need to first remove the adaptors and then trim the reads.
 That is probably the correct way.  As for the second part of the question,
you could try a rudimentary way to actually search for a sequence header.
 I have seen this different sizes in the r1 and r2 read files, but taken
together almost 90% turn out to be true the paired reads.

Hope this helps,
Sameet

On Wed, Feb 22, 2012 at 12:29 PM, Ravi Karra ravi.ka...@gmail.com wrote:

 Hello,

 I have Illumina 76bp paired end data for a zebrafish RNA-seq experiment
 and am basically stuck while trying to pre-process my data prior to using
 Tophat/CuffDiff.

 For each sample, I have a read1 fastq file and a paired read2 fastq file.
  After using FASTQ Groomer, I trimmed the ends using FASTQ quality trimmer
 with a threshold quality score of 20 ans a window size of 1 (I think that
 will essentially lop off the end of the read until the quality score is =
 20).  Next, I trimmed the adapters using Clip.

 What I am left with is a modified read1 fastq file and a modified read2
 file, where the pairs are not in the same order and some reads are left
 without pairs. From what I have read, I don't think TopHat can incorporate
 paired end data that is out of order.. I tried to get around the ordering
 issue using FASTQ joiner, but this tool is not able to join the reads
 (return is 0 joined reads).  I am not really sure why FASTQ joiner didn't
 work for me and am looking for suggestions of what to try next.

 Thanks!
 ravi
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:

  http://lists.bx.psu.edu/




-- 
Sameet Mehta, Ph.D.,
Phone:  (301) 842-4791
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/