Re: [galaxy-user] Please help me check the quality of the Tophat mapping to reference genome

2012-08-27 Thread Du, Jianguang
Hi Jen,
Thank you very much for your information. I will not worry about the Tophat 
outputs now. For this particular run, I used a single-end dataset. The whole 
experiment contains both paired-end datasets datasets and single-end datasets. 
I ran Tophat with paired-end setting for the paired-end datasets, and 
single-end setting for the single-end datasets. And then ran Cufflink, 
Cuffmerge, and Cuffdiff.
Jianguang


From: Jennifer Jackson [j...@bx.psu.edu]
Sent: Monday, August 27, 2012 12:36 PM
To: Du, Jianguang
Cc: galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] Please help me check the quality of the Tophat 
mapping to reference genome

Hello Jianguang,

This is the expected output from this particular tool. Your TopHat
output file 'accepted hits' contains only mapped data.

I did notice this option for the TopHat run:
 > Is this library mate-paired?
 > Single-end

Your data was originally paired end - so this is unexpected. But perhaps
you are working with a different dataset(s) now? If you are running with
the original paired dataset, then this is would be an option to correct
- change to mate paired = yes and run TopHat with both the fwd and rev
reads in a single mapping process. (The same method as in the RNA-seq
tutorial).
http://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise

Best,

Jen
Galaxy team


On 8/27/12 8:15 AM, Du, Jianguang wrote:
> Dear All,
>
> I ran "Flagstat" under "NGS: SAM Tools" to check the quality of the
> Tophat output (the file of accepted hits).  I got the diagnosis results
> as follow:
>
> 9471730 + 0 in total (QC-passed reads + QC-failed reads)
> 0 + 0 duplicates
> 9471730 + 0 mapped (100.00%:-nan%)
> 0 + 0 paired in sequencing
> 0 + 0 read1
> 0 + 0 read2
> 0 + 0 properly paired (-nan%:-nan%)
> 0 + 0 with itself and mate mapped
> 0 + 0 singletons (-nan%:-nan%)
> 0 + 0 with mate mapped to a different chr
> 0 + 0 with mate mapped to a different chr (mapQ>=5)
>
> I ran Tophat with settings as shown below:
>
> Will you select a reference genome from your history or use a built-in
> index?
> Use a built-in index
> Select a reference genome
> /galaxy/data/mm9/bowtie_index/mm9
> Is this library mate-paired?
> Single-end
> TopHat settings to use
> Full parameter list
> Library Type
> FR Unstranded
> Anchor length (at least 3)
> 8
> Maximum number of mismatches that can appear in the anchor region of
> spliced alignment
> 0
> The minimum intron length
> 70
> The maximum intron length
> 50
> Allow indel search
> Yes
> Max insertion length.
> 3
> Max deletion length.
> 3
> Maximum number of alignments to be allowed
> 20
> Minimum intron length that may be found during split-segment (default)
> search
> 50
> Maximum intron length that may be found during split-segment (default)
> search
> 50
> Number of mismatches allowed in the initial read mapping
> 1
> Number of mismatches allowed in each segment alignment for reads mapped
> independently
> 1
> Minimum length of read segments
> 25
> Use Own Junctions
> Yes
> Use Gene Annotation Model
> Yes
> Gene Model Annotations
> /iGenome version of mm9 genes. GTF/
> Use Raw Junctions
> No
> Only look for supplied junctions
> No
> Use Closure Search
> No
> Use Coverage Search
> Yes
> Minimum intron length that may be found during coverage search
> 50
> Maximum intron length that may be found during coverage search
> 2
> Use Microexon Search
> No
>
> Please help me find out what is wrong with the Tophat.
>
> Thanks,
>
> Jianguang
>
>
>
> ___
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
>
>http://lists.bx.psu.edu/listinfo/galaxy-dev
>
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
>
>http://lists.bx.psu.edu/
>

--
Jennifer Jackson
http://galaxyproject.org
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Please help me check the quality of the Tophat mapping to reference genome

2012-08-27 Thread Jennifer Jackson

Hello Jianguang,

This is the expected output from this particular tool. Your TopHat 
output file 'accepted hits' contains only mapped data.


I did notice this option for the TopHat run:
> Is this library mate-paired?
> Single-end

Your data was originally paired end - so this is unexpected. But perhaps 
you are working with a different dataset(s) now? If you are running with 
the original paired dataset, then this is would be an option to correct 
- change to mate paired = yes and run TopHat with both the fwd and rev 
reads in a single mapping process. (The same method as in the RNA-seq 
tutorial).

http://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise

Best,

Jen
Galaxy team


On 8/27/12 8:15 AM, Du, Jianguang wrote:

Dear All,

I ran "Flagstat" under "NGS: SAM Tools" to check the quality of the
Tophat output (the file of accepted hits).  I got the diagnosis results
as follow:

9471730 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
9471730 + 0 mapped (100.00%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

I ran Tophat with settings as shown below:

Will you select a reference genome from your history or use a built-in
index?
Use a built-in index
Select a reference genome
/galaxy/data/mm9/bowtie_index/mm9
Is this library mate-paired?
Single-end
TopHat settings to use
Full parameter list
Library Type
FR Unstranded
Anchor length (at least 3)
8
Maximum number of mismatches that can appear in the anchor region of
spliced alignment
0
The minimum intron length
70
The maximum intron length
50
Allow indel search
Yes
Max insertion length.
3
Max deletion length.
3
Maximum number of alignments to be allowed
20
Minimum intron length that may be found during split-segment (default)
search
50
Maximum intron length that may be found during split-segment (default)
search
50
Number of mismatches allowed in the initial read mapping
1
Number of mismatches allowed in each segment alignment for reads mapped
independently
1
Minimum length of read segments
25
Use Own Junctions
Yes
Use Gene Annotation Model
Yes
Gene Model Annotations
/iGenome version of mm9 genes. GTF/
Use Raw Junctions
No
Only look for supplied junctions
No
Use Closure Search
No
Use Coverage Search
Yes
Minimum intron length that may be found during coverage search
50
Maximum intron length that may be found during coverage search
2
Use Microexon Search
No

Please help me find out what is wrong with the Tophat.

Thanks,

Jianguang



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/



--
Jennifer Jackson
http://galaxyproject.org
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-user] Please help me check the quality of the Tophat mapping to reference genome

2012-08-27 Thread Du, Jianguang
Dear All,

I ran "Flagstat" under "NGS: SAM Tools" to check the quality of the Tophat 
output (the file of accepted hits).  I got the diagnosis results as follow:

9471730 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
9471730 + 0 mapped (100.00%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

I ran Tophat with settings as shown below:

Will you select a reference genome from your history or use a built-in index?
Use a built-in index
Select a reference genome
/galaxy/data/mm9/bowtie_index/mm9
Is this library mate-paired?
Single-end
TopHat settings to use
Full parameter list
Library Type
FR Unstranded
Anchor length (at least 3)
8
Maximum number of mismatches that can appear in the anchor region of spliced 
alignment
0
The minimum intron length
70
The maximum intron length
50
Allow indel search
Yes
Max insertion length.
3
Max deletion length.
3
Maximum number of alignments to be allowed
20
Minimum intron length that may be found during split-segment (default) search
50
Maximum intron length that may be found during split-segment (default) search
50
Number of mismatches allowed in the initial read mapping
1
Number of mismatches allowed in each segment alignment for reads mapped 
independently
1
Minimum length of read segments
25
Use Own Junctions
Yes
Use Gene Annotation Model
Yes
Gene Model Annotations
iGenome version of mm9 genes. GTF
Use Raw Junctions
No
Only look for supplied junctions
No
Use Closure Search
No
Use Coverage Search
Yes
Minimum intron length that may be found during coverage search
50
Maximum intron length that may be found during coverage search
2
Use Microexon Search
No

Please help me find out what is wrong with the Tophat.

Thanks,

Jianguang
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/