Re: [galaxy-user] Bowtie: Outputting unmapped reads

2013-07-31 Thread Mayank Tandon
That's a neat trick, and I definitely wouldn't have thought of that
approach, so thanks for that!

After I finished writing this out, I realized it was super long.  So here
are the questions I'm asking up front, so you can choose whether or not to
read the details.  Thanks!

1. How do I output the quality scores when converting from FASTQ to FASTA?
2. Does the SAM-to-interval tool output only mapped reads by looking at the
flag values?
3. Why am I getting the mentioned error and is there a way to resolve it?

Here are the details:

   1. I don't see an option to output both the sequences and the quality
   scores.  I found two FASTQ-to-FASTA converters (one under the Convert
   Formats and the other in the FastX Toolkit) and both only output one fasta
   file with the sequences.  Am I missing something, or should I be using some
   other tool to output both the sequences and the quality scores?
   2. The Extract Genomic Sequences tool seems to want an Interval file as
   input, not a list of IDs.  Does that mean I should convert the filtered SAM
   output to Interval?  Currently I'm using the SAM-to-interval conversion to
   extract the mapped reads and make the data more manageable in one step
   (pretty sure I picked that up from one of the tutorials...).  I was
   assuming that by definition it could only output an interval if it was
   mapped, and if so, I wouldn't be able to convert the unmapped reads to
   Interval anyway.  Is that wrong?
   3. I was setting up a workflow with Bowtie and I noticed that the
   Workflow Editor does show options to output unmapped reads.  But when I try
   to output them, I get this error:

Error due to input mapping of 'Compute quality statistics' in
'output_unmapped_reads_l'. A common cause of this is conditional outputs
that cannot be determined until runtime, please review your workflow.

Superficially, this seems silly.  Obviously a conditional output will not
be determined until runtime because it's dependent on something else.  So
why is that an error?  I have tried outputting to a few different tools, so
it doesn't seem to be specific to the tool into which the unmapped reads go
(in this case, Compute Quality Statistics).


Any thoughts, insights, or even other approaches to the original problem
would be great.  Currently, I'm thinking my best bet is to filter out the
unmapped reads locally with a Perl script and re-upload, but that felt like
overkill and time-consuming when I will inevitably want to tweak or re-run
things.  Also, installing a local instance is currently not an option for
me (though it should be in a few months). In any case, I appreciate your
help a lot!

Thanks, again!
Mayank Tandon


On Thu, Jul 18, 2013 at 5:38 PM, Jennifer Jackson j...@bx.psu.edu wrote:

  Hi Mayank,

 The best option I know of is to do the following:

 1 - obtain the sequence identifiers for the unmapped reads by filter the
 SAM file, then cutting them out
 2 - convert the original FASTQ file to FASTA - you should get two output,
 one for the sequences and one for the quality score values
 3 - use the tool Fetch Sequences - Extract Genomic DNA. The query is
 the list from #1, the target is the genome from #2. Do this twice - once
 for seqs, once for quals. This means using the target datasets from #2 as
 Custom Reference genomes - help about how to do this is here:
 http://wiki.galaxyproject.org/Support#Custom_reference_genome
 4 - combine the FASTA seq and qual files back to FASTQ

 If you will be doing this again, then capture the process into a workflow
 for future use, in a way creating your own tool.

 Hopefully this helps!

 Jen
 Galaxy


 On 7/18/13 9:40 AM, Mayank Tandon wrote:

 I was hoping this question had been asked, but I haven't been able to find
 it.  I want to output the unmapped reads from bowtie as a fastq file for
 subsequent mapping to other genomes (i.e. the --un filename option).  I
 know I can extract the unmapped reads by filtering on the bitwise values in
 the sam output and converting to fastq with the Picard tool, but I'm using
 colorspace data and bowtie converts them to letterspace. My understanding
 (coming mostly from forums and personal discussions) was that the
 color-to-letter conversion was somehow lossy so mapping the colorspace data
 directly is always preferable.

  So the question is: Is bowtie's '--un' option implemented in Galaxy and
 if so, how do I access it?

  Thanks in advance!


  Mayank Tandon


 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:

   

Re: [galaxy-user] Bowtie: Outputting unmapped reads

2013-07-31 Thread Peter Cock
On Wednesday, July 31, 2013, Mayank Tandon wrote:

 That's a neat trick, and I definitely wouldn't have thought of that
 approach, so thanks for that!

 After I finished writing this out, I realized it was super long.  So here
 are the questions I'm asking up front, so you can choose whether or not to
 read the details.  Thanks!

 1. How do I output the quality scores when converting from FASTQ to FASTA?


You can't, unless you mean converting a FASTQ file into a FASTA and
matching QUAL file?

Peter
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Bowtie: Outputting unmapped reads

2013-07-31 Thread Mayank Tandon
Exactly. Jennifer's solution for outputting unmapped reads involves
splitting the FASTQ file into basically two FASTA files, one with sequences
and the other with the corresponding quality score string. So, yes, they
would be matched files.



On Wed, Jul 31, 2013 at 4:42 PM, Peter Cock p.j.a.c...@googlemail.comwrote:



 On Wednesday, July 31, 2013, Mayank Tandon wrote:

 That's a neat trick, and I definitely wouldn't have thought of that
 approach, so thanks for that!

 After I finished writing this out, I realized it was super long.  So here
 are the questions I'm asking up front, so you can choose whether or not to
 read the details.  Thanks!

 1. How do I output the quality scores when converting from FASTQ to FASTA?


 You can't, unless you mean converting a FASTQ file into a FASTA and
 matching QUAL file?

 Peter


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

[galaxy-user] Bowtie: Outputting unmapped reads

2013-07-18 Thread Mayank Tandon
I was hoping this question had been asked, but I haven't been able to find
it.  I want to output the unmapped reads from bowtie as a fastq file for
subsequent mapping to other genomes (i.e. the --un filename option).  I
know I can extract the unmapped reads by filtering on the bitwise values in
the sam output and converting to fastq with the Picard tool, but I'm using
colorspace data and bowtie converts them to letterspace. My understanding
(coming mostly from forums and personal discussions) was that the
color-to-letter conversion was somehow lossy so mapping the colorspace data
directly is always preferable.

So the question is: Is bowtie's '--un' option implemented in Galaxy and if
so, how do I access it?

Thanks in advance!


Mayank Tandon
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Bowtie: Outputting unmapped reads

2013-07-18 Thread Jennifer Jackson

Hi Mayank,

The best option I know of is to do the following:

1 - obtain the sequence identifiers for the unmapped reads by filter the 
SAM file, then cutting them out
2 - convert the original FASTQ file to FASTA - you should get two 
output, one for the sequences and one for the quality score values
3 - use the tool Fetch Sequences - Extract Genomic DNA. The query is 
the list from #1, the target is the genome from #2. Do this twice - 
once for seqs, once for quals. This means using the target datasets from 
#2 as Custom Reference genomes - help about how to do this is here:

http://wiki.galaxyproject.org/Support#Custom_reference_genome
4 - combine the FASTA seq and qual files back to FASTQ

If you will be doing this again, then capture the process into a 
workflow for future use, in a way creating your own tool.


Hopefully this helps!

Jen
Galaxy

On 7/18/13 9:40 AM, Mayank Tandon wrote:
I was hoping this question had been asked, but I haven't been able to 
find it.  I want to output the unmapped reads from bowtie as a fastq 
file for subsequent mapping to other genomes (i.e. the --un 
filename option).  I know I can extract the unmapped reads by 
filtering on the bitwise values in the sam output and converting to 
fastq with the Picard tool, but I'm using colorspace data and bowtie 
converts them to letterspace. My understanding (coming mostly from 
forums and personal discussions) was that the color-to-letter 
conversion was somehow lossy so mapping the colorspace data directly 
is always preferable.


So the question is: Is bowtie's '--un' option implemented in Galaxy 
and if so, how do I access it?


Thanks in advance!


Mayank Tandon


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

   http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/