[galaxy-user] problem with GATK tools not accepting BAM file as input

2013-06-24 Thread Politz, Samuel M.
I am using GATK tools on the useGalaxy main server to detect variants in a 
mutant C. elegans whole genome sequence obtained with an Illumina instrument 
(my own data).  The first GATK tool I tried to use, Realigner Target Creator, 
gave me an error message.  In the tool window, my input file (a BAM file 
previously run through Add or  Replace Groups) did not generate an error, but 
the reference genome file (ce10) which I specified as found in History, 
produced the following reference list-specific error:  History does not 
include a dataset of the required format/build.  I got the same error when I 
tried to use this input file to run the GATK Depth of Coverage tool.  I have 
searched Galaxy mail archives for this error, and have found other examples, 
but none involving these tools.
The ce10 database was listed in the History attributes of the BAM file I used, 
and this database has worked with all of the Galaxy tools I used up to this 
point.  Something about the ce10 format is unacceptable to GATK, or it is not 
even picking it up from the History. I don't know how to access ce10 to check 
its format.  I have only found the inbuilt reference genome files in Galaxy in 
drop-down menus for each tool.
Searching the GATK site for solutions has not been helpful, because they 
suggest GATK-specific functions to fix the format such as Create Sequence 
Dictionary.  I don't have access to these tools within the Galaxy main server.
Can someone suggest a workaround or a direct solution?

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] problem with GATK tools not accepting BAM file as input

2013-06-24 Thread Jennifer Jackson

Hello,

GATK requires that reference genomes are sorted in a specific way. For 
certain genomes, the chromosomes included in the build are also 
restricted. This is often different that how most are released in full 
format (with random, haplotype, and/or unmapped data) and sometimes 
required to be used by other tools or simply how they have been already 
used, making a change at this point an issue for 
backwards-compatibility. This is where using a genome from the history 
(on the public Main server, but only for small genomes) or a cloud or 
local Galaxy fits in with GATK.


This sort/build information can be found on the GATK web site and 
formatting the data can be done prior to upload into Galaxy, or 
converting to fasta-tabular and a combination of filters/sorting can be 
done to subset and order the data (each genome is a bit different, so 
there is no single method).


But, for ce10 this has already been done. You can import a GATK-friendly 
version of the genome from one of the Cloudmap publication's histories 
(Shared Data - Published Pages - CloudMap), as it also uses ce10. See 
this link for a history that you can import. Dataset #5 is the ce10 
reference genome.

https://main.g2.bx.psu.edu/u/gm2123/h/cloudmapot266proofofprinciple

The publication may also give you ideas about how to format inputs for 
these tools. The ce10 reference genome can also be a model for how to 
sort other genomes (sometimes it takes a few tries to get the right 
ordering).


If you are switching genomes, you may need to start over from mapping. 
Some help about how to determine if that is needed is in our wiki here:

http://wiki.galaxyproject.org/Support#Reference_genomes

Hopefully this helps,

Jen
Galaxy team

On 6/24/13 8:14 AM, Politz, Samuel M. wrote:


I am using GATK tools on the useGalaxy main server to detect variants 
in a mutant C. elegans whole genome sequence obtained with an Illumina 
instrument (my own data). The first GATK tool I tried to use, 
Realigner Target Creator, gave me an error message.  In the tool 
window, my input file (a BAM file previously run through Add or  
Replace Groups) did not generate an error, but the reference genome 
file (ce10) which I specified as found in History, produced the 
following reference list-specific error:  History does not include a 
dataset of the required format/build.  I got the same error when I 
tried to use this input file to run the GATK Depth of Coverage tool.  
I have searched Galaxy mail archives for this error, and have found 
other examples, but none involving these tools.


The ce10 database was listed in the History attributes of the BAM file 
I used, and this database has worked with all of the Galaxy tools I 
used up to this point. Something about the ce10 format is unacceptable 
to GATK, or it is not even picking it up from the History. I don't 
know how to access ce10 to check its format.  I have only found the 
inbuilt reference genome files in Galaxy in drop-down menus for each tool.


Searching the GATK site for solutions has not been helpful, because 
they suggest GATK-specific functions to fix the format such as Create 
Sequence Dictionary.  I don't have access to these tools within the 
Galaxy main server.


Can someone suggest a workaround or a direct solution?



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

   http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/