Hello,
GATK requires that reference genomes are sorted in a specific way. For
certain genomes, the chromosomes included in the build are also
restricted. This is often different that how most are released in full
format (with random, haplotype, and/or unmapped data) and sometimes
required to be used by other tools or simply how they have been already
used, making a change at this point an issue for
backwards-compatibility. This is where using a genome from the history
(on the public Main server, but only for small genomes) or a cloud or
local Galaxy fits in with GATK.
This sort/build information can be found on the GATK web site and
formatting the data can be done prior to upload into Galaxy, or
converting to fasta-tabular and a combination of filters/sorting can be
done to subset and order the data (each genome is a bit different, so
there is no single method).
But, for ce10 this has already been done. You can import a GATK-friendly
version of the genome from one of the Cloudmap publication's histories
(Shared Data - Published Pages - CloudMap), as it also uses ce10. See
this link for a history that you can import. Dataset #5 is the ce10
reference genome.
https://main.g2.bx.psu.edu/u/gm2123/h/cloudmapot266proofofprinciple
The publication may also give you ideas about how to format inputs for
these tools. The ce10 reference genome can also be a model for how to
sort other genomes (sometimes it takes a few tries to get the right
ordering).
If you are switching genomes, you may need to start over from mapping.
Some help about how to determine if that is needed is in our wiki here:
http://wiki.galaxyproject.org/Support#Reference_genomes
Hopefully this helps,
Jen
Galaxy team
On 6/24/13 8:14 AM, Politz, Samuel M. wrote:
I am using GATK tools on the useGalaxy main server to detect variants
in a mutant C. elegans whole genome sequence obtained with an Illumina
instrument (my own data). The first GATK tool I tried to use,
Realigner Target Creator, gave me an error message. In the tool
window, my input file (a BAM file previously run through Add or
Replace Groups) did not generate an error, but the reference genome
file (ce10) which I specified as found in History, produced the
following reference list-specific error: History does not include a
dataset of the required format/build. I got the same error when I
tried to use this input file to run the GATK Depth of Coverage tool.
I have searched Galaxy mail archives for this error, and have found
other examples, but none involving these tools.
The ce10 database was listed in the History attributes of the BAM file
I used, and this database has worked with all of the Galaxy tools I
used up to this point. Something about the ce10 format is unacceptable
to GATK, or it is not even picking it up from the History. I don't
know how to access ce10 to check its format. I have only found the
inbuilt reference genome files in Galaxy in drop-down menus for each tool.
Searching the GATK site for solutions has not been helpful, because
they suggest GATK-specific functions to fix the format such as Create
Sequence Dictionary. I don't have access to these tools within the
Galaxy main server.
Can someone suggest a workaround or a direct solution?
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using reply all in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/
--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using reply all in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/