Thanks so much for the prompt reply. I don't mind using last years
GenBank, as long as I am getting accurate hits. I just have a couple
more questions to confirm I am safe using the Galaxy pipline for
this...
So if I continue to work within the the 1 year old database, can I
trust the output as accurate matches? Specifics about my project: I
have environmental samples that were sequenced for fungal ITS. I have
clustered these into OTUs, and chosen a representative sequence for
each. If I retrieve hits for this representative sequence file in my
sample, can I trust the hits as being the correct hits as of last
year? I'm just worried about what that one person said who thought
there was some column arrangement problems, because I'm finding that
I'm getting hits from different phylum for the same sequence using
default parameters in megablast...
Can I also assume, then, that I should NOT identify my representative
sequence file to updated GI numbers using another pipeline, and then
bring the file of GI numbers to Galaxy to fetch taxonomic assignments?
(which I would do because of the nice neat columns for each taxonomic
level Galaxy puts out)
Sarah
On Mon, Apr 23, 2012 at 2:26 PM, Jennifer Jackson j...@bx.psu.edu wrote:
Hi Sarah,
Peter defined the columns (thanks) but I can provide some information about
the GenBank identifiers. The megablast database on the public server are
roughly a year old and there have been updates at NCBI since that time. As I
understand it, this manifests as occasional mismatches between hits at
Galaxy vs Genbank when comparing certain IDs linked to updated records.
We are working to update these three databases, but there are some
complicating factors around this processing specifically related to the
public instance and the metagenomics workflow that have yet to be resolved.
Please know that getting updated is a priority for us and we apologize for
the inconvenience.
To use the most current databases, a local or (better) cloud instance with
either the regular or BLAST+ version of the tool and a database your choice
is the recommendation. Instructions to get started are at:
getgalaxy.org
getgalaxy.org/cloud
Hopefully this explains the data mismatch. This question has come up before,
but I think you are correct in that the final conclusion never was posted
back to the galaxy-user list (for different reasons). So, thank you for
asking so we that could send out a clear reply for everyone using the tool.
Best,
Jen
Galaxy team
On 4/23/12 9:56 AM, Sarah Hicks wrote:
I am having trouble finding information on the MegaBLAST output
columns. What is each column for? I can't seem to figure this out by
comparing info in the columns to NCBI directly because the GI#'s don't
match with the correct entry on NCBI. I've seen that others have
posted about that problem, so I'm also waiting on details on that
question, but for now, I'd just like to know what to make of the
output...
best,
Sarah
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using reply all in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
--
Jennifer Jackson
http://galaxyproject.org
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using reply all in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/