Hello,
The Megablast htgs, nt, and wgs databases are in the process of being
updated to the latest NCBI releases and are expected to be available by
tomorrow morning (possibly sooner).
Should you wish to continue your analysis using the prior versions,
these are available through our rsync
Jennifer
I am megablasting a simple 500,000 line dataset that is certainly in
galaxy fasta.
For a week i have been seeing numerous errors. So i have reprocesed
the data multiple times.
The error message is could not find specified database directory
Is there an alternative approach? I
Dear all,
Sometimes ago, I’ve reported on this list the same problem with
megablast than Sarah mentioned. I finally used another way to analyse
my data but my conclusion was similar to Sarah one with most of the
time a shift of « -1 » between the GI number in the output and the
following
Hi Sarah,
We appreciate all of the information you have provided and have been
working here since yesterday to investigate the issue in more detail.
This includes incorporating the additional data both you and Peter have
been posting.
We don't have anything conclusive to report yet, but it
On Tue, Apr 24, 2012 at 10:24 PM, Jennifer Jackson j...@bx.psu.edu wrote:
..., using
the BLAST+ BLASTN megablast wrapper that Peter authored, in a local or cloud
instance, would be the best immediate remedy (this version has the standard
12 column output). Sequence length data could always be
Thanks Peter,
Excellent point. From there, the Cut tool could be used to reorganize
the output to exactly match that of the 13-column regular megablast
output. So, no external data needed, no tool modifications needed.
This can't be done on the main public Galaxy instance as BLAST+ is not
I am having trouble finding information on the MegaBLAST output
columns. What is each column for? I can't seem to figure this out by
comparing info in the columns to NCBI directly because the GI#'s don't
match with the correct entry on NCBI. I've seen that others have
posted about that problem, so
Hi Sarah,
Peter defined the columns (thanks) but I can provide some information
about the GenBank identifiers. The megablast database on the public
server are roughly a year old and there have been updates at NCBI since
that time. As I understand it, this manifests as occasional mismatches
Thanks so much for the prompt reply. I don't mind using last years
GenBank, as long as I am getting accurate hits. I just have a couple
more questions to confirm I am safe using the Galaxy pipline for
this...
So if I continue to work within the the 1 year old database, can I
trust the output as
Peter, you requested an example, here are the first five hits for my
first query sequence (OTU#0)
0 324034994 527 93.23 266 13 5 1 265
22 283 7e-102 379.0
0 56181650513 93.26 267 10 8 1 265
25
Hi Vasu,
The three primary megablast databases available on the public main
Galaxy instance are comprised of individual fragments/sequences of
different types from many species (not assembled genomes):
http://user.list.galaxyproject.org/Question-about-megablast-td4543260.html
If you want to
Hi,
I am using megablast and was wondering how can I get chromosome number
and coordinates of its hits.
Thanks
Shamesher
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
Hello Scott,
For #1, option -p:
Here is a link to some megablast parameter documentation online:
http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/megablast.html#3
(the primary paper for the Galaxy tool is noted at the bottom of the
tool form, but this is convenient)
Quote:
Table 3.30 Parameter
Hi Scott
I never used megablast so what i am writing is true of just any
fasta file (so if there is anything quirky in megablast that i
dont know about, apologies!):
Take your fasta file and convert to tabular (under "fasta
manipulation" - this will
Noa has the right idea, but if you're asking for how to split a dataset into
two non-overlapping halves you'll want to use Select First and Select Last,
instead of random lines. Get an accurate line count from your file using the
Line/Word/Character count tool and then split it right in the
15 matches
Mail list logo