Re: [galaxy-user] GenBank database files

2013-04-29 Thread Peter Cock
On Mon, Apr 29, 2013 at 3:27 AM, Mike Dyall-Smith
mike.dyallsm...@gmail.com wrote:
 Dear Peter Cock, thanks for your advice. Just to be clear, do I leave the
 files within their decompressed folders or do I put all the individual files
 into one folder? I assume the former, but want to be sure.
 Thanks again, Mike DS

Hi Mike,

Unless you're using a Graphical decompression tool which is trying
to be too helpful, each tar-ball does *not* decompress into its own
folder. The files should all be in the *same* folder.

I use this to verify the checksums,

$ md5sum --check nr.00.tar.gz.md5
nr.00.tar.gz: OK

Then I use this to decompress the tar-balls,

$ tar -zxvf nr.00.tar.gz
etc

(Actually I don't do this personally any more - it has been setup
to happen automatically when the NCBI update the databases.)

We keep all our NCBI databases in the same folder,

$ ls /data/blastdb/ncbi/nr.*
/data/blastdb/ncbi/nr.00.phd
/data/blastdb/ncbi/nr.00.phi
/data/blastdb/ncbi/nr.00.phr
/data/blastdb/ncbi/nr.00.pin
/data/blastdb/ncbi/nr.00.pnd
/data/blastdb/ncbi/nr.00.pni
/data/blastdb/ncbi/nr.00.pog
/data/blastdb/ncbi/nr.00.ppd
/data/blastdb/ncbi/nr.00.ppi
/data/blastdb/ncbi/nr.00.psd
/data/blastdb/ncbi/nr.00.psi
/data/blastdb/ncbi/nr.00.psq
/data/blastdb/ncbi/nr.00.tar.gz
/data/blastdb/ncbi/nr.00.tar.gz.md5
...
/data/blastdb/ncbi/nr.10.phd
/data/blastdb/ncbi/nr.10.phi
/data/blastdb/ncbi/nr.10.phr
/data/blastdb/ncbi/nr.10.pin
/data/blastdb/ncbi/nr.10.pnd
/data/blastdb/ncbi/nr.10.pni
/data/blastdb/ncbi/nr.10.pog
/data/blastdb/ncbi/nr.10.ppd
/data/blastdb/ncbi/nr.10.ppi
/data/blastdb/ncbi/nr.10.psd
/data/blastdb/ncbi/nr.10.psi
/data/blastdb/ncbi/nr.10.psq
/data/blastdb/ncbi/nr.10.tar.gz
/data/blastdb/ncbi/nr.10.tar.gz.md5
/data/blastdb/ncbi/nr.pal

We can then refer to the NR database at the command line
as /data/blastdb/ncbi/nr or as just nr if the BLAST database
path is configured to check this folder.

In this folder we also have other NCBI database, like NT:

$ ls /data/blastdb/ncbi/nt.*
/data/blastdb/ncbi/nt.00.nhd
/data/blastdb/ncbi/nt.00.nhi
/data/blastdb/ncbi/nt.00.nhr
/data/blastdb/ncbi/nt.00.nin
/data/blastdb/ncbi/nt.00.nnd
/data/blastdb/ncbi/nt.00.nni
/data/blastdb/ncbi/nt.00.nog
/data/blastdb/ncbi/nt.00.nsd
/data/blastdb/ncbi/nt.00.nsi
/data/blastdb/ncbi/nt.00.nsq
/data/blastdb/ncbi/nt.00.tar.gz
/data/blastdb/ncbi/nt.00.tar.gz.md5
...
/data/blastdb/ncbi/nt.13.nhd
/data/blastdb/ncbi/nt.13.nhi
/data/blastdb/ncbi/nt.13.nhr
/data/blastdb/ncbi/nt.13.nin
/data/blastdb/ncbi/nt.13.nnd
/data/blastdb/ncbi/nt.13.nni
/data/blastdb/ncbi/nt.13.nog
/data/blastdb/ncbi/nt.13.nsd
/data/blastdb/ncbi/nt.13.nsi
/data/blastdb/ncbi/nt.13.nsq
/data/blastdb/ncbi/nt.13.tar.gz
/data/blastdb/ncbi/nt.13.tar.gz.md5
/data/blastdb/ncbi/nt.nal

Note you don't need to keep the *.tar.gz and the *.md5 files
once you've verified the checksum (using md5sum to detect
any data corruption during download) and decompressed the
tar-ball.

Peter

P.S. This galaxy-users list is meant for discussion of using the
tools within Galaxy from an end user perspective. Although
there is talk about creating a new Galaxy mailing list specifically
for deployment questions like this, currently galaxy-devel is
preferred for this kind of discussion.
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/


[galaxy-user] GenBank database files

2013-04-28 Thread Mike Dyall-Smith
This should be easy (but not for me so far). I want to do local blast searches, 
so I download the premade nr protein blast database from GenBank. It is split 
into 10 .tar.gz files. 
I've decompressed them all, and now I want to put all the file parts 
together. Can I simply concatenate all similar files? (e.g. all 10 parts of the 
.phd files). The Readme mentions use of an alias file, but I did not find this 
at all clear. A set of step-by-step decompression and restoration instructions 
would be useful. I could not find any.
Thanks for any assistance, Mike DS

Sent from my iPhone4
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-user] GenBank database files

2013-04-28 Thread Peter Cock
On Sun, Apr 28, 2013 at 11:22 PM, Mike Dyall-Smith
mike.dyallsm...@gmail.com wrote:
 This should be easy (but not for me so far). I want to do local blast 
 searches, so I download the premade nr protein blast database from GenBank. 
 It is split into 10 .tar.gz files.
 I've decompressed them all, and now I want to put all the file parts 
 together. Can I simply concatenate all similar files? (e.g. all 10 parts of 
 the .phd files). The Readme mentions use of an alias file, but I did not find 
 this at all clear. A set of step-by-step decompression and restoration 
 instructions would be useful. I could not find any.
 Thanks for any assistance, Mike DS

Don't cat anything - just download all nr.*.tar.gz files, and
decompress them. You'll have a load of files including a
special alias file called nr.pal which is how BLAST knows
how to deal with the combined 'nr' database.

Peter

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/