Re: [ccp4bb] [phenixbb] local BLAST server

2014-08-08 Thread Zhijie Li

Hi Rob,

The BLAST nr database (fasta format) can be downloaded from the NCBI ftp:
ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/
As I remember it is the nr.gz file. When unzipped the file is called nr.

According to BLAST the nr database does contain PDB entries.
http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=WebPAGE_TYPE=BlastDocsDOC_TYPE=ProgSelectionGuide

It is significantly larger than the PDB data file you are currently using. 
You might consider extract all the PDB sequences from it so that you do not 
need to go through all the non-PDB sequences.


Zhijie



-Original Message- 
From: R.D. Oeffner

Sent: Friday, August 08, 2014 10:14 AM
To: pheni...@phenix-online.org
Subject: [phenixbb] local BLAST server

Hi,

I'm in the process of installing a local BLAST server for doing blast
protein queries. As I understand it I need a file with all the FASTA
sequences as input for initially generating my local BLAST database.
The one present in
ftp://ftp.rcsb.org/pub/pdb/derived_data/pdb_seqres.txt seems to
contain redundant entries. Querying it produces many extra PDB
chain-ids when compared to a BLAST query on the NCBI web server.

Does anyone know where to get a non-redundant version of FASTA records
so that I can create a similar database as the one used by NCBI?


Many thanks,

Rob

--
Robert Oeffner, Ph.D.
Research Associate, The Read Group
Department of Haematology,
Cambridge Institute for Medical Research
University of Cambridge
Cambridge Biomedical Campus
Wellcome Trust/MRC Building
Hills Road
Cambridge CB2 0XY

www.cimr.cam.ac.uk/investigators/read/index.html
tel: +44(0)1223 763234
mobile: +44(0)7712 887162
___
phenixbb mailing list
pheni...@phenix-online.org
http://phenix-online.org/mailman/listinfo/phenixbb 


Re: [ccp4bb] [phenixbb] local BLAST server

2014-08-08 Thread Zhijie Li

Hello,

My apologies for sending the previous post to the wrong BB.

Zhijie


-Original Message- 
From: Zhijie Li

Sent: Friday, August 08, 2014 11:10 AM
To: R.D. Oeffner ; CCP4BB@JISCMAIL.AC.UK
Subject: Re: [phenixbb] local BLAST server

Hi Rob,

The BLAST nr database (fasta format) can be downloaded from the NCBI ftp:
ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/
As I remember it is the nr.gz file. When unzipped the file is called nr.

According to BLAST the nr database does contain PDB entries.
http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=WebPAGE_TYPE=BlastDocsDOC_TYPE=ProgSelectionGuide

It is significantly larger than the PDB data file you are currently using.
You might consider extract all the PDB sequences from it so that you do not
need to go through all the non-PDB sequences.

Zhijie



-Original Message- 
From: R.D. Oeffner

Sent: Friday, August 08, 2014 10:14 AM
To: pheni...@phenix-online.org
Subject: [phenixbb] local BLAST server

Hi,

I'm in the process of installing a local BLAST server for doing blast
protein queries. As I understand it I need a file with all the FASTA
sequences as input for initially generating my local BLAST database.
The one present in
ftp://ftp.rcsb.org/pub/pdb/derived_data/pdb_seqres.txt seems to
contain redundant entries. Querying it produces many extra PDB
chain-ids when compared to a BLAST query on the NCBI web server.

Does anyone know where to get a non-redundant version of FASTA records
so that I can create a similar database as the one used by NCBI?


Many thanks,

Rob

--
Robert Oeffner, Ph.D.
Research Associate, The Read Group
Department of Haematology,
Cambridge Institute for Medical Research
University of Cambridge
Cambridge Biomedical Campus
Wellcome Trust/MRC Building
Hills Road
Cambridge CB2 0XY

www.cimr.cam.ac.uk/investigators/read/index.html
tel: +44(0)1223 763234
mobile: +44(0)7712 887162
___
phenixbb mailing list
pheni...@phenix-online.org
http://phenix-online.org/mailman/listinfo/phenixbb