Re: [ccp4bb] pdb sequence search

2012-06-23 Thread Vandu Murugan
Hi Ed,
 What about submitting the uniprot accession number of your
protein to the PDB?.  As you know, this will just list the all entries
where your protein sequence is there..

-Vandu murugan..




On 6/23/12, Ed Pozharski epozh...@umaryland.edu wrote:
 Silly question.

 Say I want to find every structure in the PDB with the exact sequence or
 with perhaps 1-2 mutations.  I know of two ways of doing this.

 1. Go to NCBI BLAST and run the sequence against the PDB subset.  The
 resulting list will have identities listed, so manual parsing is doable
 if there aren't too many hits.

 2. PDB and PDBe both have the search by sequence features.  Trouble is
 the default E value seems to be tailored to poor sequence identity
 (which makes sense if you looking for potential MR models).  Sure, I can
 reduce the target E value, but it's a little cumbersome and I have no
 idea what the target level should be so that I don't get any 50%
 identical sequences yet not miss single/double mutants.

 Wouldn't it be nice if one could use the sequence identity cutoff/query
 coverage instead?  Much more comprehensible than the E-value.  Is there
 a search engine that does that? Seems like a fairly common need, and
 perhaps I just can't find on PDB website.

 Thanks in advance for any suggestions,

 Ed.

 --
 Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
  Julian, King of Lemurs



Re: [ccp4bb] pdb sequence search

2012-06-23 Thread Robbie Joosten

Hi Ed, If you are looking for a specific protein, why not get all PDB files 
with a DBREF record pointing at the uniprot record of the protein you want? You 
can do a simple text search in the PDB, e.g. 'MYG_PHYCA'. Cheers,Robbie 
  Date: Fri, 22 Jun 2012 22:39:12 -0400
 From: epozh...@umaryland.edu
 Subject: Re: [ccp4bb] pdb sequence search
 To: CCP4BB@JISCMAIL.AC.UK
 
 Tim,
 
 
  I did not understand your objection against solution 1 - is it because
  it is not automated? You can sort the results by max. Ident so that
  you can sroll down to the limit you set yourself.
 
 More that it does not generate a list of PDB IDs.  What I want to do is 
 to find every structure of a particular protein and line them all up.  I 
 am not saying it's not doable with option 1, it's just not too convenient.
 
  Why do you think a identity cut-off was a good criterium? I usually
  cut by E-value because I assume the developers of blast know what they
  are doing and I have the impression they consider the E-value a better
  criterium than the max. Ident.
 Because I want all the structures of a particular protein itself, not 
 it's homologues.  I just went through several cycles of reducing E-value 
 down to 1e-100, and I still get one hit included at 88% identity.  
 Setting E-value cutoff to 0 doesn't work, it just returns them all.  
 Well, thanks to you I now see how to figure out the cutoff - the results 
 are sorted by E-values and list them, so I can just go to the first 
 non-identical hit and use a slightly smaller number.  It's just that 
 sequence identity is easier for me to interpret and it's (emotionally) 
 easier to select a cutoff at, say, no more than 5 mutations rather than 
 E-value of 10e-150.
 
 Cheers,
 
 Ed.
 
 Cheers
 
 
 
 -- 
 Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
  Julian, King of Lemurs
  

Re: [ccp4bb] pdb sequence search

2012-06-23 Thread sameer
Hi,
  The up-to-date list of mappings between PDB and sequence database
UniProt is available at -

ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/csv/pdb_chain_uniprot.csv

This gives mapping between PDB chains and UniProt accession numbers. This
will allow you to find all DB entries for a particular UniProt accession
number in the PDB.

To answer original question about sequence search the following PDBe
service -
pdbe.org/fasta

allows you to set % identity value and perform search against PDB sequences.

cheers,
Sameer Velankar
PDBe


 Hi Ed, If you are looking for a specific protein, why not get all PDB
 files with a DBREF record pointing at the uniprot record of the protein
 you want? You can do a simple text search in the PDB, e.g. 'MYG_PHYCA'.
 Cheers,Robbie
   Date: Fri, 22 Jun 2012 22:39:12 -0400
 From: epozh...@umaryland.edu
 Subject: Re: [ccp4bb] pdb sequence search
 To: CCP4BB@JISCMAIL.AC.UK

 Tim,


  I did not understand your objection against solution 1 - is it because
  it is not automated? You can sort the results by max. Ident so that
  you can sroll down to the limit you set yourself.

 More that it does not generate a list of PDB IDs.  What I want to do is
 to find every structure of a particular protein and line them all up.  I
 am not saying it's not doable with option 1, it's just not too
 convenient.
 
  Why do you think a identity cut-off was a good criterium? I usually
  cut by E-value because I assume the developers of blast know what they
  are doing and I have the impression they consider the E-value a better
  criterium than the max. Ident.
 Because I want all the structures of a particular protein itself, not
 it's homologues.  I just went through several cycles of reducing E-value
 down to 1e-100, and I still get one hit included at 88% identity.
 Setting E-value cutoff to 0 doesn't work, it just returns them all.
 Well, thanks to you I now see how to figure out the cutoff - the results
 are sorted by E-values and list them, so I can just go to the first
 non-identical hit and use a slightly smaller number.  It's just that
 sequence identity is easier for me to interpret and it's (emotionally)
 easier to select a cutoff at, say, no more than 5 mutations rather than
 E-value of 10e-150.

 Cheers,

 Ed.

 Cheers



 --
 Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
  Julian, King of Lemurs



Re: [ccp4bb] pdb sequence search

2012-06-23 Thread Gerard DVD Kleywegt
Because I want all the structures of a particular protein itself, not it's 
homologues.  I just went through several cycles of reducing E-value down to


If you know the UniProt accession code of your protein, then UniPDB is your 
friend - pdbe.org/unipdb


If not, try pdbe.org/fasta where you can supply the sequence and the %-age SI 
cut-off


--Gerard