Re: [PyMOL] showing sequence conservation on the surface

Robert Campbell Fri, 21 May 2004 12:09:32 -0700

Hi Camille and Daniel,

I heard my name mentioned, so I'll join in!

* Daniel Rigden <drig...@liverpool.ac.uk> [2004-05-21 16:43] wrote:
> Hi Camille
> 
> The Espript server is the easiest way I know.  Go here
> 
> http://prodes.toulouse.inra.fr/ESPript/cgi-bin/ESPript.cgi
> 
> choose Execute then Expert mode.  SUpply an alignment and a
> supplementary pdb file then execute.  You get back an annotated
> alignment + the pdb file with a conservation score in the b-factor
> column.  The only trouble can be making that the top sequence in your
> alignment matches exactly the sequence of the structure (no missing
> loops etc).  You can read this new pdb file into Pymol and colour easily
> with Robert Campbell's script color_b.py which you get here
> 
> http://adelie.biochem.queensu.ca/~rlc/work/pymol/
> 
> Good luck
> 
> Daniel
> 
> On Fri, 2004-05-21 at 15:28, cami...@mrc-lmb.cam.ac.uk wrote:
> > Hello PyMol community!
> > 
> > is there any way to display sequence conservation on the surface of a 
> > protein?
> > i.e. to use the info I have in a sequence alignment.
> > 
> > do I have to do this by hand?
> > 
> > Thanks,
> >     Camille

I have a little python routine for calculating the sequence variability
(a measure that has apparently been commonly used in the immunoglobulin
community -- Thanks for the tip Dave!), but it requires that you already
have your sequences in lists (or strings) and already aligned.  My
seq_convert.py routines could be used for reading files with these
individual sequences, or you could use biopython to read, say, a Fasta
alignment file.  

Variability is defined as:

(Number of different residue types at a location)/(Frequence of the most
common at that location)

Ok, so I quickly added the sequence reading capability and added
variability.py to my web site (where you can find seq_convert.py also):

  http://adelie.biochem.queensu.ca/~rlc/work/scripts/

Create three plain text sequence files (really short examples here!)
that look like:

SGKSGMDVAI

AKCIGPDDAL

ARCS-MDVAL

Then run it with:

  variability.py test1.seq test2.seq test3.seq 

giving:
    1 S A A   2   2   0.667  3.000
    2 G K R   3   1   0.333  9.000
    3 K C C   2   2   0.667  3.000
    4 S I S   2   2   0.667  3.000
    5 G G -   2   2   0.667  3.000
    6 M P M   2   2   0.667  3.000
    7 D D D   1   3   1.000  1.000
    8 V D V   2   2   0.667  3.000
    9 A A A   1   3   1.000  1.000
   10 I L L   2   2   0.667  3.000

That is, the sequences written in vertical columns followed by:

the number of different amino acids found at the position
the number of the most commonly observed at that position
the frequency of the most common
the variability

You would then need to extract the variability data (e.g. with awk, or
modify the script to write only that) and modify your B-factors within
PyMOL using my data2bfactor.py script followed by colouring on B-factor
either with the spectrum command or my color_b.py script.

This was kind of a quick hack, so it isn't as user-friendly as it could
be! It would be ideal to get the alignment out of PyMOL that is created
with the align command but I haven't found a way to get at it.

Hope that helps,
Rob
-- 
Robert L. Campbell, Ph.D.                         <r...@post.queensu.ca>
Senior Research Associate                            phone: 613-533-6821
Dept. of Biochemistry, Queen's University,             fax: 613-533-2497
Kingston, ON K7L 3N6  Canada       http://adelie.biochem.queensu.ca/~rlc
    PGP Fingerprint: 9B49 3D3F A489 05DC B35C  8E33 F238 A8F5 F635 C0E2

Re: [PyMOL] showing sequence conservation on the surface

Reply via email to