Dear Adrian and Markus, On Fri, May 23, 2008 at 11:41 AM, Adrian Schreyer <[email protected]> wrote: > > I do not use RDKit for handling protein structures (I do not think > this is supported);
Indeed it's not. The system can not currently represent macromolecules in anything like an efficient manner. > instead, I use Bio.PDB to parse PDB files. This is > one of the best PDB parsers (in my opinion), and better then the open > babel or OEChem implementations. Unfortunately, there is no software I > am aware of which can handle small molecules as well as protein > structures reliably, i.e. keep track of disordered atoms, insertions > code, ligand identifiers, atom names, alternative atom locations etc. > > There is an implementation of the KDTree algorithm in Bio.PDB, which I > use to get all contacts between ligand and amino acid atoms. I am > working on a paper at the moment which describes the methods I used to > get the SIFts from protein-ligand complexes. I intend to make my > protein-ligand interaction database including the API publicly > available (as database dumps) maybe that will be helpful. I'd be interested in this and the paper. SIFTs (or PLIFs as CCG calls them) are a very interesting method. > Bio.PDB uses the Biopython license which is extremely liberal; maybe > there is way to use it as the foundation for a PDBMolSupplier and a > RDBioMol class...okay I am dreaming a bit here! ;) I'm not sure that it's too extreme of a dream. Since most of the RDKit functionality doesn't make sense for proteins and most protein functionality doesn't make sense for small molecules, it would primarily be a matter of having some code to extract ligand information from the BioPython complex and convert that into an RDK molecule. Probably via a mol block. If you store the PDB atom ID <-> RDK atom number information you could then map information from small-molecule operations (e.g. substructure searches) back onto ligand atoms in the complex. or something -greg

