Hi RDKit,
While parsing proteins from the PBD with RDKit, I've come across situations
where the distance-based bond determination leads to 'incorrect' bonds
between atoms that are erroneously too close together. PDB files have no
bond information, so it's not really 'incorrect' (rather the model
coordinates are off), but the bonds are nonphysical - and it means the Mol
objects won't sanitize.

Here's an example:

import requests
from io import BytesIO
import gzip
from rdkit import Chem

def getPDB(code):
    out = requests.get(f'https://files.rcsb.org/download/{code}.pdb1.gz')
    binary_stream =  BytesIO(out.content)
    return gzip.open(binary_stream).read()

pdb_string = getPDB('3udn')
Chem.MolFromPDBBlock(pdb_string)

Error is:

RDKit ERROR: [22:38:21] Explicit valence for atom # 573 O, 3, is
greater than permitted

This is caused by the threonine 72 sidechain being too close to the TYR71
backbone carbonyl oxygen (this can be visualized at
https://www.rcsb.org/3d-view/3UDN?preset=ligandInteraction&sele=09B , TYR71
is near the ligand).

Does anyone know how to avoid this to create a Chem.Mol? I've tried using
Parmed and PDBFixer, since they use residue templates to generate the
correct bonding topology, but they don't write CONECT records or SDFs, so
the bonds are still lost to RDKit.


Thanks for your time!
Lewis
PS - why not just use PDBFixer? I'm trying to calculate atom invariants
using RDKit's morgan fingerprinter implementation, so ultimately I want a
sanitized Mol object
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to