Hi RDKit, While parsing proteins from the PBD with RDKit, I've come across situations where the distance-based bond determination leads to 'incorrect' bonds between atoms that are erroneously too close together. PDB files have no bond information, so it's not really 'incorrect' (rather the model coordinates are off), but the bonds are nonphysical - and it means the Mol objects won't sanitize.
Here's an example: import requests from io import BytesIO import gzip from rdkit import Chem def getPDB(code): out = requests.get(f'https://files.rcsb.org/download/{code}.pdb1.gz') binary_stream = BytesIO(out.content) return gzip.open(binary_stream).read() pdb_string = getPDB('3udn') Chem.MolFromPDBBlock(pdb_string) Error is: RDKit ERROR: [22:38:21] Explicit valence for atom # 573 O, 3, is greater than permitted This is caused by the threonine 72 sidechain being too close to the TYR71 backbone carbonyl oxygen (this can be visualized at https://www.rcsb.org/3d-view/3UDN?preset=ligandInteraction&sele=09B , TYR71 is near the ligand). Does anyone know how to avoid this to create a Chem.Mol? I've tried using Parmed and PDBFixer, since they use residue templates to generate the correct bonding topology, but they don't write CONECT records or SDFs, so the bonds are still lost to RDKit. Thanks for your time! Lewis PS - why not just use PDBFixer? I'm trying to calculate atom invariants using RDKit's morgan fingerprinter implementation, so ultimately I want a sanitized Mol object
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss