On 27/09/2021 19:22, Lewis Martin wrote:
Very interesting - thank you Francois! PDB re-do does the trick:

import requests
from rdkit import Chem

def getPDB(code):
    out =
requests.get(f'https://pdb-redo.eu/db/{code}/{code}_final.pdb')
    return out.content

pdb_string = getPDB('3udn')
Chem.MolFromPDBBlock(pdb_string)

I think this solves it for me, but if anyone knows how to infer
correct bonding information without relying on distances, I'd love to
hear it too! So far I've noticed that Parmed and PDBFixer infer
correct bonds, but they don't determine bond orders, so it's difficult
to port the molecule into RDKit.

I just remember one paper; it might give you an entry point into the
scientific literature:

Determination of molecular topology and atomic hybridization states from heavy atom coordinates
Elaine C. Meng, Richard A. Lewis
https://doi.org/10.1002/jcc.540120716

Regards,
F.

Cheers
Lewis

On Mon, Sep 27, 2021 at 5:55 PM Francois Berenger <mli...@ligand.eu>
wrote:

Hi Lewis,

Just an idea: you might try to load your PDB in UCSF Chimera, then
save it as a mol2 or sdf file.
Then, try to read this sdf file from rdkit.

Another idea: try to get your pdb file through the pdbredo service.
https://pdb-redo.eu/
They might have fixed a few things; maybe this PDB will read better
in
rdkit.

Regards,
F.

On 26/09/2021 17:02, Lewis Martin wrote:
Hi RDKit,
While parsing proteins from the PBD with RDKit, I've come across
situations where the distance-based bond determination leads to
'incorrect' bonds between atoms that are erroneously too close
together. PDB files have no bond information, so it's not really
'incorrect' (rather the model coordinates are off), but the bonds
are
nonphysical - and it means the Mol objects won't sanitize.

Here's an example:

import requests
from io import BytesIO
import gzip
from rdkit import Chem

def getPDB(code):
out =
requests.get(f'https://files.rcsb.org/download/{code}.pdb1.gz [1]
[1]')
binary_stream =  BytesIO(out.content)
return gzip.open(binary_stream).read()

pdb_string = getPDB('3udn')
Chem.MolFromPDBBlock(pdb_string)

Error is:

RDKit ERROR: [22:38:21] Explicit valence for atom # 573 O, 3, is
greater than permitted

This is caused by the threonine 72 sidechain being too close to
the
TYR71 backbone carbonyl oxygen (this can be visualized at

https://www.rcsb.org/3d-view/3UDN?preset=ligandInteraction&sele=09B
,
TYR71 is near the ligand).

Does anyone know how to avoid this to create a Chem.Mol? I've
tried
using Parmed and PDBFixer, since they use residue templates to
generate the correct bonding topology, but they don't write CONECT
records or SDFs, so the bonds are still lost to RDKit.

Thanks for your time!
Lewis
PS - why not just use PDBFixer? I'm trying to calculate atom
invariants using RDKit's morgan fingerprinter implementation, so
ultimately I want a sanitized Mol object

Links:
------
[1] https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Links:
------
[1] https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to