Re: [Rdkit-discuss] Parsing a PDB file with atoms that are too close, causing bad bond

2021-09-28 Thread Paul Emsley



>  PDB files have no bond information,

This is not true. The chemistry is specified in the Chemical Component Dictionary using the residue 
identifier (so it's a reference to a chemical description, it's not embedded).


https://www.wwpdb.org/data/ccd

https://github.com/pdbeurope/ccdutils

Paul.


On 27/09/2021 11:22, Lewis Martin wrote:

Very interesting - thank you Francois! PDB re-do does the trick:

*import requests
from rdkit import Chem

def getPDB(code):
     out = requests.get(f'https://pdb-redo.eu/db/{code}/{code}_final.pdb 
')

     return out.content

pdb_string = getPDB('3udn')
Chem.MolFromPDBBlock(pdb_string)*

I think this solves it for me, but if anyone knows how to infer correct bonding information without relying 
on distances, I'd love to hear it too! So far I've noticed that Parmed and PDBFixer infer correct bonds, but 
they don't determine bond orders, so it's difficult to port the molecule into RDKit.


Cheers
Lewis



On Mon, Sep 27, 2021 at 5:55 PM Francois Berenger mailto:mli...@ligand.eu>> wrote:

Hi Lewis,

Just an idea: you might try to load your PDB in UCSF Chimera, then
save it as a mol2 or sdf file.
Then, try to read this sdf file from rdkit.

Another idea: try to get your pdb file through the pdbredo service.
https://pdb-redo.eu/ 
They might have fixed a few things; maybe this PDB will read better in
rdkit.

Regards,
F.

On 26/09/2021 17:02, Lewis Martin wrote:
 > Hi RDKit,
 > While parsing proteins from the PBD with RDKit, I've come across
 > situations where the distance-based bond determination leads to
 > 'incorrect' bonds between atoms that are erroneously too close
 > together. PDB files have no bond information, so it's not really
 > 'incorrect' (rather the model coordinates are off), but the bonds are
 > nonphysical - and it means the Mol objects won't sanitize.
 >
 > Here's an example:
 >
 > import requests
 > from io import BytesIO
 > import gzip
 > from rdkit import Chem
 >
 > def getPDB(code):
 >     out =
 > requests.get(f'https://files.rcsb.org/download/{code}.pdb1.gz
 [1]')
 >     binary_stream =  BytesIO(out.content)
 >     return gzip.open(binary_stream).read()
 >
 > pdb_string = getPDB('3udn')
 > Chem.MolFromPDBBlock(pdb_string)
 >
 > Error is:
 >
 > RDKit ERROR: [22:38:21] Explicit valence for atom # 573 O, 3, is
 > greater than permitted
 >
 > This is caused by the threonine 72 sidechain being too close to the
 > TYR71 backbone carbonyl oxygen (this can be visualized at
 > https://www.rcsb.org/3d-view/3UDN?preset=ligandInteraction=09B
 ,
 > TYR71 is near the ligand).
 >
 > Does anyone know how to avoid this to create a Chem.Mol? I've tried
 > using Parmed and PDBFixer, since they use residue templates to
 > generate the correct bonding topology, but they don't write CONECT
 > records or SDFs, so the bonds are still lost to RDKit.
 >
 > Thanks for your time!
 > Lewis
 > PS - why not just use PDBFixer? I'm trying to calculate atom
 > invariants using RDKit's morgan fingerprinter implementation, so
 > ultimately I want a sanitized Mol object
 >
 > Links:
 > --
 > [1] https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz

 > ___
 > Rdkit-discuss mailing list
 > Rdkit-discuss@lists.sourceforge.net 

 > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss





___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Parsing a PDB file with atoms that are too close, causing bad bond

2021-09-27 Thread Francois Berenger

On 27/09/2021 19:22, Lewis Martin wrote:

Very interesting - thank you Francois! PDB re-do does the trick:

import requests
from rdkit import Chem

def getPDB(code):
out =
requests.get(f'https://pdb-redo.eu/db/{code}/{code}_final.pdb')
return out.content

pdb_string = getPDB('3udn')
Chem.MolFromPDBBlock(pdb_string)

I think this solves it for me, but if anyone knows how to infer
correct bonding information without relying on distances, I'd love to
hear it too! So far I've noticed that Parmed and PDBFixer infer
correct bonds, but they don't determine bond orders, so it's difficult
to port the molecule into RDKit.


I just remember one paper; it might give you an entry point into the
scientific literature:

Determination of molecular topology and atomic hybridization states from 
heavy atom coordinates

Elaine C. Meng, Richard A. Lewis
https://doi.org/10.1002/jcc.540120716

Regards,
F.


Cheers
Lewis

On Mon, Sep 27, 2021 at 5:55 PM Francois Berenger 
wrote:


Hi Lewis,

Just an idea: you might try to load your PDB in UCSF Chimera, then
save it as a mol2 or sdf file.
Then, try to read this sdf file from rdkit.

Another idea: try to get your pdb file through the pdbredo service.
https://pdb-redo.eu/
They might have fixed a few things; maybe this PDB will read better
in
rdkit.

Regards,
F.

On 26/09/2021 17:02, Lewis Martin wrote:

Hi RDKit,
While parsing proteins from the PBD with RDKit, I've come across
situations where the distance-based bond determination leads to
'incorrect' bonds between atoms that are erroneously too close
together. PDB files have no bond information, so it's not really
'incorrect' (rather the model coordinates are off), but the bonds

are

nonphysical - and it means the Mol objects won't sanitize.

Here's an example:

import requests
from io import BytesIO
import gzip
from rdkit import Chem

def getPDB(code):
out =
requests.get(f'https://files.rcsb.org/download/{code}.pdb1.gz [1]

[1]')

binary_stream =  BytesIO(out.content)
return gzip.open(binary_stream).read()

pdb_string = getPDB('3udn')
Chem.MolFromPDBBlock(pdb_string)

Error is:

RDKit ERROR: [22:38:21] Explicit valence for atom # 573 O, 3, is
greater than permitted

This is caused by the threonine 72 sidechain being too close to

the

TYR71 backbone carbonyl oxygen (this can be visualized at


https://www.rcsb.org/3d-view/3UDN?preset=ligandInteraction=09B
,

TYR71 is near the ligand).

Does anyone know how to avoid this to create a Chem.Mol? I've

tried

using Parmed and PDBFixer, since they use residue templates to
generate the correct bonding topology, but they don't write CONECT
records or SDFs, so the bonds are still lost to RDKit.

Thanks for your time!
Lewis
PS - why not just use PDBFixer? I'm trying to calculate atom
invariants using RDKit's morgan fingerprinter implementation, so
ultimately I want a sanitized Mol object

Links:
--
[1] https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



Links:
--
[1] https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Parsing a PDB file with atoms that are too close, causing bad bond

2021-09-27 Thread Maciek Wójcikowski
Hi Lewis,

You can try to use PreparePDBMol in oddt
https://github.com/oddt/oddt/blob/master/oddt/toolkits/extras/rdkit/fixer.py#L623-L669
that we used in PLEC model training and PDBFixer didn't worked for us
either. Note that as soon as you have correct bonding you can disable
automatic bonding in RDKit using proximityBonding=False.

Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


pon., 27 wrz 2021 o 12:25 Lewis Martin 
napisał(a):

> Very interesting - thank you Francois! PDB re-do does the trick:
>
>
>
>
>
>
>
>
>
> *import requestsfrom rdkit import Chemdef getPDB(code):out =
> requests.get(f'https://pdb-redo.eu/db/{code}/{code}_final.pdb
> ')return
> out.contentpdb_string = getPDB('3udn')Chem.MolFromPDBBlock(pdb_string)*
>
> I think this solves it for me, but if anyone knows how to infer correct
> bonding information without relying on distances, I'd love to hear it too!
> So far I've noticed that Parmed and PDBFixer infer correct bonds, but they
> don't determine bond orders, so it's difficult to port the molecule into
> RDKit.
>
> Cheers
> Lewis
>
>
>
> On Mon, Sep 27, 2021 at 5:55 PM Francois Berenger 
> wrote:
>
>> Hi Lewis,
>>
>> Just an idea: you might try to load your PDB in UCSF Chimera, then
>> save it as a mol2 or sdf file.
>> Then, try to read this sdf file from rdkit.
>>
>> Another idea: try to get your pdb file through the pdbredo service.
>> https://pdb-redo.eu/
>> They might have fixed a few things; maybe this PDB will read better in
>> rdkit.
>>
>> Regards,
>> F.
>>
>> On 26/09/2021 17:02, Lewis Martin wrote:
>> > Hi RDKit,
>> > While parsing proteins from the PBD with RDKit, I've come across
>> > situations where the distance-based bond determination leads to
>> > 'incorrect' bonds between atoms that are erroneously too close
>> > together. PDB files have no bond information, so it's not really
>> > 'incorrect' (rather the model coordinates are off), but the bonds are
>> > nonphysical - and it means the Mol objects won't sanitize.
>> >
>> > Here's an example:
>> >
>> > import requests
>> > from io import BytesIO
>> > import gzip
>> > from rdkit import Chem
>> >
>> > def getPDB(code):
>> > out =
>> > requests.get(f'https://files.rcsb.org/download/{code}.pdb1.gz [1]')
>> > binary_stream =  BytesIO(out.content)
>> > return gzip.open(binary_stream).read()
>> >
>> > pdb_string = getPDB('3udn')
>> > Chem.MolFromPDBBlock(pdb_string)
>> >
>> > Error is:
>> >
>> > RDKit ERROR: [22:38:21] Explicit valence for atom # 573 O, 3, is
>> > greater than permitted
>> >
>> > This is caused by the threonine 72 sidechain being too close to the
>> > TYR71 backbone carbonyl oxygen (this can be visualized at
>> > https://www.rcsb.org/3d-view/3UDN?preset=ligandInteraction=09B ,
>> > TYR71 is near the ligand).
>> >
>> > Does anyone know how to avoid this to create a Chem.Mol? I've tried
>> > using Parmed and PDBFixer, since they use residue templates to
>> > generate the correct bonding topology, but they don't write CONECT
>> > records or SDFs, so the bonds are still lost to RDKit.
>> >
>> > Thanks for your time!
>> > Lewis
>> > PS - why not just use PDBFixer? I'm trying to calculate atom
>> > invariants using RDKit's morgan fingerprinter implementation, so
>> > ultimately I want a sanitized Mol object
>> >
>> > Links:
>> > --
>> > [1] https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz
>> > ___
>> > Rdkit-discuss mailing list
>> > Rdkit-discuss@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Parsing a PDB file with atoms that are too close, causing bad bond

2021-09-27 Thread Lewis Martin
Very interesting - thank you Francois! PDB re-do does the trick:









*import requestsfrom rdkit import Chemdef getPDB(code):out =
requests.get(f'https://pdb-redo.eu/db/{code}/{code}_final.pdb
')return
out.contentpdb_string = getPDB('3udn')Chem.MolFromPDBBlock(pdb_string)*

I think this solves it for me, but if anyone knows how to infer correct
bonding information without relying on distances, I'd love to hear it too!
So far I've noticed that Parmed and PDBFixer infer correct bonds, but they
don't determine bond orders, so it's difficult to port the molecule into
RDKit.

Cheers
Lewis



On Mon, Sep 27, 2021 at 5:55 PM Francois Berenger  wrote:

> Hi Lewis,
>
> Just an idea: you might try to load your PDB in UCSF Chimera, then
> save it as a mol2 or sdf file.
> Then, try to read this sdf file from rdkit.
>
> Another idea: try to get your pdb file through the pdbredo service.
> https://pdb-redo.eu/
> They might have fixed a few things; maybe this PDB will read better in
> rdkit.
>
> Regards,
> F.
>
> On 26/09/2021 17:02, Lewis Martin wrote:
> > Hi RDKit,
> > While parsing proteins from the PBD with RDKit, I've come across
> > situations where the distance-based bond determination leads to
> > 'incorrect' bonds between atoms that are erroneously too close
> > together. PDB files have no bond information, so it's not really
> > 'incorrect' (rather the model coordinates are off), but the bonds are
> > nonphysical - and it means the Mol objects won't sanitize.
> >
> > Here's an example:
> >
> > import requests
> > from io import BytesIO
> > import gzip
> > from rdkit import Chem
> >
> > def getPDB(code):
> > out =
> > requests.get(f'https://files.rcsb.org/download/{code}.pdb1.gz [1]')
> > binary_stream =  BytesIO(out.content)
> > return gzip.open(binary_stream).read()
> >
> > pdb_string = getPDB('3udn')
> > Chem.MolFromPDBBlock(pdb_string)
> >
> > Error is:
> >
> > RDKit ERROR: [22:38:21] Explicit valence for atom # 573 O, 3, is
> > greater than permitted
> >
> > This is caused by the threonine 72 sidechain being too close to the
> > TYR71 backbone carbonyl oxygen (this can be visualized at
> > https://www.rcsb.org/3d-view/3UDN?preset=ligandInteraction=09B ,
> > TYR71 is near the ligand).
> >
> > Does anyone know how to avoid this to create a Chem.Mol? I've tried
> > using Parmed and PDBFixer, since they use residue templates to
> > generate the correct bonding topology, but they don't write CONECT
> > records or SDFs, so the bonds are still lost to RDKit.
> >
> > Thanks for your time!
> > Lewis
> > PS - why not just use PDBFixer? I'm trying to calculate atom
> > invariants using RDKit's morgan fingerprinter implementation, so
> > ultimately I want a sanitized Mol object
> >
> > Links:
> > --
> > [1] https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Parsing a PDB file with atoms that are too close, causing bad bond

2021-09-27 Thread Francois Berenger

Hi Lewis,

Just an idea: you might try to load your PDB in UCSF Chimera, then
save it as a mol2 or sdf file.
Then, try to read this sdf file from rdkit.

Another idea: try to get your pdb file through the pdbredo service.
https://pdb-redo.eu/
They might have fixed a few things; maybe this PDB will read better in 
rdkit.


Regards,
F.

On 26/09/2021 17:02, Lewis Martin wrote:

Hi RDKit,
While parsing proteins from the PBD with RDKit, I've come across
situations where the distance-based bond determination leads to
'incorrect' bonds between atoms that are erroneously too close
together. PDB files have no bond information, so it's not really
'incorrect' (rather the model coordinates are off), but the bonds are
nonphysical - and it means the Mol objects won't sanitize.

Here's an example:

import requests
from io import BytesIO
import gzip
from rdkit import Chem

def getPDB(code):
out =
requests.get(f'https://files.rcsb.org/download/{code}.pdb1.gz [1]')
binary_stream =  BytesIO(out.content)
return gzip.open(binary_stream).read()

pdb_string = getPDB('3udn')
Chem.MolFromPDBBlock(pdb_string)

Error is:

RDKit ERROR: [22:38:21] Explicit valence for atom # 573 O, 3, is
greater than permitted

This is caused by the threonine 72 sidechain being too close to the
TYR71 backbone carbonyl oxygen (this can be visualized at
https://www.rcsb.org/3d-view/3UDN?preset=ligandInteraction=09B ,
TYR71 is near the ligand).

Does anyone know how to avoid this to create a Chem.Mol? I've tried
using Parmed and PDBFixer, since they use residue templates to
generate the correct bonding topology, but they don't write CONECT
records or SDFs, so the bonds are still lost to RDKit.

Thanks for your time!
Lewis
PS - why not just use PDBFixer? I'm trying to calculate atom
invariants using RDKit's morgan fingerprinter implementation, so
ultimately I want a sanitized Mol object

Links:
--
[1] https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Parsing a PDB file with atoms that are too close, causing bad bond

2021-09-26 Thread Lewis Martin
Hi RDKit,
While parsing proteins from the PBD with RDKit, I've come across situations
where the distance-based bond determination leads to 'incorrect' bonds
between atoms that are erroneously too close together. PDB files have no
bond information, so it's not really 'incorrect' (rather the model
coordinates are off), but the bonds are nonphysical - and it means the Mol
objects won't sanitize.

Here's an example:

import requests
from io import BytesIO
import gzip
from rdkit import Chem

def getPDB(code):
out = requests.get(f'https://files.rcsb.org/download/{code}.pdb1.gz')
binary_stream =  BytesIO(out.content)
return gzip.open(binary_stream).read()

pdb_string = getPDB('3udn')
Chem.MolFromPDBBlock(pdb_string)

Error is:

RDKit ERROR: [22:38:21] Explicit valence for atom # 573 O, 3, is
greater than permitted

This is caused by the threonine 72 sidechain being too close to the TYR71
backbone carbonyl oxygen (this can be visualized at
https://www.rcsb.org/3d-view/3UDN?preset=ligandInteraction=09B , TYR71
is near the ligand).

Does anyone know how to avoid this to create a Chem.Mol? I've tried using
Parmed and PDBFixer, since they use residue templates to generate the
correct bonding topology, but they don't write CONECT records or SDFs, so
the bonds are still lost to RDKit.


Thanks for your time!
Lewis
PS - why not just use PDBFixer? I'm trying to calculate atom invariants
using RDKit's morgan fingerprinter implementation, so ultimately I want a
sanitized Mol object
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss