Hi Michal/Greg,

Many thanks for your thoughts.  Compounds are from PubChem's Substances. I'm of 
the opinion to filter out these types of molecules, but this may be hard to do 
with billions of compounds...?


What would be an efficient way to check parse drug like compounds, and reject 
organometallics.  Clearly checking every atom/bond is too expensive.


Best


Mike




Get Outlook for Android







On Mon, Oct 7, 2019 at 3:20 PM +0100, "Michal Krompiec" 
<michal.kromp...@gmail.com> wrote:










Dear Mike,
Try changing all metal-ligand bonds to "dative" or "ionic, and
standardize afterwards (but disable adjusting of implicit Hs). This
way, I was able to process (in KNIME) >99% of organometallics (incl.
metallocenes) downloaded from Reaxys.
Example snippet (which doesn't check the "directionality" of the bond, though):

from rdkit import Chem
import pandas as pd
metals=['Ti','Al','Mo','Ru','Co','Rh', 'Ir', 'Ni','Zr', 'Hf', 'W']
outmols=[]
mols=input_table['Molecule']
for mol in mols:
    for bond in mol.GetBonds():
         if bond.GetEndAtom().GetSymbol() in metals or
bond.GetBeginAtom().GetSymbol() in metals:
              print("found metal-ligand bond")
              print("original type: "+ str(bond.GetBondType()))
              btype=Chem.rdchem.BondType.DATIVE
              bond.SetBondType(btype)
              print("changed to: "+
str(mol.GetBonds()[bond.GetIdx()].GetBondType()))
              try:

Chem.SanitizeMol(mol,sanitizeOps=Chem.SanitizeFlags.SANITIZE_ALL^Chem.SanitizeFlags.SANITIZE_ADJUSTHS)
              except ValueError as ve:
                  print("Sanitization failed")
                  print(ve)
output_table = input_table.copy()

Best,
Michal



On Mon, 7 Oct 2019 at 13:45, Greg Landrum  wrote:
>
> Hi Mike,
>
> I think you mean "organometallics", not "metallocenes" (the two molecules in 
> that SDF is are coordination complexes, but neither is a metallocene; I 
> stopped looking after that). The compounds are also drawn in such a way that 
> they are chemically unreasonable. This is pretty typical for organometallics 
> in V2000 mol files.
>
> Unless you have a reliable source of input molecules and/or are willing to 
> look at every one, I would just filter anything that has a metal-nonmetal 
> bond out of the dataset.
>
> If you really want to do something with the molecules:
> The rdMolStandardize code, which is derived from MolVS, currently has one 
> approach for dealing with this type of complex: breaking all the covalent 
> bonds to the metal (this is also what InChI does). Given what a mess these 
> compounds are when they show up in most standard file formats, this seems 
> like a reasonable thing to do:
>
> In [4]: from rdkit import Chem
>
> In [5]: from rdkit.Chem.MolStandardize import rdMolStandardize
>
> In [6]: dcon = rdMolStandardize.MetalDisconnector()
> [14:34:03] Initializing MetalDisconnector
>
> In [8]: suppl = 
> Chem.SDMolSupplier('/home/glandrum/Downloads/RDKit_input.sdf',sanitize=False,removeHs=False)
>
> In [9]: m = suppl[0]
>
> In [10]: om = dcon.Disconnect(m)
> [14:34:29] Running MetalDisconnector
> [14:34:29] Removed covalent bond between Tc and O
> [14:34:29] Removed covalent bond between Tc and O
> [14:34:29] Removed covalent bond between Tc and S
> [14:34:29] Removed covalent bond between Tc and S
> [14:34:29] Removed covalent bond between Tc and P
> [14:34:29] Removed covalent bond between Tc and P
>
> In [11]: Chem.SanitizeMol(om)
> Out[11]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
>
> In [12]: Chem.MolToSmiles(om)
> Out[12]: 
> 'CSCC[C@@H](NC(=O)[C@@H](CC(C)C)NC(=O)[C@@H](Cc1cnc[nH]1)NC(=O)CNC(=O)[C@H](NC(=O)[C@@H](C)NC(=O)[C@H](CC(=O)[C@@H](CCC(N)=O)NC(=O)CCCCNC(=O)CCCCC(CC[SH-]CCC[PH-](CO)CO)[SH-]CCC[PH-](CO)CO)c1cc2ccccc2[nH]1)C(C)C)C(N)=O.[99Tc+9].[Cl-].[O-2].[O-2]'
>
>
> It's worth noting that this molecule is still a long way from making chemical 
> sense : the +9 charge on the Tc and the [SH-] and [PH-] groups are not 
> sensible. So there's more manual fixing required here.
>
>
> Best,
> -greg
>
>
> On Mon, Oct 7, 2019 at 12:06 PM Mike Mazanetz  wrote:
>>
>> Hello RDKit experts !
>>
>>
>>
>> Is there a function to handle metallocenes in the standardizer?
>>
>>
>>
>> I’ve enclosed some examples of compounds.
>>
>>
>>
>> Thanks,
>>
>> mike
>>
>>
>>
>>
>>
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss





_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to