Hi Michal/Greg,
Many thanks for your thoughts. Compounds are from PubChem's Substances. I'm of the opinion to filter out these types of molecules, but this may be hard to do with billions of compounds...? What would be an efficient way to check parse drug like compounds, and reject organometallics. Clearly checking every atom/bond is too expensive. Best Mike Get Outlook for Android On Mon, Oct 7, 2019 at 3:20 PM +0100, "Michal Krompiec" <michal.kromp...@gmail.com> wrote: Dear Mike, Try changing all metal-ligand bonds to "dative" or "ionic, and standardize afterwards (but disable adjusting of implicit Hs). This way, I was able to process (in KNIME) >99% of organometallics (incl. metallocenes) downloaded from Reaxys. Example snippet (which doesn't check the "directionality" of the bond, though): from rdkit import Chem import pandas as pd metals=['Ti','Al','Mo','Ru','Co','Rh', 'Ir', 'Ni','Zr', 'Hf', 'W'] outmols=[] mols=input_table['Molecule'] for mol in mols: for bond in mol.GetBonds(): if bond.GetEndAtom().GetSymbol() in metals or bond.GetBeginAtom().GetSymbol() in metals: print("found metal-ligand bond") print("original type: "+ str(bond.GetBondType())) btype=Chem.rdchem.BondType.DATIVE bond.SetBondType(btype) print("changed to: "+ str(mol.GetBonds()[bond.GetIdx()].GetBondType())) try: Chem.SanitizeMol(mol,sanitizeOps=Chem.SanitizeFlags.SANITIZE_ALL^Chem.SanitizeFlags.SANITIZE_ADJUSTHS) except ValueError as ve: print("Sanitization failed") print(ve) output_table = input_table.copy() Best, Michal On Mon, 7 Oct 2019 at 13:45, Greg Landrum wrote: > > Hi Mike, > > I think you mean "organometallics", not "metallocenes" (the two molecules in > that SDF is are coordination complexes, but neither is a metallocene; I > stopped looking after that). The compounds are also drawn in such a way that > they are chemically unreasonable. This is pretty typical for organometallics > in V2000 mol files. > > Unless you have a reliable source of input molecules and/or are willing to > look at every one, I would just filter anything that has a metal-nonmetal > bond out of the dataset. > > If you really want to do something with the molecules: > The rdMolStandardize code, which is derived from MolVS, currently has one > approach for dealing with this type of complex: breaking all the covalent > bonds to the metal (this is also what InChI does). Given what a mess these > compounds are when they show up in most standard file formats, this seems > like a reasonable thing to do: > > In [4]: from rdkit import Chem > > In [5]: from rdkit.Chem.MolStandardize import rdMolStandardize > > In [6]: dcon = rdMolStandardize.MetalDisconnector() > [14:34:03] Initializing MetalDisconnector > > In [8]: suppl = > Chem.SDMolSupplier('/home/glandrum/Downloads/RDKit_input.sdf',sanitize=False,removeHs=False) > > In [9]: m = suppl[0] > > In [10]: om = dcon.Disconnect(m) > [14:34:29] Running MetalDisconnector > [14:34:29] Removed covalent bond between Tc and O > [14:34:29] Removed covalent bond between Tc and O > [14:34:29] Removed covalent bond between Tc and S > [14:34:29] Removed covalent bond between Tc and S > [14:34:29] Removed covalent bond between Tc and P > [14:34:29] Removed covalent bond between Tc and P > > In [11]: Chem.SanitizeMol(om) > Out[11]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE > > In [12]: Chem.MolToSmiles(om) > Out[12]: > 'CSCC[C@@H](NC(=O)[C@@H](CC(C)C)NC(=O)[C@@H](Cc1cnc[nH]1)NC(=O)CNC(=O)[C@H](NC(=O)[C@@H](C)NC(=O)[C@H](CC(=O)[C@@H](CCC(N)=O)NC(=O)CCCCNC(=O)CCCCC(CC[SH-]CCC[PH-](CO)CO)[SH-]CCC[PH-](CO)CO)c1cc2ccccc2[nH]1)C(C)C)C(N)=O.[99Tc+9].[Cl-].[O-2].[O-2]' > > > It's worth noting that this molecule is still a long way from making chemical > sense : the +9 charge on the Tc and the [SH-] and [PH-] groups are not > sensible. So there's more manual fixing required here. > > > Best, > -greg > > > On Mon, Oct 7, 2019 at 12:06 PM Mike Mazanetz wrote: >> >> Hello RDKit experts ! >> >> >> >> Is there a function to handle metallocenes in the standardizer? >> >> >> >> I’ve enclosed some examples of compounds. >> >> >> >> Thanks, >> >> mike >> >> >> >> >> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss