Hi Mike, I think you mean "organometallics", not "metallocenes" (the two molecules in that SDF is are coordination complexes, but neither is a metallocene; I stopped looking after that). The compounds are also drawn in such a way that they are chemically unreasonable. This is pretty typical for organometallics in V2000 mol files.
Unless you have a reliable source of input molecules and/or are willing to look at every one, I would just filter anything that has a metal-nonmetal bond out of the dataset. If you really want to do something with the molecules: The rdMolStandardize code, which is derived from MolVS, currently has one approach for dealing with this type of complex: breaking all the covalent bonds to the metal (this is also what InChI does). Given what a mess these compounds are when they show up in most standard file formats, this seems like a reasonable thing to do: In [4]: from rdkit import Chem In [5]: from rdkit.Chem.MolStandardize import rdMolStandardize In [6]: dcon = rdMolStandardize.MetalDisconnector() [14:34:03] Initializing MetalDisconnector In [8]: suppl = Chem.SDMolSupplier('/home/glandrum/Downloads/RDKit_input.sdf',sanitize=False,removeHs=False) In [9]: m = suppl[0] In [10]: om = dcon.Disconnect(m) [14:34:29] Running MetalDisconnector [14:34:29] Removed covalent bond between Tc and O [14:34:29] Removed covalent bond between Tc and O [14:34:29] Removed covalent bond between Tc and S [14:34:29] Removed covalent bond between Tc and S [14:34:29] Removed covalent bond between Tc and P [14:34:29] Removed covalent bond between Tc and P In [11]: Chem.SanitizeMol(om) Out[11]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE In [12]: Chem.MolToSmiles(om) Out[12]: 'CSCC[C@@H](NC(=O)[C@@H](CC(C)C)NC(=O)[C@ @H](Cc1cnc[nH]1)NC(=O)CNC(=O)[C@H](NC(=O)[C@@H](C)NC(=O)[C@H](CC(=O)[C@ @H](CCC(N)=O)NC(=O)CCCCNC(=O)CCCCC(CC[SH-]CCC[PH-](CO)CO)[SH-]CCC[PH-](CO)CO)c1cc2ccccc2[nH]1)C(C)C)C(N)=O.[99Tc+9].[Cl-].[O-2].[O-2]' It's worth noting that this molecule is still a long way from making chemical sense : the +9 charge on the Tc and the [SH-] and [PH-] groups are not sensible. So there's more manual fixing required here. Best, -greg On Mon, Oct 7, 2019 at 12:06 PM Mike Mazanetz <mi...@novadatasolutions.co.uk> wrote: > Hello RDKit experts ! > > > > Is there a function to handle metallocenes in the standardizer? > > > > I’ve enclosed some examples of compounds. > > > > Thanks, > > mike > > > > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss