Hi Greg, Yes, that's true, thanks for the hint!
Btw, I came across the molvs Python library (https://github.com/mcs07/MolVS/blob/master/molvs/metal.py) which uses RDkit to break covalent metal bonds - it is doing a similar thing as calculating the Inchi and converting it back to a molecule. Cheers ________________________________ From: Greg Landrum <[email protected]> Sent: 03 November 2018 16:04:11 To: Malgorzata Werner Cc: RDKit Discuss Subject: Re: [Rdkit-discuss] duplicate checks for organometallics Hi Malgorzata, Organometallics are definitely challenging. The biggest problem here is that the two different SMILES actually correspond to different stoichiometries. This isn't just two different representations of the same thing, N[Pt](N)(Cl)C is H4Cl2N2Pt while N.N.[Cl-].[Cl-].[Pt+2] is H6Cl2N2Pt For what it's worth, I believe that the Pubchem entry "N.N.Cl<http://N.N.Cl>[Pt]Cl", is correct. You should get different InChI strings or keys for molecules that have different stoichiometries. -greg On Fri, Nov 2, 2018 at 9:01 AM Malgorzata Werner <[email protected]<mailto:[email protected]>> wrote: Hi there, I was looking for a way to standardize structures of organometallics so I can match them across different databases. One example is cisplatin which has different Smiles representations in different databases, e.g.: * Drugbank (represented as covalent bonds): N[Pt](N)(Cl)Cl * PubChem (represented as both ionic and covalent bonds): N.N.Cl<http://N.N.Cl>[Pt]Cl If I just calculate the Inchikey based on those Smiles strings, obviously they are different. To standardize the structures, I came up with this solution: 1. Convert the rdkit mol to an Inchi string (disconnects metal covalent bonds) 2. Convert the Inchi string back to a molecule. For the above molecules, I get: * Drugbank: [Cl-].[Cl-].[NH2-].[NH2-].[Pt+4] * PubChem: N.N.[Cl-].[Cl-].[Pt+2] 1. Set all formal charges to zero and calculate the Inchikey, which is then identical. Unfortunately, the last step is a bit brute force, so all charges in the molecule are lost. Could anyone think of a better solution? Thanks, Malgorzata _______________________________________________ Rdkit-discuss mailing list [email protected]<mailto:[email protected]> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

