Hi Malgorzata, Organometallics are definitely challenging. The biggest problem here is that the two different SMILES actually correspond to different stoichiometries. This isn't just two different representations of the same thing, N[Pt](N)(Cl)C is H4Cl2N2Pt while N.N.[Cl-].[Cl-].[Pt+2] is H6Cl2N2Pt For what it's worth, I believe that the Pubchem entry "N.N.Cl[Pt]Cl", is correct.
You should get different InChI strings or keys for molecules that have different stoichiometries. -greg On Fri, Nov 2, 2018 at 9:01 AM Malgorzata Werner < malgorzata.wer...@molecularhealth.com> wrote: > Hi there, > > I was looking for a way to standardize structures of organometallics so I > can match them across different databases. > > > > One example is cisplatin which has different Smiles representations in > different databases, e.g.: > > - Drugbank (represented as covalent bonds): N[Pt](N)(Cl)Cl > - PubChem (represented as both ionic and covalent bonds): N.N.Cl[Pt]Cl > > > > If I just calculate the Inchikey based on those Smiles strings, obviously > they are different. > > > > To standardize the structures, I came up with this solution: > > 1. Convert the rdkit mol to an Inchi string (disconnects metal > covalent bonds) > 2. Convert the Inchi string back to a molecule. For the above > molecules, I get: > > > - Drugbank: [Cl-].[Cl-].[NH2-].[NH2-].[Pt+4] > - PubChem: N.N.[Cl-].[Cl-].[Pt+2] > > > 1. Set all formal charges to zero and calculate the Inchikey, which is > then identical. > > Unfortunately, the last step is a bit brute force, so all charges in the > molecule are lost. Could anyone think of a better solution? > > > > Thanks, > > Malgorzata > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss