Hi there, I was looking for a way to standardize structures of organometallics so I can match them across different databases.
One example is cisplatin which has different Smiles representations in different databases, e.g.: * Drugbank (represented as covalent bonds): N[Pt](N)(Cl)Cl * PubChem (represented as both ionic and covalent bonds): N.N.Cl[Pt]Cl If I just calculate the Inchikey based on those Smiles strings, obviously they are different. To standardize the structures, I came up with this solution: 1. Convert the rdkit mol to an Inchi string (disconnects metal covalent bonds) 2. Convert the Inchi string back to a molecule. For the above molecules, I get: * Drugbank: [Cl-].[Cl-].[NH2-].[NH2-].[Pt+4] * PubChem: N.N.[Cl-].[Cl-].[Pt+2] 1. Set all formal charges to zero and calculate the Inchikey, which is then identical. Unfortunately, the last step is a bit brute force, so all charges in the molecule are lost. Could anyone think of a better solution? Thanks, Malgorzata
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss