Hi there,

I was looking for a way to standardize structures of organometallics so I can 
match them across different databases.



One example is cisplatin which has different Smiles representations in 
different databases, e.g.:

  *   Drugbank (represented as covalent bonds): N[Pt](N)(Cl)Cl
  *   PubChem (represented as both ionic and covalent bonds): N.N.Cl[Pt]Cl

If I just calculate the Inchikey based on those Smiles strings, obviously they 
are different.

To standardize the structures, I came up with this solution:

  1.  Convert the rdkit mol to an Inchi string (disconnects metal covalent 
bonds)
  2.  Convert the Inchi string back to a molecule. For the above molecules, I 
get:

  *   Drugbank: [Cl-].[Cl-].[NH2-].[NH2-].[Pt+4]
  *   PubChem: N.N.[Cl-].[Cl-].[Pt+2]

  1.  Set all formal charges to zero and calculate the Inchikey, which is then 
identical.
Unfortunately, the last step is a bit brute force, so all charges in the 
molecule are lost. Could anyone think of a better solution?

Thanks,
Malgorzata
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to