Hi Greg,

Yes, that's true, thanks for the hint!


Btw, I came across the molvs Python library 
(https://github.com/mcs07/MolVS/blob/master/molvs/metal.py) which uses RDkit to 
break covalent metal bonds - it is doing a similar thing as calculating the 
Inchi and converting it back to a molecule.


Cheers

________________________________
From: Greg Landrum <[email protected]>
Sent: 03 November 2018 16:04:11
To: Malgorzata Werner
Cc: RDKit Discuss
Subject: Re: [Rdkit-discuss] duplicate checks for organometallics

Hi Malgorzata,

Organometallics are definitely challenging. The biggest problem here is that 
the two different SMILES actually correspond to different stoichiometries. This 
isn't just two different representations of the same thing, N[Pt](N)(Cl)C is 
H4Cl2N2Pt while N.N.[Cl-].[Cl-].[Pt+2] is H6Cl2N2Pt
For what it's worth, I believe that the Pubchem entry 
"N.N.Cl<http://N.N.Cl>[Pt]Cl", is correct.

You should get different InChI strings or keys for molecules that have 
different stoichiometries.

-greg



On Fri, Nov 2, 2018 at 9:01 AM Malgorzata Werner 
<[email protected]<mailto:[email protected]>>
 wrote:

Hi there,

I was looking for a way to standardize structures of organometallics so I can 
match them across different databases.



One example is cisplatin which has different Smiles representations in 
different databases, e.g.:

  *   Drugbank (represented as covalent bonds): N[Pt](N)(Cl)Cl
  *   PubChem (represented as both ionic and covalent bonds): 
N.N.Cl<http://N.N.Cl>[Pt]Cl



If I just calculate the Inchikey based on those Smiles strings, obviously they 
are different.



To standardize the structures, I came up with this solution:

  1.  Convert the rdkit mol to an Inchi string (disconnects metal covalent 
bonds)
  2.  Convert the Inchi string back to a molecule. For the above molecules, I 
get:

  *   Drugbank: [Cl-].[Cl-].[NH2-].[NH2-].[Pt+4]
  *   PubChem: N.N.[Cl-].[Cl-].[Pt+2]

  1.  Set all formal charges to zero and calculate the Inchikey, which is then 
identical.

Unfortunately, the last step is a bit brute force, so all charges in the 
molecule are lost. Could anyone think of a better solution?



Thanks,

Malgorzata

_______________________________________________
Rdkit-discuss mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to