Hi Gustavo,
(Sorry, forgot to reply all before...) Your deduplication task is quite familiar to me and something I do quite a lot of in my own work ;) Can I suggest deduplicating using Canonical SMILES? It doesn't solve your InChIKey issue, but it is a solution for now. I updated my gist to show that it is feasible: https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f> Adelene Doctoral Researcher Environmental Cheminformatics UNIVERSITÉ DU LUXEMBOURG LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE 6, avenue du Swing, L-4367 Belvaux T +356 46 66 44 67 18 [github.png] adelenelai ________________________________ From: Gustavo Seabra <gustavo.sea...@gmail.com> Sent: Sunday, October 25, 2020 2:27:15 PM To: Adelene LAI Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key Actually, I was trying to generate all stereoisomers for molecules in a database, and filter duplicate molecules by using the InChI Key to detect duplicates. But it gives cis/trans isomers on sp2-N the same Key. Gustavo. -- Gustavo Seabra ________________________________ From: Adelene LAI <adelene....@uni.lu> Sent: Sunday, October 25, 2020 1:44:01 AM To: Gustavo Seabra <gustavo.sea...@gmail.com> Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key Hi Gustavo, It occurred to me while swimming yesterday - was there a reason you pointed out the hybridisation state of N in your original subject text? Was it just to specify which N to focus on, or did you expect something special about sp2 hybridisation wrt InChIKey? Adelene Doctoral Researcher Environmental Cheminformatics UNIVERSITÉ DU LUXEMBOURG LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE 6, avenue du Swing, L-4367 Belvaux T +356 46 66 44 67 18 [github.png] adelenelai ________________________________ From: Gustavo Seabra <gustavo.sea...@gmail.com> Sent: Saturday, October 24, 2020 5:37:09 AM To: RDKit Discuss; Adelene LAI Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key Thanks for looking into it. I'm happy to see.it wasn't just a mistake by me ;-) I hope we can find what's wrong there. Best, Gustavo. -- Gustavo Seabra ________________________________ From: Adelene LAI <adelene....@uni.lu> Sent: Friday, October 23, 2020 11:28:55 PM To: Gustavo Seabra <gustavo.sea...@gmail.com>; RDKit Discuss <rdkit-discuss@lists.sourceforge.net> Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key Hi Gustavo, <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f In the gist above, I tried doing some further investigating. It seems for the example you gave, the rdkit functions indeed give the same inchikey and inchi, but different aux info. Why this different aux info doesn't translate into different inchikeys/inchis, I'm not sure. Adelene Doctoral Researcher Environmental Cheminformatics UNIVERSITÉ DU LUXEMBOURG LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE 6, avenue du Swing, L-4367 Belvaux T +356 46 66 44 67 18 [github.png] adelenelai ________________________________ From: Gustavo Seabra <gustavo.sea...@gmail.com> Sent: Friday, October 23, 2020 6:43:07 PM To: RDKit Discuss Subject: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key Hi all, I run into an issue here, and I'd appreciate your input. I noticed that compounds that differ only on the cis-trans isomerization around an sp2 nitrogen get the same InChI Key from RDKit. For example: > inchi_cis = > Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C")) > inchi_cis 'AQIXAKUUQRKLND-UHFFFAOYSA-N' > inchi_trans = > Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C")) > inchi_trans 'AQIXAKUUQRKLND-UHFFFAOYSA-N' > inchi_cis == inchi_trans True I wonder if this is a limitation of the InChI Key definition, or an implementation issue. Thanks a lot, -- Gustavo Seabra.
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss