Canonical SMILES is probably the way to go, but you might also be able to
use the InchiKey and the Inchi auxiliary information together as a compound
hash key.

-P.

On Sun, Oct 25, 2020 at 10:53 AM Adelene LAI <adelene....@uni.lu> wrote:

> Hi Gustavo,
>
>
> (Sorry, forgot to reply all before...)
>
>
> Your deduplication task is quite familiar to me and something I do quite a
> lot of in my own work ;)
>
>
> Can I suggest deduplicating using Canonical SMILES?
>
>
> It doesn't solve your InChIKey issue, but it is a solution for now.
>
>
> I updated my gist to show that it is feasible:
>
>
> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f
>
>
> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>
>
> Adelene
>
>
>
> Doctoral Researcher
>
> Environmental Cheminformatics
>
> UNIVERSITÉ DU LUXEMBOURG
>
>
> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>
> 6, avenue du Swing, L-4367 Belvaux
>
> T +356 46 66 44 67 18
>
> [image: github.png] adelenelai
>
>
>
>
>
> ------------------------------
> *From:* Gustavo Seabra <gustavo.sea...@gmail.com>
> *Sent:* Sunday, October 25, 2020 2:27:15 PM
> *To:* Adelene LAI
> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key
>
> Actually,  I was trying to generate all stereoisomers for molecules in a
> database,  and filter duplicate molecules by using the InChI Key to detect
> duplicates.  But it gives cis/trans isomers on sp2-N the same Key.
>
> Gustavo.
>
> --
> Gustavo Seabra
>
> ------------------------------
> *From:* Adelene LAI <adelene....@uni.lu>
> *Sent:* Sunday, October 25, 2020 1:44:01 AM
> *To:* Gustavo Seabra <gustavo.sea...@gmail.com>
> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key
>
>
> Hi Gustavo,
>
>
> It occurred to me while swimming yesterday - was there a reason you
> pointed out the hybridisation state of N in your original subject text?
>
>
> Was it just to specify which N to focus on, or did you expect something
> special about sp2 hybridisation wrt InChIKey?
>
>
> Adelene
>
>
> Doctoral Researcher
>
> Environmental Cheminformatics
>
> UNIVERSITÉ DU LUXEMBOURG
>
>
> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>
> 6, avenue du Swing, L-4367 Belvaux
>
> T +356 46 66 44 67 18
>
> [image: github.png] adelenelai
>
>
>
>
>
> ------------------------------
> *From:* Gustavo Seabra <gustavo.sea...@gmail.com>
> *Sent:* Saturday, October 24, 2020 5:37:09 AM
> *To:* RDKit Discuss; Adelene LAI
> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key
>
> Thanks for looking into it. I'm happy to see.it wasn't just a mistake by
> me ;-)
>
> I hope we can find what's wrong there.
>
> Best,
> Gustavo.
>
> --
> Gustavo Seabra
>
> ------------------------------
> *From:* Adelene LAI <adelene....@uni.lu>
> *Sent:* Friday, October 23, 2020 11:28:55 PM
> *To:* Gustavo Seabra <gustavo.sea...@gmail.com>; RDKit Discuss <
> rdkit-discuss@lists.sourceforge.net>
> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key
>
>
> Hi Gustavo,
>
>
> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>
> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f
>
>
> In the gist above, I tried doing some further investigating.
>
>
> It seems for the example you gave, the rdkit functions indeed give the
> same inchikey and inchi, but different aux info.
>
>
> Why this different aux info doesn't translate into different
> inchikeys/inchis, I'm not sure.
>
>
> Adelene
>
>
>
>
>
>
> Doctoral Researcher
>
> Environmental Cheminformatics
>
> UNIVERSITÉ DU LUXEMBOURG
>
>
> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>
> 6, avenue du Swing, L-4367 Belvaux
>
> T +356 46 66 44 67 18
>
> [image: github.png] adelenelai
>
>
>
>
>
> ------------------------------
> *From:* Gustavo Seabra <gustavo.sea...@gmail.com>
> *Sent:* Friday, October 23, 2020 6:43:07 PM
> *To:* RDKit Discuss
> *Subject:* [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key
>
> Hi all,
>
> I run into an issue here, and I'd appreciate your input. I noticed that
> compounds that differ only on the cis-trans isomerization around an sp2
> nitrogen get the same InChI Key from RDKit. For example:
>
> > inchi_cis =
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C"))
> > inchi_cis
> 'AQIXAKUUQRKLND-UHFFFAOYSA-N'
>
> > inchi_trans =
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C"))
> > inchi_trans
> 'AQIXAKUUQRKLND-UHFFFAOYSA-N'
>
> > inchi_cis == inchi_trans
> True
>
> I wonder if this is a limitation of the InChI Key definition, or an
> implementation issue.
>
> Thanks a lot,
> --
> Gustavo Seabra.
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to