Hi Gustavo,

(Sorry, forgot to reply all before...)


Your deduplication task is quite familiar to me and something I do quite a lot 
of in my own work ;)


Can I suggest deduplicating using Canonical SMILES?


It doesn't solve your InChIKey issue, but it is a solution for now.


I updated my gist to show that it is feasible:


https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f


<https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>

Adelene


Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai





________________________________
From: Gustavo Seabra <gustavo.sea...@gmail.com>
Sent: Sunday, October 25, 2020 2:27:15 PM
To: Adelene LAI
Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

Actually,  I was trying to generate all stereoisomers for molecules in a 
database,  and filter duplicate molecules by using the InChI Key to detect 
duplicates.  But it gives cis/trans isomers on sp2-N the same Key.

Gustavo.

--
Gustavo Seabra

________________________________
From: Adelene LAI <adelene....@uni.lu>
Sent: Sunday, October 25, 2020 1:44:01 AM
To: Gustavo Seabra <gustavo.sea...@gmail.com>
Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key


Hi Gustavo,


It occurred to me while swimming yesterday - was there a reason you pointed out 
the hybridisation state of N in your original subject text?


Was it just to specify which N to focus on, or did you expect something special 
about sp2 hybridisation wrt InChIKey?


Adelene


Doctoral Researcher

Environmental Cheminformatics

UNIVERSITÉ DU LUXEMBOURG


LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE

6, avenue du Swing, L-4367 Belvaux

T +356 46 66 44 67 18

[github.png] adelenelai





________________________________
From: Gustavo Seabra <gustavo.sea...@gmail.com>
Sent: Saturday, October 24, 2020 5:37:09 AM
To: RDKit Discuss; Adelene LAI
Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

Thanks for looking into it. I'm happy to see.it wasn't just a mistake by me ;-)

I hope we can find what's wrong there.

Best,
Gustavo.

--
Gustavo Seabra

________________________________
From: Adelene LAI <adelene....@uni.lu>
Sent: Friday, October 23, 2020 11:28:55 PM
To: Gustavo Seabra <gustavo.sea...@gmail.com>; RDKit Discuss 
<rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key


Hi Gustavo,


<https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f


In the gist above, I tried doing some further investigating.


It seems for the example you gave, the rdkit functions indeed give the same 
inchikey and inchi, but different aux info.


Why this different aux info doesn't translate into different inchikeys/inchis, 
I'm not sure.


Adelene






Doctoral Researcher

Environmental Cheminformatics

UNIVERSITÉ DU LUXEMBOURG


LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE

6, avenue du Swing, L-4367 Belvaux

T +356 46 66 44 67 18

[github.png] adelenelai





________________________________
From: Gustavo Seabra <gustavo.sea...@gmail.com>
Sent: Friday, October 23, 2020 6:43:07 PM
To: RDKit Discuss
Subject: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

Hi all,

I run into an issue here, and I'd appreciate your input. I noticed that 
compounds that differ only on the cis-trans isomerization around an sp2 
nitrogen get the same InChI Key from RDKit. For example:

> inchi_cis = 
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C"))
> inchi_cis
'AQIXAKUUQRKLND-UHFFFAOYSA-N'

> inchi_trans = 
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C"))
> inchi_trans
'AQIXAKUUQRKLND-UHFFFAOYSA-N'

> inchi_cis == inchi_trans
True

I wonder if this is a limitation of the InChI Key definition, or an 
implementation issue.

Thanks a lot,
--
Gustavo Seabra.
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to