Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-30 Thread Gustavo Seabra
Sure, here is:

1. The question:

"I noticed that compounds that differ only on the cis-trans isomerization
> around an sp2 nitrogen get the same InChI Key from RDKit. For example:
> > inchi_cis =
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C"))
> > inchi_cis
> 'AQIXAKUUQRKLND-UHFFFAOYSA-N'
> > inchi_trans =
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C"))
> > inchi_trans
> 'AQIXAKUUQRKLND-UHFFFAOYSA-N'
> > inchi_cis == inchi_trans
> True
> I wonder if this is a limitation of the InChI Key definition, or an
> implementation issue.


There answer to the question, in the end, was that the InChI Keys were
behaving as intended, by design, as pointed out by Igor Pletnev:

though InChI is not perfect, in this case it behaves as intended.
> Please see below.
> The discussed molecules contain substituted guanidine fragment
> (RHN)C(=NMe)(NHR')
> It is subjected to tautomerism, and in different tautomers different C-N
> bonds have double order:
> (RHN)C(=NMe)(NHR')
> (RHN)C(NHMe)(=NR')
> (RN=)C(NHMe)(NHR')
> You generated Standard InChI, which is evidenced by "InChI=1S/" prefix in
> the examples.
> Standard InChI is specifically designed to produce the same identifier for
> all tautomers (by indicating that two hydrogens are shared by three
> nitrogen atoms, for any tautomer; bond orders are not indicated in InChI).
> As the tautomer-invariant Std InChI does not know which C-N bond is
> actually a double, there is the only option for treating stereo -- to
> completely ignore it as a drawing artifact.
> All in all:
> Standard InChI means that the exact tautomeric form is unknown ==> all
> tautomers are mapped to the same generic representation ==>  the exact C-N
> double bond placement in this generic is unspecified ==> C-N double bond
> stereo is ignored ==> generated StdInChI and Std InChIKey are the same for
> seemingly different, by initial drawing, cis/trans forms.
> Once again, this behavior is by design; it is intended for maximal
> interoperability while comparing different drawings of the "same" compound.
> If, for any reason, you would like to consider your examples as the
> definite and resolvable structures, each having its own identifier, just
> use non-Standard InChI.
> The InChI which preserves the exact positions of tautomeric H's and double
> bond ("as drawn") is produced by just specifying option /FixedH upon
> generation.
> More on this may be found in InChI FAQ:
> https://www.inchi-trust.org/technical-faq-2/


The only question remaining was how to use this "/FixedH" option in RDKit,
and that was answered by Paolo Tosco:

you can pass InChI options to the underlying InChI API through the
options parameter
> of Chem.inchi.MolToInchi() and  Chem.inchi.MolToInchiKey(); e.g.:
> inchi.MolToInchi(mol, options="/FixedH")
> Source:
> https://www.rdkit.org/docs/source/rdkit.Chem.inchi.html?highlight=inchi#rdkit.Chem.inchi.MolBlockToInchi


And this is what I'm using now to remove duplicate molecules from my
database. I'm using a Pandas DataFrame and, with the more recent versions
of Pandas, the following works fine:

> df['InChI Key'] = df[mol_col].progress_apply(Chem.MolToInchiKey,
options="/FixedH")
> df.drop_duplicates(subset=['InChI Key'], keep='first', inplace=True)

All the best,
--
Gustavo Seabra.


On Fri, Oct 30, 2020 at 4:47 AM Adelene LAI  wrote:

> Hi Gustavo,
>
>
> Looks like you found a solution for your deduplication task. Would you
> mind sharing it with us? (Seems some emails in the chain are missing.)
>
>
> I'm curious - returning to your original question, did we figure out why
> the same InChIKey was given for the stereoisomers?
>
>
> Adelene
>
>
> Doctoral Researcher
>
> Environmental Cheminformatics
>
> UNIVERSITÉ DU LUXEMBOURG
>
>
> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>
> 6, avenue du Swing, L-4367 Belvaux
>
> T +356 46 66 44 67 18
>
> [image: github.png] adelenelai
>
>
>
>
>
> --
> *From:* Gustavo Seabra 
> *Sent:* Thursday, October 29, 2020 10:23:20 PM
> *To:* Paolo Tosco
> *Cc:* Igor Pletnev; RDKit Discuss
> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key
>
> Aha! Fantastic!
>
> Thanks a lot!!
> Gustavo.
>
> --
> Gustavo Seabra
>
> --
> *From:* Paolo Tosco 
> *Sent:* Thursday, October 29, 2020 5:13:33 PM
> *To:* Gustavo Seabra 
> *Cc:* Igor Pletnev ; RDKit Discuss <
> rdkit-discuss@lists.sourceforge.net>
> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key
>
> Hi Gusta

Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-30 Thread Adelene LAI
Hi Gustavo,


Looks like you found a solution for your deduplication task. Would you mind 
sharing it with us? (Seems some emails in the chain are missing.)


I'm curious - returning to your original question, did we figure out why the 
same InChIKey was given for the stereoisomers?


Adelene

Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai






From: Gustavo Seabra 
Sent: Thursday, October 29, 2020 10:23:20 PM
To: Paolo Tosco
Cc: Igor Pletnev; RDKit Discuss
Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

Aha! Fantastic!

Thanks a lot!!
Gustavo.

--
Gustavo Seabra


From: Paolo Tosco 
Sent: Thursday, October 29, 2020 5:13:33 PM
To: Gustavo Seabra 
Cc: Igor Pletnev ; RDKit Discuss 

Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

Hi Gustavo,

you can pass InChI options to the underlying InChI API through the options 
parameter of Chem.inchi.MolToInchi() and  Chem.inchi.MolToInchiKey(); e.g.:

inchi.MolToInchi(mol, options="/FixedH")

Source: 
https://www.rdkit.org/docs/source/rdkit.Chem.inchi.html?highlight=inchi#rdkit.Chem.inchi.MolBlockToInchi

Cheers,
p.

On Thu, Oct 29, 2020 at 9:42 PM Gustavo Seabra 
mailto:gustavo.sea...@gmail.com>> wrote:
Ok, thanks!
--
Gustavo Seabra.


On Thu, Oct 29, 2020 at 4:33 PM Igor Pletnev 
mailto:igor.plet...@gmail.com>> wrote:
>  Is this "/FixedH" an option in RDKit? How to use that? (I don't see it in 
> the docs).

Sorry, I am not so proficient in RDKit and can not answer exactly. Anyway, this 
option is available in InChI API calls, and I am pretty sure that it is also 
available in RDKit.

I recall that couple of years ago, on some InChI event,  Greg Landrum somewhat 
surprised me by saying that he himself often uses non-Standard InChI instead of 
Standard one — exactly to distinguish tautomers.
So I guess Greg can answer on how it is arranged in RDKit.

Regards,
Igor





On Thu, 29 Oct 2020 at 23:03, Gustavo Seabra 
mailto:gustavo.sea...@gmail.com>> wrote:
That does make sense, I understand it now, thanks!

Is this "/FixedH" an option in RDKit? How to use that? (I don't see it in the 
docs).

Thanks,
--
Gustavo Seabra.


On Wed, Oct 28, 2020 at 6:10 PM Igor Pletnev 
mailto:igor.plet...@gmail.com>> wrote:
Hi Gustavo,

>  ... I was generating the InChI Keys to get a unique hash for each compound, 
> thinking it would be better than SMILES (guaranteed to be unique), but is 
> clearly not the case. On the bright side, I won't lose time generating 
> InChIs...

though InChI is not perfect, in this case it behaves as intended.
Please see below.

The discussed molecules contain substituted guanidine fragment 
(RHN)C(=NMe)(NHR')

It is subjected to tautomerism, and in different tautomers different C-N bonds 
have double order:
(RHN)C(=NMe)(NHR')
(RHN)C(NHMe)(=NR')
(RN=)C(NHMe)(NHR')

You generated Standard InChI, which is evidenced by "InChI=1S/" prefix in the 
examples.
Standard InChI is specifically designed to produce the same identifier for all 
tautomers (by indicating that two hydrogens are shared by three nitrogen atoms, 
for any tautomer; bond orders are not indicated in InChI).

As the tautomer-invariant Std InChI does not know which C-N bond is actually a 
double, there is the only option for treating stereo -- to completely ignore it 
as a drawing artifact.

All in all:
Standard InChI means that the exact tautomeric form is unknown ==> all 
tautomers are mapped to the same generic representation ==>  the exact C-N 
double bond placement in this generic is unspecified ==> C-N double bond stereo 
is ignored ==> generated StdInChI and Std InChIKey are the same for seemingly 
different, by initial drawing, cis/trans forms.

Once again, this behavior is by design; it is intended for maximal 
interoperability while comparing different drawings of the "same" compound.

If, for any reason, you would like to consider your examples as the definite 
and resolvable structures, each having its own identifier, just use 
non-Standard InChI.
The InChI which preserves the exact positions of tautomeric H's and double bond 
("as drawn") is produced by just specifying option /FixedH upon generation.

More on this may be found in InChI FAQ:
https://www.inchi-trust.org/technical-faq-2/

Hope this helps.

Regards,
Igor



On Mon, Oct 26, 2020 at 6:56 PM Gustavo Seabra 
mailto:gustavo.sea...@gmail.com>> wrote:
Thanks a lot Peter and Adelene,

Yes, it looks like canonical SMILES is the way to go, and I have no problem 
sticking with RDKit. I was generating the InChI Keys to get a unique hash for 
each compound, thinking it would be better than SMILES (guaranteed to be 
unique), but is clearly not the case. 

Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-29 Thread Gustavo Seabra
Aha! Fantastic!

Thanks a lot!!
Gustavo.

--
Gustavo Seabra


From: Paolo Tosco 
Sent: Thursday, October 29, 2020 5:13:33 PM
To: Gustavo Seabra 
Cc: Igor Pletnev ; RDKit Discuss 

Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

Hi Gustavo,

you can pass InChI options to the underlying InChI API through the options 
parameter of Chem.inchi.MolToInchi() and  Chem.inchi.MolToInchiKey(); e.g.:

inchi.MolToInchi(mol, options="/FixedH")

Source: 
https://www.rdkit.org/docs/source/rdkit.Chem.inchi.html?highlight=inchi#rdkit.Chem.inchi.MolBlockToInchi

Cheers,
p.

On Thu, Oct 29, 2020 at 9:42 PM Gustavo Seabra 
mailto:gustavo.sea...@gmail.com>> wrote:
Ok, thanks!
--
Gustavo Seabra.


On Thu, Oct 29, 2020 at 4:33 PM Igor Pletnev 
mailto:igor.plet...@gmail.com>> wrote:
>  Is this "/FixedH" an option in RDKit? How to use that? (I don't see it in 
> the docs).

Sorry, I am not so proficient in RDKit and can not answer exactly. Anyway, this 
option is available in InChI API calls, and I am pretty sure that it is also 
available in RDKit.

I recall that couple of years ago, on some InChI event,  Greg Landrum somewhat 
surprised me by saying that he himself often uses non-Standard InChI instead of 
Standard one — exactly to distinguish tautomers.
So I guess Greg can answer on how it is arranged in RDKit.

Regards,
Igor





On Thu, 29 Oct 2020 at 23:03, Gustavo Seabra 
mailto:gustavo.sea...@gmail.com>> wrote:
That does make sense, I understand it now, thanks!

Is this "/FixedH" an option in RDKit? How to use that? (I don't see it in the 
docs).

Thanks,
--
Gustavo Seabra.


On Wed, Oct 28, 2020 at 6:10 PM Igor Pletnev 
mailto:igor.plet...@gmail.com>> wrote:
Hi Gustavo,

>  ... I was generating the InChI Keys to get a unique hash for each compound, 
> thinking it would be better than SMILES (guaranteed to be unique), but is 
> clearly not the case. On the bright side, I won't lose time generating 
> InChIs...

though InChI is not perfect, in this case it behaves as intended.
Please see below.

The discussed molecules contain substituted guanidine fragment 
(RHN)C(=NMe)(NHR')

It is subjected to tautomerism, and in different tautomers different C-N bonds 
have double order:
(RHN)C(=NMe)(NHR')
(RHN)C(NHMe)(=NR')
(RN=)C(NHMe)(NHR')

You generated Standard InChI, which is evidenced by "InChI=1S/" prefix in the 
examples.
Standard InChI is specifically designed to produce the same identifier for all 
tautomers (by indicating that two hydrogens are shared by three nitrogen atoms, 
for any tautomer; bond orders are not indicated in InChI).

As the tautomer-invariant Std InChI does not know which C-N bond is actually a 
double, there is the only option for treating stereo -- to completely ignore it 
as a drawing artifact.

All in all:
Standard InChI means that the exact tautomeric form is unknown ==> all 
tautomers are mapped to the same generic representation ==>  the exact C-N 
double bond placement in this generic is unspecified ==> C-N double bond stereo 
is ignored ==> generated StdInChI and Std InChIKey are the same for seemingly 
different, by initial drawing, cis/trans forms.

Once again, this behavior is by design; it is intended for maximal 
interoperability while comparing different drawings of the "same" compound.

If, for any reason, you would like to consider your examples as the definite 
and resolvable structures, each having its own identifier, just use 
non-Standard InChI.
The InChI which preserves the exact positions of tautomeric H's and double bond 
("as drawn") is produced by just specifying option /FixedH upon generation.

More on this may be found in InChI FAQ:
https://www.inchi-trust.org/technical-faq-2/

Hope this helps.

Regards,
Igor



On Mon, Oct 26, 2020 at 6:56 PM Gustavo Seabra 
mailto:gustavo.sea...@gmail.com>> wrote:
Thanks a lot Peter and Adelene,

Yes, it looks like canonical SMILES is the way to go, and I have no problem 
sticking with RDKit. I was generating the InChI Keys to get a unique hash for 
each compound, thinking it would be better than SMILES (guaranteed to be 
unique), but is clearly not the case. On the bright side, I won't lose time 
generating InChIs...

Can I trust that the same molecule will always get the same canonical SMILES 
from RDKit, independent of how it is read? (Different SDF files, geometries, 
atom orders, etc.?)

All the best,
Gustavo.


--
Gustavo Seabra.


On Sun, Oct 25, 2020 at 8:27 PM Peter S. Shenkin 
mailto:shen...@gmail.com>> wrote:
Canonical SMILES is probably the way to go, but you might also be able to use 
the InchiKey and the Inchi auxiliary information together as a compound hash 
key.

-P.

On Sun, Oct 25, 2020 at 10:53 AM Adelene LAI 
mailto:adelene@uni.lu>> wrote:

Hi Gustavo,


(Sorry, forgot to reply all before...)


Your deduplication task is quite fa

Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-29 Thread Paolo Tosco
LES is the way to go, and I have no
>>>>> problem sticking with RDKit. I was generating the InChI Keys to get a
>>>>> unique hash for each compound, thinking it would be better than SMILES
>>>>> (guaranteed to be unique), but is clearly not the case. On the bright 
>>>>> side,
>>>>> I won't lose time generating InChIs...
>>>>>
>>>>> Can I trust that the same molecule will always get the same canonical
>>>>> SMILES from RDKit, independent of how it is read? (Different SDF files,
>>>>> geometries, atom orders, etc.?)
>>>>>
>>>>> All the best,
>>>>> Gustavo.
>>>>>
>>>>>
>>>>> --
>>>>> Gustavo Seabra.
>>>>>
>>>>>
>>>>> On Sun, Oct 25, 2020 at 8:27 PM Peter S. Shenkin 
>>>>> wrote:
>>>>>
>>>>>> Canonical SMILES is probably the way to go, but you might also be
>>>>>> able to use the InchiKey and the Inchi auxiliary information together as 
>>>>>> a
>>>>>> compound hash key.
>>>>>>
>>>>>> -P.
>>>>>>
>>>>>> On Sun, Oct 25, 2020 at 10:53 AM Adelene LAI 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Gustavo,
>>>>>>>
>>>>>>>
>>>>>>> (Sorry, forgot to reply all before...)
>>>>>>>
>>>>>>>
>>>>>>> Your deduplication task is quite familiar to me and something I do
>>>>>>> quite a lot of in my own work ;)
>>>>>>>
>>>>>>>
>>>>>>> Can I suggest deduplicating using Canonical SMILES?
>>>>>>>
>>>>>>>
>>>>>>> It doesn't solve your InChIKey issue, but it is a solution for now.
>>>>>>>
>>>>>>>
>>>>>>> I updated my gist to show that it is feasible:
>>>>>>>
>>>>>>>
>>>>>>> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f
>>>>>>>
>>>>>>>
>>>>>>> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>
>>>>>>>
>>>>>>> Adelene
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Doctoral Researcher
>>>>>>>
>>>>>>> Environmental Cheminformatics
>>>>>>>
>>>>>>> UNIVERSITÉ DU LUXEMBOURG
>>>>>>>
>>>>>>>
>>>>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>>>>>>
>>>>>>> 6, avenue du Swing
>>>>>>> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail=g>,
>>>>>>> L-4367 Belvaux
>>>>>>>
>>>>>>> T +356 46 66 44 67 18
>>>>>>>
>>>>>>> [image: github.png] adelenelai
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *From:* Gustavo Seabra 
>>>>>>> *Sent:* Sunday, October 25, 2020 2:27:15 PM
>>>>>>> *To:* Adelene LAI
>>>>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same
>>>>>>> InChI Key
>>>>>>>
>>>>>>> Actually,  I was trying to generate all stereoisomers for molecules
>>>>>>> in a database,  and filter duplicate molecules by using the InChI Key to
>>>>>>> detect duplicates.  But it gives cis/trans isomers on sp2-N the same 
>>>>>>> Key.
>>>>>>>
>>>>>>> Gustavo.
>>>>>>>
>>>>>>> --
>>>>>>> Gustavo Seabra
>>>>>>>
>>>>>>> --
>>>>>>> *From:* Adelene LAI 
>>>>>>> *Sent:* Sunday, October 25, 2020 1:44:01 AM
>>>>>>> *To:* Gustavo Seabra 
>>>>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same
>>>>>>> InChI Key
>>>>>>>
>>>>>>>
>>

Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-29 Thread Gustavo Seabra
PM Peter S. Shenkin 
>>>> wrote:
>>>>
>>>>> Canonical SMILES is probably the way to go, but you might also be able
>>>>> to use the InchiKey and the Inchi auxiliary information together as a
>>>>> compound hash key.
>>>>>
>>>>> -P.
>>>>>
>>>>> On Sun, Oct 25, 2020 at 10:53 AM Adelene LAI 
>>>>> wrote:
>>>>>
>>>>>> Hi Gustavo,
>>>>>>
>>>>>>
>>>>>> (Sorry, forgot to reply all before...)
>>>>>>
>>>>>>
>>>>>> Your deduplication task is quite familiar to me and something I do
>>>>>> quite a lot of in my own work ;)
>>>>>>
>>>>>>
>>>>>> Can I suggest deduplicating using Canonical SMILES?
>>>>>>
>>>>>>
>>>>>> It doesn't solve your InChIKey issue, but it is a solution for now.
>>>>>>
>>>>>>
>>>>>> I updated my gist to show that it is feasible:
>>>>>>
>>>>>>
>>>>>> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f
>>>>>>
>>>>>>
>>>>>> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>
>>>>>>
>>>>>> Adelene
>>>>>>
>>>>>>
>>>>>>
>>>>>> Doctoral Researcher
>>>>>>
>>>>>> Environmental Cheminformatics
>>>>>>
>>>>>> UNIVERSITÉ DU LUXEMBOURG
>>>>>>
>>>>>>
>>>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>>>>>
>>>>>> 6, avenue du Swing
>>>>>> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail=g>,
>>>>>> L-4367 Belvaux
>>>>>>
>>>>>> T +356 46 66 44 67 18
>>>>>>
>>>>>> [image: github.png] adelenelai
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *From:* Gustavo Seabra 
>>>>>> *Sent:* Sunday, October 25, 2020 2:27:15 PM
>>>>>> *To:* Adelene LAI
>>>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same
>>>>>> InChI Key
>>>>>>
>>>>>> Actually,  I was trying to generate all stereoisomers for molecules
>>>>>> in a database,  and filter duplicate molecules by using the InChI Key to
>>>>>> detect duplicates.  But it gives cis/trans isomers on sp2-N the same Key.
>>>>>>
>>>>>> Gustavo.
>>>>>>
>>>>>> --
>>>>>> Gustavo Seabra
>>>>>>
>>>>>> --
>>>>>> *From:* Adelene LAI 
>>>>>> *Sent:* Sunday, October 25, 2020 1:44:01 AM
>>>>>> *To:* Gustavo Seabra 
>>>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same
>>>>>> InChI Key
>>>>>>
>>>>>>
>>>>>> Hi Gustavo,
>>>>>>
>>>>>>
>>>>>> It occurred to me while swimming yesterday - was there a reason you
>>>>>> pointed out the hybridisation state of N in your original subject text?
>>>>>>
>>>>>>
>>>>>> Was it just to specify which N to focus on, or did you expect
>>>>>> something special about sp2 hybridisation wrt InChIKey?
>>>>>>
>>>>>>
>>>>>> Adelene
>>>>>>
>>>>>>
>>>>>> Doctoral Researcher
>>>>>>
>>>>>> Environmental Cheminformatics
>>>>>>
>>>>>> UNIVERSITÉ DU LUXEMBOURG
>>>>>>
>>>>>>
>>>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>>>>>
>>>>>> 6, avenue du Swing
>>>>>> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail=g>,
>>>>>> L-4367 Belvaux
>>>>>>
>>>>>> T +356 46 66 44 67 18
>>>>>>
>>>>>> [image: github.png] adelenelai
>>>>>>
>>&g

Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-29 Thread Gustavo Seabra
That does make sense, I understand it now, thanks!

Is this "/FixedH" an option in RDKit? How to use that? (I don't see it in
the docs).

Thanks,
--
Gustavo Seabra.


On Wed, Oct 28, 2020 at 6:10 PM Igor Pletnev  wrote:

> Hi Gustavo,
>
> >  ... I was generating the InChI Keys to get a unique hash for each
> compound, thinking it would be better than SMILES (guaranteed to be
> unique), but is clearly not the case. On the bright side, I won't lose time
> generating InChIs...
>
> though InChI is not perfect, in this case it behaves as intended.
> Please see below.
>
> The discussed molecules contain substituted guanidine fragment
> (RHN)C(=NMe)(NHR')
>
> It is subjected to tautomerism, and in different tautomers different C-N
> bonds have double order:
> (RHN)C(=NMe)(NHR')
> (RHN)C(NHMe)(=NR')
> (RN=)C(NHMe)(NHR')
>
> You generated Standard InChI, which is evidenced by "InChI=1S/" prefix in
> the examples.
> Standard InChI is specifically designed to produce the same identifier for
> all tautomers (by indicating that two hydrogens are shared by three
> nitrogen atoms, for any tautomer; bond orders are not indicated in InChI).
>
> As the tautomer-invariant Std InChI does not know which C-N bond is
> actually a double, there is the only option for treating stereo -- to
> completely ignore it as a drawing artifact.
>
> All in all:
> Standard InChI means that the exact tautomeric form is unknown ==> all
> tautomers are mapped to the same generic representation ==>  the exact C-N
> double bond placement in this generic is unspecified ==> C-N double bond
> stereo is ignored ==> generated StdInChI and Std InChIKey are the same for
> seemingly different, by initial drawing, cis/trans forms.
>
> Once again, this behavior is by design; it is intended for maximal
> interoperability while comparing different drawings of the "same" compound.
>
> If, for any reason, you would like to consider your examples as the
> definite and resolvable structures, each having its own identifier, just
> use non-Standard InChI.
> The InChI which preserves the exact positions of tautomeric H's and double
> bond ("as drawn") is produced by just specifying option /FixedH upon
> generation.
>
> More on this may be found in InChI FAQ:
> https://www.inchi-trust.org/technical-faq-2/
>
> Hope this helps.
>
> Regards,
> Igor
>
>
>
> On Mon, Oct 26, 2020 at 6:56 PM Gustavo Seabra 
> wrote:
>
>> Thanks a lot Peter and Adelene,
>>
>> Yes, it looks like canonical SMILES is the way to go, and I have no
>> problem sticking with RDKit. I was generating the InChI Keys to get a
>> unique hash for each compound, thinking it would be better than SMILES
>> (guaranteed to be unique), but is clearly not the case. On the bright side,
>> I won't lose time generating InChIs...
>>
>> Can I trust that the same molecule will always get the same canonical
>> SMILES from RDKit, independent of how it is read? (Different SDF files,
>> geometries, atom orders, etc.?)
>>
>> All the best,
>> Gustavo.
>>
>>
>> --
>> Gustavo Seabra.
>>
>>
>> On Sun, Oct 25, 2020 at 8:27 PM Peter S. Shenkin 
>> wrote:
>>
>>> Canonical SMILES is probably the way to go, but you might also be able
>>> to use the InchiKey and the Inchi auxiliary information together as a
>>> compound hash key.
>>>
>>> -P.
>>>
>>> On Sun, Oct 25, 2020 at 10:53 AM Adelene LAI  wrote:
>>>
>>>> Hi Gustavo,
>>>>
>>>>
>>>> (Sorry, forgot to reply all before...)
>>>>
>>>>
>>>> Your deduplication task is quite familiar to me and something I do
>>>> quite a lot of in my own work ;)
>>>>
>>>>
>>>> Can I suggest deduplicating using Canonical SMILES?
>>>>
>>>>
>>>> It doesn't solve your InChIKey issue, but it is a solution for now.
>>>>
>>>>
>>>> I updated my gist to show that it is feasible:
>>>>
>>>>
>>>> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f
>>>>
>>>>
>>>> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>
>>>>
>>>> Adelene
>>>>
>>>>
>>>>
>>>> Doctoral Researcher
>>>>
>>>> Environmental Cheminformatics
>>>>
>>>> UNIVERSITÉ DU LUXEMBOURG
>>>>
>>>>
>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>>>

Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-26 Thread Gustavo Seabra
Thanks a lot Peter and Adelene,

Yes, it looks like canonical SMILES is the way to go, and I have no problem
sticking with RDKit. I was generating the InChI Keys to get a unique hash
for each compound, thinking it would be better than SMILES (guaranteed to
be unique), but is clearly not the case. On the bright side, I won't lose
time generating InChIs...

Can I trust that the same molecule will always get the same canonical
SMILES from RDKit, independent of how it is read? (Different SDF files,
geometries, atom orders, etc.?)

All the best,
Gustavo.


--
Gustavo Seabra.


On Sun, Oct 25, 2020 at 8:27 PM Peter S. Shenkin  wrote:

> Canonical SMILES is probably the way to go, but you might also be able to
> use the InchiKey and the Inchi auxiliary information together as a compound
> hash key.
>
> -P.
>
> On Sun, Oct 25, 2020 at 10:53 AM Adelene LAI  wrote:
>
>> Hi Gustavo,
>>
>>
>> (Sorry, forgot to reply all before...)
>>
>>
>> Your deduplication task is quite familiar to me and something I do quite
>> a lot of in my own work ;)
>>
>>
>> Can I suggest deduplicating using Canonical SMILES?
>>
>>
>> It doesn't solve your InChIKey issue, but it is a solution for now.
>>
>>
>> I updated my gist to show that it is feasible:
>>
>>
>> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f
>>
>>
>> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>
>>
>> Adelene
>>
>>
>>
>> Doctoral Researcher
>>
>> Environmental Cheminformatics
>>
>> UNIVERSITÉ DU LUXEMBOURG
>>
>>
>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>
>> 6, avenue du Swing, L-4367 Belvaux
>>
>> T +356 46 66 44 67 18
>>
>> [image: github.png] adelenelai
>>
>>
>>
>>
>>
>> --
>> *From:* Gustavo Seabra 
>> *Sent:* Sunday, October 25, 2020 2:27:15 PM
>> *To:* Adelene LAI
>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI
>> Key
>>
>> Actually,  I was trying to generate all stereoisomers for molecules in a
>> database,  and filter duplicate molecules by using the InChI Key to detect
>> duplicates.  But it gives cis/trans isomers on sp2-N the same Key.
>>
>> Gustavo.
>>
>> --
>> Gustavo Seabra
>>
>> --
>> *From:* Adelene LAI 
>> *Sent:* Sunday, October 25, 2020 1:44:01 AM
>> *To:* Gustavo Seabra 
>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI
>> Key
>>
>>
>> Hi Gustavo,
>>
>>
>> It occurred to me while swimming yesterday - was there a reason you
>> pointed out the hybridisation state of N in your original subject text?
>>
>>
>> Was it just to specify which N to focus on, or did you expect something
>> special about sp2 hybridisation wrt InChIKey?
>>
>>
>> Adelene
>>
>>
>> Doctoral Researcher
>>
>> Environmental Cheminformatics
>>
>> UNIVERSITÉ DU LUXEMBOURG
>>
>>
>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>
>> 6, avenue du Swing, L-4367 Belvaux
>>
>> T +356 46 66 44 67 18
>>
>> [image: github.png] adelenelai
>>
>>
>>
>>
>>
>> ----------
>> *From:* Gustavo Seabra 
>> *Sent:* Saturday, October 24, 2020 5:37:09 AM
>> *To:* RDKit Discuss; Adelene LAI
>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI
>> Key
>>
>> Thanks for looking into it. I'm happy to see.it wasn't just a mistake by
>> me ;-)
>>
>> I hope we can find what's wrong there.
>>
>> Best,
>> Gustavo.
>>
>> --
>> Gustavo Seabra
>>
>> --
>> *From:* Adelene LAI 
>> *Sent:* Friday, October 23, 2020 11:28:55 PM
>> *To:* Gustavo Seabra ; RDKit Discuss <
>> rdkit-discuss@lists.sourceforge.net>
>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI
>> Key
>>
>>
>> Hi Gustavo,
>>
>>
>> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>
>> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f
>>
>>
>> In the gist above, I tried doing some further investigating.
>>
>>
>> It seems for the example you gave, the rdkit functions indeed give the
>> same inchikey and inchi, but different aux info.
>>
>>
>> Why this different aux info doesn't translate into di

Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-25 Thread Peter S. Shenkin
Canonical SMILES is probably the way to go, but you might also be able to
use the InchiKey and the Inchi auxiliary information together as a compound
hash key.

-P.

On Sun, Oct 25, 2020 at 10:53 AM Adelene LAI  wrote:

> Hi Gustavo,
>
>
> (Sorry, forgot to reply all before...)
>
>
> Your deduplication task is quite familiar to me and something I do quite a
> lot of in my own work ;)
>
>
> Can I suggest deduplicating using Canonical SMILES?
>
>
> It doesn't solve your InChIKey issue, but it is a solution for now.
>
>
> I updated my gist to show that it is feasible:
>
>
> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f
>
>
> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>
>
> Adelene
>
>
>
> Doctoral Researcher
>
> Environmental Cheminformatics
>
> UNIVERSITÉ DU LUXEMBOURG
>
>
> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>
> 6, avenue du Swing, L-4367 Belvaux
>
> T +356 46 66 44 67 18
>
> [image: github.png] adelenelai
>
>
>
>
>
> --------------
> *From:* Gustavo Seabra 
> *Sent:* Sunday, October 25, 2020 2:27:15 PM
> *To:* Adelene LAI
> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key
>
> Actually,  I was trying to generate all stereoisomers for molecules in a
> database,  and filter duplicate molecules by using the InChI Key to detect
> duplicates.  But it gives cis/trans isomers on sp2-N the same Key.
>
> Gustavo.
>
> --
> Gustavo Seabra
>
> ------
> *From:* Adelene LAI 
> *Sent:* Sunday, October 25, 2020 1:44:01 AM
> *To:* Gustavo Seabra 
> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key
>
>
> Hi Gustavo,
>
>
> It occurred to me while swimming yesterday - was there a reason you
> pointed out the hybridisation state of N in your original subject text?
>
>
> Was it just to specify which N to focus on, or did you expect something
> special about sp2 hybridisation wrt InChIKey?
>
>
> Adelene
>
>
> Doctoral Researcher
>
> Environmental Cheminformatics
>
> UNIVERSITÉ DU LUXEMBOURG
>
>
> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>
> 6, avenue du Swing, L-4367 Belvaux
>
> T +356 46 66 44 67 18
>
> [image: github.png] adelenelai
>
>
>
>
>
> --
> *From:* Gustavo Seabra 
> *Sent:* Saturday, October 24, 2020 5:37:09 AM
> *To:* RDKit Discuss; Adelene LAI
> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key
>
> Thanks for looking into it. I'm happy to see.it wasn't just a mistake by
> me ;-)
>
> I hope we can find what's wrong there.
>
> Best,
> Gustavo.
>
> --
> Gustavo Seabra
>
> --
> *From:* Adelene LAI 
> *Sent:* Friday, October 23, 2020 11:28:55 PM
> *To:* Gustavo Seabra ; RDKit Discuss <
> rdkit-discuss@lists.sourceforge.net>
> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key
>
>
> Hi Gustavo,
>
>
> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>
> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f
>
>
> In the gist above, I tried doing some further investigating.
>
>
> It seems for the example you gave, the rdkit functions indeed give the
> same inchikey and inchi, but different aux info.
>
>
> Why this different aux info doesn't translate into different
> inchikeys/inchis, I'm not sure.
>
>
> Adelene
>
>
>
>
>
>
> Doctoral Researcher
>
> Environmental Cheminformatics
>
> UNIVERSITÉ DU LUXEMBOURG
>
>
> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>
> 6, avenue du Swing, L-4367 Belvaux
>
> T +356 46 66 44 67 18
>
> [image: github.png] adelenelai
>
>
>
>
>
> --
> *From:* Gustavo Seabra 
> *Sent:* Friday, October 23, 2020 6:43:07 PM
> *To:* RDKit Discuss
> *Subject:* [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key
>
> Hi all,
>
> I run into an issue here, and I'd appreciate your input. I noticed that
> compounds that differ only on the cis-trans isomerization around an sp2
> nitrogen get the same InChI Key from RDKit. For example:
>
> > inchi_cis =
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C"))
> > inchi_cis
> 'AQIXAKUUQRKLND-UHFFFAOYSA-N'
>
> > inchi_trans =
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C"))
> > inchi_trans
> 'AQIXAKUUQRKLND-UHFFFAOYSA-N'
>
> > inchi_cis == inchi_trans
> True
>
> I wonder if this is a limitation of the InChI Key definition, or an
> implementation issue.
>
> Thanks a lot,
> --
> Gustavo Seabra.
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-25 Thread Adelene LAI
Hi Gustavo,


(Sorry, forgot to reply all before...)


Your deduplication task is quite familiar to me and something I do quite a lot 
of in my own work ;)


Can I suggest deduplicating using Canonical SMILES?


It doesn't solve your InChIKey issue, but it is a solution for now.


I updated my gist to show that it is feasible:


https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f


<https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>

Adelene


Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai






From: Gustavo Seabra 
Sent: Sunday, October 25, 2020 2:27:15 PM
To: Adelene LAI
Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

Actually,  I was trying to generate all stereoisomers for molecules in a 
database,  and filter duplicate molecules by using the InChI Key to detect 
duplicates.  But it gives cis/trans isomers on sp2-N the same Key.

Gustavo.

--
Gustavo Seabra


From: Adelene LAI 
Sent: Sunday, October 25, 2020 1:44:01 AM
To: Gustavo Seabra 
Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key


Hi Gustavo,


It occurred to me while swimming yesterday - was there a reason you pointed out 
the hybridisation state of N in your original subject text?


Was it just to specify which N to focus on, or did you expect something special 
about sp2 hybridisation wrt InChIKey?


Adelene


Doctoral Researcher

Environmental Cheminformatics

UNIVERSITÉ DU LUXEMBOURG


LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE

6, avenue du Swing, L-4367 Belvaux

T +356 46 66 44 67 18

[github.png] adelenelai






From: Gustavo Seabra 
Sent: Saturday, October 24, 2020 5:37:09 AM
To: RDKit Discuss; Adelene LAI
Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

Thanks for looking into it. I'm happy to see.it wasn't just a mistake by me ;-)

I hope we can find what's wrong there.

Best,
Gustavo.

--
Gustavo Seabra


From: Adelene LAI 
Sent: Friday, October 23, 2020 11:28:55 PM
To: Gustavo Seabra ; RDKit Discuss 

Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key


Hi Gustavo,


<https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f


In the gist above, I tried doing some further investigating.


It seems for the example you gave, the rdkit functions indeed give the same 
inchikey and inchi, but different aux info.


Why this different aux info doesn't translate into different inchikeys/inchis, 
I'm not sure.


Adelene






Doctoral Researcher

Environmental Cheminformatics

UNIVERSITÉ DU LUXEMBOURG


LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE

6, avenue du Swing, L-4367 Belvaux

T +356 46 66 44 67 18

[github.png] adelenelai






From: Gustavo Seabra 
Sent: Friday, October 23, 2020 6:43:07 PM
To: RDKit Discuss
Subject: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

Hi all,

I run into an issue here, and I'd appreciate your input. I noticed that 
compounds that differ only on the cis-trans isomerization around an sp2 
nitrogen get the same InChI Key from RDKit. For example:

> inchi_cis = 
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C"))
> inchi_cis
'AQIXAKUUQRKLND-UHFFFAOYSA-N'

> inchi_trans = 
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C"))
> inchi_trans
'AQIXAKUUQRKLND-UHFFFAOYSA-N'

> inchi_cis == inchi_trans
True

I wonder if this is a limitation of the InChI Key definition, or an 
implementation issue.

Thanks a lot,
--
Gustavo Seabra.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-23 Thread Gustavo Seabra
Thanks for looking into it. I'm happy to see.it wasn't just a mistake by me ;-)

I hope we can find what's wrong there.

Best,
Gustavo.

--
Gustavo Seabra


From: Adelene LAI 
Sent: Friday, October 23, 2020 11:28:55 PM
To: Gustavo Seabra ; RDKit Discuss 

Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key


Hi Gustavo,


<https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f


In the gist above, I tried doing some further investigating.


It seems for the example you gave, the rdkit functions indeed give the same 
inchikey and inchi, but different aux info.


Why this different aux info doesn't translate into different inchikeys/inchis, 
I'm not sure.


Adelene






Doctoral Researcher

Environmental Cheminformatics

UNIVERSITÉ DU LUXEMBOURG


LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE

6, avenue du Swing, L-4367 Belvaux

T +356 46 66 44 67 18

[github.png] adelenelai






From: Gustavo Seabra 
Sent: Friday, October 23, 2020 6:43:07 PM
To: RDKit Discuss
Subject: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

Hi all,

I run into an issue here, and I'd appreciate your input. I noticed that 
compounds that differ only on the cis-trans isomerization around an sp2 
nitrogen get the same InChI Key from RDKit. For example:

> inchi_cis = 
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C"))
> inchi_cis
'AQIXAKUUQRKLND-UHFFFAOYSA-N'

> inchi_trans = 
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C"))
> inchi_trans
'AQIXAKUUQRKLND-UHFFFAOYSA-N'

> inchi_cis == inchi_trans
True

I wonder if this is a limitation of the InChI Key definition, or an 
implementation issue.

Thanks a lot,
--
Gustavo Seabra.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-23 Thread Adelene LAI
Hi Gustavo,


<https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f


In the gist above, I tried doing some further investigating.


It seems for the example you gave, the rdkit functions indeed give the same 
inchikey and inchi, but different aux info.


Why this different aux info doesn't translate into different inchikeys/inchis, 
I'm not sure.


Adelene





Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai






From: Gustavo Seabra 
Sent: Friday, October 23, 2020 6:43:07 PM
To: RDKit Discuss
Subject: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

Hi all,

I run into an issue here, and I'd appreciate your input. I noticed that 
compounds that differ only on the cis-trans isomerization around an sp2 
nitrogen get the same InChI Key from RDKit. For example:

> inchi_cis = 
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C"))
> inchi_cis
'AQIXAKUUQRKLND-UHFFFAOYSA-N'

> inchi_trans = 
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C"))
> inchi_trans
'AQIXAKUUQRKLND-UHFFFAOYSA-N'

> inchi_cis == inchi_trans
True

I wonder if this is a limitation of the InChI Key definition, or an 
implementation issue.

Thanks a lot,
--
Gustavo Seabra.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-23 Thread Gustavo Seabra
Hi all,

I run into an issue here, and I'd appreciate your input. I noticed that
compounds that differ only on the cis-trans isomerization around an sp2
nitrogen get the same InChI Key from RDKit. For example:

> inchi_cis =
Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C"))
> inchi_cis
'AQIXAKUUQRKLND-UHFFFAOYSA-N'

> inchi_trans =
Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C"))
> inchi_trans
'AQIXAKUUQRKLND-UHFFFAOYSA-N'

> inchi_cis == inchi_trans
True

I wonder if this is a limitation of the InChI Key definition, or an
implementation issue.

Thanks a lot,
--
Gustavo Seabra.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss