That does make sense, I understand it now, thanks!

Is this "/FixedH" an option in RDKit? How to use that? (I don't see it in
the docs).

Thanks,
--
Gustavo Seabra.


On Wed, Oct 28, 2020 at 6:10 PM Igor Pletnev <igor.plet...@gmail.com> wrote:

> Hi Gustavo,
>
> >  ... I was generating the InChI Keys to get a unique hash for each
> compound, thinking it would be better than SMILES (guaranteed to be
> unique), but is clearly not the case. On the bright side, I won't lose time
> generating InChIs...
>
> though InChI is not perfect, in this case it behaves as intended.
> Please see below.
>
> The discussed molecules contain substituted guanidine fragment
> (RHN)C(=NMe)(NHR')
>
> It is subjected to tautomerism, and in different tautomers different C-N
> bonds have double order:
> (RHN)C(=NMe)(NHR')
> (RHN)C(NHMe)(=NR')
> (RN=)C(NHMe)(NHR')
>
> You generated Standard InChI, which is evidenced by "InChI=1S/" prefix in
> the examples.
> Standard InChI is specifically designed to produce the same identifier for
> all tautomers (by indicating that two hydrogens are shared by three
> nitrogen atoms, for any tautomer; bond orders are not indicated in InChI).
>
> As the tautomer-invariant Std InChI does not know which C-N bond is
> actually a double, there is the only option for treating stereo -- to
> completely ignore it as a drawing artifact.
>
> All in all:
> Standard InChI means that the exact tautomeric form is unknown ==> all
> tautomers are mapped to the same generic representation ==>  the exact C-N
> double bond placement in this generic is unspecified ==> C-N double bond
> stereo is ignored ==> generated StdInChI and Std InChIKey are the same for
> seemingly different, by initial drawing, cis/trans forms.
>
> Once again, this behavior is by design; it is intended for maximal
> interoperability while comparing different drawings of the "same" compound.
>
> If, for any reason, you would like to consider your examples as the
> definite and resolvable structures, each having its own identifier, just
> use non-Standard InChI.
> The InChI which preserves the exact positions of tautomeric H's and double
> bond ("as drawn") is produced by just specifying option /FixedH upon
> generation.
>
> More on this may be found in InChI FAQ:
> https://www.inchi-trust.org/technical-faq-2/
>
> Hope this helps.
>
> Regards,
> Igor
>
>
>
> On Mon, Oct 26, 2020 at 6:56 PM Gustavo Seabra <gustavo.sea...@gmail.com>
> wrote:
>
>> Thanks a lot Peter and Adelene,
>>
>> Yes, it looks like canonical SMILES is the way to go, and I have no
>> problem sticking with RDKit. I was generating the InChI Keys to get a
>> unique hash for each compound, thinking it would be better than SMILES
>> (guaranteed to be unique), but is clearly not the case. On the bright side,
>> I won't lose time generating InChIs...
>>
>> Can I trust that the same molecule will always get the same canonical
>> SMILES from RDKit, independent of how it is read? (Different SDF files,
>> geometries, atom orders, etc.?)
>>
>> All the best,
>> Gustavo.
>>
>>
>> --
>> Gustavo Seabra.
>>
>>
>> On Sun, Oct 25, 2020 at 8:27 PM Peter S. Shenkin <shen...@gmail.com>
>> wrote:
>>
>>> Canonical SMILES is probably the way to go, but you might also be able
>>> to use the InchiKey and the Inchi auxiliary information together as a
>>> compound hash key.
>>>
>>> -P.
>>>
>>> On Sun, Oct 25, 2020 at 10:53 AM Adelene LAI <adelene....@uni.lu> wrote:
>>>
>>>> Hi Gustavo,
>>>>
>>>>
>>>> (Sorry, forgot to reply all before...)
>>>>
>>>>
>>>> Your deduplication task is quite familiar to me and something I do
>>>> quite a lot of in my own work ;)
>>>>
>>>>
>>>> Can I suggest deduplicating using Canonical SMILES?
>>>>
>>>>
>>>> It doesn't solve your InChIKey issue, but it is a solution for now.
>>>>
>>>>
>>>> I updated my gist to show that it is feasible:
>>>>
>>>>
>>>> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f
>>>>
>>>>
>>>> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>
>>>>
>>>> Adelene
>>>>
>>>>
>>>>
>>>> Doctoral Researcher
>>>>
>>>> Environmental Cheminformatics
>>>>
>>>> UNIVERSITÉ DU LUXEMBOURG
>>>>
>>>>
>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>>>
>>>> 6, avenue du Swing, L-4367 Belvaux
>>>>
>>>> T +356 46 66 44 67 18
>>>>
>>>> [image: github.png] adelenelai
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>> *From:* Gustavo Seabra <gustavo.sea...@gmail.com>
>>>> *Sent:* Sunday, October 25, 2020 2:27:15 PM
>>>> *To:* Adelene LAI
>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI
>>>> Key
>>>>
>>>> Actually,  I was trying to generate all stereoisomers for molecules in
>>>> a database,  and filter duplicate molecules by using the InChI Key to
>>>> detect duplicates.  But it gives cis/trans isomers on sp2-N the same Key.
>>>>
>>>> Gustavo.
>>>>
>>>> --
>>>> Gustavo Seabra
>>>>
>>>> ------------------------------
>>>> *From:* Adelene LAI <adelene....@uni.lu>
>>>> *Sent:* Sunday, October 25, 2020 1:44:01 AM
>>>> *To:* Gustavo Seabra <gustavo.sea...@gmail.com>
>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI
>>>> Key
>>>>
>>>>
>>>> Hi Gustavo,
>>>>
>>>>
>>>> It occurred to me while swimming yesterday - was there a reason you
>>>> pointed out the hybridisation state of N in your original subject text?
>>>>
>>>>
>>>> Was it just to specify which N to focus on, or did you expect something
>>>> special about sp2 hybridisation wrt InChIKey?
>>>>
>>>>
>>>> Adelene
>>>>
>>>>
>>>> Doctoral Researcher
>>>>
>>>> Environmental Cheminformatics
>>>>
>>>> UNIVERSITÉ DU LUXEMBOURG
>>>>
>>>>
>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>>>
>>>> 6, avenue du Swing, L-4367 Belvaux
>>>>
>>>> T +356 46 66 44 67 18
>>>>
>>>> [image: github.png] adelenelai
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>> *From:* Gustavo Seabra <gustavo.sea...@gmail.com>
>>>> *Sent:* Saturday, October 24, 2020 5:37:09 AM
>>>> *To:* RDKit Discuss; Adelene LAI
>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI
>>>> Key
>>>>
>>>> Thanks for looking into it. I'm happy to see.it wasn't just a mistake
>>>> by me ;-)
>>>>
>>>> I hope we can find what's wrong there.
>>>>
>>>> Best,
>>>> Gustavo.
>>>>
>>>> --
>>>> Gustavo Seabra
>>>>
>>>> ------------------------------
>>>> *From:* Adelene LAI <adelene....@uni.lu>
>>>> *Sent:* Friday, October 23, 2020 11:28:55 PM
>>>> *To:* Gustavo Seabra <gustavo.sea...@gmail.com>; RDKit Discuss <
>>>> rdkit-discuss@lists.sourceforge.net>
>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI
>>>> Key
>>>>
>>>>
>>>> Hi Gustavo,
>>>>
>>>>
>>>> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>
>>>> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f
>>>>
>>>>
>>>> In the gist above, I tried doing some further investigating.
>>>>
>>>>
>>>> It seems for the example you gave, the rdkit functions indeed give the
>>>> same inchikey and inchi, but different aux info.
>>>>
>>>>
>>>> Why this different aux info doesn't translate into different
>>>> inchikeys/inchis, I'm not sure.
>>>>
>>>>
>>>> Adelene
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Doctoral Researcher
>>>>
>>>> Environmental Cheminformatics
>>>>
>>>> UNIVERSITÉ DU LUXEMBOURG
>>>>
>>>>
>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>>>
>>>> 6, avenue du Swing, L-4367 Belvaux
>>>>
>>>> T +356 46 66 44 67 18
>>>>
>>>> [image: github.png] adelenelai
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>> *From:* Gustavo Seabra <gustavo.sea...@gmail.com>
>>>> *Sent:* Friday, October 23, 2020 6:43:07 PM
>>>> *To:* RDKit Discuss
>>>> *Subject:* [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key
>>>>
>>>> Hi all,
>>>>
>>>> I run into an issue here, and I'd appreciate your input. I noticed that
>>>> compounds that differ only on the cis-trans isomerization around an sp2
>>>> nitrogen get the same InChI Key from RDKit. For example:
>>>>
>>>> > inchi_cis =
>>>> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C"))
>>>> > inchi_cis
>>>> 'AQIXAKUUQRKLND-UHFFFAOYSA-N'
>>>>
>>>> > inchi_trans =
>>>> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C"))
>>>> > inchi_trans
>>>> 'AQIXAKUUQRKLND-UHFFFAOYSA-N'
>>>>
>>>> > inchi_cis == inchi_trans
>>>> True
>>>>
>>>> I wonder if this is a limitation of the InChI Key definition, or an
>>>> implementation issue.
>>>>
>>>> Thanks a lot,
>>>> --
>>>> Gustavo Seabra.
>>>> _______________________________________________
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>
>>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to