Ok, thanks!
--
Gustavo Seabra.

On Thu, Oct 29, 2020 at 4:33 PM Igor Pletnev <igor.plet...@gmail.com> wrote:

> >  Is this "/FixedH" an option in RDKit? How to use that? (I don't see it
> in the docs).
>
> Sorry, I am not so proficient in RDKit and can not answer exactly. Anyway,
> this option is available in InChI API calls, and I am pretty sure that it
> is also available in RDKit.
>
> I recall that couple of years ago, on some InChI event,  Greg Landrum
> somewhat surprised me by saying that he himself often uses non-Standard
> InChI instead of Standard one — exactly to distinguish tautomers.
> So I guess Greg can answer on how it is arranged in RDKit.
>
> Regards,
> Igor
>
>
>
>
>
> On Thu, 29 Oct 2020 at 23:03, Gustavo Seabra <gustavo.sea...@gmail.com>
> wrote:
>
>> That does make sense, I understand it now, thanks!
>>
>> Is this "/FixedH" an option in RDKit? How to use that? (I don't see it in
>> the docs).
>>
>> Thanks,
>> --
>> Gustavo Seabra.
>>
>>
>> On Wed, Oct 28, 2020 at 6:10 PM Igor Pletnev <igor.plet...@gmail.com>
>> wrote:
>>
>>> Hi Gustavo,
>>>
>>> >  ... I was generating the InChI Keys to get a unique hash for each
>>> compound, thinking it would be better than SMILES (guaranteed to be
>>> unique), but is clearly not the case. On the bright side, I won't lose time
>>> generating InChIs...
>>>
>>> though InChI is not perfect, in this case it behaves as intended.
>>> Please see below.
>>>
>>> The discussed molecules contain substituted guanidine fragment
>>> (RHN)C(=NMe)(NHR')
>>>
>>> It is subjected to tautomerism, and in different tautomers different C-N
>>> bonds have double order:
>>> (RHN)C(=NMe)(NHR')
>>> (RHN)C(NHMe)(=NR')
>>> (RN=)C(NHMe)(NHR')
>>>
>>> You generated Standard InChI, which is evidenced by "InChI=1S/" prefix
>>> in the examples.
>>> Standard InChI is specifically designed to produce the same identifier
>>> for all tautomers (by indicating that two hydrogens are shared by three
>>> nitrogen atoms, for any tautomer; bond orders are not indicated in InChI).
>>>
>>> As the tautomer-invariant Std InChI does not know which C-N bond is
>>> actually a double, there is the only option for treating stereo -- to
>>> completely ignore it as a drawing artifact.
>>>
>>> All in all:
>>> Standard InChI means that the exact tautomeric form is unknown ==> all
>>> tautomers are mapped to the same generic representation ==>  the exact C-N
>>> double bond placement in this generic is unspecified ==> C-N double bond
>>> stereo is ignored ==> generated StdInChI and Std InChIKey are the same for
>>> seemingly different, by initial drawing, cis/trans forms.
>>>
>>> Once again, this behavior is by design; it is intended for maximal
>>> interoperability while comparing different drawings of the "same" compound.
>>>
>>> If, for any reason, you would like to consider your examples as the
>>> definite and resolvable structures, each having its own identifier, just
>>> use non-Standard InChI.
>>> The InChI which preserves the exact positions of tautomeric H's and
>>> double bond ("as drawn") is produced by just specifying option /FixedH upon
>>> generation.
>>>
>>> More on this may be found in InChI FAQ:
>>> https://www.inchi-trust.org/technical-faq-2/
>>>
>>> Hope this helps.
>>>
>>> Regards,
>>> Igor
>>>
>>>
>>>
>>> On Mon, Oct 26, 2020 at 6:56 PM Gustavo Seabra <gustavo.sea...@gmail.com>
>>> wrote:
>>>
>>>> Thanks a lot Peter and Adelene,
>>>>
>>>> Yes, it looks like canonical SMILES is the way to go, and I have no
>>>> problem sticking with RDKit. I was generating the InChI Keys to get a
>>>> unique hash for each compound, thinking it would be better than SMILES
>>>> (guaranteed to be unique), but is clearly not the case. On the bright side,
>>>> I won't lose time generating InChIs...
>>>>
>>>> Can I trust that the same molecule will always get the same canonical
>>>> SMILES from RDKit, independent of how it is read? (Different SDF files,
>>>> geometries, atom orders, etc.?)
>>>>
>>>> All the best,
>>>> Gustavo.
>>>>
>>>>
>>>> --
>>>> Gustavo Seabra.
>>>>
>>>>
>>>> On Sun, Oct 25, 2020 at 8:27 PM Peter S. Shenkin <shen...@gmail.com>
>>>> wrote:
>>>>
>>>>> Canonical SMILES is probably the way to go, but you might also be able
>>>>> to use the InchiKey and the Inchi auxiliary information together as a
>>>>> compound hash key.
>>>>>
>>>>> -P.
>>>>>
>>>>> On Sun, Oct 25, 2020 at 10:53 AM Adelene LAI <adelene....@uni.lu>
>>>>> wrote:
>>>>>
>>>>>> Hi Gustavo,
>>>>>>
>>>>>>
>>>>>> (Sorry, forgot to reply all before...)
>>>>>>
>>>>>>
>>>>>> Your deduplication task is quite familiar to me and something I do
>>>>>> quite a lot of in my own work ;)
>>>>>>
>>>>>>
>>>>>> Can I suggest deduplicating using Canonical SMILES?
>>>>>>
>>>>>>
>>>>>> It doesn't solve your InChIKey issue, but it is a solution for now.
>>>>>>
>>>>>>
>>>>>> I updated my gist to show that it is feasible:
>>>>>>
>>>>>>
>>>>>> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f
>>>>>>
>>>>>>
>>>>>> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>
>>>>>>
>>>>>> Adelene
>>>>>>
>>>>>>
>>>>>>
>>>>>> Doctoral Researcher
>>>>>>
>>>>>> Environmental Cheminformatics
>>>>>>
>>>>>> UNIVERSITÉ DU LUXEMBOURG
>>>>>>
>>>>>>
>>>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>>>>>
>>>>>> 6, avenue du Swing
>>>>>> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>,
>>>>>> L-4367 Belvaux
>>>>>>
>>>>>> T +356 46 66 44 67 18
>>>>>>
>>>>>> [image: github.png] adelenelai
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------
>>>>>> *From:* Gustavo Seabra <gustavo.sea...@gmail.com>
>>>>>> *Sent:* Sunday, October 25, 2020 2:27:15 PM
>>>>>> *To:* Adelene LAI
>>>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same
>>>>>> InChI Key
>>>>>>
>>>>>> Actually,  I was trying to generate all stereoisomers for molecules
>>>>>> in a database,  and filter duplicate molecules by using the InChI Key to
>>>>>> detect duplicates.  But it gives cis/trans isomers on sp2-N the same Key.
>>>>>>
>>>>>> Gustavo.
>>>>>>
>>>>>> --
>>>>>> Gustavo Seabra
>>>>>>
>>>>>> ------------------------------
>>>>>> *From:* Adelene LAI <adelene....@uni.lu>
>>>>>> *Sent:* Sunday, October 25, 2020 1:44:01 AM
>>>>>> *To:* Gustavo Seabra <gustavo.sea...@gmail.com>
>>>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same
>>>>>> InChI Key
>>>>>>
>>>>>>
>>>>>> Hi Gustavo,
>>>>>>
>>>>>>
>>>>>> It occurred to me while swimming yesterday - was there a reason you
>>>>>> pointed out the hybridisation state of N in your original subject text?
>>>>>>
>>>>>>
>>>>>> Was it just to specify which N to focus on, or did you expect
>>>>>> something special about sp2 hybridisation wrt InChIKey?
>>>>>>
>>>>>>
>>>>>> Adelene
>>>>>>
>>>>>>
>>>>>> Doctoral Researcher
>>>>>>
>>>>>> Environmental Cheminformatics
>>>>>>
>>>>>> UNIVERSITÉ DU LUXEMBOURG
>>>>>>
>>>>>>
>>>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>>>>>
>>>>>> 6, avenue du Swing
>>>>>> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>,
>>>>>> L-4367 Belvaux
>>>>>>
>>>>>> T +356 46 66 44 67 18
>>>>>>
>>>>>> [image: github.png] adelenelai
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------
>>>>>> *From:* Gustavo Seabra <gustavo.sea...@gmail.com>
>>>>>> *Sent:* Saturday, October 24, 2020 5:37:09 AM
>>>>>> *To:* RDKit Discuss; Adelene LAI
>>>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same
>>>>>> InChI Key
>>>>>>
>>>>>> Thanks for looking into it. I'm happy to see.it wasn't just a
>>>>>> mistake by me ;-)
>>>>>>
>>>>>> I hope we can find what's wrong there.
>>>>>>
>>>>>> Best,
>>>>>> Gustavo.
>>>>>>
>>>>>> --
>>>>>> Gustavo Seabra
>>>>>>
>>>>>> ------------------------------
>>>>>> *From:* Adelene LAI <adelene....@uni.lu>
>>>>>> *Sent:* Friday, October 23, 2020 11:28:55 PM
>>>>>> *To:* Gustavo Seabra <gustavo.sea...@gmail.com>; RDKit Discuss <
>>>>>> rdkit-discuss@lists.sourceforge.net>
>>>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same
>>>>>> InChI Key
>>>>>>
>>>>>>
>>>>>> Hi Gustavo,
>>>>>>
>>>>>>
>>>>>> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>
>>>>>> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f
>>>>>>
>>>>>>
>>>>>> In the gist above, I tried doing some further investigating.
>>>>>>
>>>>>>
>>>>>> It seems for the example you gave, the rdkit functions indeed give
>>>>>> the same inchikey and inchi, but different aux info.
>>>>>>
>>>>>>
>>>>>> Why this different aux info doesn't translate into different
>>>>>> inchikeys/inchis, I'm not sure.
>>>>>>
>>>>>>
>>>>>> Adelene
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Doctoral Researcher
>>>>>>
>>>>>> Environmental Cheminformatics
>>>>>>
>>>>>> UNIVERSITÉ DU LUXEMBOURG
>>>>>>
>>>>>>
>>>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>>>>>
>>>>>> 6, avenue du Swing
>>>>>> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>,
>>>>>> L-4367 Belvaux
>>>>>>
>>>>>> T +356 46 66 44 67 18
>>>>>>
>>>>>> [image: github.png] adelenelai
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------
>>>>>> *From:* Gustavo Seabra <gustavo.sea...@gmail.com>
>>>>>> *Sent:* Friday, October 23, 2020 6:43:07 PM
>>>>>> *To:* RDKit Discuss
>>>>>> *Subject:* [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI
>>>>>> Key
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I run into an issue here, and I'd appreciate your input. I noticed
>>>>>> that compounds that differ only on the cis-trans isomerization around an
>>>>>> sp2 nitrogen get the same InChI Key from RDKit. For example:
>>>>>>
>>>>>> > inchi_cis =
>>>>>> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C"))
>>>>>> > inchi_cis
>>>>>> 'AQIXAKUUQRKLND-UHFFFAOYSA-N'
>>>>>>
>>>>>> > inchi_trans =
>>>>>> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C"))
>>>>>> > inchi_trans
>>>>>> 'AQIXAKUUQRKLND-UHFFFAOYSA-N'
>>>>>>
>>>>>> > inchi_cis == inchi_trans
>>>>>> True
>>>>>>
>>>>>> I wonder if this is a limitation of the InChI Key definition, or an
>>>>>> implementation issue.
>>>>>>
>>>>>> Thanks a lot,
>>>>>>
>>>>> --
>>>>>> Gustavo Seabra.
>>>>>> _______________________________________________
>>>>>> Rdkit-discuss mailing list
>>>>>> Rdkit-discuss@lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>>>
>>>>> _______________________________________________
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>
>>>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to