Ok, thanks! -- Gustavo Seabra.
On Thu, Oct 29, 2020 at 4:33 PM Igor Pletnev <igor.plet...@gmail.com> wrote: > > Is this "/FixedH" an option in RDKit? How to use that? (I don't see it > in the docs). > > Sorry, I am not so proficient in RDKit and can not answer exactly. Anyway, > this option is available in InChI API calls, and I am pretty sure that it > is also available in RDKit. > > I recall that couple of years ago, on some InChI event, Greg Landrum > somewhat surprised me by saying that he himself often uses non-Standard > InChI instead of Standard one — exactly to distinguish tautomers. > So I guess Greg can answer on how it is arranged in RDKit. > > Regards, > Igor > > > > > > On Thu, 29 Oct 2020 at 23:03, Gustavo Seabra <gustavo.sea...@gmail.com> > wrote: > >> That does make sense, I understand it now, thanks! >> >> Is this "/FixedH" an option in RDKit? How to use that? (I don't see it in >> the docs). >> >> Thanks, >> -- >> Gustavo Seabra. >> >> >> On Wed, Oct 28, 2020 at 6:10 PM Igor Pletnev <igor.plet...@gmail.com> >> wrote: >> >>> Hi Gustavo, >>> >>> > ... I was generating the InChI Keys to get a unique hash for each >>> compound, thinking it would be better than SMILES (guaranteed to be >>> unique), but is clearly not the case. On the bright side, I won't lose time >>> generating InChIs... >>> >>> though InChI is not perfect, in this case it behaves as intended. >>> Please see below. >>> >>> The discussed molecules contain substituted guanidine fragment >>> (RHN)C(=NMe)(NHR') >>> >>> It is subjected to tautomerism, and in different tautomers different C-N >>> bonds have double order: >>> (RHN)C(=NMe)(NHR') >>> (RHN)C(NHMe)(=NR') >>> (RN=)C(NHMe)(NHR') >>> >>> You generated Standard InChI, which is evidenced by "InChI=1S/" prefix >>> in the examples. >>> Standard InChI is specifically designed to produce the same identifier >>> for all tautomers (by indicating that two hydrogens are shared by three >>> nitrogen atoms, for any tautomer; bond orders are not indicated in InChI). >>> >>> As the tautomer-invariant Std InChI does not know which C-N bond is >>> actually a double, there is the only option for treating stereo -- to >>> completely ignore it as a drawing artifact. >>> >>> All in all: >>> Standard InChI means that the exact tautomeric form is unknown ==> all >>> tautomers are mapped to the same generic representation ==> the exact C-N >>> double bond placement in this generic is unspecified ==> C-N double bond >>> stereo is ignored ==> generated StdInChI and Std InChIKey are the same for >>> seemingly different, by initial drawing, cis/trans forms. >>> >>> Once again, this behavior is by design; it is intended for maximal >>> interoperability while comparing different drawings of the "same" compound. >>> >>> If, for any reason, you would like to consider your examples as the >>> definite and resolvable structures, each having its own identifier, just >>> use non-Standard InChI. >>> The InChI which preserves the exact positions of tautomeric H's and >>> double bond ("as drawn") is produced by just specifying option /FixedH upon >>> generation. >>> >>> More on this may be found in InChI FAQ: >>> https://www.inchi-trust.org/technical-faq-2/ >>> >>> Hope this helps. >>> >>> Regards, >>> Igor >>> >>> >>> >>> On Mon, Oct 26, 2020 at 6:56 PM Gustavo Seabra <gustavo.sea...@gmail.com> >>> wrote: >>> >>>> Thanks a lot Peter and Adelene, >>>> >>>> Yes, it looks like canonical SMILES is the way to go, and I have no >>>> problem sticking with RDKit. I was generating the InChI Keys to get a >>>> unique hash for each compound, thinking it would be better than SMILES >>>> (guaranteed to be unique), but is clearly not the case. On the bright side, >>>> I won't lose time generating InChIs... >>>> >>>> Can I trust that the same molecule will always get the same canonical >>>> SMILES from RDKit, independent of how it is read? (Different SDF files, >>>> geometries, atom orders, etc.?) >>>> >>>> All the best, >>>> Gustavo. >>>> >>>> >>>> -- >>>> Gustavo Seabra. >>>> >>>> >>>> On Sun, Oct 25, 2020 at 8:27 PM Peter S. Shenkin <shen...@gmail.com> >>>> wrote: >>>> >>>>> Canonical SMILES is probably the way to go, but you might also be able >>>>> to use the InchiKey and the Inchi auxiliary information together as a >>>>> compound hash key. >>>>> >>>>> -P. >>>>> >>>>> On Sun, Oct 25, 2020 at 10:53 AM Adelene LAI <adelene....@uni.lu> >>>>> wrote: >>>>> >>>>>> Hi Gustavo, >>>>>> >>>>>> >>>>>> (Sorry, forgot to reply all before...) >>>>>> >>>>>> >>>>>> Your deduplication task is quite familiar to me and something I do >>>>>> quite a lot of in my own work ;) >>>>>> >>>>>> >>>>>> Can I suggest deduplicating using Canonical SMILES? >>>>>> >>>>>> >>>>>> It doesn't solve your InChIKey issue, but it is a solution for now. >>>>>> >>>>>> >>>>>> I updated my gist to show that it is feasible: >>>>>> >>>>>> >>>>>> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f >>>>>> >>>>>> >>>>>> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f> >>>>>> >>>>>> Adelene >>>>>> >>>>>> >>>>>> >>>>>> Doctoral Researcher >>>>>> >>>>>> Environmental Cheminformatics >>>>>> >>>>>> UNIVERSITÉ DU LUXEMBOURG >>>>>> >>>>>> >>>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE >>>>>> >>>>>> 6, avenue du Swing >>>>>> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>, >>>>>> L-4367 Belvaux >>>>>> >>>>>> T +356 46 66 44 67 18 >>>>>> >>>>>> [image: github.png] adelenelai >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------ >>>>>> *From:* Gustavo Seabra <gustavo.sea...@gmail.com> >>>>>> *Sent:* Sunday, October 25, 2020 2:27:15 PM >>>>>> *To:* Adelene LAI >>>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same >>>>>> InChI Key >>>>>> >>>>>> Actually, I was trying to generate all stereoisomers for molecules >>>>>> in a database, and filter duplicate molecules by using the InChI Key to >>>>>> detect duplicates. But it gives cis/trans isomers on sp2-N the same Key. >>>>>> >>>>>> Gustavo. >>>>>> >>>>>> -- >>>>>> Gustavo Seabra >>>>>> >>>>>> ------------------------------ >>>>>> *From:* Adelene LAI <adelene....@uni.lu> >>>>>> *Sent:* Sunday, October 25, 2020 1:44:01 AM >>>>>> *To:* Gustavo Seabra <gustavo.sea...@gmail.com> >>>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same >>>>>> InChI Key >>>>>> >>>>>> >>>>>> Hi Gustavo, >>>>>> >>>>>> >>>>>> It occurred to me while swimming yesterday - was there a reason you >>>>>> pointed out the hybridisation state of N in your original subject text? >>>>>> >>>>>> >>>>>> Was it just to specify which N to focus on, or did you expect >>>>>> something special about sp2 hybridisation wrt InChIKey? >>>>>> >>>>>> >>>>>> Adelene >>>>>> >>>>>> >>>>>> Doctoral Researcher >>>>>> >>>>>> Environmental Cheminformatics >>>>>> >>>>>> UNIVERSITÉ DU LUXEMBOURG >>>>>> >>>>>> >>>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE >>>>>> >>>>>> 6, avenue du Swing >>>>>> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>, >>>>>> L-4367 Belvaux >>>>>> >>>>>> T +356 46 66 44 67 18 >>>>>> >>>>>> [image: github.png] adelenelai >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------ >>>>>> *From:* Gustavo Seabra <gustavo.sea...@gmail.com> >>>>>> *Sent:* Saturday, October 24, 2020 5:37:09 AM >>>>>> *To:* RDKit Discuss; Adelene LAI >>>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same >>>>>> InChI Key >>>>>> >>>>>> Thanks for looking into it. I'm happy to see.it wasn't just a >>>>>> mistake by me ;-) >>>>>> >>>>>> I hope we can find what's wrong there. >>>>>> >>>>>> Best, >>>>>> Gustavo. >>>>>> >>>>>> -- >>>>>> Gustavo Seabra >>>>>> >>>>>> ------------------------------ >>>>>> *From:* Adelene LAI <adelene....@uni.lu> >>>>>> *Sent:* Friday, October 23, 2020 11:28:55 PM >>>>>> *To:* Gustavo Seabra <gustavo.sea...@gmail.com>; RDKit Discuss < >>>>>> rdkit-discuss@lists.sourceforge.net> >>>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same >>>>>> InChI Key >>>>>> >>>>>> >>>>>> Hi Gustavo, >>>>>> >>>>>> >>>>>> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f> >>>>>> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f >>>>>> >>>>>> >>>>>> In the gist above, I tried doing some further investigating. >>>>>> >>>>>> >>>>>> It seems for the example you gave, the rdkit functions indeed give >>>>>> the same inchikey and inchi, but different aux info. >>>>>> >>>>>> >>>>>> Why this different aux info doesn't translate into different >>>>>> inchikeys/inchis, I'm not sure. >>>>>> >>>>>> >>>>>> Adelene >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Doctoral Researcher >>>>>> >>>>>> Environmental Cheminformatics >>>>>> >>>>>> UNIVERSITÉ DU LUXEMBOURG >>>>>> >>>>>> >>>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE >>>>>> >>>>>> 6, avenue du Swing >>>>>> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>, >>>>>> L-4367 Belvaux >>>>>> >>>>>> T +356 46 66 44 67 18 >>>>>> >>>>>> [image: github.png] adelenelai >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------ >>>>>> *From:* Gustavo Seabra <gustavo.sea...@gmail.com> >>>>>> *Sent:* Friday, October 23, 2020 6:43:07 PM >>>>>> *To:* RDKit Discuss >>>>>> *Subject:* [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI >>>>>> Key >>>>>> >>>>>> Hi all, >>>>>> >>>>>> I run into an issue here, and I'd appreciate your input. I noticed >>>>>> that compounds that differ only on the cis-trans isomerization around an >>>>>> sp2 nitrogen get the same InChI Key from RDKit. For example: >>>>>> >>>>>> > inchi_cis = >>>>>> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C")) >>>>>> > inchi_cis >>>>>> 'AQIXAKUUQRKLND-UHFFFAOYSA-N' >>>>>> >>>>>> > inchi_trans = >>>>>> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C")) >>>>>> > inchi_trans >>>>>> 'AQIXAKUUQRKLND-UHFFFAOYSA-N' >>>>>> >>>>>> > inchi_cis == inchi_trans >>>>>> True >>>>>> >>>>>> I wonder if this is a limitation of the InChI Key definition, or an >>>>>> implementation issue. >>>>>> >>>>>> Thanks a lot, >>>>>> >>>>> -- >>>>>> Gustavo Seabra. >>>>>> _______________________________________________ >>>>>> Rdkit-discuss mailing list >>>>>> Rdkit-discuss@lists.sourceforge.net >>>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>>>>> >>>>> _______________________________________________ >>>> Rdkit-discuss mailing list >>>> Rdkit-discuss@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>>> >>>
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss