Thanks for your helpful answer. I learned a lot.

I have few more questions:

1. How do you achieve non-standard InChI? Is it available in RDKit?

2. What are the 15T and KET options?

3. Is your solution cannot be systematic? As a systematic solution I tried:

enumerator = rdMolStandardize.TautomerEnumerator()

for smi in my_smi_list:
    m = Chem.MolFromSmiles(smi)
    m = enumerator.Canonicalize(m)
    inchi = Chem.rdinchi.MolToInchi(m)

The problem with this solution was that with very big molecules (for example, 
macrocycles) I have 'MemoryError'.

4. In another case (not for tautomers), I can't understand if the InChI output 
is correct or not:

C[N+]1=C(\C=C\C2=CNC=C2)C=CC2=CC=CC=C12
C[N+]1=C(\C=C/C2=CNC=C2)C=CC2=CC=CC=C12

Usually, when I enter two E/Z stereoisomers - I have two different InChIs (and 
the difference is in the the /b or /t layers, as should be). However, this time 
(both in RDKit and OpenBabel) I have:

InChI=1S/C16H14N2/c1-18-15(8-6-13-10-11-17-12-13)9-7-14-4-2-3-5-16(14)18/h2-12H,1H3/p+1
InChI=1S/C16H14N2/c1-18-15(8-6-13-10-11-17-12-13)9-7-14-4-2-3-5-16(14)18/h2-12H,1H3/p+1

Only if I remove the charge (hydrogen instead of carbon on the methylquinoline) 
or modify the pyrrole group on the other side, it gives me different InChI. Why?

Thanks a lot,
Benny


From: Markus Sitzmann [mailto:markus.sitzm...@gmail.com]
Sent: Tuesday, July 21, 2020 2:47 PM
To: Da'Adoosh Binyamin <daado...@tauex.tau.ac.il>
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] RDKit/tautomers

Hi Benny,

that is a pure InChI problem (not a RDKit one). Back then when the Standard 
InChI was defined, the 15T and the KET option for the InChI calculation weren't 
either available or still experimental (I don't remember :-)), so they didn't 
make it into the standard set of options for the Standard InChI calculation. 
Hence it isn't too surprising that this tautomer pair doesn't calculate the 
same Standard InChI (InChI isn't/wasn't particularly strong regarding 
tautomerism outside rings). You might use (non-standard) InChI and switch the 
15T and KET options on, that should fix your particular case.

In general there are still ongoing efforts to make InChI stronger regarding 
tautomerism: https://pubmed.ncbi.nlm.nih.gov/32043883/

Markus


On Tue, Jul 21, 2020 at 12:11 PM Da'Adoosh Binyamin 
<daado...@tauex.tau.ac.il<mailto:daado...@tauex.tau.ac.il>> wrote:
Hi,

I have a question about RDKit/tautomers.

Let's say I have smiles input:

C[CH]2CCC(=O)C1=C(O)[CH](O)C[CH](O)[CH]12
C[CH]2CCC(O)=C1C(=O)[CH](O)C[CH](O)[CH]12

Now, if I make this code for each input:

m = Chem.MolFromSmiles(input)
inchi = Chem.rdinchi.MolToInchi(m)

I get different InChIs:

InChI=1S/C11H16O4/c1-5-2-3-6(12)10-9(5)7(13)4-8(14)11(10)15/h5,7-9,13-15H,2-4H2,1H3
InChI=1S/C11H16O4/c1-5-2-3-6(12)10-9(5)7(13)4-8(14)11(10)15/h5,7-9,12-14H,2-4H2,1H3

My question is why is it happening. Usually if I enter two tautomers - they 
have the same InChI (like it is supposed to be, according to the literature ). 
What is the difference in this example?

Thanks,
Benny

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to