Good catch, thank you Diogo!

Recognising the difficulties of tautomer enumeration: For my own purposes,
the ideal behaviour would be to get the set of all three plausible
tautomers of 'mol1' no matter what the input SMILES. Looks like there's
already a Github Issue up (https://github.com/rdkit/rdkit/issues/5937) but
I can add this if it has a different cause.

thanks all
Lewis



On Tue, Feb 6, 2024 at 7:23 AM Diogo Martins <diogo.stm...@gmail.com> wrote:

> Hello,
>
> I think it's a bug because the tautomers depend on how the input SMILES is
> written. Both represent mol1:
>
> Sc1ncc2c(c1)cccc2
> Sc1cc2ccccc2cn1
>
> However the resulting tautomers differ depending on which is used as input.
>
> Best regards,
> Diogo
>
> On Mon, 5 Feb 2024 at 11:38, Lewis Martin <lewis.marti...@gmail.com>
> wrote:
>
>> Thank you very much for the detective work, Wim! This is helpful.
>>
>> It looks like the _reverse_ transition is possible, though. If I start by
>> generating tautomers of "mol2", then "mol1" is recovered, which indicates
>> this is an allowed transform. Is it possible that one direction is allowed
>> but not the reverse?
>>
>> Failing a solution there, does anyone know if it is possible to add
>> SMIRKS to the allowed tautomers through the python interface?
>> Thanks,
>> Lewis
>>
>> On Mon, Feb 5, 2024 at 9:52 PM Wim Dehaen <wimdeh...@gmail.com> wrote:
>>
>>> hi lewis,
>>> if i am not mistaken this is because the tautomer transfor "1,3 aromatic
>>> heteroatom H shift" does not account for other chalcogens than oxygen, so
>>> no selenium, tellurium or sulfur.
>>> you can find the list of transforms here:
>>> https://github.com/rdkit/rdkit/blob/8dae48b7a17fd984c69d04549e6d9b53690f5c52/Code/GraphMol/MolStandardize/TautomerCatalog/tautomerTransforms.in#L46
>>> (poiting to the line with the relevant transform).
>>> best wishes
>>> wim
>>>
>>> On Mon, Feb 5, 2024 at 3:26 AM Lewis Martin <lewis.marti...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>> I'm looking at scoring tautomers, and using the 'tautobase' dataset
>>>> used by Weider et al* at:
>>>>
>>>> https://github.com/choderalab/neutromeratio/blob/master/data/b3lyp_tautobase_subset.txt
>>>>
>>>> This dataset has pairs of tautomers with experimental logK values to
>>>> determine the preferred tautomer.
>>>>
>>>> In at least one case, depending on which tautomer you use as the
>>>> 'entry' point, the enumerated tautomers by RDKit either do or don't include
>>>> both of the pair of input molecules. *I'm hoping there's a way to
>>>> uniquely recover the full set of possible tautomers from using any input
>>>> tautomer. *
>>>>
>>>> Here's a code example:
>>>>
>>>> from rdkit import Chem
>>>>>
>>>> from rdkit.Chem import Draw
>>>>
>>>> from rdkit.Chem.Draw import IPythonConsole
>>>>> IPythonConsole.drawOptions.addStereoAnnotation = True
>>>>> from rdkit.Chem.MolStandardize import rdMolStandardize
>>>>>
>>>>> #same result if you don't do any of these params.
>>>>
>>>> tautomer_params =
>>>>> Chem.MolStandardize.rdMolStandardize.CleanupParameters()
>>>>> tautomer_params.tautomerRemoveSp3Stereo = False
>>>>> tautomer_params.tautomerRemoveBondStereo = False
>>>>> tautomer_params.tautomerRemoveIsotopicHs = False
>>>>> tautomer_params.tautomerReassignStereo = False
>>>>> tautomer_params.doCanonical = True
>>>>>
>>>>> enumerator = rdMolStandardize.TautomerEnumerator(tautomer_params)
>>>>>
>>>>> smi1 = 'Sc1cc2ccccc2cn1'
>>>>> smi2 = 'S=c1cc2ccccc2c[nH]1'
>>>>> mol1 = Chem.MolFromSmiles(smi1)
>>>>> mol2 = Chem.MolFromSmiles(smi2)
>>>>>
>>>>> #choose mol1 or mol2 to be source of tautomers:
>>>>> #choose mol1, and look at the tautomers. Note that mol2 isn't present!
>>>>> tauts = [Chem.MolFromSmiles(Chem.MolToSmiles(m)) for m in
>>>>> enumerator.Enumerate(mol1)]
>>>>>
>>>>> Draw.MolsToGridImage([mol1, mol2]+tauts, legends=['mol1', 'mol2 (not
>>>>> present in tauts!)'] + [f'taut{i}' for i in range(len(tauts))],
>>>>>                      molsPerRow=4)
>>>>>
>>>>
>>>> And a picture of this in a notebook for an at-a-glance view:
>>>> https://gist.github.com/ljmartin/4a9d9eb684df3e11e59fc6502a4b7b03
>>>>
>>>> Does anyone know a way to recover "mol2" within tautomers of "mol1"?
>>>>
>>>> Thank you!
>>>> Lewis
>>>>
>>>>
>>>> _______________________________________________
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>
>>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to