Re: [Rdkit-discuss] One tautomer not included in list of enumerated tautomers

Lewis Martin Mon, 05 Feb 2024 11:38:30 -0800

Thank you very much for the detective work, Wim! This is helpful.

It looks like the _reverse_ transition is possible, though. If I start by
generating tautomers of "mol2", then "mol1" is recovered, which indicates
this is an allowed transform. Is it possible that one direction is allowed
but not the reverse?


Failing a solution there, does anyone know if it is possible to add SMIRKS
to the allowed tautomers through the python interface?
Thanks,
Lewis

On Mon, Feb 5, 2024 at 9:52 PM Wim Dehaen <[email protected]> wrote:

> hi lewis,
> if i am not mistaken this is because the tautomer transfor "1,3 aromatic
> heteroatom H shift" does not account for other chalcogens than oxygen, so
> no selenium, tellurium or sulfur.
> you can find the list of transforms here:
> https://github.com/rdkit/rdkit/blob/8dae48b7a17fd984c69d04549e6d9b53690f5c52/Code/GraphMol/MolStandardize/TautomerCatalog/tautomerTransforms.in#L46
> (poiting to the line with the relevant transform).
> best wishes
> wim
>
> On Mon, Feb 5, 2024 at 3:26 AM Lewis Martin <[email protected]>
> wrote:
>
>> Hi all,
>> I'm looking at scoring tautomers, and using the 'tautobase' dataset used
>> by Weider et al* at:
>>
>> https://github.com/choderalab/neutromeratio/blob/master/data/b3lyp_tautobase_subset.txt
>>
>> This dataset has pairs of tautomers with experimental logK values to
>> determine the preferred tautomer.
>>
>> In at least one case, depending on which tautomer you use as the 'entry'
>> point, the enumerated tautomers by RDKit either do or don't include both of
>> the pair of input molecules. *I'm hoping there's a way to uniquely
>> recover the full set of possible tautomers from using any input tautomer. *
>>
>> Here's a code example:
>>
>> from rdkit import Chem
>>>
>> from rdkit.Chem import Draw
>>
>> from rdkit.Chem.Draw import IPythonConsole
>>> IPythonConsole.drawOptions.addStereoAnnotation = True
>>> from rdkit.Chem.MolStandardize import rdMolStandardize
>>>
>>> #same result if you don't do any of these params.
>>
>> tautomer_params = Chem.MolStandardize.rdMolStandardize.CleanupParameters()
>>> tautomer_params.tautomerRemoveSp3Stereo = False
>>> tautomer_params.tautomerRemoveBondStereo = False
>>> tautomer_params.tautomerRemoveIsotopicHs = False
>>> tautomer_params.tautomerReassignStereo = False
>>> tautomer_params.doCanonical = True
>>>
>>> enumerator = rdMolStandardize.TautomerEnumerator(tautomer_params)
>>>
>>> smi1 = 'Sc1cc2ccccc2cn1'
>>> smi2 = 'S=c1cc2ccccc2c[nH]1'
>>> mol1 = Chem.MolFromSmiles(smi1)
>>> mol2 = Chem.MolFromSmiles(smi2)
>>>
>>> #choose mol1 or mol2 to be source of tautomers:
>>> #choose mol1, and look at the tautomers. Note that mol2 isn't present!
>>> tauts = [Chem.MolFromSmiles(Chem.MolToSmiles(m)) for m in
>>> enumerator.Enumerate(mol1)]
>>>
>>> Draw.MolsToGridImage([mol1, mol2]+tauts, legends=['mol1', 'mol2 (not
>>> present in tauts!)'] + [f'taut{i}' for i in range(len(tauts))],
>>>                      molsPerRow=4)
>>>
>>
>> And a picture of this in a notebook for an at-a-glance view:
>> https://gist.github.com/ljmartin/4a9d9eb684df3e11e59fc6502a4b7b03
>>
>> Does anyone know a way to recover "mol2" within tautomers of "mol1"?
>>
>> Thank you!
>> Lewis
>>
>>
>> _______________________________________________
>> Rdkit-discuss mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>

_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] One tautomer not included in list of enumerated tautomers

Reply via email to