Re: [Rdkit-discuss] One tautomer not included in list of enumerated tautomers
Good catch, thank you Diogo!
Recognising the difficulties of tautomer enumeration: For my own purposes,
the ideal behaviour would be to get the set of all three plausible
tautomers of 'mol1' no matter what the input SMILES. Looks like there's
already a Github Issue up (https://github.com/rdkit/rdkit/issues/5937) but
I can add this if it has a different cause.
thanks all
Lewis
On Tue, Feb 6, 2024 at 7:23 AM Diogo Martins wrote:
> Hello,
>
> I think it's a bug because the tautomers depend on how the input SMILES is
> written. Both represent mol1:
>
> Sc1ncc2c(c1)2
> Sc1cc2c2cn1
>
> However the resulting tautomers differ depending on which is used as input.
>
> Best regards,
> Diogo
>
> On Mon, 5 Feb 2024 at 11:38, Lewis Martin
> wrote:
>
>> Thank you very much for the detective work, Wim! This is helpful.
>>
>> It looks like the _reverse_ transition is possible, though. If I start by
>> generating tautomers of "mol2", then "mol1" is recovered, which indicates
>> this is an allowed transform. Is it possible that one direction is allowed
>> but not the reverse?
>>
>> Failing a solution there, does anyone know if it is possible to add
>> SMIRKS to the allowed tautomers through the python interface?
>> Thanks,
>> Lewis
>>
>> On Mon, Feb 5, 2024 at 9:52 PM Wim Dehaen wrote:
>>
>>> hi lewis,
>>> if i am not mistaken this is because the tautomer transfor "1,3 aromatic
>>> heteroatom H shift" does not account for other chalcogens than oxygen, so
>>> no selenium, tellurium or sulfur.
>>> you can find the list of transforms here:
>>> https://github.com/rdkit/rdkit/blob/8dae48b7a17fd984c69d04549e6d9b53690f5c52/Code/GraphMol/MolStandardize/TautomerCatalog/tautomerTransforms.in#L46
>>> (poiting to the line with the relevant transform).
>>> best wishes
>>> wim
>>>
>>> On Mon, Feb 5, 2024 at 3:26 AM Lewis Martin
>>> wrote:
>>>
Hi all,
I'm looking at scoring tautomers, and using the 'tautobase' dataset
used by Weider et al* at:
https://github.com/choderalab/neutromeratio/blob/master/data/b3lyp_tautobase_subset.txt
This dataset has pairs of tautomers with experimental logK values to
determine the preferred tautomer.
In at least one case, depending on which tautomer you use as the
'entry' point, the enumerated tautomers by RDKit either do or don't include
both of the pair of input molecules. *I'm hoping there's a way to
uniquely recover the full set of possible tautomers from using any input
tautomer. *
Here's a code example:
from rdkit import Chem
>
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
> IPythonConsole.drawOptions.addStereoAnnotation = True
> from rdkit.Chem.MolStandardize import rdMolStandardize
>
> #same result if you don't do any of these params.
tautomer_params =
> Chem.MolStandardize.rdMolStandardize.CleanupParameters()
> tautomer_params.tautomerRemoveSp3Stereo = False
> tautomer_params.tautomerRemoveBondStereo = False
> tautomer_params.tautomerRemoveIsotopicHs = False
> tautomer_params.tautomerReassignStereo = False
> tautomer_params.doCanonical = True
>
> enumerator = rdMolStandardize.TautomerEnumerator(tautomer_params)
>
> smi1 = 'Sc1cc2c2cn1'
> smi2 = 'S=c1cc2c2c[nH]1'
> mol1 = Chem.MolFromSmiles(smi1)
> mol2 = Chem.MolFromSmiles(smi2)
>
> #choose mol1 or mol2 to be source of tautomers:
> #choose mol1, and look at the tautomers. Note that mol2 isn't present!
> tauts = [Chem.MolFromSmiles(Chem.MolToSmiles(m)) for m in
> enumerator.Enumerate(mol1)]
>
> Draw.MolsToGridImage([mol1, mol2]+tauts, legends=['mol1', 'mol2 (not
> present in tauts!)'] + [f'taut{i}' for i in range(len(tauts))],
> molsPerRow=4)
>
And a picture of this in a notebook for an at-a-glance view:
https://gist.github.com/ljmartin/4a9d9eb684df3e11e59fc6502a4b7b03
Does anyone know a way to recover "mol2" within tautomers of "mol1"?
Thank you!
Lewis
___
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>> ___
>> Rdkit-discuss mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] One tautomer not included in list of enumerated tautomers
Hello,
I think it's a bug because the tautomers depend on how the input SMILES is
written. Both represent mol1:
Sc1ncc2c(c1)2
Sc1cc2c2cn1
However the resulting tautomers differ depending on which is used as input.
Best regards,
Diogo
On Mon, 5 Feb 2024 at 11:38, Lewis Martin wrote:
> Thank you very much for the detective work, Wim! This is helpful.
>
> It looks like the _reverse_ transition is possible, though. If I start by
> generating tautomers of "mol2", then "mol1" is recovered, which indicates
> this is an allowed transform. Is it possible that one direction is allowed
> but not the reverse?
>
> Failing a solution there, does anyone know if it is possible to add SMIRKS
> to the allowed tautomers through the python interface?
> Thanks,
> Lewis
>
> On Mon, Feb 5, 2024 at 9:52 PM Wim Dehaen wrote:
>
>> hi lewis,
>> if i am not mistaken this is because the tautomer transfor "1,3 aromatic
>> heteroatom H shift" does not account for other chalcogens than oxygen, so
>> no selenium, tellurium or sulfur.
>> you can find the list of transforms here:
>> https://github.com/rdkit/rdkit/blob/8dae48b7a17fd984c69d04549e6d9b53690f5c52/Code/GraphMol/MolStandardize/TautomerCatalog/tautomerTransforms.in#L46
>> (poiting to the line with the relevant transform).
>> best wishes
>> wim
>>
>> On Mon, Feb 5, 2024 at 3:26 AM Lewis Martin
>> wrote:
>>
>>> Hi all,
>>> I'm looking at scoring tautomers, and using the 'tautobase' dataset used
>>> by Weider et al* at:
>>>
>>> https://github.com/choderalab/neutromeratio/blob/master/data/b3lyp_tautobase_subset.txt
>>>
>>> This dataset has pairs of tautomers with experimental logK values to
>>> determine the preferred tautomer.
>>>
>>> In at least one case, depending on which tautomer you use as the 'entry'
>>> point, the enumerated tautomers by RDKit either do or don't include both of
>>> the pair of input molecules. *I'm hoping there's a way to uniquely
>>> recover the full set of possible tautomers from using any input tautomer. *
>>>
>>> Here's a code example:
>>>
>>> from rdkit import Chem
>>> from rdkit.Chem import Draw
>>>
>>> from rdkit.Chem.Draw import IPythonConsole
IPythonConsole.drawOptions.addStereoAnnotation = True
from rdkit.Chem.MolStandardize import rdMolStandardize
#same result if you don't do any of these params.
>>>
>>> tautomer_params =
Chem.MolStandardize.rdMolStandardize.CleanupParameters()
tautomer_params.tautomerRemoveSp3Stereo = False
tautomer_params.tautomerRemoveBondStereo = False
tautomer_params.tautomerRemoveIsotopicHs = False
tautomer_params.tautomerReassignStereo = False
tautomer_params.doCanonical = True
enumerator = rdMolStandardize.TautomerEnumerator(tautomer_params)
smi1 = 'Sc1cc2c2cn1'
smi2 = 'S=c1cc2c2c[nH]1'
mol1 = Chem.MolFromSmiles(smi1)
mol2 = Chem.MolFromSmiles(smi2)
#choose mol1 or mol2 to be source of tautomers:
#choose mol1, and look at the tautomers. Note that mol2 isn't present!
tauts = [Chem.MolFromSmiles(Chem.MolToSmiles(m)) for m in
enumerator.Enumerate(mol1)]
Draw.MolsToGridImage([mol1, mol2]+tauts, legends=['mol1', 'mol2 (not
present in tauts!)'] + [f'taut{i}' for i in range(len(tauts))],
molsPerRow=4)
>>>
>>> And a picture of this in a notebook for an at-a-glance view:
>>> https://gist.github.com/ljmartin/4a9d9eb684df3e11e59fc6502a4b7b03
>>>
>>> Does anyone know a way to recover "mol2" within tautomers of "mol1"?
>>>
>>> Thank you!
>>> Lewis
>>>
>>>
>>> ___
>>> Rdkit-discuss mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>> ___
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] One tautomer not included in list of enumerated tautomers
Thank you very much for the detective work, Wim! This is helpful.
It looks like the _reverse_ transition is possible, though. If I start by
generating tautomers of "mol2", then "mol1" is recovered, which indicates
this is an allowed transform. Is it possible that one direction is allowed
but not the reverse?
Failing a solution there, does anyone know if it is possible to add SMIRKS
to the allowed tautomers through the python interface?
Thanks,
Lewis
On Mon, Feb 5, 2024 at 9:52 PM Wim Dehaen wrote:
> hi lewis,
> if i am not mistaken this is because the tautomer transfor "1,3 aromatic
> heteroatom H shift" does not account for other chalcogens than oxygen, so
> no selenium, tellurium or sulfur.
> you can find the list of transforms here:
> https://github.com/rdkit/rdkit/blob/8dae48b7a17fd984c69d04549e6d9b53690f5c52/Code/GraphMol/MolStandardize/TautomerCatalog/tautomerTransforms.in#L46
> (poiting to the line with the relevant transform).
> best wishes
> wim
>
> On Mon, Feb 5, 2024 at 3:26 AM Lewis Martin
> wrote:
>
>> Hi all,
>> I'm looking at scoring tautomers, and using the 'tautobase' dataset used
>> by Weider et al* at:
>>
>> https://github.com/choderalab/neutromeratio/blob/master/data/b3lyp_tautobase_subset.txt
>>
>> This dataset has pairs of tautomers with experimental logK values to
>> determine the preferred tautomer.
>>
>> In at least one case, depending on which tautomer you use as the 'entry'
>> point, the enumerated tautomers by RDKit either do or don't include both of
>> the pair of input molecules. *I'm hoping there's a way to uniquely
>> recover the full set of possible tautomers from using any input tautomer. *
>>
>> Here's a code example:
>>
>> from rdkit import Chem
>>>
>> from rdkit.Chem import Draw
>>
>> from rdkit.Chem.Draw import IPythonConsole
>>> IPythonConsole.drawOptions.addStereoAnnotation = True
>>> from rdkit.Chem.MolStandardize import rdMolStandardize
>>>
>>> #same result if you don't do any of these params.
>>
>> tautomer_params = Chem.MolStandardize.rdMolStandardize.CleanupParameters()
>>> tautomer_params.tautomerRemoveSp3Stereo = False
>>> tautomer_params.tautomerRemoveBondStereo = False
>>> tautomer_params.tautomerRemoveIsotopicHs = False
>>> tautomer_params.tautomerReassignStereo = False
>>> tautomer_params.doCanonical = True
>>>
>>> enumerator = rdMolStandardize.TautomerEnumerator(tautomer_params)
>>>
>>> smi1 = 'Sc1cc2c2cn1'
>>> smi2 = 'S=c1cc2c2c[nH]1'
>>> mol1 = Chem.MolFromSmiles(smi1)
>>> mol2 = Chem.MolFromSmiles(smi2)
>>>
>>> #choose mol1 or mol2 to be source of tautomers:
>>> #choose mol1, and look at the tautomers. Note that mol2 isn't present!
>>> tauts = [Chem.MolFromSmiles(Chem.MolToSmiles(m)) for m in
>>> enumerator.Enumerate(mol1)]
>>>
>>> Draw.MolsToGridImage([mol1, mol2]+tauts, legends=['mol1', 'mol2 (not
>>> present in tauts!)'] + [f'taut{i}' for i in range(len(tauts))],
>>> molsPerRow=4)
>>>
>>
>> And a picture of this in a notebook for an at-a-glance view:
>> https://gist.github.com/ljmartin/4a9d9eb684df3e11e59fc6502a4b7b03
>>
>> Does anyone know a way to recover "mol2" within tautomers of "mol1"?
>>
>> Thank you!
>> Lewis
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] One tautomer not included in list of enumerated tautomers
hi lewis,
if i am not mistaken this is because the tautomer transfor "1,3 aromatic
heteroatom H shift" does not account for other chalcogens than oxygen, so
no selenium, tellurium or sulfur.
you can find the list of transforms here:
https://github.com/rdkit/rdkit/blob/8dae48b7a17fd984c69d04549e6d9b53690f5c52/Code/GraphMol/MolStandardize/TautomerCatalog/tautomerTransforms.in#L46
(poiting to the line with the relevant transform).
best wishes
wim
On Mon, Feb 5, 2024 at 3:26 AM Lewis Martin
wrote:
> Hi all,
> I'm looking at scoring tautomers, and using the 'tautobase' dataset used
> by Weider et al* at:
>
> https://github.com/choderalab/neutromeratio/blob/master/data/b3lyp_tautobase_subset.txt
>
> This dataset has pairs of tautomers with experimental logK values to
> determine the preferred tautomer.
>
> In at least one case, depending on which tautomer you use as the 'entry'
> point, the enumerated tautomers by RDKit either do or don't include both of
> the pair of input molecules. *I'm hoping there's a way to uniquely
> recover the full set of possible tautomers from using any input tautomer. *
>
> Here's a code example:
>
> from rdkit import Chem
>>
> from rdkit.Chem import Draw
>
> from rdkit.Chem.Draw import IPythonConsole
>> IPythonConsole.drawOptions.addStereoAnnotation = True
>> from rdkit.Chem.MolStandardize import rdMolStandardize
>>
>> #same result if you don't do any of these params.
>
> tautomer_params = Chem.MolStandardize.rdMolStandardize.CleanupParameters()
>> tautomer_params.tautomerRemoveSp3Stereo = False
>> tautomer_params.tautomerRemoveBondStereo = False
>> tautomer_params.tautomerRemoveIsotopicHs = False
>> tautomer_params.tautomerReassignStereo = False
>> tautomer_params.doCanonical = True
>>
>> enumerator = rdMolStandardize.TautomerEnumerator(tautomer_params)
>>
>> smi1 = 'Sc1cc2c2cn1'
>> smi2 = 'S=c1cc2c2c[nH]1'
>> mol1 = Chem.MolFromSmiles(smi1)
>> mol2 = Chem.MolFromSmiles(smi2)
>>
>> #choose mol1 or mol2 to be source of tautomers:
>> #choose mol1, and look at the tautomers. Note that mol2 isn't present!
>> tauts = [Chem.MolFromSmiles(Chem.MolToSmiles(m)) for m in
>> enumerator.Enumerate(mol1)]
>>
>> Draw.MolsToGridImage([mol1, mol2]+tauts, legends=['mol1', 'mol2 (not
>> present in tauts!)'] + [f'taut{i}' for i in range(len(tauts))],
>> molsPerRow=4)
>>
>
> And a picture of this in a notebook for an at-a-glance view:
> https://gist.github.com/ljmartin/4a9d9eb684df3e11e59fc6502a4b7b03
>
> Does anyone know a way to recover "mol2" within tautomers of "mol1"?
>
> Thank you!
> Lewis
>
>
> ___
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

