Re: [Rdkit-discuss] One tautomer not included in list of enumerated tautomers

2024-02-05 Thread Lewis Martin
Good catch, thank you Diogo!

Recognising the difficulties of tautomer enumeration: For my own purposes,
the ideal behaviour would be to get the set of all three plausible
tautomers of 'mol1' no matter what the input SMILES. Looks like there's
already a Github Issue up (https://github.com/rdkit/rdkit/issues/5937) but
I can add this if it has a different cause.

thanks all
Lewis



On Tue, Feb 6, 2024 at 7:23 AM Diogo Martins  wrote:

> Hello,
>
> I think it's a bug because the tautomers depend on how the input SMILES is
> written. Both represent mol1:
>
> Sc1ncc2c(c1)2
> Sc1cc2c2cn1
>
> However the resulting tautomers differ depending on which is used as input.
>
> Best regards,
> Diogo
>
> On Mon, 5 Feb 2024 at 11:38, Lewis Martin 
> wrote:
>
>> Thank you very much for the detective work, Wim! This is helpful.
>>
>> It looks like the _reverse_ transition is possible, though. If I start by
>> generating tautomers of "mol2", then "mol1" is recovered, which indicates
>> this is an allowed transform. Is it possible that one direction is allowed
>> but not the reverse?
>>
>> Failing a solution there, does anyone know if it is possible to add
>> SMIRKS to the allowed tautomers through the python interface?
>> Thanks,
>> Lewis
>>
>> On Mon, Feb 5, 2024 at 9:52 PM Wim Dehaen  wrote:
>>
>>> hi lewis,
>>> if i am not mistaken this is because the tautomer transfor "1,3 aromatic
>>> heteroatom H shift" does not account for other chalcogens than oxygen, so
>>> no selenium, tellurium or sulfur.
>>> you can find the list of transforms here:
>>> https://github.com/rdkit/rdkit/blob/8dae48b7a17fd984c69d04549e6d9b53690f5c52/Code/GraphMol/MolStandardize/TautomerCatalog/tautomerTransforms.in#L46
>>> (poiting to the line with the relevant transform).
>>> best wishes
>>> wim
>>>
>>> On Mon, Feb 5, 2024 at 3:26 AM Lewis Martin 
>>> wrote:
>>>
 Hi all,
 I'm looking at scoring tautomers, and using the 'tautobase' dataset
 used by Weider et al* at:

 https://github.com/choderalab/neutromeratio/blob/master/data/b3lyp_tautobase_subset.txt

 This dataset has pairs of tautomers with experimental logK values to
 determine the preferred tautomer.

 In at least one case, depending on which tautomer you use as the
 'entry' point, the enumerated tautomers by RDKit either do or don't include
 both of the pair of input molecules. *I'm hoping there's a way to
 uniquely recover the full set of possible tautomers from using any input
 tautomer. *

 Here's a code example:

 from rdkit import Chem
>
 from rdkit.Chem import Draw

 from rdkit.Chem.Draw import IPythonConsole
> IPythonConsole.drawOptions.addStereoAnnotation = True
> from rdkit.Chem.MolStandardize import rdMolStandardize
>
> #same result if you don't do any of these params.

 tautomer_params =
> Chem.MolStandardize.rdMolStandardize.CleanupParameters()
> tautomer_params.tautomerRemoveSp3Stereo = False
> tautomer_params.tautomerRemoveBondStereo = False
> tautomer_params.tautomerRemoveIsotopicHs = False
> tautomer_params.tautomerReassignStereo = False
> tautomer_params.doCanonical = True
>
> enumerator = rdMolStandardize.TautomerEnumerator(tautomer_params)
>
> smi1 = 'Sc1cc2c2cn1'
> smi2 = 'S=c1cc2c2c[nH]1'
> mol1 = Chem.MolFromSmiles(smi1)
> mol2 = Chem.MolFromSmiles(smi2)
>
> #choose mol1 or mol2 to be source of tautomers:
> #choose mol1, and look at the tautomers. Note that mol2 isn't present!
> tauts = [Chem.MolFromSmiles(Chem.MolToSmiles(m)) for m in
> enumerator.Enumerate(mol1)]
>
> Draw.MolsToGridImage([mol1, mol2]+tauts, legends=['mol1', 'mol2 (not
> present in tauts!)'] + [f'taut{i}' for i in range(len(tauts))],
>  molsPerRow=4)
>

 And a picture of this in a notebook for an at-a-glance view:
 https://gist.github.com/ljmartin/4a9d9eb684df3e11e59fc6502a4b7b03

 Does anyone know a way to recover "mol2" within tautomers of "mol1"?

 Thank you!
 Lewis


 ___
 Rdkit-discuss mailing list
 [email protected]
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

>>> ___
>> Rdkit-discuss mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] One tautomer not included in list of enumerated tautomers

2024-02-05 Thread Diogo Martins
Hello,

I think it's a bug because the tautomers depend on how the input SMILES is
written. Both represent mol1:

Sc1ncc2c(c1)2
Sc1cc2c2cn1

However the resulting tautomers differ depending on which is used as input.

Best regards,
Diogo

On Mon, 5 Feb 2024 at 11:38, Lewis Martin  wrote:

> Thank you very much for the detective work, Wim! This is helpful.
>
> It looks like the _reverse_ transition is possible, though. If I start by
> generating tautomers of "mol2", then "mol1" is recovered, which indicates
> this is an allowed transform. Is it possible that one direction is allowed
> but not the reverse?
>
> Failing a solution there, does anyone know if it is possible to add SMIRKS
> to the allowed tautomers through the python interface?
> Thanks,
> Lewis
>
> On Mon, Feb 5, 2024 at 9:52 PM Wim Dehaen  wrote:
>
>> hi lewis,
>> if i am not mistaken this is because the tautomer transfor "1,3 aromatic
>> heteroatom H shift" does not account for other chalcogens than oxygen, so
>> no selenium, tellurium or sulfur.
>> you can find the list of transforms here:
>> https://github.com/rdkit/rdkit/blob/8dae48b7a17fd984c69d04549e6d9b53690f5c52/Code/GraphMol/MolStandardize/TautomerCatalog/tautomerTransforms.in#L46
>> (poiting to the line with the relevant transform).
>> best wishes
>> wim
>>
>> On Mon, Feb 5, 2024 at 3:26 AM Lewis Martin 
>> wrote:
>>
>>> Hi all,
>>> I'm looking at scoring tautomers, and using the 'tautobase' dataset used
>>> by Weider et al* at:
>>>
>>> https://github.com/choderalab/neutromeratio/blob/master/data/b3lyp_tautobase_subset.txt
>>>
>>> This dataset has pairs of tautomers with experimental logK values to
>>> determine the preferred tautomer.
>>>
>>> In at least one case, depending on which tautomer you use as the 'entry'
>>> point, the enumerated tautomers by RDKit either do or don't include both of
>>> the pair of input molecules. *I'm hoping there's a way to uniquely
>>> recover the full set of possible tautomers from using any input tautomer. *
>>>
>>> Here's a code example:
>>>
>>> from rdkit import Chem

>>> from rdkit.Chem import Draw
>>>
>>> from rdkit.Chem.Draw import IPythonConsole
 IPythonConsole.drawOptions.addStereoAnnotation = True
 from rdkit.Chem.MolStandardize import rdMolStandardize

 #same result if you don't do any of these params.
>>>
>>> tautomer_params =
 Chem.MolStandardize.rdMolStandardize.CleanupParameters()
 tautomer_params.tautomerRemoveSp3Stereo = False
 tautomer_params.tautomerRemoveBondStereo = False
 tautomer_params.tautomerRemoveIsotopicHs = False
 tautomer_params.tautomerReassignStereo = False
 tautomer_params.doCanonical = True

 enumerator = rdMolStandardize.TautomerEnumerator(tautomer_params)

 smi1 = 'Sc1cc2c2cn1'
 smi2 = 'S=c1cc2c2c[nH]1'
 mol1 = Chem.MolFromSmiles(smi1)
 mol2 = Chem.MolFromSmiles(smi2)

 #choose mol1 or mol2 to be source of tautomers:
 #choose mol1, and look at the tautomers. Note that mol2 isn't present!
 tauts = [Chem.MolFromSmiles(Chem.MolToSmiles(m)) for m in
 enumerator.Enumerate(mol1)]

 Draw.MolsToGridImage([mol1, mol2]+tauts, legends=['mol1', 'mol2 (not
 present in tauts!)'] + [f'taut{i}' for i in range(len(tauts))],
  molsPerRow=4)

>>>
>>> And a picture of this in a notebook for an at-a-glance view:
>>> https://gist.github.com/ljmartin/4a9d9eb684df3e11e59fc6502a4b7b03
>>>
>>> Does anyone know a way to recover "mol2" within tautomers of "mol1"?
>>>
>>> Thank you!
>>> Lewis
>>>
>>>
>>> ___
>>> Rdkit-discuss mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>> ___
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] One tautomer not included in list of enumerated tautomers

2024-02-05 Thread Lewis Martin
Thank you very much for the detective work, Wim! This is helpful.

It looks like the _reverse_ transition is possible, though. If I start by
generating tautomers of "mol2", then "mol1" is recovered, which indicates
this is an allowed transform. Is it possible that one direction is allowed
but not the reverse?

Failing a solution there, does anyone know if it is possible to add SMIRKS
to the allowed tautomers through the python interface?
Thanks,
Lewis

On Mon, Feb 5, 2024 at 9:52 PM Wim Dehaen  wrote:

> hi lewis,
> if i am not mistaken this is because the tautomer transfor "1,3 aromatic
> heteroatom H shift" does not account for other chalcogens than oxygen, so
> no selenium, tellurium or sulfur.
> you can find the list of transforms here:
> https://github.com/rdkit/rdkit/blob/8dae48b7a17fd984c69d04549e6d9b53690f5c52/Code/GraphMol/MolStandardize/TautomerCatalog/tautomerTransforms.in#L46
> (poiting to the line with the relevant transform).
> best wishes
> wim
>
> On Mon, Feb 5, 2024 at 3:26 AM Lewis Martin 
> wrote:
>
>> Hi all,
>> I'm looking at scoring tautomers, and using the 'tautobase' dataset used
>> by Weider et al* at:
>>
>> https://github.com/choderalab/neutromeratio/blob/master/data/b3lyp_tautobase_subset.txt
>>
>> This dataset has pairs of tautomers with experimental logK values to
>> determine the preferred tautomer.
>>
>> In at least one case, depending on which tautomer you use as the 'entry'
>> point, the enumerated tautomers by RDKit either do or don't include both of
>> the pair of input molecules. *I'm hoping there's a way to uniquely
>> recover the full set of possible tautomers from using any input tautomer. *
>>
>> Here's a code example:
>>
>> from rdkit import Chem
>>>
>> from rdkit.Chem import Draw
>>
>> from rdkit.Chem.Draw import IPythonConsole
>>> IPythonConsole.drawOptions.addStereoAnnotation = True
>>> from rdkit.Chem.MolStandardize import rdMolStandardize
>>>
>>> #same result if you don't do any of these params.
>>
>> tautomer_params = Chem.MolStandardize.rdMolStandardize.CleanupParameters()
>>> tautomer_params.tautomerRemoveSp3Stereo = False
>>> tautomer_params.tautomerRemoveBondStereo = False
>>> tautomer_params.tautomerRemoveIsotopicHs = False
>>> tautomer_params.tautomerReassignStereo = False
>>> tautomer_params.doCanonical = True
>>>
>>> enumerator = rdMolStandardize.TautomerEnumerator(tautomer_params)
>>>
>>> smi1 = 'Sc1cc2c2cn1'
>>> smi2 = 'S=c1cc2c2c[nH]1'
>>> mol1 = Chem.MolFromSmiles(smi1)
>>> mol2 = Chem.MolFromSmiles(smi2)
>>>
>>> #choose mol1 or mol2 to be source of tautomers:
>>> #choose mol1, and look at the tautomers. Note that mol2 isn't present!
>>> tauts = [Chem.MolFromSmiles(Chem.MolToSmiles(m)) for m in
>>> enumerator.Enumerate(mol1)]
>>>
>>> Draw.MolsToGridImage([mol1, mol2]+tauts, legends=['mol1', 'mol2 (not
>>> present in tauts!)'] + [f'taut{i}' for i in range(len(tauts))],
>>>  molsPerRow=4)
>>>
>>
>> And a picture of this in a notebook for an at-a-glance view:
>> https://gist.github.com/ljmartin/4a9d9eb684df3e11e59fc6502a4b7b03
>>
>> Does anyone know a way to recover "mol2" within tautomers of "mol1"?
>>
>> Thank you!
>> Lewis
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] One tautomer not included in list of enumerated tautomers

2024-02-05 Thread Wim Dehaen
hi lewis,
if i am not mistaken this is because the tautomer transfor "1,3 aromatic
heteroatom H shift" does not account for other chalcogens than oxygen, so
no selenium, tellurium or sulfur.
you can find the list of transforms here:
https://github.com/rdkit/rdkit/blob/8dae48b7a17fd984c69d04549e6d9b53690f5c52/Code/GraphMol/MolStandardize/TautomerCatalog/tautomerTransforms.in#L46
(poiting to the line with the relevant transform).
best wishes
wim

On Mon, Feb 5, 2024 at 3:26 AM Lewis Martin 
wrote:

> Hi all,
> I'm looking at scoring tautomers, and using the 'tautobase' dataset used
> by Weider et al* at:
>
> https://github.com/choderalab/neutromeratio/blob/master/data/b3lyp_tautobase_subset.txt
>
> This dataset has pairs of tautomers with experimental logK values to
> determine the preferred tautomer.
>
> In at least one case, depending on which tautomer you use as the 'entry'
> point, the enumerated tautomers by RDKit either do or don't include both of
> the pair of input molecules. *I'm hoping there's a way to uniquely
> recover the full set of possible tautomers from using any input tautomer. *
>
> Here's a code example:
>
> from rdkit import Chem
>>
> from rdkit.Chem import Draw
>
> from rdkit.Chem.Draw import IPythonConsole
>> IPythonConsole.drawOptions.addStereoAnnotation = True
>> from rdkit.Chem.MolStandardize import rdMolStandardize
>>
>> #same result if you don't do any of these params.
>
> tautomer_params = Chem.MolStandardize.rdMolStandardize.CleanupParameters()
>> tautomer_params.tautomerRemoveSp3Stereo = False
>> tautomer_params.tautomerRemoveBondStereo = False
>> tautomer_params.tautomerRemoveIsotopicHs = False
>> tautomer_params.tautomerReassignStereo = False
>> tautomer_params.doCanonical = True
>>
>> enumerator = rdMolStandardize.TautomerEnumerator(tautomer_params)
>>
>> smi1 = 'Sc1cc2c2cn1'
>> smi2 = 'S=c1cc2c2c[nH]1'
>> mol1 = Chem.MolFromSmiles(smi1)
>> mol2 = Chem.MolFromSmiles(smi2)
>>
>> #choose mol1 or mol2 to be source of tautomers:
>> #choose mol1, and look at the tautomers. Note that mol2 isn't present!
>> tauts = [Chem.MolFromSmiles(Chem.MolToSmiles(m)) for m in
>> enumerator.Enumerate(mol1)]
>>
>> Draw.MolsToGridImage([mol1, mol2]+tauts, legends=['mol1', 'mol2 (not
>> present in tauts!)'] + [f'taut{i}' for i in range(len(tauts))],
>>  molsPerRow=4)
>>
>
> And a picture of this in a notebook for an at-a-glance view:
> https://gist.github.com/ljmartin/4a9d9eb684df3e11e59fc6502a4b7b03
>
> Does anyone know a way to recover "mol2" within tautomers of "mol1"?
>
> Thank you!
> Lewis
>
>
> ___
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss