Good catch, thank you Diogo! Recognising the difficulties of tautomer enumeration: For my own purposes, the ideal behaviour would be to get the set of all three plausible tautomers of 'mol1' no matter what the input SMILES. Looks like there's already a Github Issue up (https://github.com/rdkit/rdkit/issues/5937) but I can add this if it has a different cause.
thanks all Lewis On Tue, Feb 6, 2024 at 7:23 AM Diogo Martins <diogo.stm...@gmail.com> wrote: > Hello, > > I think it's a bug because the tautomers depend on how the input SMILES is > written. Both represent mol1: > > Sc1ncc2c(c1)cccc2 > Sc1cc2ccccc2cn1 > > However the resulting tautomers differ depending on which is used as input. > > Best regards, > Diogo > > On Mon, 5 Feb 2024 at 11:38, Lewis Martin <lewis.marti...@gmail.com> > wrote: > >> Thank you very much for the detective work, Wim! This is helpful. >> >> It looks like the _reverse_ transition is possible, though. If I start by >> generating tautomers of "mol2", then "mol1" is recovered, which indicates >> this is an allowed transform. Is it possible that one direction is allowed >> but not the reverse? >> >> Failing a solution there, does anyone know if it is possible to add >> SMIRKS to the allowed tautomers through the python interface? >> Thanks, >> Lewis >> >> On Mon, Feb 5, 2024 at 9:52 PM Wim Dehaen <wimdeh...@gmail.com> wrote: >> >>> hi lewis, >>> if i am not mistaken this is because the tautomer transfor "1,3 aromatic >>> heteroatom H shift" does not account for other chalcogens than oxygen, so >>> no selenium, tellurium or sulfur. >>> you can find the list of transforms here: >>> https://github.com/rdkit/rdkit/blob/8dae48b7a17fd984c69d04549e6d9b53690f5c52/Code/GraphMol/MolStandardize/TautomerCatalog/tautomerTransforms.in#L46 >>> (poiting to the line with the relevant transform). >>> best wishes >>> wim >>> >>> On Mon, Feb 5, 2024 at 3:26 AM Lewis Martin <lewis.marti...@gmail.com> >>> wrote: >>> >>>> Hi all, >>>> I'm looking at scoring tautomers, and using the 'tautobase' dataset >>>> used by Weider et al* at: >>>> >>>> https://github.com/choderalab/neutromeratio/blob/master/data/b3lyp_tautobase_subset.txt >>>> >>>> This dataset has pairs of tautomers with experimental logK values to >>>> determine the preferred tautomer. >>>> >>>> In at least one case, depending on which tautomer you use as the >>>> 'entry' point, the enumerated tautomers by RDKit either do or don't include >>>> both of the pair of input molecules. *I'm hoping there's a way to >>>> uniquely recover the full set of possible tautomers from using any input >>>> tautomer. * >>>> >>>> Here's a code example: >>>> >>>> from rdkit import Chem >>>>> >>>> from rdkit.Chem import Draw >>>> >>>> from rdkit.Chem.Draw import IPythonConsole >>>>> IPythonConsole.drawOptions.addStereoAnnotation = True >>>>> from rdkit.Chem.MolStandardize import rdMolStandardize >>>>> >>>>> #same result if you don't do any of these params. >>>> >>>> tautomer_params = >>>>> Chem.MolStandardize.rdMolStandardize.CleanupParameters() >>>>> tautomer_params.tautomerRemoveSp3Stereo = False >>>>> tautomer_params.tautomerRemoveBondStereo = False >>>>> tautomer_params.tautomerRemoveIsotopicHs = False >>>>> tautomer_params.tautomerReassignStereo = False >>>>> tautomer_params.doCanonical = True >>>>> >>>>> enumerator = rdMolStandardize.TautomerEnumerator(tautomer_params) >>>>> >>>>> smi1 = 'Sc1cc2ccccc2cn1' >>>>> smi2 = 'S=c1cc2ccccc2c[nH]1' >>>>> mol1 = Chem.MolFromSmiles(smi1) >>>>> mol2 = Chem.MolFromSmiles(smi2) >>>>> >>>>> #choose mol1 or mol2 to be source of tautomers: >>>>> #choose mol1, and look at the tautomers. Note that mol2 isn't present! >>>>> tauts = [Chem.MolFromSmiles(Chem.MolToSmiles(m)) for m in >>>>> enumerator.Enumerate(mol1)] >>>>> >>>>> Draw.MolsToGridImage([mol1, mol2]+tauts, legends=['mol1', 'mol2 (not >>>>> present in tauts!)'] + [f'taut{i}' for i in range(len(tauts))], >>>>> molsPerRow=4) >>>>> >>>> >>>> And a picture of this in a notebook for an at-a-glance view: >>>> https://gist.github.com/ljmartin/4a9d9eb684df3e11e59fc6502a4b7b03 >>>> >>>> Does anyone know a way to recover "mol2" within tautomers of "mol1"? >>>> >>>> Thank you! >>>> Lewis >>>> >>>> >>>> _______________________________________________ >>>> Rdkit-discuss mailing list >>>> Rdkit-discuss@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>>> >>> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss