hi lewis, if i am not mistaken this is because the tautomer transfor "1,3 aromatic heteroatom H shift" does not account for other chalcogens than oxygen, so no selenium, tellurium or sulfur. you can find the list of transforms here: https://github.com/rdkit/rdkit/blob/8dae48b7a17fd984c69d04549e6d9b53690f5c52/Code/GraphMol/MolStandardize/TautomerCatalog/tautomerTransforms.in#L46 (poiting to the line with the relevant transform). best wishes wim
On Mon, Feb 5, 2024 at 3:26 AM Lewis Martin <lewis.marti...@gmail.com> wrote: > Hi all, > I'm looking at scoring tautomers, and using the 'tautobase' dataset used > by Weider et al* at: > > https://github.com/choderalab/neutromeratio/blob/master/data/b3lyp_tautobase_subset.txt > > This dataset has pairs of tautomers with experimental logK values to > determine the preferred tautomer. > > In at least one case, depending on which tautomer you use as the 'entry' > point, the enumerated tautomers by RDKit either do or don't include both of > the pair of input molecules. *I'm hoping there's a way to uniquely > recover the full set of possible tautomers from using any input tautomer. * > > Here's a code example: > > from rdkit import Chem >> > from rdkit.Chem import Draw > > from rdkit.Chem.Draw import IPythonConsole >> IPythonConsole.drawOptions.addStereoAnnotation = True >> from rdkit.Chem.MolStandardize import rdMolStandardize >> >> #same result if you don't do any of these params. > > tautomer_params = Chem.MolStandardize.rdMolStandardize.CleanupParameters() >> tautomer_params.tautomerRemoveSp3Stereo = False >> tautomer_params.tautomerRemoveBondStereo = False >> tautomer_params.tautomerRemoveIsotopicHs = False >> tautomer_params.tautomerReassignStereo = False >> tautomer_params.doCanonical = True >> >> enumerator = rdMolStandardize.TautomerEnumerator(tautomer_params) >> >> smi1 = 'Sc1cc2ccccc2cn1' >> smi2 = 'S=c1cc2ccccc2c[nH]1' >> mol1 = Chem.MolFromSmiles(smi1) >> mol2 = Chem.MolFromSmiles(smi2) >> >> #choose mol1 or mol2 to be source of tautomers: >> #choose mol1, and look at the tautomers. Note that mol2 isn't present! >> tauts = [Chem.MolFromSmiles(Chem.MolToSmiles(m)) for m in >> enumerator.Enumerate(mol1)] >> >> Draw.MolsToGridImage([mol1, mol2]+tauts, legends=['mol1', 'mol2 (not >> present in tauts!)'] + [f'taut{i}' for i in range(len(tauts))], >> molsPerRow=4) >> > > And a picture of this in a notebook for an at-a-glance view: > https://gist.github.com/ljmartin/4a9d9eb684df3e11e59fc6502a4b7b03 > > Does anyone know a way to recover "mol2" within tautomers of "mol1"? > > Thank you! > Lewis > > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss