Hi all, I am trying to get my head around the tautomer function. Here a few issues, could you help me getting into touch with Tim, or someone else being able to help with the tautomerization functionality?
First, the canonical tautomer function is less tautomeric than I would wish for. The following molecules are not normalized correctly, I get two different tautomeric versions. Test file and debugging log files are attached. In both cases I would expect: CC(=O)Cc1ccccc1 Second, the full enumeration is just not working. In the case of NOT defining "-c" I would expect to get all possible tautomers as defined by the Functor class class Functor : public OpenBabel::TautomerFunctor but I just get one structure all the time? Any clues what goes wrong? Third, I would highly recommend that we replace the tautomerization framework with an alternative solution, e.g. the SMIRKS ennumeriation from Markus Sitzman. The SMIRKS patterns are part of his publication Article (sin10) Sitzmann, M.; Ihlenfeldt, W.-D. & Nicklaus, M. C. Tautomerism in large databases J Comput Aided Mol Des, 2010, 24, 521-551 DOI 10.1007/s10822-010-9346-4 PMID 20512400 In other words, as defined in the SMIRKS and ranking rules, we need just a recursive execution, store the unique canonical SMILES, rank them, and take the highest scoring as tautomeric SMILES. Or we should at least put the interfaces in-place to allow users to use their tautomerization framework of choice. Thoughts? P.S.: Anyone who can take this on? Cheers /.Joerg https://plus.google.com/116731043002877336055/
Atom Types: 0: Hybridized 1: Hybridized 2: Hybridized 3: Hybridized 4: Hybridized 5: Hybridized 6: Other 7: Hybridized 8: Acceptor 9: Other Bond Types: 0: Unassigned 1: Unassigned 2: Unassigned 3: Unassigned 4: Unassigned 5: Unassigned 6: Assigned 7: Assigned 8: Unassigned 9: Assigned Atom Types: 0: Hybridized 1: Hybridized 2: Hybridized 3: Hybridized 4: Hybridized 5: Hybridized 6: Other 7: Hybridized 8: Unassigned 9: Other EnumerateRecursive Assigned 8 Acceptor -> Rule 5: Assign 7-8 Double -> Rule 5: Assign 5-0 Double -> Rule 5: Assign 1-2 Double -> Rule 4: Assign 2-3 Single -> Rule 4: Assign 4-5 Single -> Rule 5: Assign 3-4 Double --> LeafNode reached... Change? 8 A 8 A Backtrack... 8 CC(=O)Cc1ccccc1 Atom Types: 0: Hybridized 1: Hybridized 2: Hybridized 3: Hybridized 4: Hybridized 5: Hybridized 6: Hybridized 7: Hybridized 8: Donor 9: Other Bond Types: 0: Unassigned 1: Unassigned 2: Unassigned 3: Unassigned 4: Unassigned 5: Unassigned 6: Unassigned 7: Unassigned 8: Unassigned 9: Assigned Atom Types: 0: Hybridized 1: Hybridized 2: Hybridized 3: Hybridized 4: Hybridized 5: Hybridized 6: Hybridized 7: Hybridized 8: Unassigned 9: Other EnumerateRecursive Assigned 8 Donor -> Rule 1: Assign 7-8 Single -> Rule 5: Assign 6-7 Double -> Rule 4: Assign 0-6 Single -> Rule 5: Assign 5-0 Double -> Rule 5: Assign 1-2 Double -> Rule 4: Assign 2-3 Single -> Rule 4: Assign 4-5 Single -> Rule 5: Assign 3-4 Double --> LeafNode reached... Change? 8 D Change 8 to Acceptor -> Rule 5: Assign 1-2 Double -> Rule 5: Assign 7-8 Double -> Rule 4: Assign 2-3 Single -> Rule 4: Assign 6-7 Single -> Rule 5: Assign 3-4 Double -> Rule 5: Assign 0-6 Double -> Rule 4: Assign 5-0 Single -> Rule 4: Assign 4-5 Single invalid Acceptor/Hybridized 1 8 A Backtrack... 8 CC(=Cc1ccccc1)O
test4.sdf
Description: Binary data
Atom Types: 0: Hybridized 1: Hybridized 2: Hybridized 3: Hybridized 4: Hybridized 5: Hybridized 6: Other 7: Hybridized 8: Acceptor 9: Other Bond Types: 0: Unassigned 1: Unassigned 2: Unassigned 3: Unassigned 4: Unassigned 5: Unassigned 6: Assigned 7: Assigned 8: Unassigned 9: Assigned Atom Types: 0: Hybridized 1: Hybridized 2: Hybridized 3: Hybridized 4: Hybridized 5: Hybridized 6: Other 7: Hybridized 8: Unassigned 9: Other EnumerateRecursive Assigned 8 Acceptor -> Rule 5: Assign 7-8 Double -> Rule 5: Assign 5-0 Double -> Rule 5: Assign 1-2 Double -> Rule 4: Assign 2-3 Single -> Rule 4: Assign 4-5 Single -> Rule 5: Assign 3-4 Double --> LeafNode reached... CC(=O)Cc1ccccc1 Change? 8 A 8 A Backtrack... 8 Atom Types: 0: Hybridized 1: Hybridized 2: Hybridized 3: Hybridized 4: Hybridized 5: Hybridized 6: Hybridized 7: Hybridized 8: Donor 9: Other Bond Types: 0: Unassigned 1: Unassigned 2: Unassigned 3: Unassigned 4: Unassigned 5: Unassigned 6: Unassigned 7: Unassigned 8: Unassigned 9: Assigned Atom Types: 0: Hybridized 1: Hybridized 2: Hybridized 3: Hybridized 4: Hybridized 5: Hybridized 6: Hybridized 7: Hybridized 8: Unassigned 9: Other EnumerateRecursive Assigned 8 Donor -> Rule 1: Assign 7-8 Single -> Rule 5: Assign 6-7 Double -> Rule 4: Assign 0-6 Single -> Rule 5: Assign 5-0 Double -> Rule 5: Assign 1-2 Double -> Rule 4: Assign 2-3 Single -> Rule 4: Assign 4-5 Single -> Rule 5: Assign 3-4 Double --> LeafNode reached... C/C(=C\c1ccccc1)/O Change? 8 D Change 8 to Acceptor -> Rule 5: Assign 1-2 Double -> Rule 5: Assign 7-8 Double -> Rule 4: Assign 2-3 Single -> Rule 4: Assign 6-7 Single -> Rule 5: Assign 3-4 Double -> Rule 5: Assign 0-6 Double -> Rule 4: Assign 5-0 Single -> Rule 4: Assign 4-5 Single invalid Acceptor/Hybridized 1 8 A Backtrack... 8
#Article (sin10) #Sitzmann, M.; Ihlenfeldt, W.-D. & Nicklaus, M. C. #Tautomerism in large databases #J Comput Aided Mol Des, 2010, 24, 521-551 #DOI 10.1007/s10822-010-9346-4 #PMID 20512400 Rule 1: 1,3 (thio)keto/(thio)enol [O,S,Se,Te;X1:1]=[C;z{1-2}:2][CX4R{0-2}:3][#1:4]>>[#1:4][O,S,Se,Te;X2:1][#6;z{1-2}:2]=[C,cz{0-1}R{0-1}:3] Rule 2: 1,5 (thio)keto/(thio)enol [O,S,Se,Te;X1:1]=[Cz1H0:2][C:5]=[C:6][CX4z0,NX3:3][#1:4]>>[#1:4][O,S,Se,Te;X2:1][Cz1:2]=[C:5][C:6]=[Cz0,N:3] Rule 4: special imine [Cz0R0X3:1]([C:5])=[C:2][Nz0:3][#1:4]>>[#1:4][Cz0R0X4:1]([C:5])[c:2]=[nz0:3] Rule 5: 1,3 aromatic heteroatom H shift [#1:4][N:1][C;e6:2]=[O,NX2:3]>>[NX2,nX2:1]=[C,c;e6:2][O,N:3][#1:4] Rule 6: 1,3 heteroatom H shift [N,n,S,s,O,o,Se,Te:1]=[NX2,nX2,C,c,P,p:2][N,n,S,O,Se,Te:3][#1:4]>>[#1:4][N,n,S,O,Se,Te:1][NX2,nX2,C,c,P,p:2]=[N,n,S,s,O,o,Se,Te:3] Rule 7: 1,5 (aromatic) heteroatom H shift (1) [nX2,NX2,S,O,Se,Te:1]=[C,c,nX2,NX2:6][C,c:5]=[C,c,nX2:2][N,n,S,s,O,o,Se,Te:3][#1:4]>>[#1:4][N,n,S,O,Se,Te:1][C,c,nX2,NX2:6]=[C,c:5][C,c,nX2:2]=[NX2,S,O,Se,Te:3] Rule 8: 1,5 aromatic heteroatom H shift (2) [n,s,o:1]=[c,n:6][c:5]=[c,n:2][n,s,o:3][#1:4]>>[#1:4][n,s,o:1][c,n:6]=[c:5][c,n:2]=[n,s,o:3] Rule 9: 1,7 (aromatic) heteroatom H shift [nX2,NX2,S,O,Se,Te,Cz0X3:1]=[c,C,NX2,nX2:6][C,c:5]=[C,c,NX2,nX2:2][C,c,NX2,nX2:7]=[C,c,NX2,nX2:8][N,n,S,s,O,o,Se,Te:3][#1:4]>>[#1:4][N,n,S,O,Se,Te,Cz0X4:1][C,c,NX2,nX2:6]=[C,c:5][C,c,NX2,nX2:2]=[C,c,NX2,nX2:7][C,c,NX2,nX2:8]=[NX2,S,O,Se,Te:3][C,c,NX2,nX2:8]=[NX2,S,O,Se,Te:3] Rule 10: 1,9 (aromatic) heteroatom H shift [#1:1][n,N,O:2][c,nX2,C:3]=[c,nX2,C:4][c,nX2:5]=[c,nX2:6][c,nX2:7]=[c,nX2:8][c,nX2,C:9]=[n,N,O:10]>>[N,n,O:2]=[C,c,nX2:3][c,nX2:4]=[c,nX2:5][c,nX2:6]=[c,nX2:7][c,nX2:8]=[c,nX2:9][n,O:10][#1:1] Rule 11: 1,11 (aromatic) heteroatom H shift [#1:1][n,N,O:2][c,nX2,C:3]=[c,nX2,C:4][c,nX2:5]=[c,C,nX2:6][c,C,nX2:7]=[c,C,nX2:8][c,nX2,C:9]=[c,C,nX2:10][c,C,nX2:11]=[nX2,NX2,O:12]>>[NX2,nX2,O:2]=[C,c,nX2:3][c,C,nX2:4]=[c,C,nX2:5][c,C,nX2:6]=[c,C,nX2:7][c,C,nX2:8]=[c,C,nX2:9][c,C,nX2:10]=[c,C,nX2:11][nX2,O:12][#1:1] Rule 12: furanones [#1:1][O,S,N:2][c,C;z2;r5:3]=[C,c;r5:4][c,C;r5:5]>>[O,S,N:2]=[Cz2r5:3][C&r5R{0-2}:4]([#1:1])[C,c;r5:5] Rule 13: keten/ynol exchange [O,S,Se,Te;X1:1]=[C:2]=[C:3][#1:4]>>[#1:4][O,S,Se,Te;X2:1][C:2]#[C:3] Rule 14: ionic nitro/aci-nitro [#1:1][C:2][N?:3]([O:5])=[O:4]>>[C:2]=[N?:3]([O-:5])[O:4][#1:1] checkcharges Rule 15: pentavalent nitro/aci-nitro [#1:1][C:2][N:3](=[O:5])=[O:4]>>[C:2]=[N:3](=[O:5])[O:4][#1:1] Rule 16: oxim/nitroso [#1:1][O:2][Nz1:3]=[C:4]>>[O:2]=[Nz1:3][C:4][#1:1] Rule 16: oxim/nitroso [#1:1][O:2][Nz1:3]=[C:4]>>[O:2]=[Nz1:3][C:4][#1:1] Rule 18: cyanic/iso-cyanic acids [#1:1][O:2][C:3]#[N:4]>>[O:2]=[C:3]=[N:4][#1:1] Rule 19: formamidinesulfinic acids [#1:1][O,N:2][C:3]=[S,Se,Te:4]=[O:5]>>[O,N:2]=[C:3][S,Se,Te:4][O:5][#1:1] Rule 20: isocyanides [#1:1][C0:2]#[N0:3]>>[C:2]#[N?:3][#1:1] checkcharges checkaro Rule 21: phosphonic acids [#1:1][O:2][P:3]>>[O:2]=[P:3][#1:1] #Scoring Structure fragment Scoring points Each carbocyclic aromatic ring +150 Each aromatic ring +100 Each benzoquinones (including imine and thio analogs, [C]1([C]=[C][C]([C]=[C]1)=,:[N,S,O])=,:[N,S,O], penalize cyclohexanetetrone-like structures)+25 Each oxim group (C=N[OH]) +4 Each double bond between a carbon atom (C) and an oxygen atom (O) +2 Each double bond between a nitrogen atom (N) and an oxygen atom (O) +2 Each double bond between a phosphorus atom (P) and an oxygen atom (O) +2 Each non-aromatic double bond between a carbon atom (C) and a heteroatom (X) +1 Each methyl group (penalize structures with terminal double bonds) +1 Each guanidine group with a double bond on the terminal nitrogen atom (NC(=N)[N][!H]) +1 Each guanidine group with an endocyclic double bond ([N;R][C;R]([N])=[N;R]) +2 Each P-H, S-H, Se-H and Te-H bond -1 Each aci-nitro group (C=N(=O)[OH]) -4
------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________ OpenBabel-Devel mailing list OpenBabel-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-devel