Hi all,

I am trying to get my head around the tautomer function. Here a few
issues, could you help me getting into touch with Tim, or someone else
being able to help with the tautomerization functionality?

First, the canonical tautomer function is less tautomeric than I would wish for.
The following molecules are not normalized correctly, I get two
different tautomeric versions. Test file and debugging log files are
attached.
In both cases I would expect: CC(=O)Cc1ccccc1

Second, the full enumeration is just not working. In the case of NOT
defining "-c" I would expect to get all possible tautomers as defined
by the Functor class
class Functor : public OpenBabel::TautomerFunctor
but I just get one structure all the time? Any clues what goes wrong?

Third, I would highly recommend that we replace the tautomerization
framework with an alternative solution, e.g. the SMIRKS ennumeriation
from Markus Sitzman. The SMIRKS patterns are part of his publication
Article (sin10)
Sitzmann, M.; Ihlenfeldt, W.-D. & Nicklaus, M. C.
Tautomerism in large databases
J Comput Aided Mol Des, 2010, 24, 521-551
DOI 10.1007/s10822-010-9346-4
PMID 20512400

In other words, as defined in the SMIRKS and ranking rules, we need
just a recursive execution, store the unique canonical SMILES, rank
them, and take the highest scoring as tautomeric SMILES.

Or we should at least put the interfaces in-place to allow users to
use their tautomerization framework of choice.

Thoughts?

P.S.: Anyone who can take this on?

Cheers
/.Joerg

https://plus.google.com/116731043002877336055/
Atom Types:
  0: Hybridized
  1: Hybridized
  2: Hybridized
  3: Hybridized
  4: Hybridized
  5: Hybridized
  6: Other
  7: Hybridized
  8: Acceptor
  9: Other
Bond Types:
  0: Unassigned
  1: Unassigned
  2: Unassigned
  3: Unassigned
  4: Unassigned
  5: Unassigned
  6: Assigned
  7: Assigned
  8: Unassigned
  9: Assigned
Atom Types:
  0: Hybridized
  1: Hybridized
  2: Hybridized
  3: Hybridized
  4: Hybridized
  5: Hybridized
  6: Other
  7: Hybridized
  8: Unassigned
  9: Other
EnumerateRecursive
  Assigned 8 Acceptor
    -> Rule 5: Assign 7-8 Double
    -> Rule 5: Assign 5-0 Double
    -> Rule 5: Assign 1-2 Double
    -> Rule 4: Assign 2-3 Single
    -> Rule 4: Assign 4-5 Single
    -> Rule 5: Assign 3-4 Double
  --> LeafNode reached...
Change?
8 
A 
8 
A 
  Backtrack... 8
CC(=O)Cc1ccccc1 
Atom Types:
  0: Hybridized
  1: Hybridized
  2: Hybridized
  3: Hybridized
  4: Hybridized
  5: Hybridized
  6: Hybridized
  7: Hybridized
  8: Donor
  9: Other
Bond Types:
  0: Unassigned
  1: Unassigned
  2: Unassigned
  3: Unassigned
  4: Unassigned
  5: Unassigned
  6: Unassigned
  7: Unassigned
  8: Unassigned
  9: Assigned
Atom Types:
  0: Hybridized
  1: Hybridized
  2: Hybridized
  3: Hybridized
  4: Hybridized
  5: Hybridized
  6: Hybridized
  7: Hybridized
  8: Unassigned
  9: Other
EnumerateRecursive
  Assigned 8 Donor
    -> Rule 1: Assign 7-8 Single
    -> Rule 5: Assign 6-7 Double
    -> Rule 4: Assign 0-6 Single
    -> Rule 5: Assign 5-0 Double
    -> Rule 5: Assign 1-2 Double
    -> Rule 4: Assign 2-3 Single
    -> Rule 4: Assign 4-5 Single
    -> Rule 5: Assign 3-4 Double
  --> LeafNode reached...
Change?
8 
D 
  Change 8 to Acceptor
    -> Rule 5: Assign 1-2 Double
    -> Rule 5: Assign 7-8 Double
    -> Rule 4: Assign 2-3 Single
    -> Rule 4: Assign 6-7 Single
    -> Rule 5: Assign 3-4 Double
    -> Rule 5: Assign 0-6 Double
    -> Rule 4: Assign 5-0 Single
    -> Rule 4: Assign 4-5 Single
invalid Acceptor/Hybridized 1
8 
A 
  Backtrack... 8
CC(=Cc1ccccc1)O 

Attachment: test4.sdf
Description: Binary data

Atom Types:
  0: Hybridized
  1: Hybridized
  2: Hybridized
  3: Hybridized
  4: Hybridized
  5: Hybridized
  6: Other
  7: Hybridized
  8: Acceptor
  9: Other
Bond Types:
  0: Unassigned
  1: Unassigned
  2: Unassigned
  3: Unassigned
  4: Unassigned
  5: Unassigned
  6: Assigned
  7: Assigned
  8: Unassigned
  9: Assigned
Atom Types:
  0: Hybridized
  1: Hybridized
  2: Hybridized
  3: Hybridized
  4: Hybridized
  5: Hybridized
  6: Other
  7: Hybridized
  8: Unassigned
  9: Other
EnumerateRecursive
  Assigned 8 Acceptor
    -> Rule 5: Assign 7-8 Double
    -> Rule 5: Assign 5-0 Double
    -> Rule 5: Assign 1-2 Double
    -> Rule 4: Assign 2-3 Single
    -> Rule 4: Assign 4-5 Single
    -> Rule 5: Assign 3-4 Double
  --> LeafNode reached...
CC(=O)Cc1ccccc1 
Change?
8 
A 
8 
A 
  Backtrack... 8
Atom Types:
  0: Hybridized
  1: Hybridized
  2: Hybridized
  3: Hybridized
  4: Hybridized
  5: Hybridized
  6: Hybridized
  7: Hybridized
  8: Donor
  9: Other
Bond Types:
  0: Unassigned
  1: Unassigned
  2: Unassigned
  3: Unassigned
  4: Unassigned
  5: Unassigned
  6: Unassigned
  7: Unassigned
  8: Unassigned
  9: Assigned
Atom Types:
  0: Hybridized
  1: Hybridized
  2: Hybridized
  3: Hybridized
  4: Hybridized
  5: Hybridized
  6: Hybridized
  7: Hybridized
  8: Unassigned
  9: Other
EnumerateRecursive
  Assigned 8 Donor
    -> Rule 1: Assign 7-8 Single
    -> Rule 5: Assign 6-7 Double
    -> Rule 4: Assign 0-6 Single
    -> Rule 5: Assign 5-0 Double
    -> Rule 5: Assign 1-2 Double
    -> Rule 4: Assign 2-3 Single
    -> Rule 4: Assign 4-5 Single
    -> Rule 5: Assign 3-4 Double
  --> LeafNode reached...
C/C(=C\c1ccccc1)/O      
Change?
8 
D 
  Change 8 to Acceptor
    -> Rule 5: Assign 1-2 Double
    -> Rule 5: Assign 7-8 Double
    -> Rule 4: Assign 2-3 Single
    -> Rule 4: Assign 6-7 Single
    -> Rule 5: Assign 3-4 Double
    -> Rule 5: Assign 0-6 Double
    -> Rule 4: Assign 5-0 Single
    -> Rule 4: Assign 4-5 Single
invalid Acceptor/Hybridized 1
8 
A 
  Backtrack... 8
#Article (sin10)
#Sitzmann, M.; Ihlenfeldt, W.-D. & Nicklaus, M. C.
#Tautomerism in large databases
#J Comput Aided Mol Des, 2010, 24, 521-551
#DOI 10.1007/s10822-010-9346-4
#PMID 20512400 
Rule 1: 1,3 (thio)keto/(thio)enol
[O,S,Se,Te;X1:1]=[C;z{1-2}:2][CX4R{0-2}:3][#1:4]>>[#1:4][O,S,Se,Te;X2:1][#6;z{1-2}:2]=[C,cz{0-1}R{0-1}:3]
Rule 2: 1,5 (thio)keto/(thio)enol
[O,S,Se,Te;X1:1]=[Cz1H0:2][C:5]=[C:6][CX4z0,NX3:3][#1:4]>>[#1:4][O,S,Se,Te;X2:1][Cz1:2]=[C:5][C:6]=[Cz0,N:3]
Rule 4: special imine
[Cz0R0X3:1]([C:5])=[C:2][Nz0:3][#1:4]>>[#1:4][Cz0R0X4:1]([C:5])[c:2]=[nz0:3]
Rule 5: 1,3 aromatic heteroatom H shift
[#1:4][N:1][C;e6:2]=[O,NX2:3]>>[NX2,nX2:1]=[C,c;e6:2][O,N:3][#1:4]
Rule 6: 1,3 heteroatom H shift
[N,n,S,s,O,o,Se,Te:1]=[NX2,nX2,C,c,P,p:2][N,n,S,O,Se,Te:3][#1:4]>>[#1:4][N,n,S,O,Se,Te:1][NX2,nX2,C,c,P,p:2]=[N,n,S,s,O,o,Se,Te:3]
Rule 7: 1,5 (aromatic) heteroatom H shift (1)
[nX2,NX2,S,O,Se,Te:1]=[C,c,nX2,NX2:6][C,c:5]=[C,c,nX2:2][N,n,S,s,O,o,Se,Te:3][#1:4]>>[#1:4][N,n,S,O,Se,Te:1][C,c,nX2,NX2:6]=[C,c:5][C,c,nX2:2]=[NX2,S,O,Se,Te:3]
Rule 8: 1,5 aromatic heteroatom H shift (2)
[n,s,o:1]=[c,n:6][c:5]=[c,n:2][n,s,o:3][#1:4]>>[#1:4][n,s,o:1][c,n:6]=[c:5][c,n:2]=[n,s,o:3]
Rule 9: 1,7 (aromatic) heteroatom H shift
[nX2,NX2,S,O,Se,Te,Cz0X3:1]=[c,C,NX2,nX2:6][C,c:5]=[C,c,NX2,nX2:2][C,c,NX2,nX2:7]=[C,c,NX2,nX2:8][N,n,S,s,O,o,Se,Te:3][#1:4]>>[#1:4][N,n,S,O,Se,Te,Cz0X4:1][C,c,NX2,nX2:6]=[C,c:5][C,c,NX2,nX2:2]=[C,c,NX2,nX2:7][C,c,NX2,nX2:8]=[NX2,S,O,Se,Te:3][C,c,NX2,nX2:8]=[NX2,S,O,Se,Te:3]
Rule 10: 1,9 (aromatic) heteroatom H shift
[#1:1][n,N,O:2][c,nX2,C:3]=[c,nX2,C:4][c,nX2:5]=[c,nX2:6][c,nX2:7]=[c,nX2:8][c,nX2,C:9]=[n,N,O:10]>>[N,n,O:2]=[C,c,nX2:3][c,nX2:4]=[c,nX2:5][c,nX2:6]=[c,nX2:7][c,nX2:8]=[c,nX2:9][n,O:10][#1:1]
Rule 11: 1,11 (aromatic) heteroatom H shift
[#1:1][n,N,O:2][c,nX2,C:3]=[c,nX2,C:4][c,nX2:5]=[c,C,nX2:6][c,C,nX2:7]=[c,C,nX2:8][c,nX2,C:9]=[c,C,nX2:10][c,C,nX2:11]=[nX2,NX2,O:12]>>[NX2,nX2,O:2]=[C,c,nX2:3][c,C,nX2:4]=[c,C,nX2:5][c,C,nX2:6]=[c,C,nX2:7][c,C,nX2:8]=[c,C,nX2:9][c,C,nX2:10]=[c,C,nX2:11][nX2,O:12][#1:1]
Rule 12: furanones
[#1:1][O,S,N:2][c,C;z2;r5:3]=[C,c;r5:4][c,C;r5:5]>>[O,S,N:2]=[Cz2r5:3][C&r5R{0-2}:4]([#1:1])[C,c;r5:5]
Rule 13: keten/ynol exchange
[O,S,Se,Te;X1:1]=[C:2]=[C:3][#1:4]>>[#1:4][O,S,Se,Te;X2:1][C:2]#[C:3]
Rule 14: ionic nitro/aci-nitro
[#1:1][C:2][N?:3]([O–:5])=[O:4]>>[C:2]=[N?:3]([O-:5])[O:4][#1:1] checkcharges
Rule 15: pentavalent nitro/aci-nitro
[#1:1][C:2][N:3](=[O:5])=[O:4]>>[C:2]=[N:3](=[O:5])[O:4][#1:1]
Rule 16: oxim/nitroso
[#1:1][O:2][Nz1:3]=[C:4]>>[O:2]=[Nz1:3][C:4][#1:1]
Rule 16: oxim/nitroso
[#1:1][O:2][Nz1:3]=[C:4]>>[O:2]=[Nz1:3][C:4][#1:1]
Rule 18: cyanic/iso-cyanic acids
[#1:1][O:2][C:3]#[N:4]>>[O:2]=[C:3]=[N:4][#1:1]
Rule 19: formamidinesulfinic acids
[#1:1][O,N:2][C:3]=[S,Se,Te:4]=[O:5]>>[O,N:2]=[C:3][S,Se,Te:4][O:5][#1:1]
Rule 20: isocyanides
[#1:1][C0:2]#[N0:3]>>[C–:2]#[N?:3][#1:1] checkcharges checkaro
Rule 21: phosphonic acids
[#1:1][O:2][P:3]>>[O:2]=[P:3][#1:1]

#Scoring
Structure fragment Scoring points
Each carbocyclic aromatic ring +150
Each aromatic ring +100
Each benzoquinones (including imine and thio analogs, 
[C]1([C]=[C][C]([C]=[C]1)=,:[N,S,O])=,:[N,S,O], penalize 
cyclohexanetetrone-like structures)+25
Each oxim group (C=N[OH]) +4
Each double bond between a carbon atom (C) and an oxygen atom (O) +2
Each double bond between a nitrogen atom (N) and an oxygen atom (O) +2
Each double bond between a phosphorus atom (P) and an oxygen atom (O) +2
Each non-aromatic double bond between a carbon atom (C) and a heteroatom (X) +1
Each methyl group (penalize structures with terminal double bonds) +1
Each guanidine group with a double bond on the terminal nitrogen atom 
(NC(=N)[N][!H]) +1
Each guanidine group with an endocyclic double bond ([N;R][C;R]([N])=[N;R]) +2
Each P-H, S-H, Se-H and Te-H bond -1
Each aci-nitro group (C=N(=O)[OH]) -4
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to