I've been investigating some of the weird SMILES strings distributed by eMolecules, that can't be read into other cheminformatics packages. A significant fraction appear to be molecules with nonsense formal charges on aromatic atoms, which then fail to be Kekulized given the mismatched valence states. Two examples include: c1ccc2c(c1)[n+2](c(CO)c(CO)[n+]2[O-])O and Fc1c(F)c(F)[c+7](c(c1F)F)[Ti]1234([C+6]5C=CC=C5)([C+6]5[C+7]3=[C+7]2[C+7]1=[C+7]45)[c+7]1c(F)c(F)c(c(c1F)F)F 44386258 The second example is complete nonsense, because as explained on my "can't break the laws of physics" blog post, a carbon can't possibly have a formal charge of +7 with only six protons. Given this brokenness, they shouldn't be getting marked as aromatic. Digging deeper into where these atoms may be getting classified as aromatic led me to OpenBabel's aromatic.txt and indeed the current trunk version of openbabel will blindly transform "c1ccc2c(c1)[N+](=C(C(=[N+2]2O)CO)CO)[O-]" into the first string above. The problem appears to be that the current SMARTS patterns in aromatic.txt are too forgiving, allowing any formal charge to be accepted as aromatic. I suspect the pattern's author may have assumed that, like SMILES, not specifying a formal charge implies no charge. Indeed, the OpenSMILES specification implicitly repeats this, by listing the SMILES but not the SMARTS. The attached patch resolves the issue, by tightening these SMARTS patterns. The relevant ideology is "first do no harm"; a ring system shouldn't be considered aromatic unless we can be certain we can correctly Kekulize it back at some point in the future. "[n+2]", if it did exist and was allowed (it isn't on Daylight SMILES, c.f. daycgi/depict), should be isoelectronic with boron, three valent, potentially aromatic, but contributing zero pi-electrons. I've also noticed that "genheaders.sh" hasn't been run since some of the most recent changes to the data/*.txt files, meaning some of the data/*.h files are out of sync. If this proposed patch gets accepted, running genheader.sh to regenerate aromatic.h would also address this. Please let me know what you think? Roger -- Roger Sayle, Ph.D.
CEO and founder NextMove Software Limited Registered in England No. 07588305 Registered Office: Innovation Centre (Unit 23), Cambridge Science Park, Cambridge CB4 0EY |
############################################################################## # # # Open Babel file: aromatic.txt # # # # # # Copyright (c) 1998-2001 by OpenEye Scientific Software, Inc. # # Some portions Copyright (c) 2001-2005 Geoffrey R. Hutchison # # Part of the Open Babel package, under the GNU General Public License (GPL)# # # # SMARTS patterns with minimum and maximum pi-electrons contributed to an # # aromatic system (used by typer.cpp:OBAromaticTyper) # # The LAST PATTERN MATCHED is used to assign values, so that patterns should # # be ordered from more general to more specific # # # ##############################################################################
#PATTERN MIN MAX #carbon patterns [#6rD2+0] 1 1 # exo ketone or alcohol -- don't know which [#6rD3+0]~!@[#8] 0 1 [#6rD2+,#6rD3+] 1 1 [#6r+0]=@* 1 1 [#6rD3+0]=!@* 1 1 # external double bonds to hetero atoms contribute no electrons to the # aromatic systems -- quinoid systems are non-aromatic, e.g. 1,4-benzoquinone [#6rD3+0]=!@[!#6] 0 0 [#6rD3-] 2 2 #nitrogen patterns [#7rD2+0] 1 2 [#7rD3+0] 1 2 [#7r+0](-@*)-@* 1 2 [#7rD2+0]=@* 1 1 [#7rD3+] 1 1 [#7rD3+0]=O 1 1 [#7rD2-] 2 2 #oxygen patterns [#8r+0] 2 2 [#8r+] 1 1 #sulfur patterns [#16rD2+0] 2 2 [#16rD2+] 1 1 [#16rD3+0]=!@O 2 2 #other misc patterns # Accounts Chem Res 1978 11 p. 153 # phosphole, phosphabenzene (not v. aromatic) [#15rD3+0] 2 2 # selenophene [#34rD2+0] 2 2 # arsabenzene, etc. (*really* not v. aromatic) #[#33rD3+0] 2 2 # tellurophene, etc. (*really* not v. aromatic) #[#52rD2+0] 2 2 # stilbabenzene, etc. (very little aromatic character) #[#51rD3+0] 2 2
aromatic.txt.patch
Description: Binary data
------------------------------------------------------------------------------ WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________ OpenBabel-Devel mailing list OpenBabel-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-devel