The recent discussion of SSSR bugs prompted me to dig back through my emails to one I wrote on 20 November 2007 to the BlueObelisk mailing list. Here it is in its entirety.
Craig -------------------------------- Andrew suggested an algorithmic description of aromaticity. I think this is a really good idea. I propose, for argument's sake, that we should be using a LSSR, "Largest Set of Smallest Rings." It would go like this: A breadth-first search for rings, which terminates when all ring atoms have been included in at least one ring, but only after ALL cyclic paths of a given size have been enumerated. Since the LSSR is computed specifically for aromaticity detection, it would stop at some reasonable size, say 8-atom rings. For example, cubane's LSSR would have six (not five or four) 4-atom rings, because when you get the the four-atom-ring step of the algorithm, you don't stop until you've added all four-atom cyclic paths. Similarly, the LSSR for a prism (three square faces joined in a triangle) would have five rings (two 3-atom rings and three 4-atom rings, whereas the SSSR for that structure would only have four rings). Unlike SSSR, this is a very easy algorithm to code, and easy to explain. It's computation time is O(R) where R is the number of rings, and there is no arbitrariness to the LSSR; no matter how you compute it, you always get the same answer. For most "ordinary" molecules, the LSSR is the same as the SSSR. The LSSR and SSSR only differ with "cage" structures, where the rings themselves form rings, i.e. when there exists a set of R rings that "covers" all atoms, but R is fewer than the number of bonds that must be broken to make the structure acyclic. And in these cases, the choice of rings for the SSSR is arbitrary, whereas the LSSR is not. For aromaticity detection, this gives you an algorithmic description of rings that is very suitable for the aromaticity definition. It would go something like this: Compute the LSSR Sort the LSSR list smallest-to-largest for each atom in the molecule that isn't marked "aromatic" yet { for each ring in the LSSR (ordered smallest to biggest) that atom is in { if that ring is aromatic { mark all of its atoms "aromatic" } else { for each fused neighbor ring in the LSSR { if the two rings together are aromatic { mark all atoms in the two rings aromatic } else { for each ring of the LSSR fused to either of the pair { if the three rings together are aromatic { mark all atoms in all three rings aromatic } } } } } } I think that's about it. If I understand the chemistry, you never have to go to four rings (more than 14 atoms) to discover aromaticity. Or we could just decide that was the OpenSMILES rule, even if there are obscure, rare cases for which it isn't true. In extreme cases (fullerenes), this algorithm is still fast, O(nrings), because you stop at three rings. Something like 760 steps per ring in a fullerene, but it's still linear, not polynomial or exponential. And it's only something like 760 steps if the thing turns out to be completely non-aromatic. In a real fullerene, it would decide all the atoms were aromatic in the first pass of the outer loop, and be finished. Craig ------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo _______________________________________________ OpenBabel-Devel mailing list OpenBabel-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-devel