The recent discussion of SSSR bugs prompted me to dig back through my emails to
one I wrote on 20 November 2007 to the BlueObelisk mailing list. Here it is in
its entirety.
Craig
--------------------------------
Andrew suggested an algorithmic description of aromaticity. I think this is a
really good idea.
I propose, for argument's sake, that we should be using a LSSR, "Largest Set of
Smallest Rings." It would go like this: A breadth-first search for rings,
which terminates when all ring atoms have been included in at least one ring,
but only after ALL cyclic paths of a given size have been enumerated. Since
the LSSR is computed specifically for aromaticity detection, it would stop at
some reasonable size, say 8-atom rings.
For example, cubane's LSSR would have six (not five or four) 4-atom rings,
because when you get the the four-atom-ring step of the algorithm, you don't
stop until you've added all four-atom cyclic paths. Similarly, the LSSR for a
prism (three square faces joined in a triangle) would have five rings (two
3-atom rings and three 4-atom rings, whereas the SSSR for that structure would
only have four rings).
Unlike SSSR, this is a very easy algorithm to code, and easy to explain. It's
computation time is O(R) where R is the number of rings, and there is no
arbitrariness to the LSSR; no matter how you compute it, you always get the
same answer.
For most "ordinary" molecules, the LSSR is the same as the SSSR. The LSSR and
SSSR only differ with "cage" structures, where the rings themselves form rings,
i.e. when there exists a set of R rings that "covers" all atoms, but R is fewer
than the number of bonds that must be broken to make the structure acyclic.
And in these cases, the choice of rings for the SSSR is arbitrary, whereas the
LSSR is not.
For aromaticity detection, this gives you an algorithmic description of rings
that is very suitable for the aromaticity definition. It would go something
like this:
Compute the LSSR
Sort the LSSR list smallest-to-largest
for each atom in the molecule that isn't marked "aromatic" yet {
for each ring in the LSSR (ordered smallest to biggest) that atom is in {
if that ring is aromatic {
mark all of its atoms "aromatic"
}
else {
for each fused neighbor ring in the LSSR {
if the two rings together are aromatic {
mark all atoms in the two rings aromatic
}
else {
for each ring of the LSSR fused to either of the pair {
if the three rings together are aromatic {
mark all atoms in all three rings aromatic
}
}
}
}
}
}
I think that's about it. If I understand the chemistry, you never have to go
to four rings (more than 14 atoms) to discover aromaticity. Or we could just
decide that was the OpenSMILES rule, even if there are obscure, rare cases for
which it isn't true.
In extreme cases (fullerenes), this algorithm is still fast, O(nrings), because
you stop at three rings. Something like 760 steps per ring in a fullerene, but
it's still linear, not polynomial or exponential. And it's only something like
760 steps if the thing turns out to be completely non-aromatic. In a real
fullerene, it would decide all the atoms were aromatic in the first pass of the
outer loop, and be finished.
Craig
------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit. See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
OpenBabel-Devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel