The recent discussion of SSSR bugs prompted me to dig back through my emails to 
one I wrote on 20 November 2007 to the BlueObelisk mailing list.  Here it is in 
its entirety.

Craig

--------------------------------


Andrew suggested an algorithmic description of aromaticity.  I think this is a 
really good idea.

I propose, for argument's sake, that we should be using a LSSR, "Largest Set of 
Smallest Rings."  It would go like this: A breadth-first search for rings, 
which terminates when all ring atoms have been included in at least one ring, 
but only after ALL cyclic paths of a given size have been enumerated.  Since 
the LSSR is computed specifically for aromaticity detection, it would stop at 
some reasonable size, say 8-atom rings.

For example, cubane's LSSR would have six (not five or four) 4-atom rings, 
because when you get the the four-atom-ring step of the algorithm, you don't 
stop until you've added all four-atom cyclic paths.  Similarly, the LSSR for a 
prism (three square faces joined in a triangle) would have five rings (two 
3-atom rings and three 4-atom rings, whereas the SSSR for that structure would 
only have four rings).

Unlike SSSR, this is a very easy algorithm to code, and easy to explain. It's 
computation time is O(R) where R is the number of rings, and there is no 
arbitrariness to the LSSR; no matter how you compute it, you always get the 
same answer.

For most "ordinary" molecules, the LSSR is the same as the SSSR.  The LSSR and 
SSSR only differ with "cage" structures, where the rings themselves form rings, 
i.e. when there exists a set of R rings that "covers" all atoms, but R is fewer 
than the number of bonds that must be broken to make the structure acyclic.  
And in these cases, the choice of rings for the SSSR is arbitrary, whereas the 
LSSR is not.

For aromaticity detection, this gives you an algorithmic description of rings 
that is very suitable for the aromaticity definition.  It would go something 
like this:

  Compute the LSSR
  Sort the LSSR list smallest-to-largest
  for each atom in the molecule that isn't marked "aromatic" yet {
    for each ring in the LSSR (ordered smallest to biggest) that atom is in {
      if that ring is aromatic {
        mark all of its atoms "aromatic"
      }
      else {
       for each fused neighbor ring in the LSSR {
         if the two rings together are aromatic {
           mark all atoms in the two rings aromatic
         }
         else {
           for each ring of the LSSR fused to either of the pair {
             if the three rings together are aromatic {
               mark all atoms in all three rings aromatic
             }
           }
         }
       }
     }
   }


I think that's about it.  If I understand the chemistry, you never have to go 
to four rings (more than 14 atoms) to discover aromaticity.  Or we could just 
decide that was the OpenSMILES rule, even if there are obscure, rare cases for 
which it isn't true.

In extreme cases (fullerenes), this algorithm is still fast, O(nrings), because 
you stop at three rings.  Something like 760 steps per ring in a fullerene, but 
it's still linear, not polynomial or exponential.  And it's only something like 
760 steps if the thing turns out to be completely non-aromatic.  In a real 
fullerene, it would decide all the atoms were aromatic in the first pass of the 
outer loop, and be finished.

Craig

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to