On Mon, Jun 21, 2010 at 7:24 PM, Tim Vandermeersch
<tim.vandermeer...@gmail.com> wrote:
> On Mon, Jun 21, 2010 at 6:43 PM, Craig A. James <cja...@emolecules.com> wrote:
>> The recent discussion of SSSR bugs prompted me to dig back through my emails 
>> to one I wrote on 20 November 2007 to the BlueObelisk mailing list.  Here it 
>> is in its entirety.
>>
>> Craig
>>
>> --------------------------------
>>
>>
>> Andrew suggested an algorithmic description of aromaticity.  I think this is 
>> a really good idea.
>>
>> I propose, for argument's sake, that we should be using a LSSR, "Largest Set 
>> of Smallest Rings."  It would go like this: A breadth-first search for 
>> rings, which terminates when all ring atoms have been included in at least 
>> one ring, but only after ALL cyclic paths of a given size have been 
>> enumerated.  Since the LSSR is computed specifically for aromaticity 
>> detection, it would stop at some reasonable size, say 8-atom rings.
>>
>> For example, cubane's LSSR would have six (not five or four) 4-atom rings, 
>> because when you get the the four-atom-ring step of the algorithm, you don't 
>> stop until you've added all four-atom cyclic paths.  Similarly, the LSSR for 
>> a prism (three square faces joined in a triangle) would have five rings (two 
>> 3-atom rings and three 4-atom rings, whereas the SSSR for that structure 
>> would only have four rings).
>
> Implementing this shouldn't be to hard. Changing the last loop in
> OBRingSearch should be all. It currently checks if a ring is contained
> in the union of all other rings. Checking with a single ring should
> give the behavior you describe (assuming we don't check frj).  The
> SSSR should still be available though. Is there a reference for the
> LSSR or is it a suggestion for a name? If there is no reference there
> might be some unforeseen cases. For example, in cubane there are 3
> 6-atom-rings not fully overlapping with a single 4-atom-ring. How do
> we decide if the ring should not go in the set? (note: if you check
> all 4-rings, the 6th 4-ring wouldn't be found) This may not be
> important for aromaticity though.

There seems to be an easy way to obtain the LSSR you describe:

* Compute the rings (same as for SSSR, the SSSR is selected from this)
* Use alternative RemoveRedundant: When checking rings of size n, only
consider rings of sizes n-1 and lower

I've tested this with tetrahedron, cube (should work for other
platonic solids too) and some bridged rings. A test set with
structures is needed to further test this though. I can make this an
option (SSSR or LSSR, without breaking ABI) and commit it to trunk...

Since we can't use frère jacques number anymore to check the number of
rings, the best way to test this would be to reorder atoms (cfr
Canonical Code testing) and check the resulting set. The resulting
ring set will always be the same (i.e. canonical, using unique ids for
example). This alone is a valuable test but comparing the found rings
to manually assigned rings would also help to make sure the canonical
LSSR rings are the ones we expect. I can provide test structures for
substructure search. I don't really know much about the aromaticity
requirements though. Craig: Do you have test cases for aromaticity?

>> Unlike SSSR, this is a very easy algorithm to code, and easy to explain. 
>> It's computation time is O(R) where R is the number of rings, and there is 
>> no arbitrariness to the LSSR; no matter how you compute it, you always get 
>> the same answer.
>>
>> For most "ordinary" molecules, the LSSR is the same as the SSSR.  The LSSR 
>> and SSSR only differ with "cage" structures, where the rings themselves form 
>> rings, i.e. when there exists a set of R rings that "covers" all atoms, but 
>> R is fewer than the number of bonds that must be broken to make the 
>> structure acyclic.  And in these cases, the choice of rings for the SSSR is 
>> arbitrary, whereas the LSSR is not.

I don't follow the part about number of bonds that must be broken to
make the structure acyclic. Do you mean closure bonds?  Regardless, we
probably mean the same thing with LSSR.

>>
>> For aromaticity detection, this gives you an algorithmic description of 
>> rings that is very suitable for the aromaticity definition.  It would go 
>> something like this:
>>
>>  Compute the LSSR
>>  Sort the LSSR list smallest-to-largest
>>  for each atom in the molecule that isn't marked "aromatic" yet {
>>    for each ring in the LSSR (ordered smallest to biggest) that atom is in {
>>      if that ring is aromatic {
>>        mark all of its atoms "aromatic"
>>      }
>>      else {
>>       for each fused neighbor ring in the LSSR {
>>         if the two rings together are aromatic {
>>           mark all atoms in the two rings aromatic
>>         }
>>         else {
>>           for each ring of the LSSR fused to either of the pair {
>>             if the three rings together are aromatic {
>>               mark all atoms in all three rings aromatic
>>             }
>>           }
>>         }
>>       }
>>     }
>>   }
>>
>>
>> I think that's about it.  If I understand the chemistry, you never have to 
>> go to four rings (more than 14 atoms) to discover aromaticity.  Or we could 
>> just decide that was the OpenSMILES rule, even if there are obscure, rare 
>> cases for which it isn't true.
>>
>> In extreme cases (fullerenes), this algorithm is still fast, O(nrings), 
>> because you stop at three rings.  Something like 760 steps per ring in a 
>> fullerene, but it's still linear, not polynomial or exponential.  And it's 
>> only something like 760 steps if the thing turns out to be completely 
>> non-aromatic.  In a real fullerene, it would decide all the atoms were 
>> aromatic in the first pass of the outer loop, and be finished.
>>
>> Craig
>>
>> ------------------------------------------------------------------------------
>> ThinkGeek and WIRED's GeekDad team up for the Ultimate
>> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
>> lucky parental unit.  See the prize list and enter to win:
>> http://p.sf.net/sfu/thinkgeek-promo
>> _______________________________________________
>> OpenBabel-Devel mailing list
>> OpenBabel-Devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/openbabel-devel
>>
>

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to