On Mon, Oct 11, 2010 at 6:24 PM, Geoffrey Hutchison
<ge...@geoffhutchison.net> wrote:
>> In general, performance is good for relase now. The canonconsistent
>> test completes in 30 seconds here. Converting cansmi-roundtrip.smi to
>> can takes 16 seconds, to smi takes 8 so canonical coding takes about
>> 50% of the total time when doing a file conversion.
>
> I've added some revisions to the kekule.cpp code, including a timeout. If the 
> timeout triggers, then the ring-fragment-based LSSR analysis kicks in, which 
> is imperfect, but much faster.
>
> At the moment, I'm using a 15-second timeout, since I mainly want to prevent 
> "endless" recursion in graphene, nanotubes, big fullerenes, etc. I don't want 
> to set it too short, since the LSSR analysis has some clear bugs on 
> meaningful structures like c60, porphyrins, etc.

It should not be too hard to fix the lssr issues I think. I'll have a
look after the canonical coding is done.

> The "metallocene hack" sounds about right. I think the key test for Craig was:
> C12[Pr]3456789%10%11%12%13(C1C3C%13C24)(C1C9C%10C7C%121)C1C5C%11C8C61
>
> This has 3 identical rings, but I would think finding >8 neighbors with 
> identical symmetry classes would trigger the optimization you discuss.

Yes, there are 15 neighbors with the same symmetry class here. The
same structure with two rings (10 nbrs -> 10!) already takes about
500MB of memory and doesn't complete within the 5 seconds timeout.
With the optimization, this would be reduced to milliseconds with no
significant memory usage.

I'm going to get something to eat first and after that I'll implement
this normalization, test 1 million molecules and commit.

Without these metallocene optimizations, I can already shuffle (20x)
215 smiles/second (average for first 200000 eMolecules, using 5
processes, 15 minutes total) which is a huge improvement compared to a
few days ago when Craig posted:

"Using the latest code from this morning (-r4157), I ran four
processes all day long for a total of about 280,000 SMILES."

Note: In these 200000 molecules, there are 17 errors but these are
metallocene timeouts that would be resolved.

Tim

> -Geoff
>
> P.S. I can see that we've already fixed some of Craig's issues, e.g.:
> c12=c(nn2)ssnc1S        6137697
>
>

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to