On Mon, Oct 11, 2010 at 6:24 PM, Geoffrey Hutchison <ge...@geoffhutchison.net> wrote: >> In general, performance is good for relase now. The canonconsistent >> test completes in 30 seconds here. Converting cansmi-roundtrip.smi to >> can takes 16 seconds, to smi takes 8 so canonical coding takes about >> 50% of the total time when doing a file conversion. > > I've added some revisions to the kekule.cpp code, including a timeout. If the > timeout triggers, then the ring-fragment-based LSSR analysis kicks in, which > is imperfect, but much faster. > > At the moment, I'm using a 15-second timeout, since I mainly want to prevent > "endless" recursion in graphene, nanotubes, big fullerenes, etc. I don't want > to set it too short, since the LSSR analysis has some clear bugs on > meaningful structures like c60, porphyrins, etc.
It should not be too hard to fix the lssr issues I think. I'll have a look after the canonical coding is done. > The "metallocene hack" sounds about right. I think the key test for Craig was: > C12[Pr]3456789%10%11%12%13(C1C3C%13C24)(C1C9C%10C7C%121)C1C5C%11C8C61 > > This has 3 identical rings, but I would think finding >8 neighbors with > identical symmetry classes would trigger the optimization you discuss. Yes, there are 15 neighbors with the same symmetry class here. The same structure with two rings (10 nbrs -> 10!) already takes about 500MB of memory and doesn't complete within the 5 seconds timeout. With the optimization, this would be reduced to milliseconds with no significant memory usage. I'm going to get something to eat first and after that I'll implement this normalization, test 1 million molecules and commit. Without these metallocene optimizations, I can already shuffle (20x) 215 smiles/second (average for first 200000 eMolecules, using 5 processes, 15 minutes total) which is a huge improvement compared to a few days ago when Craig posted: "Using the latest code from this morning (-r4157), I ran four processes all day long for a total of about 280,000 SMILES." Note: In these 200000 molecules, there are 17 errors but these are metallocene timeouts that would be resolved. Tim > -Geoff > > P.S. I can see that we've already fixed some of Craig's issues, e.g.: > c12=c(nn2)ssnc1S 6137697 > > ------------------------------------------------------------------------------ Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb _______________________________________________ OpenBabel-Devel mailing list OpenBabel-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-devel