Over the past couple of days I've spent some time doing some tuning of the RDKit's SMILES parser.
I made a couple of minor changes here and there and saw some improvement before making a change in the YACC grammar used to generate the parser. This made the parser source a bit more difficult to read, but had a pretty significant impact on performance. In order to just measure performance of the SMILES parser, I did a benchmark using ~560K molecules from ZINC where I generated a molecule from SMILES without any sanitization. Here are the timings on my linux box for that benchmark: RDKit_2011_06_1: 50.6s RDKit_2012_03_1: 49.6s RDKit_2012_06_1: 57.6s [ <- I'm not sure I understand this outlier] svn: 30.6s I'm pretty pleased about that last number. :-) For those who are interested, here's the commit: https://sourceforge.net/p/rdkit/code/2159/ and the specific grammar changes that made the difference: https://sourceforge.net/p/rdkit/code/2159/tree//trunk/Code/GraphMol/SmilesParse/smiles.yy?diff=502dda6571b75b41b4b10063:2158 -greg ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Rdkit-devel mailing list Rdkit-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-devel