On Fri, Jun 29, 2012 at 7:45 AM, Greg Landrum <greg.land...@gmail.com> wrote: > > The test I devised was the following : > > 1) Read a molecule from the sdf > 2) generate canonical smiles csmi > 3) Parse csmi to give a new molecule > 4) generate a new canonical smiles and make sure it matches csmi > 5) Pick 5 random atoms in the molecule and, for each one: > 5a) generate a non-canonical smiles rooted at that atom > 5b) parse that non-canonical smiles to give a new molecule > 5c) generate a new canonical smiles from that and make sure it matches csmi >
<snip> > If anyone has recommendations for alternate test methodologies or test > sets, please let me know. These tests aren't exactly super fast, so > I'd like to avoid something like "just run the {pubchem, emolecules, > full ZINC} set", but if people are convinced that's necessary, I can > set it up and run it. Yesterday I successfully ran the same test across 500K compounds randomly selected from the ZINC "Drugs Now" set (http://zinc.docking.org/subsets/drugs-now). I also created a second testing approach: 1) Read a molecule m1 from the sdf 2) generate canonical smiles csmi 3) Parse csmi to give a new molecule m2 4) make sure all chiral centers in m1 and m2 have the same CIP code and that all double bonds where stereochemistry is indicated have the same stereochemistry. 5) Pick 5 random atoms in the molecule and, for each one: 5a) generate a non-canonical smiles rooted at that atom 5b) parse that non-canonical smiles to give a new molecule m3 5c) make sure all chiral centers in m1 and m3 have the same CIP code and that all double bonds where stereochemistry is indicated have the same stereochemistry. This ran without failures across the 500K ZINC compounds. I've got some confidence now that the code is correct, so I'm merging it onto the trunk this morning. -greg ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Rdkit-devel mailing list Rdkit-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-devel