On May 27, 2011, at 6:01 AM, Greg Landrum wrote: > And now a more philosophical point about this. ... > The idea of the MACCS keys is simple: a limited set of structural keys > that can be used to speed up substructure searches and which have > since been (ab)used for chemical similarity. It seems like it would be > a lot more helpful to the community if we had a set of keys like this > that is based on a truly open definition. ... > What do you think Andrew? Want to work together on this?
Sure! I've been working on my PubChem-like substructure keys all this week. The pattern definitions are available at http://code.google.com/p/chem-fingerprints/source/browse/chemfp/substruct.patterns Validation is the hardest part, since I mostly only have the PubChem substructure bits as an oracle of what I'm supposed to get. I think I'm down to differences in how CACTVS does aromaticity (lots of mismatches because of that!) and lack of support for PubChem's PUBCHEM_NONSTANDARDBOND bond definitions ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_sdtags.txt I've got implementations for OpenBabel, RDKit, and OEChem. If you want to try it out, after you've installed the package, rdkit2fps --substruct $STRUCTURE_FILENAME I've also converted RDKit's MACCS patterns into my format definition at http://code.google.com/p/chem-fingerprints/source/browse/chemfp/rdmaccs.patterns which I've used in part as a cross-test to make sure my implementation using RDKit matches RDKit's own implementation. I've been calling it "rdmaccs". Any problems with that? Want another name? My hope is to get this out in a 0.95 (or perhaps "1.0 alpha"?) build today and announce it. What's mostly lacking are: - full validation (very hard, given aromaticity differences) - a solid test suite (that's amazingly hard to do) - documentation Oh yeah, and write up some sort of paper on what I did. Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ vRanger cuts backup time in half-while increasing security. With the market-leading solution for virtual backup and recovery, you get blazing-fast, flexible, and affordable data protection. Download your free trial now. http://p.sf.net/sfu/quest-d2dcopy1 _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss