Dear Gonzalo, On Tue, Aug 7, 2012 at 11:50 AM, Gonzalo Colmenarejo-Sanchez <[email protected]> wrote: > > I have a vendor fragmentation algorithm and I want to evaluate the presence > of the fragments/substructures in a list of molecules with the RDKit C++ > API. In order to avoid a slow SubstructMatch comparison of n fragments x m > molecules I first SmilesToMol the fragment, generate a fingerprint, > calculate the Tversky similarity with the molecule fingerprint, and only if > the value is high a SubstructMatch is run. This makes the process extremely > fast.
An even more efficient approach to this may be to use the function AllProbeBitsMatch(). I say "may" because though this is definitely what you want to be doing when using a substructure fingerprint, the current implementation of that function may be slow. I'm not sure which fingerprint you are using, but the best performing substructure fingerprint that the RDKit currently provides is accessible using this function: http://www.rdkit.org/docs/cppapi/namespaceRDKit.html#a10ca25c3dedc67b66d3fd0abc7af3133 Here's the call that's used in the postgresql cartridge, which is RDKit::LayeredFingerprintMol2(*mol,RDKit::substructLayers,1,4,1024); note that at the moment, despite what the docs say, the minPath and maxPath arguments are ignored. > > > The problem I observe is that for many SmilesToMol of the substructures I’m > getting exceptions like > > > > [10:34:46] Can't kekulize mol > > [10:34:46] non-ring atom 4 marked aromatic > > > > Is there a way to “force” SmilesToMol to accept the fragment SMILES so that > a fingerprint of the fragment graph can be generated (btw, I don’t > understand why a kekulization is performed) even when the fragment is not a > complete molecule? Sure, you just need to skip sanitization when you build the molecule and then do a partial sanitization yourself. Something like this (not tested) should probably be fine for what you're trying: unsigned int opThatFailed; RWMol *m = SmilesToMol(smi,false,false); MolOps:: sanitizeMol(*m,opThatFailed, MolOps::SANITIZE_CLEANUP|MolOps::SANITIZE_PROPERTIES|MolOps::SANITIZE_SYMMRINGS); To answer your question about why kekulization is performed: the sanitization step converts all input molecules to kekule form and then does aromaticity perception based on that. This ensures that the molecule is in a consistent state. -greg ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

