Dear Gonzalo,

On Tue, Aug 7, 2012 at 11:50 AM, Gonzalo Colmenarejo-Sanchez
<[email protected]> wrote:
>
> I have a vendor fragmentation algorithm and I want to evaluate the presence
> of the fragments/substructures in a list of molecules with the RDKit C++
> API. In order to avoid a slow SubstructMatch comparison of n fragments x m
> molecules I first SmilesToMol the fragment, generate a fingerprint,
> calculate the Tversky similarity with the molecule fingerprint, and only if
> the value is high a SubstructMatch is run. This makes the process extremely
> fast.

An even more efficient approach to this may be to use the function
AllProbeBitsMatch(). I say "may" because though this is definitely
what you want to be doing when using a substructure fingerprint, the
current implementation of that function may be slow.

I'm not sure which fingerprint you are using, but the best performing
substructure fingerprint that the RDKit currently provides is
accessible using this function:
http://www.rdkit.org/docs/cppapi/namespaceRDKit.html#a10ca25c3dedc67b66d3fd0abc7af3133
Here's the call that's used in the postgresql cartridge, which is
    RDKit::LayeredFingerprintMol2(*mol,RDKit::substructLayers,1,4,1024);
note that at the moment, despite what the docs say, the minPath and
maxPath arguments are ignored.

>
>
> The problem I observe is that for many SmilesToMol of the substructures I’m
> getting exceptions like
>
>
>
> [10:34:46] Can't kekulize mol
>
> [10:34:46] non-ring atom 4 marked aromatic
>
>
>
> Is there a way to “force” SmilesToMol to accept the fragment SMILES so that
> a fingerprint of the fragment graph can be generated (btw, I don’t
> understand why a kekulization is performed) even when the fragment is not a
> complete molecule?

Sure, you just need to skip sanitization when you build the molecule
and then do a partial sanitization yourself. Something like this (not
tested) should probably be fine for what you're trying:

  unsigned int opThatFailed;
  RWMol *m = SmilesToMol(smi,false,false);
  MolOps:: sanitizeMol(*m,opThatFailed,
MolOps::SANITIZE_CLEANUP|MolOps::SANITIZE_PROPERTIES|MolOps::SANITIZE_SYMMRINGS);

To answer your question about why kekulization is performed: the
sanitization step converts all input molecules to kekule form and then
does aromaticity perception based on that. This ensures that the
molecule is in a consistent state.


-greg

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to