Hi Scott,
On Jun 12, 2013, at 4:17 PM, Scott Dickerson <scott.h.dicker...@gsk.com> wrote:
> Hi
>
> I’ve recently started using RDKit and have been impressed with this open
> source toolkit. In particular, I’ve used the Morgan fingerprints in my
> research with great success.
I'm glad to hear it. It's always nice to hear from people actually using the
code. :-)
> I’d like to use rdkit topological fingerprints but the on bit density seems
> high. For example, for Tykerb (admittedly not a tiny molecule), with
> fpSize:1024, maxPath:7 & nBitsPerHash:2 , the bit density is ~80% (with size
> of 2048 it’s still ~50%). Using ChemAxon tools I get ~34% with the same
> settings.
>
> I’m interested to know what might cause rdkit fingerprints to be darker than,
> say, those from chemaxon and if there is some parameter that I am
> overlooking. I’m thinking, perhaps incorrectly, that these fingerprints
> would be more useful if they were less dense. Any insights or guidance would
> be greatly appreciated.
The bit density of the rdkit parameters isn't as high as it used to be when
nBitsPerHash defaulted to 4, but it is still pretty high. Having said that, 80%
is definitely not typical. I did an analysis a while ago of the average
densities of the various fingerprints that I need to dig out.
>
You can easily drop the density by close to a factor of two by changing
nBitsPerHash down to 1. This will, of course, lead to a drop in average
similarities as well. An experiment worth doing would be to apply the new
benchmarking platform we just published to see how these changes affect
similarity search performance... Another one for the todo list.
Hope this helps,
-greg
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss