Hi Lewis,

The Morgan atom environments are hashed into an unsigned 32bit int, so the
maximum value is 2^32 -1

-greg

On Fri, Oct 9, 2020 at 1:18 AM Lewis Martin <lewis.marti...@gmail.com>
wrote:

> Hi all,
> Felt sure this would have been asked but I can't find it. What is the
> 'largest' possible bit in an unfolded Morgan fingerprint? Asked another
> way, what type of number are the substructure identities hashed into?
>
> The Rogers and Hahn ECFP paper says that they hash into a 32-bit integer,
> and in the paper they use negative and positive values.
>
> Since hashing generates bits with mostly uniform density, I tried sampling
> some fingerprints. Testing a few hundred thousand molecules, the largest
> bit I found was suspiciously close to 2X larger than the maximum
> expressible number for a 32-bit integer. So I guess that, to be consistent
> with Rogers and Hahn the bits are hashed into 32-bit integers, but then
> they are shifted to be positive? Is that correct?
>
> Thanks :) hope the UGM went well.
> Lewis
>
> PS context is I saw a weird result where prediction scores kept getting
> higher when I used larger fingerprints. At size 8192 I ran out of memory,
> so I'm moving to sparse representation (possibly unfolded) but I don't know
> how big the sparse matrix should be.
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to