On Wed, Jul 16, 2014 at 6:40 PM, Nicholas Firth <nicholas.fi...@icr.ac.uk>
wrote:

> Hi RDKitters,
>
> I might be being stupid here, but I'm trying to marry up the bitinfo from
> a hashed fingerprint to the actual fingerprint and I can't seem to do it.
>
>
> from rdkit import Chem, DataStructs
> from rdkit.Chem import rdMolDescriptors as rdMD
> info = {}
> mol = Chem.MolFromSmiles('CCCCC')
> print rdMD.GetHashedMorganFingerprint(mol, radius=2, nBits = 1024, bitInfo
> = info).GetNonzeroElements()
> print '\n',info
>
>
> {33: 2, 294: 2, 591: 2, 80: 3, 887: 1, 794: 2, 381: 1}
>
> {2246728737: ((0, 0), (4, 0)), 3542456614: ((0, 1), (4, 1)), 1685248591: ((1, 
> 2), (3, 2)), 2245384272: ((1, 0), (2, 0), (3, 0)), 1510461303: ((2, 1),), 
> 1173125914: ((1, 1), (3, 1)), 2738269565: ((2, 2),)}
>
>
>
>
> The indices on the bitinfo appear to be the unhashed values. What I'd
> expect to see it something similar to the bit vector version of this code
>

Sure enough, that's a bug.

The values are the indices for the non-hashed (really non-folded, but it's
too late to rename that function now) version of the fingerprint:

In [7]: info = {}

In [8]: print rdMD.GetMorganFingerprint(mol, radius=2, bitInfo =
info).GetNonzeroElements()
{2246728737: 2, 3542456614: 2, 1685248591: 2, 2245384272: 3, 1510461303: 1,
1173125914: 2, 2738269565: 1}

In [9]: print '\n',info

{2246728737: ((0, 0), (4, 0)), 3542456614: ((0, 1), (4, 1)), 1685248591:
((1, 2), (3, 2)), 2245384272: ((1, 0), (2, 0), (3, 0)), 1510461303: ((2,
1),), 1173125914: ((1, 1), (3, 1)), 2738269565: ((2, 2),)}


Fortunately it's easy to fix this. The bits are hashed/folded into the
smaller fingerprint using integer modulo:

In [10]: info = {}

In [11]: print rdMD.GetHashedMorganFingerprint(mol, radius=2, nBits = 1024,
bitInfo = info).GetNonzeroElements()
{33: 2, 294: 2, 591: 2, 80: 3, 887: 1, 794: 2, 381: 1}

In [12]: for k,v in info.it
info.items       info.iteritems   info.iterkeys    info.itervalues

In [12]: for k,v in info.items(): print k%1024,v
33 ((0, 0), (4, 0))
294 ((0, 1), (4, 1))
591 ((1, 2), (3, 2))
80 ((1, 0), (2, 0), (3, 0))
887 ((2, 1),)
794 ((1, 1), (3, 1))
381 ((2, 2),)


I'll fix the bug, but this workaround should hopefully cover things in the
short term.

-greg
------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to