Hi,

By default the new fingerprint generators do "count simulation": adding
extra bits to a bit vector fingerprint in order to get bit-vector
similarities that are more similar to count-vector similarities.
You can turn this off by passing the useCountSimulation=False argument to
GetMorganGenerator().

Two comments about your sample code:
1) 256 bits is really not very many for a Morgan fingerprint. Maybe you
were just using the small number for this question, but if you are really
using fingerprints that short you should be aware that you are going to
have a lot of collisions (blog post on this here:
http://rdkit.blogspot.com/2016/02/colliding-bits-iii.html)
2) In case you aren't aware of it: you can calculate similarities and do
fingerprint stats a lot more simply with builtin code like the
GetNumOnBits() method on bit vectors and the similarity calculation code
in  rdkit.DataStructs. Take a look at DataStructs.DiceSimilarity()

Hope this helps,
-greg



On Wed, Jul 10, 2019 at 3:53 AM Lewis Martin <lewis.marti...@gmail.com>
wrote:

> Hi all,
> Quick question on truncated fingerprints, any help is really appreciated.
>
>
> I think I've missed a trick on how the new fingerprint generator works. I
> thought the below should produce equivalent fingerprints but they are
> totally different. Has the implementation changed, or maybe I'm getting the
> kwargs incorrect? See below code or this link for a quick visual:
> https://github.com/ljmartin/snippets/blob/master/truncated_fingerprints.ipynb
> Thanks !
>
> import rdkit
> from rdkit import Chem
> from rdkit.Chem import Draw, AllChem
> from rdkit.Chem import rdFingerprintGenerator
> from rdkit.Chem.Draw import IPythonConsole
> import numpy as np
> from scipy.spatial import distance
>
> mol = Chem.MolFromSmiles('CN1C(=O)CN=C(C2=C1C=CC(=C2)Cl)C3=CC=CC=C3')
> #diazepam
>
> gen_mo = rdFingerprintGenerator.GetMorganGenerator(fpSize=256, radius=2)
> a = gen_mo.GetFingerprint(mol)
> b = AllChem.GetMorganFingerprintAsBitVect(mol,2,256,useFeatures=False)
> a_f = [int(i) for i in a.ToBitString()]
> b_f = [int(i) for i in b.ToBitString()]
> print('NumBits a: %s, NumBits b: %s' % (np.sum(a_f), np.sum(b_f)))
> print('Dice Distance %s' % distance.dice(a_f,b_f))
>
>
> NumBits a: 47, NumBits b: 38
> Dice Distance 0.9058823529411765
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to