Wojtek,
You can use GetNonzeroelements() to convert the sparse fingerprint to a
Python Dict of hash to count.
Cheers,
Gareth
In [7]: mol = Chem.MolFromSmiles('Cn1cnc2n(C)c(=O)n(C)c(=O)c12')
In [8]: fp = AllChem.GetMorganFingerprint(mol, 2)
In [9]: elements = fp.GetNonzeroElements();
In [10]: elements
Out[10]:
{10565946: 2,
348155210: 1,
476388586: 1,
540046244: 1,
553412256: 1,
864942730: 2,
909857231: 1,
1100037548: 1,
1333761024: 1,
1512818157: 1,
1981181107: 1,
2030573601: 1,
2041434490: 1,
2092489639: 3,
2246728737: 3,
2370996728: 1,
2877515035: 1,
2971716993: 1,
2975126068: 2,
3140581776: 1,
3217380708: 4,
3218693969: 1,
3462333187: 1,
3657471097: 3,
3796970912: 1}
In [11]:
On 4/21/2021 5:44 AM, Wojtek Plonka wrote:
Dear All
Do any of you have a working example of getting Morgan Fingerprints,
as sparse bit vector (non-hashed) in the 64 bit version using Python?
I'm looking into the issue of collisions on the "main hash" on large
(100+ million molecules) data
Thank you very much!
Kindest regards,
Wojtek Plonka
+48885756652
wojtekplonka.com <http://www.wojtekplonka.com>
fb.com/wojtek.plonka <https://fb.com/wojtek.plonka>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss