Hi Wojtek,
From looking at the RDKit code base my take is that is is currently not
possible to generate 64 bit Morgan fingerprints.
The Python fingerprint generator defaults to 64bit:
In [36]: fp.GetLength()
Out[36]: 18446744073709551615
Unfortunately, the C++ Morgan fingerprint generator only ever sets the
first 32 bits even if the fingerprint is 64bit. If you look at
MorganFingerprints::getConnectivityInvariants and
MorganFingerprints::getFeatureInvariants in
Code/GraphMol/Fingerprints/FingerprintUtil.cpp the generated invariants
(that are used to set the fingerprint bits) are unsigned 32 bit ints.
Some RDKit development would be needed to template those functions so
that they would work with both 32 and 64 bit fingerprints.
Cheers,
Gareth
On 4/21/2021 10:10 PM, Wojtek Plonka wrote:
Hi Gareth,
Thank you. I do exactly as you wrote. That's not the issue.
Please note, that all the keys in elements are in range of 2**32 - the
main hash function used is definitely 32 bit
According to
https://www.rdkit.org/docs/source/rdkit.Chem.rdFingerprintGenerator.html
<https://www.rdkit.org/docs/source/rdkit.Chem.rdFingerprintGenerator.html>
both /class /|rdkit.Chem.rdFingerprintGenerator.||FingerprintGenerator32|
and /class /|rdkit.Chem.rdFingerprintGenerator.||FingerprintGenerator64|
exist.
However with my limited knowledge I don't know how to access the 64
bit version and that is my problem.
Kindest regards,
Wojtek
Wojtek Plonka
+48885756652
wojtekplonka.com <http://www.wojtekplonka.com>
fb.com/wojtek.plonka <https://fb.com/wojtek.plonka>
On Thu, Apr 22, 2021 at 1:27 AM Gareth Jones <java.jo...@gmail.com
<mailto:java.jo...@gmail.com>> wrote:
Wojtek,
You can use GetNonzeroelements() to convert the sparse fingerprint
to a Python Dict of hash to count.
Cheers,
Gareth
In [7]: mol = Chem.MolFromSmiles('Cn1cnc2n(C)c(=O)n(C)c(=O)c12')
In [8]: fp = AllChem.GetMorganFingerprint(mol, 2)
In [9]: elements = fp.GetNonzeroElements();
In [10]: elements
Out[10]:
{10565946: 2,
348155210: 1,
476388586: 1,
540046244: 1,
553412256: 1,
864942730: 2,
909857231: 1,
1100037548: 1,
1333761024: 1,
1512818157: 1,
1981181107: 1,
2030573601: 1,
2041434490: 1,
2092489639: 3,
2246728737: 3,
2370996728: 1,
2877515035: 1,
2971716993: 1,
2975126068: 2,
3140581776: 1,
3217380708: 4,
3218693969: 1,
3462333187: 1,
3657471097: 3,
3796970912: 1}
In [11]:
On 4/21/2021 5:44 AM, Wojtek Plonka wrote:
Dear All
Do any of you have a working example of getting Morgan
Fingerprints, as sparse bit vector (non-hashed) in the 64 bit
version using Python?
I'm looking into the issue of collisions on the "main hash" on
large (100+ million molecules) data
Thank you very much!
Kindest regards,
Wojtek Plonka
+48885756652
wojtekplonka.com <http://www.wojtekplonka.com>
fb.com/wojtek.plonka <https://fb.com/wojtek.plonka>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss