Hi Gareth, Your findings are a bit contrary to mine, so the truth must be somewhere in between :) I downloaded the RDKit sources and some support for 64 bit Morgan Fingerprints seems to be there:
Search "getMorganGenerator<std::uint64_t>" (7 hits in 4 files of 661 searched) C:\RDKit\rdkit\Code\GraphMol\Fingerprints\catch_tests.cpp (1 hit) Line 152: MorganFingerprint::getMorganGenerator<std::uint64_t>(radius)); C:\RDKit\rdkit\Code\GraphMol\Fingerprints\FingerprintGenerator.cpp (4 hits) Line 461: generator = MorganFingerprint::getMorganGenerator<std::uint64_t>(2); Line 497: generator = MorganFingerprint::getMorganGenerator<std::uint64_t>(2); Line 533: generator = MorganFingerprint::getMorganGenerator<std::uint64_t>(2); Line 569: generator = MorganFingerprint::getMorganGenerator<std::uint64_t>(2); C:\RDKit\rdkit\Code\GraphMol\Fingerprints\testFingerprintGenerators.cpp (1 hit) Line 2387: MorganFingerprint::getMorganGenerator<std::uint64_t>(2), C:\RDKit\rdkit\Code\GraphMol\Fingerprints\Wrap\MorganWrapper.cpp (1 hit) Line 78: "GetMorganGenerator", getMorganGenerator<std::uint64_t>, I will have a closer look at that. I don't need to write my code in Python, C++ (with Google's help) is fine, too, as long as I can compile it with Linux tools of MSVC Community Edition. Maybe simply 64 bit stuff is not complete or not interfaced to Python yet? Thanks! Wojtek Plonka +48885756652 wojtekplonka.com <http://www.wojtekplonka.com> fb.com/wojtek.plonka On Thu, Apr 22, 2021 at 7:17 PM Gareth Jones <java.jo...@gmail.com> wrote: > > Hi Wojtek, > > From looking at the RDKit code base my take is that is is currently not > possible to generate 64 bit Morgan fingerprints. > > The Python fingerprint generator defaults to 64bit: > > In [36]: fp.GetLength() > Out[36]: 18446744073709551615 > > Unfortunately, the C++ Morgan fingerprint generator only ever sets the > first 32 bits even if the fingerprint is 64bit. If you look at > MorganFingerprints::getConnectivityInvariants and > MorganFingerprints::getFeatureInvariants in > Code/GraphMol/Fingerprints/FingerprintUtil.cpp the generated invariants > (that are used to set the fingerprint bits) are unsigned 32 bit ints. > > Some RDKit development would be needed to template those functions so that > they would work with both 32 and 64 bit fingerprints. > Cheers, > > Gareth > > > On 4/21/2021 10:10 PM, Wojtek Plonka wrote: > > Hi Gareth, > > Thank you. I do exactly as you wrote. That's not the issue. > Please note, that all the keys in elements are in range of 2**32 - the > main hash function used is definitely 32 bit > > According to > https://www.rdkit.org/docs/source/rdkit.Chem.rdFingerprintGenerator.html > both *class *rdkit.Chem.rdFingerprintGenerator.FingerprintGenerator32 > and *class *rdkit.Chem.rdFingerprintGenerator.FingerprintGenerator64 > exist. > > However with my limited knowledge I don't know how to access the 64 bit > version and that is my problem. > Kindest regards, > > Wojtek > > Wojtek Plonka > +48885756652 > wojtekplonka.com <http://www.wojtekplonka.com> > fb.com/wojtek.plonka > > > > On Thu, Apr 22, 2021 at 1:27 AM Gareth Jones <java.jo...@gmail.com> wrote: > >> Wojtek, >> >> You can use GetNonzeroelements() to convert the sparse fingerprint to a >> Python Dict of hash to count. >> >> Cheers, >> Gareth >> >> >> In [7]: mol = Chem.MolFromSmiles('Cn1cnc2n(C)c(=O)n(C)c(=O)c12') >> >> In [8]: fp = AllChem.GetMorganFingerprint(mol, 2) >> >> In [9]: elements = fp.GetNonzeroElements(); >> >> In [10]: elements >> Out[10]: >> {10565946: 2, >> 348155210: 1, >> 476388586: 1, >> 540046244: 1, >> 553412256: 1, >> 864942730: 2, >> 909857231: 1, >> 1100037548: 1, >> 1333761024: 1, >> 1512818157: 1, >> 1981181107: 1, >> 2030573601: 1, >> 2041434490: 1, >> 2092489639: 3, >> 2246728737: 3, >> 2370996728: 1, >> 2877515035: 1, >> 2971716993: 1, >> 2975126068: 2, >> 3140581776: 1, >> 3217380708: 4, >> 3218693969: 1, >> 3462333187: 1, >> 3657471097: 3, >> 3796970912: 1} >> >> In [11]: >> On 4/21/2021 5:44 AM, Wojtek Plonka wrote: >> >> Dear All >> >> Do any of you have a working example of getting Morgan Fingerprints, as >> sparse bit vector (non-hashed) in the 64 bit version using Python? >> I'm looking into the issue of collisions on the "main hash" on large >> (100+ million molecules) data >> Thank you very much! >> Kindest regards, >> >> Wojtek Plonka >> +48885756652 >> wojtekplonka.com <http://www.wojtekplonka.com> >> fb.com/wojtek.plonka >> >> >> >> _______________________________________________ >> Rdkit-discuss mailing >> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > > > _______________________________________________ > Rdkit-discuss mailing > listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss