Hi Gareth, I'm a bit lost now... If you look into the CPP testing code C:\RDKit\rdkit\Code\GraphMol\Fingerprints\testFingerprintGenerators.cpp the testing function void testMorganFP() (line 615) seems to use only the FingerprintGenerator<std::uint32_t> *morganGenerator; as if the 64 bit version was not maintained.
Wojtek Plonka +48885756652 wojtekplonka.com <http://www.wojtekplonka.com> fb.com/wojtek.plonka On Thu, Apr 22, 2021 at 9:57 PM Gareth Jones <java.jo...@gmail.com> wrote: > Hi Wojtek, > > Our findings are the same. There is a Morgan fingerprint generator for 64 > bits, which Python uses by default. When you call it the functions that > actually set the bits in the 64 bit fingerprint > (MorganFingerprints::getConnectivityInvariants and > MorganFingerprints::getFeatureInvariants) will only ever set the first 32 > bits. > > So you have a 64 bit fingerprint, but only the first 32 bits are ever set. > On 4/22/2021 12:20 PM, Wojtek Plonka wrote: > > Hi Gareth, > > Your findings are a bit contrary to mine, so the truth must be somewhere > in between :) > I downloaded the RDKit sources and some support for 64 bit Morgan > Fingerprints seems to be there: > > Search "getMorganGenerator<std::uint64_t>" (7 hits in 4 files of 661 > searched) > C:\RDKit\rdkit\Code\GraphMol\Fingerprints\catch_tests.cpp (1 hit) > Line 152: > MorganFingerprint::getMorganGenerator<std::uint64_t>(radius)); > C:\RDKit\rdkit\Code\GraphMol\Fingerprints\FingerprintGenerator.cpp (4 > hits) > Line 461: generator = > MorganFingerprint::getMorganGenerator<std::uint64_t>(2); > Line 497: generator = > MorganFingerprint::getMorganGenerator<std::uint64_t>(2); > Line 533: generator = > MorganFingerprint::getMorganGenerator<std::uint64_t>(2); > Line 569: generator = > MorganFingerprint::getMorganGenerator<std::uint64_t>(2); > C:\RDKit\rdkit\Code\GraphMol\Fingerprints\testFingerprintGenerators.cpp > (1 hit) > Line 2387: MorganFingerprint::getMorganGenerator<std::uint64_t>(2), > C:\RDKit\rdkit\Code\GraphMol\Fingerprints\Wrap\MorganWrapper.cpp (1 hit) > Line 78: "GetMorganGenerator", getMorganGenerator<std::uint64_t>, > > I will have a closer look at that. > I don't need to write my code in Python, C++ (with Google's help) is fine, > too, as long as I can compile it with Linux tools of MSVC Community Edition. > Maybe simply 64 bit stuff is not complete or not interfaced to Python yet? > Thanks! > > Wojtek Plonka > +48885756652 > wojtekplonka.com <http://www.wojtekplonka.com> > fb.com/wojtek.plonka > > > > On Thu, Apr 22, 2021 at 7:17 PM Gareth Jones <java.jo...@gmail.com> wrote: > >> >> Hi Wojtek, >> >> From looking at the RDKit code base my take is that is is currently not >> possible to generate 64 bit Morgan fingerprints. >> >> The Python fingerprint generator defaults to 64bit: >> >> In [36]: fp.GetLength() >> Out[36]: 18446744073709551615 >> >> Unfortunately, the C++ Morgan fingerprint generator only ever sets the >> first 32 bits even if the fingerprint is 64bit. If you look at >> MorganFingerprints::getConnectivityInvariants and >> MorganFingerprints::getFeatureInvariants in >> Code/GraphMol/Fingerprints/FingerprintUtil.cpp the generated invariants >> (that are used to set the fingerprint bits) are unsigned 32 bit ints. >> >> Some RDKit development would be needed to template those functions so >> that they would work with both 32 and 64 bit fingerprints. >> Cheers, >> >> Gareth >> >> >> On 4/21/2021 10:10 PM, Wojtek Plonka wrote: >> >> Hi Gareth, >> >> Thank you. I do exactly as you wrote. That's not the issue. >> Please note, that all the keys in elements are in range of 2**32 - the >> main hash function used is definitely 32 bit >> >> According to >> https://www.rdkit.org/docs/source/rdkit.Chem.rdFingerprintGenerator.html >> both *class *rdkit.Chem.rdFingerprintGenerator.FingerprintGenerator32 >> and *class *rdkit.Chem.rdFingerprintGenerator.FingerprintGenerator64 >> exist. >> >> However with my limited knowledge I don't know how to access the 64 bit >> version and that is my problem. >> Kindest regards, >> >> Wojtek >> >> Wojtek Plonka >> +48885756652 >> wojtekplonka.com <http://www.wojtekplonka.com> >> fb.com/wojtek.plonka >> >> >> >> On Thu, Apr 22, 2021 at 1:27 AM Gareth Jones <java.jo...@gmail.com> >> wrote: >> >>> Wojtek, >>> >>> You can use GetNonzeroelements() to convert the sparse fingerprint to a >>> Python Dict of hash to count. >>> >>> Cheers, >>> Gareth >>> >>> >>> In [7]: mol = Chem.MolFromSmiles('Cn1cnc2n(C)c(=O)n(C)c(=O)c12') >>> >>> In [8]: fp = AllChem.GetMorganFingerprint(mol, 2) >>> >>> In [9]: elements = fp.GetNonzeroElements(); >>> >>> In [10]: elements >>> Out[10]: >>> {10565946: 2, >>> 348155210: 1, >>> 476388586: 1, >>> 540046244: 1, >>> 553412256: 1, >>> 864942730: 2, >>> 909857231: 1, >>> 1100037548: 1, >>> 1333761024: 1, >>> 1512818157: 1, >>> 1981181107: 1, >>> 2030573601: 1, >>> 2041434490: 1, >>> 2092489639: 3, >>> 2246728737: 3, >>> 2370996728: 1, >>> 2877515035: 1, >>> 2971716993: 1, >>> 2975126068: 2, >>> 3140581776: 1, >>> 3217380708: 4, >>> 3218693969: 1, >>> 3462333187: 1, >>> 3657471097: 3, >>> 3796970912: 1} >>> >>> In [11]: >>> On 4/21/2021 5:44 AM, Wojtek Plonka wrote: >>> >>> Dear All >>> >>> Do any of you have a working example of getting Morgan Fingerprints, as >>> sparse bit vector (non-hashed) in the 64 bit version using Python? >>> I'm looking into the issue of collisions on the "main hash" on large >>> (100+ million molecules) data >>> Thank you very much! >>> Kindest regards, >>> >>> Wojtek Plonka >>> +48885756652 >>> wojtekplonka.com <http://www.wojtekplonka.com> >>> fb.com/wojtek.plonka >>> >>> >>> >>> _______________________________________________ >>> Rdkit-discuss mailing >>> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >>> _______________________________________________ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >> >> >> _______________________________________________ >> Rdkit-discuss mailing >> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > > > _______________________________________________ > Rdkit-discuss mailing > listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss