Hi Gareth,
I'm a bit lost now...
If you look into the CPP testing code
C:\RDKit\rdkit\Code\GraphMol\Fingerprints\testFingerprintGenerators.cpp
the testing function void testMorganFP() (line 615) seems to use only the
FingerprintGenerator<std::uint32_t> *morganGenerator;
as if the 64 bit version was not maintained.
Wojtek Plonka
+48885756652
wojtekplonka.com <http://www.wojtekplonka.com>
fb.com/wojtek.plonka
On Thu, Apr 22, 2021 at 9:57 PM Gareth Jones <[email protected]> wrote:
> Hi Wojtek,
>
> Our findings are the same. There is a Morgan fingerprint generator for 64
> bits, which Python uses by default. When you call it the functions that
> actually set the bits in the 64 bit fingerprint
> (MorganFingerprints::getConnectivityInvariants and
> MorganFingerprints::getFeatureInvariants) will only ever set the first 32
> bits.
>
> So you have a 64 bit fingerprint, but only the first 32 bits are ever set.
> On 4/22/2021 12:20 PM, Wojtek Plonka wrote:
>
> Hi Gareth,
>
> Your findings are a bit contrary to mine, so the truth must be somewhere
> in between :)
> I downloaded the RDKit sources and some support for 64 bit Morgan
> Fingerprints seems to be there:
>
> Search "getMorganGenerator<std::uint64_t>" (7 hits in 4 files of 661
> searched)
> C:\RDKit\rdkit\Code\GraphMol\Fingerprints\catch_tests.cpp (1 hit)
> Line 152:
> MorganFingerprint::getMorganGenerator<std::uint64_t>(radius));
> C:\RDKit\rdkit\Code\GraphMol\Fingerprints\FingerprintGenerator.cpp (4
> hits)
> Line 461: generator =
> MorganFingerprint::getMorganGenerator<std::uint64_t>(2);
> Line 497: generator =
> MorganFingerprint::getMorganGenerator<std::uint64_t>(2);
> Line 533: generator =
> MorganFingerprint::getMorganGenerator<std::uint64_t>(2);
> Line 569: generator =
> MorganFingerprint::getMorganGenerator<std::uint64_t>(2);
> C:\RDKit\rdkit\Code\GraphMol\Fingerprints\testFingerprintGenerators.cpp
> (1 hit)
> Line 2387: MorganFingerprint::getMorganGenerator<std::uint64_t>(2),
> C:\RDKit\rdkit\Code\GraphMol\Fingerprints\Wrap\MorganWrapper.cpp (1 hit)
> Line 78: "GetMorganGenerator", getMorganGenerator<std::uint64_t>,
>
> I will have a closer look at that.
> I don't need to write my code in Python, C++ (with Google's help) is fine,
> too, as long as I can compile it with Linux tools of MSVC Community Edition.
> Maybe simply 64 bit stuff is not complete or not interfaced to Python yet?
> Thanks!
>
> Wojtek Plonka
> +48885756652
> wojtekplonka.com <http://www.wojtekplonka.com>
> fb.com/wojtek.plonka
>
>
>
> On Thu, Apr 22, 2021 at 7:17 PM Gareth Jones <[email protected]> wrote:
>
>>
>> Hi Wojtek,
>>
>> From looking at the RDKit code base my take is that is is currently not
>> possible to generate 64 bit Morgan fingerprints.
>>
>> The Python fingerprint generator defaults to 64bit:
>>
>> In [36]: fp.GetLength()
>> Out[36]: 18446744073709551615
>>
>> Unfortunately, the C++ Morgan fingerprint generator only ever sets the
>> first 32 bits even if the fingerprint is 64bit. If you look at
>> MorganFingerprints::getConnectivityInvariants and
>> MorganFingerprints::getFeatureInvariants in
>> Code/GraphMol/Fingerprints/FingerprintUtil.cpp the generated invariants
>> (that are used to set the fingerprint bits) are unsigned 32 bit ints.
>>
>> Some RDKit development would be needed to template those functions so
>> that they would work with both 32 and 64 bit fingerprints.
>> Cheers,
>>
>> Gareth
>>
>>
>> On 4/21/2021 10:10 PM, Wojtek Plonka wrote:
>>
>> Hi Gareth,
>>
>> Thank you. I do exactly as you wrote. That's not the issue.
>> Please note, that all the keys in elements are in range of 2**32 - the
>> main hash function used is definitely 32 bit
>>
>> According to
>> https://www.rdkit.org/docs/source/rdkit.Chem.rdFingerprintGenerator.html
>> both *class *rdkit.Chem.rdFingerprintGenerator.FingerprintGenerator32
>> and *class *rdkit.Chem.rdFingerprintGenerator.FingerprintGenerator64
>> exist.
>>
>> However with my limited knowledge I don't know how to access the 64 bit
>> version and that is my problem.
>> Kindest regards,
>>
>> Wojtek
>>
>> Wojtek Plonka
>> +48885756652
>> wojtekplonka.com <http://www.wojtekplonka.com>
>> fb.com/wojtek.plonka
>>
>>
>>
>> On Thu, Apr 22, 2021 at 1:27 AM Gareth Jones <[email protected]>
>> wrote:
>>
>>> Wojtek,
>>>
>>> You can use GetNonzeroelements() to convert the sparse fingerprint to a
>>> Python Dict of hash to count.
>>>
>>> Cheers,
>>> Gareth
>>>
>>>
>>> In [7]: mol = Chem.MolFromSmiles('Cn1cnc2n(C)c(=O)n(C)c(=O)c12')
>>>
>>> In [8]: fp = AllChem.GetMorganFingerprint(mol, 2)
>>>
>>> In [9]: elements = fp.GetNonzeroElements();
>>>
>>> In [10]: elements
>>> Out[10]:
>>> {10565946: 2,
>>> 348155210: 1,
>>> 476388586: 1,
>>> 540046244: 1,
>>> 553412256: 1,
>>> 864942730: 2,
>>> 909857231: 1,
>>> 1100037548: 1,
>>> 1333761024: 1,
>>> 1512818157: 1,
>>> 1981181107: 1,
>>> 2030573601: 1,
>>> 2041434490: 1,
>>> 2092489639: 3,
>>> 2246728737: 3,
>>> 2370996728: 1,
>>> 2877515035: 1,
>>> 2971716993: 1,
>>> 2975126068: 2,
>>> 3140581776: 1,
>>> 3217380708: 4,
>>> 3218693969: 1,
>>> 3462333187: 1,
>>> 3657471097: 3,
>>> 3796970912: 1}
>>>
>>> In [11]:
>>> On 4/21/2021 5:44 AM, Wojtek Plonka wrote:
>>>
>>> Dear All
>>>
>>> Do any of you have a working example of getting Morgan Fingerprints, as
>>> sparse bit vector (non-hashed) in the 64 bit version using Python?
>>> I'm looking into the issue of collisions on the "main hash" on large
>>> (100+ million molecules) data
>>> Thank you very much!
>>> Kindest regards,
>>>
>>> Wojtek Plonka
>>> +48885756652
>>> wojtekplonka.com <http://www.wojtekplonka.com>
>>> fb.com/wojtek.plonka
>>>
>>>
>>>
>>> _______________________________________________
>>> Rdkit-discuss mailing
>>> [email protected]https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
>>
>> _______________________________________________
>> Rdkit-discuss mailing
>> [email protected]https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>> _______________________________________________
>> Rdkit-discuss mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
>
> _______________________________________________
> Rdkit-discuss mailing
> [email protected]https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss