Hi Gareth,

I'm a bit lost now...
If you look into the CPP testing code
C:\RDKit\rdkit\Code\GraphMol\Fingerprints\testFingerprintGenerators.cpp
the testing function void testMorganFP() (line 615) seems to use only the
    FingerprintGenerator<std::uint32_t> *morganGenerator;
as if the 64 bit version was not maintained.

Wojtek Plonka
+48885756652
wojtekplonka.com <http://www.wojtekplonka.com>
fb.com/wojtek.plonka



On Thu, Apr 22, 2021 at 9:57 PM Gareth Jones <java.jo...@gmail.com> wrote:

> Hi Wojtek,
>
> Our findings are the same.  There is a Morgan fingerprint generator for 64
> bits, which Python uses by default.  When you call it the functions that
> actually set the bits in the 64 bit fingerprint
> (MorganFingerprints::getConnectivityInvariants and
> MorganFingerprints::getFeatureInvariants) will only ever set the first 32
> bits.
>
> So you have a 64 bit fingerprint, but only the first 32 bits are ever set.
> On 4/22/2021 12:20 PM, Wojtek Plonka wrote:
>
> Hi Gareth,
>
> Your findings are a bit contrary to mine, so the truth must be somewhere
> in between :)
> I downloaded the RDKit sources and some support for 64 bit Morgan
> Fingerprints seems to be there:
>
> Search "getMorganGenerator<std::uint64_t>" (7 hits in 4 files of 661
> searched)
>   C:\RDKit\rdkit\Code\GraphMol\Fingerprints\catch_tests.cpp (1 hit)
> Line 152:
> MorganFingerprint::getMorganGenerator<std::uint64_t>(radius));
>   C:\RDKit\rdkit\Code\GraphMol\Fingerprints\FingerprintGenerator.cpp (4
> hits)
> Line 461:       generator =
> MorganFingerprint::getMorganGenerator<std::uint64_t>(2);
> Line 497:       generator =
> MorganFingerprint::getMorganGenerator<std::uint64_t>(2);
> Line 533:       generator =
> MorganFingerprint::getMorganGenerator<std::uint64_t>(2);
> Line 569:       generator =
> MorganFingerprint::getMorganGenerator<std::uint64_t>(2);
>   C:\RDKit\rdkit\Code\GraphMol\Fingerprints\testFingerprintGenerators.cpp
> (1 hit)
> Line 2387:       MorganFingerprint::getMorganGenerator<std::uint64_t>(2),
>   C:\RDKit\rdkit\Code\GraphMol\Fingerprints\Wrap\MorganWrapper.cpp (1 hit)
> Line 78:       "GetMorganGenerator", getMorganGenerator<std::uint64_t>,
>
> I will have a closer look at that.
> I don't need to write my code in Python, C++ (with Google's help) is fine,
> too, as long as I can compile it with Linux tools of MSVC Community Edition.
> Maybe simply 64 bit stuff is not complete or not interfaced to Python yet?
> Thanks!
>
> Wojtek Plonka
> +48885756652
> wojtekplonka.com <http://www.wojtekplonka.com>
> fb.com/wojtek.plonka
>
>
>
> On Thu, Apr 22, 2021 at 7:17 PM Gareth Jones <java.jo...@gmail.com> wrote:
>
>>
>> Hi Wojtek,
>>
>> From looking at the RDKit code base my take is that is is currently not
>> possible to generate 64 bit Morgan fingerprints.
>>
>> The Python fingerprint generator defaults to 64bit:
>>
>> In [36]: fp.GetLength()
>> Out[36]: 18446744073709551615
>>
>> Unfortunately, the C++ Morgan fingerprint generator only ever sets the
>> first 32 bits even if the fingerprint is 64bit.  If you look at
>> MorganFingerprints::getConnectivityInvariants and
>> MorganFingerprints::getFeatureInvariants in
>> Code/GraphMol/Fingerprints/FingerprintUtil.cpp the generated invariants
>> (that are used to set the fingerprint bits) are unsigned 32 bit ints.
>>
>> Some RDKit development would be needed to template those functions so
>> that they would work with both 32 and 64 bit fingerprints.
>> Cheers,
>>
>> Gareth
>>
>>
>> On 4/21/2021 10:10 PM, Wojtek Plonka wrote:
>>
>> Hi Gareth,
>>
>> Thank you. I do exactly as you wrote. That's not the issue.
>> Please note, that all the keys in elements are in range of 2**32 - the
>> main hash function used is definitely 32 bit
>>
>> According to
>> https://www.rdkit.org/docs/source/rdkit.Chem.rdFingerprintGenerator.html
>> both *class *rdkit.Chem.rdFingerprintGenerator.FingerprintGenerator32
>> and *class *rdkit.Chem.rdFingerprintGenerator.FingerprintGenerator64
>> exist.
>>
>> However with my limited knowledge I don't know how to access the 64 bit
>> version and that is my problem.
>> Kindest regards,
>>
>> Wojtek
>>
>> Wojtek Plonka
>> +48885756652
>> wojtekplonka.com <http://www.wojtekplonka.com>
>> fb.com/wojtek.plonka
>>
>>
>>
>> On Thu, Apr 22, 2021 at 1:27 AM Gareth Jones <java.jo...@gmail.com>
>> wrote:
>>
>>> Wojtek,
>>>
>>> You can use GetNonzeroelements() to convert the sparse fingerprint to a
>>> Python Dict of hash to count.
>>>
>>> Cheers,
>>> Gareth
>>>
>>>
>>> In [7]: mol = Chem.MolFromSmiles('Cn1cnc2n(C)c(=O)n(C)c(=O)c12')
>>>
>>> In [8]: fp = AllChem.GetMorganFingerprint(mol, 2)
>>>
>>> In [9]: elements = fp.GetNonzeroElements();
>>>
>>> In [10]: elements
>>> Out[10]:
>>> {10565946: 2,
>>>  348155210: 1,
>>>  476388586: 1,
>>>  540046244: 1,
>>>  553412256: 1,
>>>  864942730: 2,
>>>  909857231: 1,
>>>  1100037548: 1,
>>>  1333761024: 1,
>>>  1512818157: 1,
>>>  1981181107: 1,
>>>  2030573601: 1,
>>>  2041434490: 1,
>>>  2092489639: 3,
>>>  2246728737: 3,
>>>  2370996728: 1,
>>>  2877515035: 1,
>>>  2971716993: 1,
>>>  2975126068: 2,
>>>  3140581776: 1,
>>>  3217380708: 4,
>>>  3218693969: 1,
>>>  3462333187: 1,
>>>  3657471097: 3,
>>>  3796970912: 1}
>>>
>>> In [11]:
>>> On 4/21/2021 5:44 AM, Wojtek Plonka wrote:
>>>
>>> Dear All
>>>
>>> Do any of you have a working example of getting Morgan Fingerprints, as
>>> sparse bit vector (non-hashed) in the 64 bit version using Python?
>>> I'm looking into the issue of collisions on the "main hash" on large
>>> (100+ million molecules) data
>>> Thank you very much!
>>> Kindest regards,
>>>
>>> Wojtek Plonka
>>> +48885756652
>>> wojtekplonka.com <http://www.wojtekplonka.com>
>>> fb.com/wojtek.plonka
>>>
>>>
>>>
>>> _______________________________________________
>>> Rdkit-discuss mailing 
>>> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
>>
>> _______________________________________________
>> Rdkit-discuss mailing 
>> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
>
> _______________________________________________
> Rdkit-discuss mailing 
> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to