Hi Gareth,

Your findings are a bit contrary to mine, so the truth must be somewhere in
between :)
I downloaded the RDKit sources and some support for 64 bit Morgan
Fingerprints seems to be there:

Search "getMorganGenerator<std::uint64_t>" (7 hits in 4 files of 661
searched)
  C:\RDKit\rdkit\Code\GraphMol\Fingerprints\catch_tests.cpp (1 hit)
Line 152:
MorganFingerprint::getMorganGenerator<std::uint64_t>(radius));
  C:\RDKit\rdkit\Code\GraphMol\Fingerprints\FingerprintGenerator.cpp (4
hits)
Line 461:       generator =
MorganFingerprint::getMorganGenerator<std::uint64_t>(2);
Line 497:       generator =
MorganFingerprint::getMorganGenerator<std::uint64_t>(2);
Line 533:       generator =
MorganFingerprint::getMorganGenerator<std::uint64_t>(2);
Line 569:       generator =
MorganFingerprint::getMorganGenerator<std::uint64_t>(2);
  C:\RDKit\rdkit\Code\GraphMol\Fingerprints\testFingerprintGenerators.cpp
(1 hit)
Line 2387:       MorganFingerprint::getMorganGenerator<std::uint64_t>(2),
  C:\RDKit\rdkit\Code\GraphMol\Fingerprints\Wrap\MorganWrapper.cpp (1 hit)
Line 78:       "GetMorganGenerator", getMorganGenerator<std::uint64_t>,

I will have a closer look at that.
I don't need to write my code in Python, C++ (with Google's help) is fine,
too, as long as I can compile it with Linux tools of MSVC Community Edition.
Maybe simply 64 bit stuff is not complete or not interfaced to Python yet?
Thanks!

Wojtek Plonka
+48885756652
wojtekplonka.com <http://www.wojtekplonka.com>
fb.com/wojtek.plonka



On Thu, Apr 22, 2021 at 7:17 PM Gareth Jones <java.jo...@gmail.com> wrote:

>
> Hi Wojtek,
>
> From looking at the RDKit code base my take is that is is currently not
> possible to generate 64 bit Morgan fingerprints.
>
> The Python fingerprint generator defaults to 64bit:
>
> In [36]: fp.GetLength()
> Out[36]: 18446744073709551615
>
> Unfortunately, the C++ Morgan fingerprint generator only ever sets the
> first 32 bits even if the fingerprint is 64bit.  If you look at
> MorganFingerprints::getConnectivityInvariants and
> MorganFingerprints::getFeatureInvariants in
> Code/GraphMol/Fingerprints/FingerprintUtil.cpp the generated invariants
> (that are used to set the fingerprint bits) are unsigned 32 bit ints.
>
> Some RDKit development would be needed to template those functions so that
> they would work with both 32 and 64 bit fingerprints.
> Cheers,
>
> Gareth
>
>
> On 4/21/2021 10:10 PM, Wojtek Plonka wrote:
>
> Hi Gareth,
>
> Thank you. I do exactly as you wrote. That's not the issue.
> Please note, that all the keys in elements are in range of 2**32 - the
> main hash function used is definitely 32 bit
>
> According to
> https://www.rdkit.org/docs/source/rdkit.Chem.rdFingerprintGenerator.html
> both *class *rdkit.Chem.rdFingerprintGenerator.FingerprintGenerator32
> and *class *rdkit.Chem.rdFingerprintGenerator.FingerprintGenerator64
> exist.
>
> However with my limited knowledge I don't know how to access the 64 bit
> version and that is my problem.
> Kindest regards,
>
> Wojtek
>
> Wojtek Plonka
> +48885756652
> wojtekplonka.com <http://www.wojtekplonka.com>
> fb.com/wojtek.plonka
>
>
>
> On Thu, Apr 22, 2021 at 1:27 AM Gareth Jones <java.jo...@gmail.com> wrote:
>
>> Wojtek,
>>
>> You can use GetNonzeroelements() to convert the sparse fingerprint to a
>> Python Dict of hash to count.
>>
>> Cheers,
>> Gareth
>>
>>
>> In [7]: mol = Chem.MolFromSmiles('Cn1cnc2n(C)c(=O)n(C)c(=O)c12')
>>
>> In [8]: fp = AllChem.GetMorganFingerprint(mol, 2)
>>
>> In [9]: elements = fp.GetNonzeroElements();
>>
>> In [10]: elements
>> Out[10]:
>> {10565946: 2,
>>  348155210: 1,
>>  476388586: 1,
>>  540046244: 1,
>>  553412256: 1,
>>  864942730: 2,
>>  909857231: 1,
>>  1100037548: 1,
>>  1333761024: 1,
>>  1512818157: 1,
>>  1981181107: 1,
>>  2030573601: 1,
>>  2041434490: 1,
>>  2092489639: 3,
>>  2246728737: 3,
>>  2370996728: 1,
>>  2877515035: 1,
>>  2971716993: 1,
>>  2975126068: 2,
>>  3140581776: 1,
>>  3217380708: 4,
>>  3218693969: 1,
>>  3462333187: 1,
>>  3657471097: 3,
>>  3796970912: 1}
>>
>> In [11]:
>> On 4/21/2021 5:44 AM, Wojtek Plonka wrote:
>>
>> Dear All
>>
>> Do any of you have a working example of getting Morgan Fingerprints, as
>> sparse bit vector (non-hashed) in the 64 bit version using Python?
>> I'm looking into the issue of collisions on the "main hash" on large
>> (100+ million molecules) data
>> Thank you very much!
>> Kindest regards,
>>
>> Wojtek Plonka
>> +48885756652
>> wojtekplonka.com <http://www.wojtekplonka.com>
>> fb.com/wojtek.plonka
>>
>>
>>
>> _______________________________________________
>> Rdkit-discuss mailing 
>> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
>
> _______________________________________________
> Rdkit-discuss mailing 
> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to