Hi Wojtek,
Yes. I don't want to speak for the developer(s) of the Morgan
fingerprint code, but I don't think that 64 bit support is there.
If you add the function below to the testFingerprintGenerators.cpp then
debug, you can see that you create a 64 bit fingerprint but only end up
setting the first 32 bits through the Morgan invariant functions. This
is what happens in Python where 64 bit fingerprints are created by default.
void testMorgan64FP() {
BOOST_LOG(rdErrorLog) << "-------------------------------------" <<
std::endl;
BOOST_LOG(rdErrorLog) << " Test Morgan 64 Fingerprints." << std::endl;
auto mol = SmilesToMol("Cn1cnc2n(C)c(=O)n(C)c(=O)c12");
auto morganGenerator =
MorganFingerprint::getMorganGenerator<std::uint64_t>(3);
auto fp = morganGenerator->getSparseCountFingerprint(*mol);
fp->getNonzeroElements();
delete fp;
delete morganGenerator;
delete mol;
}
On 4/22/2021 2:06 PM, Wojtek Plonka wrote:
Hi Gareth,
I'm a bit lost now...
If you look into the CPP testing code
C:\RDKit\rdkit\Code\GraphMol\Fingerprints\testFingerprintGenerators.cpp
the testing function void testMorganFP() (line 615) seems to use only the
FingerprintGenerator<std::uint32_t> *morganGenerator;
as if the 64 bit version was not maintained.
Wojtek Plonka
+48885756652
wojtekplonka.com <http://www.wojtekplonka.com>
fb.com/wojtek.plonka <https://fb.com/wojtek.plonka>
On Thu, Apr 22, 2021 at 9:57 PM Gareth Jones <java.jo...@gmail.com
<mailto:java.jo...@gmail.com>> wrote:
Hi Wojtek,
Our findings are the same. There is a Morgan fingerprint
generator for 64 bits, which Python uses by default. When you
call it the functions that actually set the bits in the 64 bit
fingerprint (MorganFingerprints::getConnectivityInvariants and
MorganFingerprints::getFeatureInvariants) will only ever set the
first 32 bits.
So you have a 64 bit fingerprint, but only the first 32 bits are
ever set.
On 4/22/2021 12:20 PM, Wojtek Plonka wrote:
Hi Gareth,
Your findings are a bit contrary to mine, so the truth must be
somewhere in between :)
I downloaded the RDKit sources and some support for 64 bit Morgan
Fingerprints seems to be there:
Search "getMorganGenerator<std::uint64_t>" (7 hits in 4 files of
661 searched)
C:\RDKit\rdkit\Code\GraphMol\Fingerprints\catch_tests.cpp (1 hit)
Line 152:
MorganFingerprint::getMorganGenerator<std::uint64_t>(radius));
C:\RDKit\rdkit\Code\GraphMol\Fingerprints\FingerprintGenerator.cpp
(4 hits)
Line 461: generator =
MorganFingerprint::getMorganGenerator<std::uint64_t>(2);
Line 497: generator =
MorganFingerprint::getMorganGenerator<std::uint64_t>(2);
Line 533: generator =
MorganFingerprint::getMorganGenerator<std::uint64_t>(2);
Line 569: generator =
MorganFingerprint::getMorganGenerator<std::uint64_t>(2);
C:\RDKit\rdkit\Code\GraphMol\Fingerprints\testFingerprintGenerators.cpp
(1 hit)
Line 2387: MorganFingerprint::getMorganGenerator<std::uint64_t>(2),
C:\RDKit\rdkit\Code\GraphMol\Fingerprints\Wrap\MorganWrapper.cpp
(1 hit)
Line 78: "GetMorganGenerator",
getMorganGenerator<std::uint64_t>,
I will have a closer look at that.
I don't need to write my code in Python, C++ (with Google's help)
is fine, too, as long as I can compile it with Linux tools
of MSVC Community Edition.
Maybe simply 64 bit stuff is not complete or not interfaced to
Python yet?
Thanks!
Wojtek Plonka
+48885756652
wojtekplonka.com <http://www.wojtekplonka.com>
fb.com/wojtek.plonka <https://fb.com/wojtek.plonka>
On Thu, Apr 22, 2021 at 7:17 PM Gareth Jones
<java.jo...@gmail.com <mailto:java.jo...@gmail.com>> wrote:
Hi Wojtek,
From looking at the RDKit code base my take is that is is
currently not possible to generate 64 bit Morgan fingerprints.
The Python fingerprint generator defaults to 64bit:
In [36]: fp.GetLength()
Out[36]: 18446744073709551615
Unfortunately, the C++ Morgan fingerprint generator only ever
sets the first 32 bits even if the fingerprint is 64bit.
If
you look at MorganFingerprints::getConnectivityInvariants and
MorganFingerprints::getFeatureInvariants in
Code/GraphMol/Fingerprints/FingerprintUtil.cpp the generated
invariants (that are used to set the fingerprint bits) are
unsigned 32 bit ints.
Some RDKit development would be needed to template those
functions so that they would work with both 32 and 64 bit
fingerprints.
Cheers,
Gareth
On 4/21/2021 10:10 PM, Wojtek Plonka wrote:
Hi Gareth,
Thank you. I do exactly as you wrote. That's not the issue.
Please note, that all the keys in elements are in range of
2**32 - the main hash function used is definitely 32 bit
According to
https://www.rdkit.org/docs/source/rdkit.Chem.rdFingerprintGenerator.html
<https://www.rdkit.org/docs/source/rdkit.Chem.rdFingerprintGenerator.html>
both /class
/|rdkit.Chem.rdFingerprintGenerator.||FingerprintGenerator32|
and /class
/|rdkit.Chem.rdFingerprintGenerator.||FingerprintGenerator64|
exist.
However with my limited knowledge I don't know how to access
the 64 bit version and that is my problem.
Kindest regards,
Wojtek
Wojtek Plonka
+48885756652
wojtekplonka.com <http://www.wojtekplonka.com>
fb.com/wojtek.plonka <https://fb.com/wojtek.plonka>
On Thu, Apr 22, 2021 at 1:27 AM Gareth Jones
<java.jo...@gmail.com <mailto:java.jo...@gmail.com>> wrote:
Wojtek,
You can use GetNonzeroelements() to convert the sparse
fingerprint to a Python Dict of hash to count.
Cheers,
Gareth
In [7]: mol =
Chem.MolFromSmiles('Cn1cnc2n(C)c(=O)n(C)c(=O)c12')
In [8]: fp = AllChem.GetMorganFingerprint(mol, 2)
In [9]: elements = fp.GetNonzeroElements();
In [10]: elements
Out[10]:
{10565946: 2,
348155210: 1,
476388586: 1,
540046244: 1,
553412256: 1,
864942730: 2,
909857231: 1,
1100037548: 1,
1333761024: 1,
1512818157: 1,
1981181107: 1,
2030573601: 1,
2041434490: 1,
2092489639: 3,
2246728737: 3,
2370996728: 1,
2877515035: 1,
2971716993: 1,
2975126068: 2,
3140581776: 1,
3217380708: 4,
3218693969: 1,
3462333187: 1,
3657471097: 3,
3796970912: 1}
In [11]:
On 4/21/2021 5:44 AM, Wojtek Plonka wrote:
Dear All
Do any of you have a working example of getting Morgan
Fingerprints, as sparse bit vector (non-hashed) in the
64 bit version using Python?
I'm looking into the issue of collisions on the "main
hash" on large (100+ million molecules) data
Thank you very much!
Kindest regards,
Wojtek Plonka
+48885756652
wojtekplonka.com <http://www.wojtekplonka.com>
fb.com/wojtek.plonka <https://fb.com/wojtek.plonka>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss