Hi Wojtek,

Yes.  I don't want to speak for the developer(s) of the Morgan fingerprint code, but I don't think that 64 bit support is there.

If you add the function below to the testFingerprintGenerators.cpp then debug, you can see that you create a 64 bit fingerprint but only end up setting the first 32 bits through the Morgan invariant functions.  This is what happens in Python where 64 bit fingerprints are created by default.


void testMorgan64FP() {

  BOOST_LOG(rdErrorLog) << "-------------------------------------" << std::endl;
  BOOST_LOG(rdErrorLog) << "    Test Morgan 64 Fingerprints." << std::endl;

  auto mol = SmilesToMol("Cn1cnc2n(C)c(=O)n(C)c(=O)c12");
  auto morganGenerator = MorganFingerprint::getMorganGenerator<std::uint64_t>(3);
  auto fp = morganGenerator->getSparseCountFingerprint(*mol);
  fp->getNonzeroElements();

  delete fp;
  delete morganGenerator;
  delete mol;
}


On 4/22/2021 2:06 PM, Wojtek Plonka wrote:
Hi Gareth,

I'm a bit lost now...
If you look into the CPP testing code
C:\RDKit\rdkit\Code\GraphMol\Fingerprints\testFingerprintGenerators.cpp
the testing function void testMorganFP() (line 615) seems to use only the
    FingerprintGenerator<std::uint32_t> *morganGenerator;
as if the 64 bit version was not maintained.

Wojtek Plonka
+48885756652
wojtekplonka.com <http://www.wojtekplonka.com>
fb.com/wojtek.plonka <https://fb.com/wojtek.plonka>



On Thu, Apr 22, 2021 at 9:57 PM Gareth Jones <java.jo...@gmail.com <mailto:java.jo...@gmail.com>> wrote:

    Hi Wojtek,

    Our findings are the same.  There is a Morgan fingerprint
    generator for 64 bits, which Python uses by default.  When you
    call it the functions that actually set the bits in the 64 bit
    fingerprint (MorganFingerprints::getConnectivityInvariants and
    MorganFingerprints::getFeatureInvariants) will only ever set the
    first 32 bits.

    So you have a 64 bit fingerprint, but only the first 32 bits are
    ever set.

    On 4/22/2021 12:20 PM, Wojtek Plonka wrote:
    Hi Gareth,

    Your findings are a bit contrary to mine, so the truth must be
    somewhere in between :)
    I downloaded the RDKit sources and some support for 64 bit Morgan
    Fingerprints seems to be there:

    Search "getMorganGenerator<std::uint64_t>" (7 hits in 4 files of
    661 searched)
    C:\RDKit\rdkit\Code\GraphMol\Fingerprints\catch_tests.cpp (1 hit)
    Line 152:
    MorganFingerprint::getMorganGenerator<std::uint64_t>(radius));
    C:\RDKit\rdkit\Code\GraphMol\Fingerprints\FingerprintGenerator.cpp
    (4 hits)
    Line 461:       generator =
    MorganFingerprint::getMorganGenerator<std::uint64_t>(2);
    Line 497:       generator =
    MorganFingerprint::getMorganGenerator<std::uint64_t>(2);
    Line 533:       generator =
    MorganFingerprint::getMorganGenerator<std::uint64_t>(2);
    Line 569:       generator =
    MorganFingerprint::getMorganGenerator<std::uint64_t>(2);
    C:\RDKit\rdkit\Code\GraphMol\Fingerprints\testFingerprintGenerators.cpp
    (1 hit)
    Line 2387: MorganFingerprint::getMorganGenerator<std::uint64_t>(2),
    C:\RDKit\rdkit\Code\GraphMol\Fingerprints\Wrap\MorganWrapper.cpp
    (1 hit)
    Line 78:       "GetMorganGenerator",
    getMorganGenerator<std::uint64_t>,

    I will have a closer look at that.
    I don't need to write my code in Python, C++ (with Google's help)
    is fine, too, as long as I can compile it with Linux tools
    of MSVC Community Edition.
    Maybe simply 64 bit stuff is not complete or not interfaced to
    Python yet?
    Thanks!

    Wojtek Plonka
    +48885756652
    wojtekplonka.com <http://www.wojtekplonka.com>
    fb.com/wojtek.plonka <https://fb.com/wojtek.plonka>



    On Thu, Apr 22, 2021 at 7:17 PM Gareth Jones
    <java.jo...@gmail.com <mailto:java.jo...@gmail.com>> wrote:


        Hi Wojtek,

        From looking at the RDKit code base my take is that is is
        currently not possible to generate 64 bit Morgan fingerprints.

        The Python fingerprint generator defaults to 64bit:

        In [36]: fp.GetLength()
        Out[36]: 18446744073709551615

        Unfortunately, the C++ Morgan fingerprint generator only ever
sets the first 32 bits even if the fingerprint is 64bit. 
If
        you look at MorganFingerprints::getConnectivityInvariants and
        MorganFingerprints::getFeatureInvariants in
        Code/GraphMol/Fingerprints/FingerprintUtil.cpp the generated
        invariants (that are used to set the fingerprint bits) are
        unsigned 32 bit ints.

        Some RDKit development would be needed to template those
        functions so that they would work with both 32 and 64 bit
        fingerprints.

        Cheers,

        Gareth


        On 4/21/2021 10:10 PM, Wojtek Plonka wrote:
        Hi Gareth,

        Thank you. I do exactly as you wrote. That's not the issue.
        Please note, that all the keys in elements are in range of
        2**32 - the main hash function used is definitely 32 bit

        According to
        https://www.rdkit.org/docs/source/rdkit.Chem.rdFingerprintGenerator.html
        
<https://www.rdkit.org/docs/source/rdkit.Chem.rdFingerprintGenerator.html>
        both /class
        /|rdkit.Chem.rdFingerprintGenerator.||FingerprintGenerator32|
        and /class
        /|rdkit.Chem.rdFingerprintGenerator.||FingerprintGenerator64|
        exist.

        However with my limited knowledge I don't know how to access
        the 64 bit version and that is my problem.
        Kindest regards,

        Wojtek

        Wojtek Plonka
        +48885756652
        wojtekplonka.com <http://www.wojtekplonka.com>
        fb.com/wojtek.plonka <https://fb.com/wojtek.plonka>



        On Thu, Apr 22, 2021 at 1:27 AM Gareth Jones
        <java.jo...@gmail.com <mailto:java.jo...@gmail.com>> wrote:

            Wojtek,

            You can use GetNonzeroelements() to convert the sparse
            fingerprint to a Python Dict of hash to count.

            Cheers,
            Gareth


            In [7]: mol =
            Chem.MolFromSmiles('Cn1cnc2n(C)c(=O)n(C)c(=O)c12')

            In [8]: fp = AllChem.GetMorganFingerprint(mol, 2)

            In [9]: elements = fp.GetNonzeroElements();

            In [10]: elements
            Out[10]:
            {10565946: 2,
             348155210: 1,
             476388586: 1,
             540046244: 1,
             553412256: 1,
             864942730: 2,
             909857231: 1,
             1100037548: 1,
             1333761024: 1,
             1512818157: 1,
             1981181107: 1,
             2030573601: 1,
             2041434490: 1,
             2092489639: 3,
             2246728737: 3,
             2370996728: 1,
             2877515035: 1,
             2971716993: 1,
             2975126068: 2,
             3140581776: 1,
             3217380708: 4,
             3218693969: 1,
             3462333187: 1,
             3657471097: 3,
             3796970912: 1}

            In [11]:

            On 4/21/2021 5:44 AM, Wojtek Plonka wrote:
            Dear All

            Do any of you have a working example of getting Morgan
            Fingerprints, as sparse bit vector (non-hashed) in the
            64 bit version using Python?
            I'm looking into the issue of collisions on the "main
            hash" on large (100+ million molecules) data
            Thank you very much!
            Kindest regards,

            Wojtek Plonka
            +48885756652
            wojtekplonka.com <http://www.wojtekplonka.com>
            fb.com/wojtek.plonka <https://fb.com/wojtek.plonka>



            _______________________________________________
            Rdkit-discuss mailing list
            Rdkit-discuss@lists.sourceforge.net  
<mailto:Rdkit-discuss@lists.sourceforge.net>
            https://lists.sourceforge.net/lists/listinfo/rdkit-discuss  
<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
            _______________________________________________
            Rdkit-discuss mailing list
            Rdkit-discuss@lists.sourceforge.net
            <mailto:Rdkit-discuss@lists.sourceforge.net>
            https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
            <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>



        _______________________________________________
        Rdkit-discuss mailing list
        Rdkit-discuss@lists.sourceforge.net  
<mailto:Rdkit-discuss@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/rdkit-discuss  
<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
        _______________________________________________
        Rdkit-discuss mailing list
        Rdkit-discuss@lists.sourceforge.net
        <mailto:Rdkit-discuss@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
        <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>



    _______________________________________________
    Rdkit-discuss mailing list
    Rdkit-discuss@lists.sourceforge.net  
<mailto:Rdkit-discuss@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/rdkit-discuss  
<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
    _______________________________________________
    Rdkit-discuss mailing list
    Rdkit-discuss@lists.sourceforge.net
    <mailto:Rdkit-discuss@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
    <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>



_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to