Hello Francois,

I am trying to replicate some of the functionality of
CreateDifferenceFingerprintForReaction [Ref 1] for my own understanding on
how the code works. The function CreateDifferenceFingerprintForReaction
allows for three difference fingerprint representation of the molecules:
AtomPair, Morgan, and TopologicalTorsion [Ref 2]. All three are count
vectors [Ref 3], and the function allows for variable fingerprint size
output.

I was following this post [Ref 4] describing how to create reaction
difference fingerprints using different fingerprints representation. Using
the code from the post I can create reaction difference fingerprints using
either Morgan or AtomPair, but comparing the output from the post [Ref 4]
to CreateDifferenceFingerprintForReaction results in different size
fingerprints, with different values within the fingerprint, and different
densities. I am assuming this due to folding the count vector down to
the default fingerprint size of 2048.

Example code snippet:

# The below defs are from the post
https://sourceforge.net/p/rdkit/mailman/message/35240736/
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit import DataStructs
import copy

def _createFP(mol,maxSize,fpType='AP'):
    mol.UpdatePropertyCache(False)
    if fpType == 'AP':
        return AllChem.GetAtomPairFingerprint(mol, minLength=1,
maxLength=maxSize)
    else:
        Chem.GetSSSR(mol)
        rinfo = mol.GetRingInfo()
        return AllChem.GetMorganFingerprint(mol, radius=maxSize)

def getSumFps(fps):
    summedFP = copy.deepcopy(fps[0])
    for fp in fps[1:]:
        summedFP += fp
    return summedFP

def buildReactionFP(rxn, maxSize=3, fpType='AP'):
    reactants = rxn.GetReactants()
    products = rxn.GetProducts()
    rFP = getSumFps([_createFP(mol,maxSize,fpType=fpType) for mol in
reactants])
    pFP = getSumFps([_createFP(mol,maxSize,fpType=fpType) for mol in
products])
    return pFP-rFP

>>> rxn1 = AllChem.ReactionFromSmarts( '[C:1]C1CCCCC1>>[N:1]C1CCCCC1' ,
useSmiles=True)
>>> rxfp1 = buildReactionFP(rxn1,maxSize=2)

>>> rxfp1.GetNonzeroElements()
{558114: -2, 574497: -1, 1066050: 2, 1066081: 1}

>>> rxfp1.GetLength()
8388608


# Same reaction now using CreateDifferenceFingerprintForReaction
>>> rxn1_fp = AllChem.CreateDifferenceFingerprintForReaction(rxn1)

>>> rxn1_fp.GetNonzeroElements()
{1048: 10,
 1310: -20,
 1325: 20,
 1372: -10,
 1390: 20,
 1692: -10,
 1757: -20,
 1772: 10}

>>> print(rxn1_fp.GetLength(),rxfp1.GetLength())
2048 8388608

References
1.
https://www.rdkit.org/docs/source/rdkit.Chem.rdChemReactions.html#rdkit.Chem.rdChemReactions.CreateDifferenceFingerprintForReaction
2.
https://www.rdkit.org/docs/cppapi/structRDKit_1_1ReactionFingerprintParams.html
3.
https://www.rdkit.org/docs/GettingStartedInPython.html#morgan-fingerprints-circular-fingerprints
4. https://sourceforge.net/p/rdkit/mailman/message/35240736/

v/r,

Ben

On Mon, Nov 18, 2019 at 10:13 PM Francois Berenger <mli...@ligand.eu> wrote:

> On 19/11/2019 03:34, Benjamin Datko wrote:
> > Hello all,
> >
> > I am curious on how to fold a count vector fingerprint. I understand
> > when folding bit vectors the most common way is to split the vector in
> > half, and apply a bitwise OR operation. I think this is how the
> > function rdkit.DataStructs.FoldFingerprint works in RDKit, correct me
> > if I am wrong.
> >
> > How does RDKit and or what is the appropriate way to fold count
> > vectors such as AtomPair, Morgan, and Topological torsion?
>
> Can you give us some context? Why do you want to do that?
>
> Maybe, you can use the following in order to create
> shorter "fingerprints" for which the Tanimoto distance is
> still computable (despite becoming approximate then):
>
> ---
> Shrivastava, A. (2016).
> Simple and efficient weighted minwise hashing.
> In Advances in Neural Information Processing Systems (pp. 1498-1506).
>
>
> https://papers.nips.cc/paper/6472-simple-and-efficient-weighted-minwise-hashing.pdf
> ---
>
> Regards,
> F.
>
> > I thought about turning the fingerprint into a bit vector using their
> > respected "AsBitVect" method then folding using
> > rdkit.DataStructs.FoldFingerprint, but topological torsion doesn't
> > have a "AsBitVect" method
> > [https://www.rdkit.org/docs/GettingStartedInPython.html].
> >
> > For an explicit example using AtomPair fingerprint we can see the
> > fingerprint is extremely sparse. Could this AtomPair fingerprint be
> > folded to increase the density?
> >
> >>>> from rdkit import Chem
> >
> >>>> from rdkit.Chem import AllChem
> >
> >>>> mol = Chem.MolFromSmiles('CC1CCCCC1')
> >>>> ap_fp = AllChem.GetAtomPairFingerprint(mol, minLength=1,
> > maxLength=3)
> >
> >>>> number_of_nonzero_elements =
> > len(ap_fp.GetNonzeroElements().values())
> >
> >>>> print((ap_fp.GetLength(),number_of_nonzero_elements))
> > (8388608,9)
> >
> > Very Respectfully,
> >
> > Ben
> > _______________________________________________
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to