Hi Francois,

I agree with your suggestion. I am also CCing Greg on this response.

I have tried to look around on google for viewing the source code of the
CreateDifferenceFingerprintForReaction method but the most relevant pages I
can find describing what the code does are [here](
https://www.rdkit.org/docs/cppapi/structRDKit_1_1ReactionFingerprintParams.html)
and [here](
https://www.rdkit.org/docs/source/rdkit.Chem.rdChemReactions.html#rdkit.Chem.rdChemReactions.CreateDifferenceFingerprintForReaction
)

I don't mind if the source is only in C++ but where can I find it? If I can
view the source code I could understand how folding a count vector was
implemented. As of right now I am assuming the implementation is similar
to folding a bit vector just applying a SUM instead of a logical OR.

v/r,

Ben

On Wed, Nov 20, 2019 at 3:23 AM Francois Berenger <mli...@ligand.eu> wrote:

> On 20/11/2019 02:00, Benjamin Datko wrote:
> > Hello Francois,
> >
> > I am trying to replicate some of the functionality of
> > CreateDifferenceFingerprintForReaction [Ref 1] for my own
> > understanding on how the code works. The function
> > CreateDifferenceFingerprintForReaction allows for three difference
> > fingerprint representation of the molecules: AtomPair, Morgan, and
> > TopologicalTorsion [Ref 2]. All three are count vectors [Ref 3], and
> > the function allows for variable fingerprint size output.
>
> Personally, I wouldn't try to fold a count vector.
> They are sparse vectors, so they don't take a lot of memory.
> Also, they are less information lossy than binary fingerprints.
>
> But, maybe Greg has some hack around, if you are really forced to do
> this.
>
> > I was following this post [Ref 4] describing how to create reaction
> > difference fingerprints using different fingerprints representation.
> > Using the code from the post I can create reaction difference
> > fingerprints using either Morgan or AtomPair, but comparing the output
> > from the post [Ref 4] to CreateDifferenceFingerprintForReaction
> > results in different size fingerprints, with different values within
> > the fingerprint, and different densities. I am assuming this due to
> > folding the count vector down to the default fingerprint size of 2048.
> >
> >
> > Example code snippet:
> >
> > # The below defs are from the post
> > https://sourceforge.net/p/rdkit/mailman/message/35240736/
> >
> > from rdkit import Chem
> > from rdkit.Chem import AllChem
> > from rdkit import DataStructs
> > import copy
> >
> > def _createFP(mol,maxSize,fpType='AP'):
> >     mol.UpdatePropertyCache(False)
> >     if fpType == 'AP':
> >         return AllChem.GetAtomPairFingerprint(mol, minLength=1,
> > maxLength=maxSize)
> >     else:
> >         Chem.GetSSSR(mol)
> >         rinfo = mol.GetRingInfo()
> >         return AllChem.GetMorganFingerprint(mol, radius=maxSize)
> >
> > def getSumFps(fps):
> >     summedFP = copy.deepcopy(fps[0])
> >     for fp in fps[1:]:
> >         summedFP += fp
> >     return summedFP
> >
> > def buildReactionFP(rxn, maxSize=3, fpType='AP'):
> >     reactants = rxn.GetReactants()
> >     products = rxn.GetProducts()
> >     rFP = getSumFps([_createFP(mol,maxSize,fpType=fpType) for mol in
> > reactants])
> >     pFP = getSumFps([_createFP(mol,maxSize,fpType=fpType) for mol in
> > products])
> >     return pFP-rFP
> >
> >>>> rxn1 = AllChem.ReactionFromSmarts( '[C:1]C1CCCCC1>>[N:1]C1CCCCC1'
> > , useSmiles=True)
> >
> >>>> rxfp1 = buildReactionFP(rxn1,maxSize=2)
> >
> >>>> rxfp1.GetNonzeroElements()
> > {558114: -2, 574497: -1, 1066050: 2, 1066081: 1}
> >
> >>>> rxfp1.GetLength()
> > 8388608
> >
> > # Same reaction now using CreateDifferenceFingerprintForReaction
> >>>> rxn1_fp = AllChem.CreateDifferenceFingerprintForReaction(rxn1)
> >
> >>>> rxn1_fp.GetNonzeroElements()
> >
> > {1048: 10,
> >  1310: -20,
> >  1325: 20,
> >  1372: -10,
> >  1390: 20,
> >  1692: -10,
> >  1757: -20,
> >  1772: 10}
> >
> >>>> print(rxn1_fp.GetLength(),rxfp1.GetLength())
> > 2048 8388608
> >
> > References
> > 1.
> >
> https://www.rdkit.org/docs/source/rdkit.Chem.rdChemReactions.html#rdkit.Chem.rdChemReactions.CreateDifferenceFingerprintForReaction
> > 2.
> >
> https://www.rdkit.org/docs/cppapi/structRDKit_1_1ReactionFingerprintParams.html
> > 3.
> >
> https://www.rdkit.org/docs/GettingStartedInPython.html#morgan-fingerprints-circular-fingerprints
> > 4. https://sourceforge.net/p/rdkit/mailman/message/35240736/
> >
> > v/r,
> >
> > Ben
> >
> > On Mon, Nov 18, 2019 at 10:13 PM Francois Berenger <mli...@ligand.eu>
> > wrote:
> >
> >> On 19/11/2019 03:34, Benjamin Datko wrote:
> >>> Hello all,
> >>>
> >>> I am curious on how to fold a count vector fingerprint. I
> >> understand
> >>> when folding bit vectors the most common way is to split the
> >> vector in
> >>> half, and apply a bitwise OR operation. I think this is how the
> >>> function rdkit.DataStructs.FoldFingerprint works in RDKit, correct
> >> me
> >>> if I am wrong.
> >>>
> >>> How does RDKit and or what is the appropriate way to fold count
> >>> vectors such as AtomPair, Morgan, and Topological torsion?
> >>
> >> Can you give us some context? Why do you want to do that?
> >>
> >> Maybe, you can use the following in order to create
> >> shorter "fingerprints" for which the Tanimoto distance is
> >> still computable (despite becoming approximate then):
> >>
> >> ---
> >> Shrivastava, A. (2016).
> >> Simple and efficient weighted minwise hashing.
> >> In Advances in Neural Information Processing Systems (pp.
> >> 1498-1506).
> >>
> >>
> >
> https://papers.nips.cc/paper/6472-simple-and-efficient-weighted-minwise-hashing.pdf
> >> ---
> >>
> >> Regards,
> >> F.
> >>
> >>> I thought about turning the fingerprint into a bit vector using
> >> their
> >>> respected "AsBitVect" method then folding using
> >>> rdkit.DataStructs.FoldFingerprint, but topological torsion doesn't
> >>> have a "AsBitVect" method
> >>> [https://www.rdkit.org/docs/GettingStartedInPython.html].
> >>>
> >>> For an explicit example using AtomPair fingerprint we can see the
> >>> fingerprint is extremely sparse. Could this AtomPair fingerprint
> >> be
> >>> folded to increase the density?
> >>>
> >>>>>> from rdkit import Chem
> >>>
> >>>>>> from rdkit.Chem import AllChem
> >>>
> >>>>>> mol = Chem.MolFromSmiles('CC1CCCCC1')
> >>>>>> ap_fp = AllChem.GetAtomPairFingerprint(mol, minLength=1,
> >>> maxLength=3)
> >>>
> >>>>>> number_of_nonzero_elements =
> >>> len(ap_fp.GetNonzeroElements().values())
> >>>
> >>>>>> print((ap_fp.GetLength(),number_of_nonzero_elements))
> >>> (8388608,9)
> >>>
> >>> Very Respectfully,
> >>>
> >>> Ben
> >>> _______________________________________________
> >>> Rdkit-discuss mailing list
> >>> Rdkit-discuss@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to