Hello Francois, I am trying to replicate some of the functionality of CreateDifferenceFingerprintForReaction [Ref 1] for my own understanding on how the code works. The function CreateDifferenceFingerprintForReaction allows for three difference fingerprint representation of the molecules: AtomPair, Morgan, and TopologicalTorsion [Ref 2]. All three are count vectors [Ref 3], and the function allows for variable fingerprint size output.
I was following this post [Ref 4] describing how to create reaction difference fingerprints using different fingerprints representation. Using the code from the post I can create reaction difference fingerprints using either Morgan or AtomPair, but comparing the output from the post [Ref 4] to CreateDifferenceFingerprintForReaction results in different size fingerprints, with different values within the fingerprint, and different densities. I am assuming this due to folding the count vector down to the default fingerprint size of 2048. Example code snippet: # The below defs are from the post https://sourceforge.net/p/rdkit/mailman/message/35240736/ from rdkit import Chem from rdkit.Chem import AllChem from rdkit import DataStructs import copy def _createFP(mol,maxSize,fpType='AP'): mol.UpdatePropertyCache(False) if fpType == 'AP': return AllChem.GetAtomPairFingerprint(mol, minLength=1, maxLength=maxSize) else: Chem.GetSSSR(mol) rinfo = mol.GetRingInfo() return AllChem.GetMorganFingerprint(mol, radius=maxSize) def getSumFps(fps): summedFP = copy.deepcopy(fps[0]) for fp in fps[1:]: summedFP += fp return summedFP def buildReactionFP(rxn, maxSize=3, fpType='AP'): reactants = rxn.GetReactants() products = rxn.GetProducts() rFP = getSumFps([_createFP(mol,maxSize,fpType=fpType) for mol in reactants]) pFP = getSumFps([_createFP(mol,maxSize,fpType=fpType) for mol in products]) return pFP-rFP >>> rxn1 = AllChem.ReactionFromSmarts( '[C:1]C1CCCCC1>>[N:1]C1CCCCC1' , useSmiles=True) >>> rxfp1 = buildReactionFP(rxn1,maxSize=2) >>> rxfp1.GetNonzeroElements() {558114: -2, 574497: -1, 1066050: 2, 1066081: 1} >>> rxfp1.GetLength() 8388608 # Same reaction now using CreateDifferenceFingerprintForReaction >>> rxn1_fp = AllChem.CreateDifferenceFingerprintForReaction(rxn1) >>> rxn1_fp.GetNonzeroElements() {1048: 10, 1310: -20, 1325: 20, 1372: -10, 1390: 20, 1692: -10, 1757: -20, 1772: 10} >>> print(rxn1_fp.GetLength(),rxfp1.GetLength()) 2048 8388608 References 1. https://www.rdkit.org/docs/source/rdkit.Chem.rdChemReactions.html#rdkit.Chem.rdChemReactions.CreateDifferenceFingerprintForReaction 2. https://www.rdkit.org/docs/cppapi/structRDKit_1_1ReactionFingerprintParams.html 3. https://www.rdkit.org/docs/GettingStartedInPython.html#morgan-fingerprints-circular-fingerprints 4. https://sourceforge.net/p/rdkit/mailman/message/35240736/ v/r, Ben On Mon, Nov 18, 2019 at 10:13 PM Francois Berenger <mli...@ligand.eu> wrote: > On 19/11/2019 03:34, Benjamin Datko wrote: > > Hello all, > > > > I am curious on how to fold a count vector fingerprint. I understand > > when folding bit vectors the most common way is to split the vector in > > half, and apply a bitwise OR operation. I think this is how the > > function rdkit.DataStructs.FoldFingerprint works in RDKit, correct me > > if I am wrong. > > > > How does RDKit and or what is the appropriate way to fold count > > vectors such as AtomPair, Morgan, and Topological torsion? > > Can you give us some context? Why do you want to do that? > > Maybe, you can use the following in order to create > shorter "fingerprints" for which the Tanimoto distance is > still computable (despite becoming approximate then): > > --- > Shrivastava, A. (2016). > Simple and efficient weighted minwise hashing. > In Advances in Neural Information Processing Systems (pp. 1498-1506). > > > https://papers.nips.cc/paper/6472-simple-and-efficient-weighted-minwise-hashing.pdf > --- > > Regards, > F. > > > I thought about turning the fingerprint into a bit vector using their > > respected "AsBitVect" method then folding using > > rdkit.DataStructs.FoldFingerprint, but topological torsion doesn't > > have a "AsBitVect" method > > [https://www.rdkit.org/docs/GettingStartedInPython.html]. > > > > For an explicit example using AtomPair fingerprint we can see the > > fingerprint is extremely sparse. Could this AtomPair fingerprint be > > folded to increase the density? > > > >>>> from rdkit import Chem > > > >>>> from rdkit.Chem import AllChem > > > >>>> mol = Chem.MolFromSmiles('CC1CCCCC1') > >>>> ap_fp = AllChem.GetAtomPairFingerprint(mol, minLength=1, > > maxLength=3) > > > >>>> number_of_nonzero_elements = > > len(ap_fp.GetNonzeroElements().values()) > > > >>>> print((ap_fp.GetLength(),number_of_nonzero_elements)) > > (8388608,9) > > > > Very Respectfully, > > > > Ben > > _______________________________________________ > > Rdkit-discuss mailing list > > Rdkit-discuss@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss