Dear Nadine, Thank you for your reply with the code examples. I understood the reason of low similarity in my code. Your mail is very informative for me.
Best regards, Takayuki 2016年7月27日(水) 3:34 Nadine Schneider <nadine.schneider....@gmail.com>: > Hi Takayuki > > The reason why this happens is that the > CreateDifferenceFingerprintForReaction function takes the whole structure > of the molecules of a reactions into account. This means it generates > AtomPair FPs with a path length up to 30 bonds for the reactants and > products and then builds the difference of those. Therefore you get this > low similarity. If you would like to capture the transformation only you > should better use a more local version of the FPs, like an AP FP with a > path length up to 3 bonds or a Morgan FP with radius of 1. Unfortunately > this isn’t possible with the function above but please find an example > below that allows doing this. > I hope that helps. > > Best, > Nadine > > > > from rdkit import Chem > from rdkit.Chem import AllChem > from rdkit import DataStructs > import copy > > > def _createFP(mol,maxSize,fpType='AP'): > mol.UpdatePropertyCache(False) > if fpType == 'AP': > return AllChem.GetAtomPairFingerprint(mol, minLength=1, > maxLength=maxSize) > else: > Chem.GetSSSR(mol) > rinfo = mol.GetRingInfo() > return AllChem.GetMorganFingerprint(mol, radius=maxSize) > > def getSumFps(fps): > summedFP = copy.deepcopy(fps[0]) > for fp in fps[1:]: > summedFP += fp > return summedFP > > def buildReactionFP(rxn, maxSize=3, fpType='AP'): > reactants = rxn.GetReactants() > products = rxn.GetProducts() > rFP = getSumFps([_createFP(mol,maxSize,fpType=fpType) for mol in > reactants]) > pFP = getSumFps([_createFP(mol,maxSize,fpType=fpType) for mol in > products]) > return pFP-rFP > > # Your examples > > rxn1 = AllChem.ReactionFromSmarts( '[C:1]C1CCCCC1>>[N:1]C1CCCCC1' ) > rxn2 = AllChem.ReactionFromSmarts( '[C:1]C1CCCNC1>>[N:1]C1CCCNC1' ) > rxn3 = AllChem.ReactionFromSmarts( '[C:1]c1ccccc1>>[N:1]c1ccccc1' ) > > rxfp1 = buildReactionFP(rxn1,maxSize=3) > rxfp2 = buildReactionFP(rxn2,maxSize=3) > rxfp3 = buildReactionFP(rxn3,maxSize=3) > > > tc12 = DataStructs.TanimotoSimilarity(rxfp1, rxfp2) > tc13 = DataStructs.TanimotoSimilarity(rxfp1, rxfp3) > tc23 = DataStructs.TanimotoSimilarity(rxfp2, rxfp3) > > print(tc12,tc13,tc23) > > >> (0.6666666666666666, 0.0, 0.0) > > # Try a smaller path length > > rxfp1 = buildReactionFP(rxn1,maxSize=2) > rxfp2 = buildReactionFP(rxn2,maxSize=2) > rxfp3 = buildReactionFP(rxn3,maxSize=2) > > > tc12 = DataStructs.TanimotoSimilarity(rxfp1, rxfp2) > tc13 = DataStructs.TanimotoSimilarity(rxfp1, rxfp3) > tc23 = DataStructs.TanimotoSimilarity(rxfp2, rxfp3) > > print(tc12,tc13,tc23) > > >> (1.0, 0.0, 0.0) > > # Finally use Morgan with radius 1 > > rxfp1 = buildReactionFP(rxn1,maxSize=1,fpType='Morgan') > rxfp2 = buildReactionFP(rxn2,maxSize=1,fpType='Morgan') > rxfp3 = buildReactionFP(rxn3,maxSize=1,fpType='Morgan') > > > tc12 = DataStructs.TanimotoSimilarity(rxfp1, rxfp2) > tc13 = DataStructs.TanimotoSimilarity(rxfp1, rxfp3) > tc23 = DataStructs.TanimotoSimilarity(rxfp2, rxfp3) > > print(tc12,tc13,tc23) > > >> (1.0, 0.2, 0.2) > > > > 2016-07-25 15:44 GMT+02:00 Taka Seri <serit...@gmail.com>: > >> Dear rdkitters, >> I want to analyse and build prediction model about reaction or matched >> molecular pair ( molecular transformations ). >> >> I found new function named CreateDifferenceFingerprintForReaction. So, I >> tried to use the function to do it. But I confused following result. >> >> I defined three reactions that transform C to N. >> I expected that tanimoto similarity would be same but Tanimoto similarity >> of the reactions were quite different. I confused these result. >> My code is following.... >> from rdkit import Chem >> from rdkit.Chem import AllChem >> from rdkit import rdBase >> from rdkit.Chem import rdChemReactions >> from rdkit.Chem import DataStructs >> >> rdBase.rdkitVersion =>'2016.03.1' >> >> rxn1 = AllChem.ReactionFromSmarts( '[C:1]C1CCCCC1>>[N:1]C1CCCCC1' ) >> >> rxn2 = AllChem.ReactionFromSmarts( '[C:1]C1CCCNC1>>[N:1]C1CCCNC1' ) >> >> rxn3 = AllChem.ReactionFromSmarts( '[C:1]c1ccccc1>>[N:1]c1ccccc1' ) >> >> rxfp1 = rdChemReactions.CreateDifferenceFingerprintForReaction(rxn1) >> >> rxfp2 = rdChemReactions.CreateDifferenceFingerprintForReaction(rxn2) >> >> rxfp3 = rdChemReactions.CreateDifferenceFingerprintForReaction(rxn3) >> >> tc12 = DataStructs.TanimotoSimilarity(rxfp1, rxfp2) >> >> tc13 = DataStructs.TanimotoSimilarity(rxfp1, rxfp3) >> >> tc23 = DataStructs.TanimotoSimilarity(rxfp2, rxfp3) >> >> print( tc12,tc13, tc23 ) >> >> # I got following score. Why 2nd and 3rd similarity was zero? >> >> 0.7142857142857143 0.0 0.0 >> >> Any advice and suggestions will be greatly appreciated >> Best regards, >> Takayuki >> >> >> ------------------------------------------------------------------------------ >> What NetFlow Analyzer can do for you? Monitors network bandwidth and >> traffic >> patterns at an interface-level. Reveals which users, apps, and protocols >> are >> consuming the most bandwidth. Provides multi-vendor support for NetFlow, >> J-Flow, sFlow and other flows. Make informed decisions using capacity >> planning >> reports.http://sdm.link/zohodev2dev >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> >
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss