Dear Nadine,

Thank you for your reply with the code examples.
I understood the reason of low similarity in my code. Your mail is very
informative for me.

Best regards,
Takayuki

2016年7月27日(水) 3:34 Nadine Schneider <nadine.schneider....@gmail.com>:

> Hi Takayuki
>
> The reason why this happens is that the
> CreateDifferenceFingerprintForReaction function takes the whole structure
> of the molecules of a reactions into account. This means it generates
> AtomPair FPs with a path length up to 30 bonds for the reactants and
> products and then builds the difference of those. Therefore you get this
> low similarity. If you would like to capture the transformation only you
> should better use a more local version of the FPs, like an AP FP with a
> path length up to 3 bonds or a Morgan FP with radius of 1. Unfortunately
> this isn’t possible with the function above but please find an example
> below that allows doing this.
> I hope that helps.
>
> Best,
> Nadine
>
>
>
> from rdkit import Chem
> from rdkit.Chem import AllChem
> from rdkit import DataStructs
> import copy
>
>
> def _createFP(mol,maxSize,fpType='AP'):
>     mol.UpdatePropertyCache(False)
>     if fpType == 'AP':
>         return AllChem.GetAtomPairFingerprint(mol, minLength=1,
> maxLength=maxSize)
>     else:
>         Chem.GetSSSR(mol)
>         rinfo = mol.GetRingInfo()
>         return AllChem.GetMorganFingerprint(mol, radius=maxSize)
>
> def getSumFps(fps):
>     summedFP = copy.deepcopy(fps[0])
>     for fp in fps[1:]:
>         summedFP += fp
>     return summedFP
>
> def buildReactionFP(rxn, maxSize=3, fpType='AP'):
>     reactants = rxn.GetReactants()
>     products = rxn.GetProducts()
>     rFP = getSumFps([_createFP(mol,maxSize,fpType=fpType) for mol in
> reactants])
>     pFP = getSumFps([_createFP(mol,maxSize,fpType=fpType) for mol in
> products])
>     return pFP-rFP
>
> # Your examples
>
> rxn1 = AllChem.ReactionFromSmarts( '[C:1]C1CCCCC1>>[N:1]C1CCCCC1' )
> rxn2 = AllChem.ReactionFromSmarts( '[C:1]C1CCCNC1>>[N:1]C1CCCNC1' )
> rxn3 = AllChem.ReactionFromSmarts( '[C:1]c1ccccc1>>[N:1]c1ccccc1' )
>
> rxfp1 = buildReactionFP(rxn1,maxSize=3)
> rxfp2 = buildReactionFP(rxn2,maxSize=3)
> rxfp3 = buildReactionFP(rxn3,maxSize=3)
>
>
> tc12 = DataStructs.TanimotoSimilarity(rxfp1, rxfp2)
> tc13 = DataStructs.TanimotoSimilarity(rxfp1, rxfp3)
> tc23 = DataStructs.TanimotoSimilarity(rxfp2, rxfp3)
>
> print(tc12,tc13,tc23)
>
> >> (0.6666666666666666, 0.0, 0.0)
>
> # Try a smaller path length
>
> rxfp1 = buildReactionFP(rxn1,maxSize=2)
> rxfp2 = buildReactionFP(rxn2,maxSize=2)
> rxfp3 = buildReactionFP(rxn3,maxSize=2)
>
>
> tc12 = DataStructs.TanimotoSimilarity(rxfp1, rxfp2)
> tc13 = DataStructs.TanimotoSimilarity(rxfp1, rxfp3)
> tc23 = DataStructs.TanimotoSimilarity(rxfp2, rxfp3)
>
> print(tc12,tc13,tc23)
>
> >> (1.0, 0.0, 0.0)
>
> # Finally use Morgan with radius 1
>
> rxfp1 = buildReactionFP(rxn1,maxSize=1,fpType='Morgan')
> rxfp2 = buildReactionFP(rxn2,maxSize=1,fpType='Morgan')
> rxfp3 = buildReactionFP(rxn3,maxSize=1,fpType='Morgan')
>
>
> tc12 = DataStructs.TanimotoSimilarity(rxfp1, rxfp2)
> tc13 = DataStructs.TanimotoSimilarity(rxfp1, rxfp3)
> tc23 = DataStructs.TanimotoSimilarity(rxfp2, rxfp3)
>
> print(tc12,tc13,tc23)
>
> >> (1.0, 0.2, 0.2)
>
>
>
> 2016-07-25 15:44 GMT+02:00 Taka Seri <serit...@gmail.com>:
>
>> Dear rdkitters,
>> I want to analyse and build prediction model about reaction or matched
>> molecular pair ( molecular transformations ).
>>
>> I found new function named CreateDifferenceFingerprintForReaction. So, I
>> tried to use the function to do it. But I confused following result.
>>
>> I defined three reactions that transform C to N.
>> I expected that tanimoto similarity would be same but Tanimoto similarity
>> of the reactions were quite different. I confused these result.
>> My code is following....
>> from rdkit import Chem
>> from rdkit.Chem import AllChem
>> from rdkit import rdBase
>> from rdkit.Chem import rdChemReactions
>> from rdkit.Chem import DataStructs
>>
>> rdBase.rdkitVersion =>'2016.03.1'
>>
>> rxn1 = AllChem.ReactionFromSmarts( '[C:1]C1CCCCC1>>[N:1]C1CCCCC1' )
>>
>> rxn2 = AllChem.ReactionFromSmarts( '[C:1]C1CCCNC1>>[N:1]C1CCCNC1' )
>>
>> rxn3 = AllChem.ReactionFromSmarts( '[C:1]c1ccccc1>>[N:1]c1ccccc1' )
>>
>> rxfp1 = rdChemReactions.CreateDifferenceFingerprintForReaction(rxn1)
>>
>> rxfp2 = rdChemReactions.CreateDifferenceFingerprintForReaction(rxn2)
>>
>> rxfp3 = rdChemReactions.CreateDifferenceFingerprintForReaction(rxn3)
>>
>> tc12 = DataStructs.TanimotoSimilarity(rxfp1, rxfp2)
>>
>> tc13 = DataStructs.TanimotoSimilarity(rxfp1, rxfp3)
>>
>> tc23 = DataStructs.TanimotoSimilarity(rxfp2, rxfp3)
>>
>> print( tc12,tc13, tc23 )
>>
>> # I got following score. Why 2nd and 3rd similarity was zero?
>>
>> 0.7142857142857143 0.0 0.0
>>
>> Any advice and suggestions will be greatly appreciated
>> Best regards,
>> Takayuki
>>
>>
>> ------------------------------------------------------------------------------
>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>> traffic
>> patterns at an interface-level. Reveals which users, apps, and protocols
>> are
>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>> planning
>> reports.http://sdm.link/zohodev2dev
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to