Jurgens

You should be aware that Tc=1 does not guarantee that two compounds are 
identical, only that they could be identical. Due to the finite fingerprint 
length used in the comparison it is possible that the same bits will be set for 
non-identical structures, especially if you fold the fingerprints.

To say this another way - identical compounds must have the same fingerprints, 
but compounds with the same fingerprints are not necessarily identical.

Marc


On 29/03/2011, at 10:43 PM, Jurgens de Bruin <[email protected]> wrote:

> Hi All,
> 
> I do hope some light can be shed on the following...
> 
> I have a .sdf file that contains 2483 molecules when I run the following 
> command ">babel in.sdf out.sdf --unique" it finds 255 duplicated, which is 
> possible. 
> 
> When I try and do the same using python code by calculating the Tanimoto 
> coefficient between two compounds (Tc = 1 would indicate a duplicate) I don't 
> find any duplicated. How is this possible? 
> Python code below:
> 
> mport openbabel
> import pybel
> import csv
> from pybel import *
> 
> 
> def createFPS():
>     
>     before = 0
>     Phytochemicals = []
>     
>     for phyto in readfile("sdf","./phyto3000.sdf"):
>         Phytochemical = {}
>         before += 1
>         fps = phyto.calcfp()
>         Phytochemical["Name"] = phyto.title
>         Phytochemical["FPS"] = fps
>         Phytochemicals.append(Phytochemical)
>     
>     print "Phytochemicals in original sdf:",before
>     
>     return Phytochemicals
>         
>         
> def fDuplicated(Phytochemicals):
>     
>     stop = len(Phytochemicals)
>     count = 0
>     for x in range(0, stop):
>         for z in range(0, stop):
>             if x != z:
>                 Tc = Phytochemicals[x]['FPS'] | Phytochemicals[z]['FPS']
>         if Tc == 1:
>             print "Tc equalto 1"
>             count += 1
>             
>     print "Total Tc equal to 1",count
>     
> 
> Phytochemicals = createFPS()
> fDuplicated(Phytochemicals)
> 
> -- 
> Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/
> distinti saluti/siong/duì yú/привет
> 
> Jurgens de Bruin
> ------------------------------------------------------------------------------
> Enable your software for Intel(R) Active Management Technology to meet the
> growing manageability and security demands of your customers. Businesses
> are taking advantage of Intel(R) vPro (TM) technology - will your software 
> be a part of the solution? Download the Intel(R) Manageability Checker 
> today! http://p.sf.net/sfu/intel-dev2devmar
> _______________________________________________
> OpenBabel-scripting mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
OpenBabel-scripting mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openbabel-scripting

Reply via email to