Jurgens You should be aware that Tc=1 does not guarantee that two compounds are identical, only that they could be identical. Due to the finite fingerprint length used in the comparison it is possible that the same bits will be set for non-identical structures, especially if you fold the fingerprints.
To say this another way - identical compounds must have the same fingerprints, but compounds with the same fingerprints are not necessarily identical. Marc On 29/03/2011, at 10:43 PM, Jurgens de Bruin <[email protected]> wrote: > Hi All, > > I do hope some light can be shed on the following... > > I have a .sdf file that contains 2483 molecules when I run the following > command ">babel in.sdf out.sdf --unique" it finds 255 duplicated, which is > possible. > > When I try and do the same using python code by calculating the Tanimoto > coefficient between two compounds (Tc = 1 would indicate a duplicate) I don't > find any duplicated. How is this possible? > Python code below: > > mport openbabel > import pybel > import csv > from pybel import * > > > def createFPS(): > > before = 0 > Phytochemicals = [] > > for phyto in readfile("sdf","./phyto3000.sdf"): > Phytochemical = {} > before += 1 > fps = phyto.calcfp() > Phytochemical["Name"] = phyto.title > Phytochemical["FPS"] = fps > Phytochemicals.append(Phytochemical) > > print "Phytochemicals in original sdf:",before > > return Phytochemicals > > > def fDuplicated(Phytochemicals): > > stop = len(Phytochemicals) > count = 0 > for x in range(0, stop): > for z in range(0, stop): > if x != z: > Tc = Phytochemicals[x]['FPS'] | Phytochemicals[z]['FPS'] > if Tc == 1: > print "Tc equalto 1" > count += 1 > > print "Total Tc equal to 1",count > > > Phytochemicals = createFPS() > fDuplicated(Phytochemicals) > > -- > Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/ > distinti saluti/siong/duì yú/привет > > Jurgens de Bruin > ------------------------------------------------------------------------------ > Enable your software for Intel(R) Active Management Technology to meet the > growing manageability and security demands of your customers. Businesses > are taking advantage of Intel(R) vPro (TM) technology - will your software > be a part of the solution? Download the Intel(R) Manageability Checker > today! http://p.sf.net/sfu/intel-dev2devmar > _______________________________________________ > OpenBabel-scripting mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/openbabel-scripting ------------------------------------------------------------------------------ Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar _______________________________________________ OpenBabel-scripting mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/openbabel-scripting
