Hi All,

I do hope some light can be shed on the following...

I have a .sdf file that contains 2483 molecules when I run the following
command ">babel in.sdf out.sdf --unique" it finds 255 duplicated, which is
possible.

When I try and do the same using python code by calculating the Tanimoto
coefficient between two compounds (Tc = 1 would indicate a duplicate) I
don't find any duplicated. How is this possible?
Python code below:

mport openbabel
import pybel
import csv
from pybel import *


def createFPS():

    before = 0
    Phytochemicals = []

    for phyto in readfile("sdf","./phyto3000.sdf"):
        Phytochemical = {}
        before += 1
        fps = phyto.calcfp()
        Phytochemical["Name"] = phyto.title
        Phytochemical["FPS"] = fps
        Phytochemicals.append(Phytochemical)

    print "Phytochemicals in original sdf:",before

    return Phytochemicals


def fDuplicated(Phytochemicals):

    stop = len(Phytochemicals)
    count = 0
    for x in range(0, stop):
        for z in range(0, stop):
            if x != z:
                Tc = Phytochemicals[x]['FPS'] | Phytochemicals[z]['FPS']
        if Tc == 1:
            print "Tc equalto 1"
            count += 1

    print "Total Tc equal to 1",count


Phytochemicals = createFPS()
fDuplicated(Phytochemicals)

-- 
Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/
distinti saluti/siong/duì yú/привет

Jurgens de Bruin
------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
OpenBabel-scripting mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openbabel-scripting

Reply via email to