Hi All,
I do hope some light can be shed on the following...
I have a .sdf file that contains 2483 molecules when I run the following
command ">babel in.sdf out.sdf --unique" it finds 255 duplicated, which is
possible.
When I try and do the same using python code by calculating the Tanimoto
coefficient between two compounds (Tc = 1 would indicate a duplicate) I
don't find any duplicated. How is this possible?
Python code below:
mport openbabel
import pybel
import csv
from pybel import *
def createFPS():
before = 0
Phytochemicals = []
for phyto in readfile("sdf","./phyto3000.sdf"):
Phytochemical = {}
before += 1
fps = phyto.calcfp()
Phytochemical["Name"] = phyto.title
Phytochemical["FPS"] = fps
Phytochemicals.append(Phytochemical)
print "Phytochemicals in original sdf:",before
return Phytochemicals
def fDuplicated(Phytochemicals):
stop = len(Phytochemicals)
count = 0
for x in range(0, stop):
for z in range(0, stop):
if x != z:
Tc = Phytochemicals[x]['FPS'] | Phytochemicals[z]['FPS']
if Tc == 1:
print "Tc equalto 1"
count += 1
print "Total Tc equal to 1",count
Phytochemicals = createFPS()
fDuplicated(Phytochemicals)
--
Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/
distinti saluti/siong/duì yú/привет
Jurgens de Bruin
------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
OpenBabel-scripting mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openbabel-scripting