Hi all,
I'm using Pybel to compute similarity between one query molecule and one
database of molecules.
Doing some simple tests, I find out the following:

let's say that my query is the compound 44968246 from Pubchem. Its smiles
string is:
'CC(C1=CC=C(O1)C2=CC3=C(C=C2)N=CN=C3NCCC4=CN=CN4)O'

In my program, I find a list of similar compounds, and by curiosity I wanted
to check that the Tanimoto values are the same if I compute them one by one
using Pybel. One of the compounds "hitted" is the compound 44968247 of
Pubchem, whose smiles string is
'C1=CC2=C(C=C1C3=CC=C(O3)CO)C(=NC=N2)NCC4=NC=CN4'

The computation of Tanimoto for MACCS fingerprints gives me: 0.822 using
>>> mols =
['CC(C1=CC=C(O1)C2=CC3=C(C=C2)N=CN=C3NCCC4=CN=CN4)O','C1=CC2=C(C=C1C3=CC=C(O3)CO)C(=NC=N2)NCC4=NC=CN4']
>>> molec = [pybel.readstring("smi", x) for x in mols]
>>> fps = [x.calcfp('MACCS') for x in molec]
>>> print fps[0] | fps[1]
*0.822222222222*

But when I save the molecule 44968247 into a sdf file (find attached) and
read the molecule from the file using
>>> mol2 = pybel.readfile("sdf",
"/mmb/data/Medicahead/WP2/activePubchemCompound/44968247.sdf").next()
The computation then gives me
>>> fp2 = mol2.calcfp("MACCS")
>>> print fps[0] | fp2
*0.711538461538*

I have compared the lists of on bits given by 1/ the smiles string 2/ the
sdf file, and they are definitely different:
1/ [8, 11, 38, 54, 57, 62, 65, 72, 77, 79, 80, 82, 83, 96, 100, 104, 105,
109, 111, 120, 121, 131, 132, 133, 135, 137, 138, 139, 142, 151, 152, 153,
155, 156, 157, 158, 159, 161, 162, 164, 165]
2/ [8, 11, 38, 54, 57, 62, 65, 72, 75, 77, 79, 80, 82, 83, 96, 100, 104,
105, 109, 111, 112, 120, 121, 122, 126, 131, 132, 133, 135, 137, 138, 139,
142, 144, 148, 150, 151, 152, 153, 155, 156, 157, 158, 159, 161, 162, 164,
165]

So... Is it a problem of OpenBabel reading the sdf file?
Is it a problem of me not reading it properly?
Is it a problem of Pubchem giving smiles string and sdf files that do not
match?

I would be glad if someone could help me with that.

Regards,
Floriane

Attachment: 44968247.sdf
Description: StarMath document

------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to