Hi everybody,

This post is actually a following of a previous one called "similarity
search". I was getting troubles with similarity searches in case the
molecule used as a query is not present in the database.
I got nice advices from Chris, but in the mean time, doing some
"experiments" (trying different ways of computing the similarity search), I
noticed something weird in the Tanimoto values given depending on the method
that is been used. It seems that nobody reported that before, so maybe I'm
wrong in my methods.
I pasted here a few commands and results obtained:

1/ using .smi format and -aa option:

 *obabel activeCpds12012011.fs -O babar.smi -aa -at0.7
-s"FC(F)(F)c1cc(CNc2ncnc3n(C(**C)C)cnc23)ccc1" -xfMACCS*

FC(F)(F)c1cc(CNc2ncnc3n(C(C)C)
cnc23)ccc1    49830252 0.926829
n1(C(C)C)c2ncnc(NCc3ccccc3)c2nc1    10730551 0.829268
Fc1ccc(Cc2cc3n(C(C)C)cnc3c(NCc3ccccc3)c2)cc1    44143083 0.813953
FC(F)(F)c1cc(n2c3ncnc(NCc4cccnc4)c3c(c2)c2ccccc2)ccc1    1287650 0.809524
n1(C(C)C)c2nc(nc(NCc3ccccc3)c2nc1)NCc1ccccc1    10523324 0.795455
Fc1ccc(NCc2nc3n(c2)c(ccc3)C)cc1    1263260 0.790698
n1(C(C)C)c2c(nc1)cc(NCc1ncccc1)cc2    1842164 0.785714
Fc1cc(n2c3nc(nc(NCC)c3nc2)C#N)cc(F)c1    44143091 0.777778
FC(F)(F)c1nc(NCC)c2[nH]cnc2n1    4574429 0.777778
FC(F)CNc1nc(nc2n(c3cc(F)cc(F)c3)cnc12)C#N    44460082 0.772727


2/ using .fpt format (the query is in the database anyway)

*obabel activeCpds12012011.fs -O babar.fpt -at0.7
-s"FC(F)(F)c1cc(CNc2ncnc3n(C(**C)C)cnc23)ccc1" -xfMACCS
*
>49830252
>10730551   Tanimoto from 49830252 = 0.902439
>44143083   Tanimoto from 49830252 = 0.883721
>1287650   Tanimoto from 49830252 = 0.880952
>10523324   Tanimoto from 49830252 = 0.863636
>1263260   Tanimoto from 49830252 = 0.860465
>1842164   Tanimoto from 49830252 = 0.857143
>44143091   Tanimoto from 49830252 = 0.844444
>4574429   Tanimoto from 49830252 = 0.844444
>44460082   Tanimoto from 49830252 = 0.840909


3/ Using Pybel library and computing directly the Tanimoto distance between
2 molecules (query against 10730551 and query against 44143083 )

>>> mol1 =
pybel.readstring("smi","FC(F)(F)c1cc(CNc2ncnc3n(C(C)C)cnc23)ccc1") <-- this
is the query molecule
>>> fp1 = mol1.calcfp("MACCS")
>>> mol2 = pybel.readstring("smi","n1(C(C)C)c2ncnc(NCc3ccccc3)c2nc1") <--
this is the first molecule found in the db that is not itself
>>> fp2 = mol2.calcfp("MACCS")
>>> fp1 | fp2
0.86842105263157898
>>> mol3 =
pybel.readstring("smi","Fc1ccc(Cc2cc3n(C(C)C)cnc3c(NCc3ccccc3)c2)cc1") <--
this is the second molecule
>>> fp3 = mol3.calcfp("MACCS")
>>> fp1|fp3
0.84999999999999998

An other remarkable fact is that using or not the -xfMACCS doesn't change
the results for any of the 2 first methods. It changes when using Pybel but
the results are again different from any of the 2 previous methods.

Am I doing something wrong in my command lines? The .fs database is in MACCS
fingerprints as well.
Which figures are trustable? Why in the first case the Tanimoto score
against itself is not 1.0? (first line)

Thanks in advance,

Floriane Montanari
------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to