Hi everybody,
This post is actually a following of a previous one called "similarity
search". I was getting troubles with similarity searches in case the
molecule used as a query is not present in the database.
I got nice advices from Chris, but in the mean time, doing some
"experiments" (trying different ways of computing the similarity search), I
noticed something weird in the Tanimoto values given depending on the method
that is been used. It seems that nobody reported that before, so maybe I'm
wrong in my methods.
I pasted here a few commands and results obtained:
1/ using .smi format and -aa option:
*obabel activeCpds12012011.fs -O babar.smi -aa -at0.7
-s"FC(F)(F)c1cc(CNc2ncnc3n(C(**C)C)cnc23)ccc1" -xfMACCS*
FC(F)(F)c1cc(CNc2ncnc3n(C(C)C)
cnc23)ccc1 49830252 0.926829
n1(C(C)C)c2ncnc(NCc3ccccc3)c2nc1 10730551 0.829268
Fc1ccc(Cc2cc3n(C(C)C)cnc3c(NCc3ccccc3)c2)cc1 44143083 0.813953
FC(F)(F)c1cc(n2c3ncnc(NCc4cccnc4)c3c(c2)c2ccccc2)ccc1 1287650 0.809524
n1(C(C)C)c2nc(nc(NCc3ccccc3)c2nc1)NCc1ccccc1 10523324 0.795455
Fc1ccc(NCc2nc3n(c2)c(ccc3)C)cc1 1263260 0.790698
n1(C(C)C)c2c(nc1)cc(NCc1ncccc1)cc2 1842164 0.785714
Fc1cc(n2c3nc(nc(NCC)c3nc2)C#N)cc(F)c1 44143091 0.777778
FC(F)(F)c1nc(NCC)c2[nH]cnc2n1 4574429 0.777778
FC(F)CNc1nc(nc2n(c3cc(F)cc(F)c3)cnc12)C#N 44460082 0.772727
2/ using .fpt format (the query is in the database anyway)
*obabel activeCpds12012011.fs -O babar.fpt -at0.7
-s"FC(F)(F)c1cc(CNc2ncnc3n(C(**C)C)cnc23)ccc1" -xfMACCS
*
>49830252
>10730551 Tanimoto from 49830252 = 0.902439
>44143083 Tanimoto from 49830252 = 0.883721
>1287650 Tanimoto from 49830252 = 0.880952
>10523324 Tanimoto from 49830252 = 0.863636
>1263260 Tanimoto from 49830252 = 0.860465
>1842164 Tanimoto from 49830252 = 0.857143
>44143091 Tanimoto from 49830252 = 0.844444
>4574429 Tanimoto from 49830252 = 0.844444
>44460082 Tanimoto from 49830252 = 0.840909
3/ Using Pybel library and computing directly the Tanimoto distance between
2 molecules (query against 10730551 and query against 44143083 )
>>> mol1 =
pybel.readstring("smi","FC(F)(F)c1cc(CNc2ncnc3n(C(C)C)cnc23)ccc1") <-- this
is the query molecule
>>> fp1 = mol1.calcfp("MACCS")
>>> mol2 = pybel.readstring("smi","n1(C(C)C)c2ncnc(NCc3ccccc3)c2nc1") <--
this is the first molecule found in the db that is not itself
>>> fp2 = mol2.calcfp("MACCS")
>>> fp1 | fp2
0.86842105263157898
>>> mol3 =
pybel.readstring("smi","Fc1ccc(Cc2cc3n(C(C)C)cnc3c(NCc3ccccc3)c2)cc1") <--
this is the second molecule
>>> fp3 = mol3.calcfp("MACCS")
>>> fp1|fp3
0.84999999999999998
An other remarkable fact is that using or not the -xfMACCS doesn't change
the results for any of the 2 first methods. It changes when using Pybel but
the results are again different from any of the 2 previous methods.
Am I doing something wrong in my command lines? The .fs database is in MACCS
fingerprints as well.
Which figures are trustable? Why in the first case the Tanimoto score
against itself is not 1.0? (first line)
Thanks in advance,
Floriane Montanari
------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss