Hi all, I'm using Pybel to compute similarity between one query molecule and one database of molecules. Doing some simple tests, I find out the following:
let's say that my query is the compound 44968246 from Pubchem. Its smiles string is: 'CC(C1=CC=C(O1)C2=CC3=C(C=C2)N=CN=C3NCCC4=CN=CN4)O' In my program, I find a list of similar compounds, and by curiosity I wanted to check that the Tanimoto values are the same if I compute them one by one using Pybel. One of the compounds "hitted" is the compound 44968247 of Pubchem, whose smiles string is 'C1=CC2=C(C=C1C3=CC=C(O3)CO)C(=NC=N2)NCC4=NC=CN4' The computation of Tanimoto for MACCS fingerprints gives me: 0.822 using >>> mols = ['CC(C1=CC=C(O1)C2=CC3=C(C=C2)N=CN=C3NCCC4=CN=CN4)O','C1=CC2=C(C=C1C3=CC=C(O3)CO)C(=NC=N2)NCC4=NC=CN4'] >>> molec = [pybel.readstring("smi", x) for x in mols] >>> fps = [x.calcfp('MACCS') for x in molec] >>> print fps[0] | fps[1] *0.822222222222* But when I save the molecule 44968247 into a sdf file (find attached) and read the molecule from the file using >>> mol2 = pybel.readfile("sdf", "/mmb/data/Medicahead/WP2/activePubchemCompound/44968247.sdf").next() The computation then gives me >>> fp2 = mol2.calcfp("MACCS") >>> print fps[0] | fp2 *0.711538461538* I have compared the lists of on bits given by 1/ the smiles string 2/ the sdf file, and they are definitely different: 1/ [8, 11, 38, 54, 57, 62, 65, 72, 77, 79, 80, 82, 83, 96, 100, 104, 105, 109, 111, 120, 121, 131, 132, 133, 135, 137, 138, 139, 142, 151, 152, 153, 155, 156, 157, 158, 159, 161, 162, 164, 165] 2/ [8, 11, 38, 54, 57, 62, 65, 72, 75, 77, 79, 80, 82, 83, 96, 100, 104, 105, 109, 111, 112, 120, 121, 122, 126, 131, 132, 133, 135, 137, 138, 139, 142, 144, 148, 150, 151, 152, 153, 155, 156, 157, 158, 159, 161, 162, 164, 165] So... Is it a problem of OpenBabel reading the sdf file? Is it a problem of me not reading it properly? Is it a problem of Pubchem giving smiles string and sdf files that do not match? I would be glad if someone could help me with that. Regards, Floriane
44968247.sdf
Description: StarMath document
------------------------------------------------------------------------------ Free Software Download: Index, Search & Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev
_______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss