[Open Babel] problem with conversion from smi to fs followed by similarity search

Chris Mayne Tue, 11 Oct 2011 10:32:21 -0700

I'm having a problem with the conversion of certain smiles structures to the
fastsearch format, followed by subsequent similarity searches.  I'm
currently working with a large dataset (~160,000) compounds, and some
compounds that I know are in the FS file are not hitting in the output.
Here is a representative example.


Take the following smiles string:
c1cccc(c1)C(C(OC(=O)C)C[N+](=O)[O-])c1ccccc1
As far as I can tell, it is a valid smiles structure.  It came from babel
conversion from an SDF file to SMI file, and pastes into chemdraw just
fine.  If i put it into the tab-delimited file, test.smi, as:

c1cccc(c1)C(C(OC(=O)C)C[N+](=O)[O-])c1ccccc1    cmpd001

and then generate the fastsearch index:

>babel -ismi test.smi -ofs test.fs

The index appears to build without error (i.e. 1 molecule converted).
However, if I then probe that fs index with the exact input string:

>babel test.fs hitlist.smi -s'c1cccc(c1)C(C(OC(=O)C)C[N+](=O)[O-])c1ccccc1'
-at0.85

I get 0 molecules converted.  I have also tried leaving it in the original
SDF format, but have the same problem.  I'm currently using openbabel
2.3.0.  I thought that the development version might contain some bugfixes,
however, I have not yet been successful in compiling from source (which is a
nightmare, btw).

Any ideas?
Chris

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct

_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

[Open Babel] problem with conversion from smi to fs followed by similarity search

Reply via email to