Dear Bob,

On Sun, Jul 25, 2010 at 7:43 PM, bob-bates <[email protected]> wrote:
> I created a database using CambridgeSoft ChemFinder (version 12) and the
> pubchem.200.sdf data file.  Using the smiles query 'O=CNC' ChemFinder
> returns 125 hits and DbCLI returns 73 hits.  This doesn't seem like a
> difficult query, any idea's why DbCLI is missing so many structures?

At least part of this behavior is due to a bug in the code that's used
to generate fingerprints to speed up the substructure searching. The
bug has already been fixed in subversion so it will be in the next
release.

In the near term, if you're feeling brave you can grab a copy of the
code from svn and rebuild, otherwise the easiest fix is to edit the
file $RDBASE/Projects/DbCLI/SearchDb.py and change line 280 from:
      if os.path.exists(fpDbName):
to:
      if 0 and os.path.exists(fpDbName):
That will disable the use of substructure fingerprints and get you to
103 hits. I would guess that the remaining 22 mismatches are due to
differences in the search semantics between  ChemFinder and the RDKit.
The SMILES query O=CNC only matches in the RDKit if all of the
corresponding bonds in the target molecule are single; aromatic bonds
will not match.

Best Regards,
-greg

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share 
of $1 Million in cash or HP Products. Visit us here for more details:
http://ad.doubleclick.net/clk;226879339;13503038;l?
http://clk.atdmt.com/CRS/go/247765532/direct/01/
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to