Dear all,

I just checked in some changes that allow SMARTS to be used to do
substructure queries with the Postgresql cartridge.

Because SMARTS-based queries can be substantially slower than SMILES,
and much of the time SMILES is sufficient, by default queries will
still be done with SMILES. To get a SMARTS-based query, you have to
ask for it by creating a query molecule.

So, for example, this is a SMILES query:

chembl=# select count(*) from mols where m@>'c1ccc2c(c1)CCN2';
 count
-------
  2880
(1 row)

Time: 2855.536 ms

Here's the same query done with SMARTS by casting to a qmol:

chembl=# select count(*) from mols where m@>'c1ccc2c(c1)CCN2'::qmol;
 count
-------
  2880
(1 row)

Time: 20788.329 ms

Notice that the results are (this time) the same, but it takes a lot
longer to get them.

An example using query features:

chembl=# select count(*) from mols where m@>'c1[c,n]cc2c(c1)CCN2'::qmol;
 count
-------
  2885
(1 row)

Time: 21717.863 ms

While constructing your queries, don't forget that the semantics of
SMARTS and SMILES are different, as this example illustrates:

chembl=# select count(*) from mols where m@>'C1=CC=CC=C1';
 count
--------
 191900
(1 row)

Time: 12085.102 ms
chembl=# select count(*) from mols where m@>'C1=CC=CC=C1'::qmol;
 count
-------
     0
(1 row)

Time: 12946.744 ms

If you try out the new code, you will need to rebuild any molecule
indices. Here's an example of doing that from the chembl example
(http://code.google.com/p/rdkit/wiki/DatabaseCreation):
  drop index molidx;
  create index molidx on mols using gist(m);

Along the way I made some changes to the layered fingerprinting code
so that it is more effective for molecules containing query features.
Fingerprints generated with the new code are definitely not compatible
with the older version; so if you have layered fingerprints stored
anywhere, they should be rebuilt.

I'm not completely happy with the performance of the SMARTS-based
queries, so I will continue to tweak the fingerprint, but I thought it
was important to at least provide the option to do these queries.

For those who don't want to build from subversion, these changes will
be in the Q4 release.

Best Regards,
-greg

------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to