Hi RDKitters,

I'm facing some performance issue using the RDKit cartridge;
the database contains roughly 170k small molecules, I use the cartridge
version 0.20.0 on PostgreSQL 8.4.7, and the tanimoto_threshold is set to 0.5
A simple similarity search takes at least 30 seconds to complete.
The database has been recently vacuumed.
Any hints are most welcome!

Cheers,

Grégori


                                             Table "public.test_db"
 Column |         Type          |                      Modifiers
            | Storage  | Description
--------+-----------------------+------------------------------------------------------+----------+-------------
 rid    | integer               | not null default
nextval('test_db_id_seq'::regclass) | plain    |
 smi    | mol                   |
           | extended |
Indexes:
    "test_db_pkey" PRIMARY KEY, btree (rid)
    "ididx" btree (rid)
    "molidx" gist (smi)
Referenced by:
    TABLE "test_db_fingerprints" CONSTRAINT "test_db_fingerprints_rid_fkey"
FOREIGN KEY (rid) REFERENCES test_db(rid)
Has OIDs: no

           Table "public.test_db_fingerprints"
  Column   |  Type   | Modifiers | Storage  | Description
-----------+---------+-----------+----------+-------------
 rid       | integer |           | plain    |
 pairbv    | bfp     |           | extended |
 torsionbv | bfp     |           | extended |
 morganbv2 | bfp     |           | extended |
Indexes:
    "apbvidx" gist (pairbv)
    "morganbvidx" gist (morganbv2)
    "rididx" btree (rid)
    "torsbvidx" gist (torsionbv)
Foreign-key constraints:
    "test_db_fingerprints_rid_fkey" FOREIGN KEY (rid) REFERENCES
test_db(rid)
Has OIDs: no


explain analyze select test_db.rid, test_db.smi,
tanimoto_sml(atompairbv_fp('CN1C=NC2=C1C(=O)N(C(=O)N2C)C'), pairbv) sml
from test_db_fingerprints right join test_db on test_db.rid =
test_db_fingerprints.rid  where
atompairbv_fp('CN1C=NC2=C1C(=O)N(C(=O)N2C)C') % pairbv order by sml desc
limit 20;


        QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
 Limit  (cost=2037.62..2037.67 rows=20 width=837) (actual
time=37990.369..37990.406 rows=11 loops=1)
   ->  Sort  (cost=2037.62..2038.05 rows=172 width=837) (actual
time=37990.365..37990.379 rows=11 loops=1)
         Sort Key:
(tanimoto_sml('\\340\\377\\377\\377\\000\\010\\000\\0002\\000\\000\\000\\010\\204D"\\022\\004*\\014\\004\\020\\024\\002\\020,\\016\\000\\020\\030\\036>\\000\\020\\272\\004\\336B\\034\\036\\200h\\272\\245\\000BP8>\\00
0\\022\\354\\204\\000:@Bq\\002\\004\\012.\\000>\\245\\002'::bfp,
test_db_fingerprints.pairbv))
         Sort Method:  quicksort  Memory: 22kB
         ->  Nested Loop  (cost=98.53..2033.05 rows=172 width=837) (actual
time=37726.008..37990.284 rows=11 loops=1)
               ->  Bitmap Heap Scan on test_db_fingerprints
 (cost=98.53..713.44 rows=172 width=222) (actual time=37686.483..37806.422
rows=11 loops=1)
                     Recheck Cond:
('\\340\\377\\377\\377\\000\\010\\000\\0002\\000\\000\\000\\010\\204D"\\022\\004*\\014\\004\\020\\024\\002\\020,\\016\\000\\020\\030\\036>\\000\\020\\272\\004\\336B\\034\\036\\200h\\272\\245\\000BP8>\
\000\\022\\354\\204\\000:@Bq\\002\\004\\012.\\000>\\245\\002'::bfp % pairbv)
                     ->  Bitmap Index Scan on apbvidx  (cost=0.00..98.49
rows=172 width=0) (actual time=37661.723..37661.723 rows=11 loops=1)
                           Index Cond:
('\\340\\377\\377\\377\\000\\010\\000\\0002\\000\\000\\000\\010\\204D"\\022\\004*\\014\\004\\020\\024\\002\\020,\\016\\000\\020\\030\\036>\\000\\020\\272\\004\\336B\\034\\036\\200h\\272\\245\\000B
P8>\\000\\022\\354\\204\\000:@Bq\\002\\004\\012.\\000>\\245\\002'::bfp %
pairbv)
               ->  Index Scan using test_db_pkey on test_db
 (cost=0.00..7.63 rows=1 width=623) (actual time=16.634..16.639 rows=1
loops=11)
                     Index Cond: (test_db.rid = test_db_fingerprints.rid)
 Total runtime: 37990.523 ms
(12 rows)
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to