Hi RDKitters,
I'm facing some performance issue using the RDKit cartridge;
the database contains roughly 170k small molecules, I use the cartridge
version 0.20.0 on PostgreSQL 8.4.7, and the tanimoto_threshold is set to 0.5
A simple similarity search takes at least 30 seconds to complete.
The database has been recently vacuumed.
Any hints are most welcome!
Cheers,
Grégori
Table "public.test_db"
Column | Type | Modifiers
| Storage | Description
+---+--+--+-
rid| integer | not null default
nextval('test_db_id_seq'::regclass) | plain|
smi| mol |
| extended |
Indexes:
"test_db_pkey" PRIMARY KEY, btree (rid)
"ididx" btree (rid)
"molidx" gist (smi)
Referenced by:
TABLE "test_db_fingerprints" CONSTRAINT "test_db_fingerprints_rid_fkey"
FOREIGN KEY (rid) REFERENCES test_db(rid)
Has OIDs: no
Table "public.test_db_fingerprints"
Column | Type | Modifiers | Storage | Description
---+-+---+--+-
rid | integer | | plain|
pairbv| bfp | | extended |
torsionbv | bfp | | extended |
morganbv2 | bfp | | extended |
Indexes:
"apbvidx" gist (pairbv)
"morganbvidx" gist (morganbv2)
"rididx" btree (rid)
"torsbvidx" gist (torsionbv)
Foreign-key constraints:
"test_db_fingerprints_rid_fkey" FOREIGN KEY (rid) REFERENCES
test_db(rid)
Has OIDs: no
explain analyze select test_db.rid, test_db.smi,
tanimoto_sml(atompairbv_fp('CN1C=NC2=C1C(=O)N(C(=O)N2C)C'), pairbv) sml
from test_db_fingerprints right join test_db on test_db.rid =
test_db_fingerprints.rid where
atompairbv_fp('CN1C=NC2=C1C(=O)N(C(=O)N2C)C') % pairbv order by sml desc
limit 20;
QUERY PLAN
---
---
Limit (cost=2037.62..2037.67 rows=20 width=837) (actual
time=37990.369..37990.406 rows=11 loops=1)
-> Sort (cost=2037.62..2038.05 rows=172 width=837) (actual
time=37990.365..37990.379 rows=11 loops=1)
Sort Key:
(tanimoto_sml('\\340\\377\\377\\377\\000\\010\\000\\0002\\000\\000\\000\\010\\204D"\\022\\004*\\014\\004\\020\\024\\002\\020,\\016\\000\\020\\030\\036>\\000\\020\\272\\004\\336B\\034\\036\\200h\\272\\245\\000BP8>\\00
0\\022\\354\\204\\000:@Bq\\002\\004\\012.\\000>\\245\\002'::bfp,
test_db_fingerprints.pairbv))
Sort Method: quicksort Memory: 22kB
-> Nested Loop (cost=98.53..2033.05 rows=172 width=837) (actual
time=37726.008..37990.284 rows=11 loops=1)
-> Bitmap Heap Scan on test_db_fingerprints
(cost=98.53..713.44 rows=172 width=222) (actual time=37686.483..37806.422
rows=11 loops=1)
Recheck Cond:
('\\340\\377\\377\\377\\000\\010\\000\\0002\\000\\000\\000\\010\\204D"\\022\\004*\\014\\004\\020\\024\\002\\020,\\016\\000\\020\\030\\036>\\000\\020\\272\\004\\336B\\034\\036\\200h\\272\\245\\000BP8>\
\000\\022\\354\\204\\000:@Bq\\002\\004\\012.\\000>\\245\\002'::bfp % pairbv)
-> Bitmap Index Scan on apbvidx (cost=0.00..98.49
rows=172 width=0) (actual time=37661.723..37661.723 rows=11 loops=1)
Index Cond:
('\\340\\377\\377\\377\\000\\010\\000\\0002\\000\\000\\000\\010\\204D"\\022\\004*\\014\\004\\020\\024\\002\\020,\\016\\000\\020\\030\\036>\\000\\020\\272\\004\\336B\\034\\036\\200h\\272\\245\\000B
P8>\\000\\022\\354\\204\\000:@Bq\\002\\004\\012.\\000>\\245\\002'::bfp %
pairbv)
-> Index Scan using test_db_pkey on test_db
(cost=0.00..7.63 rows=1 width=623) (actual time=16.634..16.639 rows=1
loops=11)
Index Cond: (test_db.rid = test_db_fingerprints.rid)
Total runtime: 37990.523 ms
(12 rows)
--
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss