Re: [Rdkit-discuss] RDKit cartridge speed issue

2013-04-23 Thread Greg Landrum
Hi Gregori.

On Tue, Apr 23, 2013 at 1:57 PM, Gerebtzoff, Gregori
 wrote:
> Hi RDKitters,
>
> I'm facing some performance issue using the RDKit cartridge;
> the database contains roughly 170k small molecules, I use the cartridge
> version 0.20.0 on PostgreSQL 8.4.7, and the tanimoto_threshold is set to 0.5
> A simple similarity search takes at least 30 seconds to complete.
> The database has been recently vacuumed.
> Any hints are most welcome!

That's a pretty ancient version of both postgresql and the cartridge.
Any chance you could switch to a more up-to-date version of at least
the cartridge? The newer version may have better performance and will
certainly make it easier for me to help you track down the problem.

As a reference point, here's timing for a basic similarity search in a
table of 100k fingerprints :

fptest=# select count(*) from fps where
atompairbv_fp('O=C(NCCc1cccs1)c1cc(Cl)sc1Cl')%pairbv;
 count
---
31
(1 row)

Time: 328.503 ms

That's running on a VM, using postgresql 8.4.16 and a recent version
of the cartridge.

-greg

--
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit cartridge speed issue

2013-04-23 Thread Gerebtzoff, Gregori
Hi RDKitters,

I'm facing some performance issue using the RDKit cartridge;
the database contains roughly 170k small molecules, I use the cartridge
version 0.20.0 on PostgreSQL 8.4.7, and the tanimoto_threshold is set to 0.5
A simple similarity search takes at least 30 seconds to complete.
The database has been recently vacuumed.
Any hints are most welcome!

Cheers,

Grégori


 Table "public.test_db"
 Column | Type  |  Modifiers
| Storage  | Description
+---+--+--+-
 rid| integer   | not null default
nextval('test_db_id_seq'::regclass) | plain|
 smi| mol   |
   | extended |
Indexes:
"test_db_pkey" PRIMARY KEY, btree (rid)
"ididx" btree (rid)
"molidx" gist (smi)
Referenced by:
TABLE "test_db_fingerprints" CONSTRAINT "test_db_fingerprints_rid_fkey"
FOREIGN KEY (rid) REFERENCES test_db(rid)
Has OIDs: no

   Table "public.test_db_fingerprints"
  Column   |  Type   | Modifiers | Storage  | Description
---+-+---+--+-
 rid   | integer |   | plain|
 pairbv| bfp |   | extended |
 torsionbv | bfp |   | extended |
 morganbv2 | bfp |   | extended |
Indexes:
"apbvidx" gist (pairbv)
"morganbvidx" gist (morganbv2)
"rididx" btree (rid)
"torsbvidx" gist (torsionbv)
Foreign-key constraints:
"test_db_fingerprints_rid_fkey" FOREIGN KEY (rid) REFERENCES
test_db(rid)
Has OIDs: no


explain analyze select test_db.rid, test_db.smi,
tanimoto_sml(atompairbv_fp('CN1C=NC2=C1C(=O)N(C(=O)N2C)C'), pairbv) sml
from test_db_fingerprints right join test_db on test_db.rid =
test_db_fingerprints.rid  where
atompairbv_fp('CN1C=NC2=C1C(=O)N(C(=O)N2C)C') % pairbv order by sml desc
limit 20;


QUERY PLAN

---
---
 Limit  (cost=2037.62..2037.67 rows=20 width=837) (actual
time=37990.369..37990.406 rows=11 loops=1)
   ->  Sort  (cost=2037.62..2038.05 rows=172 width=837) (actual
time=37990.365..37990.379 rows=11 loops=1)
 Sort Key:
(tanimoto_sml('\\340\\377\\377\\377\\000\\010\\000\\0002\\000\\000\\000\\010\\204D"\\022\\004*\\014\\004\\020\\024\\002\\020,\\016\\000\\020\\030\\036>\\000\\020\\272\\004\\336B\\034\\036\\200h\\272\\245\\000BP8>\\00
0\\022\\354\\204\\000:@Bq\\002\\004\\012.\\000>\\245\\002'::bfp,
test_db_fingerprints.pairbv))
 Sort Method:  quicksort  Memory: 22kB
 ->  Nested Loop  (cost=98.53..2033.05 rows=172 width=837) (actual
time=37726.008..37990.284 rows=11 loops=1)
   ->  Bitmap Heap Scan on test_db_fingerprints
 (cost=98.53..713.44 rows=172 width=222) (actual time=37686.483..37806.422
rows=11 loops=1)
 Recheck Cond:
('\\340\\377\\377\\377\\000\\010\\000\\0002\\000\\000\\000\\010\\204D"\\022\\004*\\014\\004\\020\\024\\002\\020,\\016\\000\\020\\030\\036>\\000\\020\\272\\004\\336B\\034\\036\\200h\\272\\245\\000BP8>\
\000\\022\\354\\204\\000:@Bq\\002\\004\\012.\\000>\\245\\002'::bfp % pairbv)
 ->  Bitmap Index Scan on apbvidx  (cost=0.00..98.49
rows=172 width=0) (actual time=37661.723..37661.723 rows=11 loops=1)
   Index Cond:
('\\340\\377\\377\\377\\000\\010\\000\\0002\\000\\000\\000\\010\\204D"\\022\\004*\\014\\004\\020\\024\\002\\020,\\016\\000\\020\\030\\036>\\000\\020\\272\\004\\336B\\034\\036\\200h\\272\\245\\000B
P8>\\000\\022\\354\\204\\000:@Bq\\002\\004\\012.\\000>\\245\\002'::bfp %
pairbv)
   ->  Index Scan using test_db_pkey on test_db
 (cost=0.00..7.63 rows=1 width=623) (actual time=16.634..16.639 rows=1
loops=11)
 Index Cond: (test_db.rid = test_db_fingerprints.rid)
 Total runtime: 37990.523 ms
(12 rows)
--
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss