[Rdkit-discuss] Substructure search

groberts Sat, 23 Apr 2016 12:07:08 -0700

Hello,

Very nice work on this project!


Sorry if this is a known issue.  I looked through the mailing lists and 
didn't see the same problem listed.

When I perform a substructure search using the postgres cartridge, >99% 
of the time it works perfectly and is incredibly fast.  Sometimes I 
encounter situations where the system never returns a result, even after 
many hours on a small dataset.  A good example is this:

select count(substance_id) from substance where 
rdkmol@>'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCBr'

(rdkmol is type mol with the index in place)

The only way to stop is by restarting postgres.

Interestingly though, the following returns the count rather quickly:

select count(substance_id) from substance where 
rdkmol@>'CCCCCCCCCCCCCCBr'

I've encountered other examples where repeated atoms or components, such 
as the O's in the example below cause the same problem:

select count(substance_id) from substance where 
rdkmol@>'O.O.O.O.O.O.O.O.O.O.OS(O)(=O)=O'

I'd like to be able to run this on an internal webserver.  When the 
query hangs, the cpu is at ~100%.  Unfortunately, setting the postgres 
statement_timeout parameter does not help in this case.

Any suggestions on how to improve the query or how to kill it after a 
certain amount of time without restarting postgres?

Thanks a lot,

Greg







------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] Substructure search

Reply via email to