Hi Greg,
Thank you very much for your quick reply and taking the time to look
into this.
As a crude work around, if I split the dot-disconnected string into
individual and unique components then include in the where clause, the
query returns the result rapidly:
select * from rdk.mols where m@>'O' and m@>'OS(O)(=O)=O' and
m@>'O.O.O.O.O.O.O.O.O.OS(O)(=O)=O' limit 10;
I suppose this won't help in every case, but it helps.
Best regards,
Greg
On 2016-04-24 04:47, Greg Landrum wrote:
> On Sun, Apr 24, 2016 at 11:28 AM, Greg Landrum
> wrote:
>
>> Here's my guess: The highly redundant query is getting hung up on
>> one large molecule where there are a large number of possible
>> matches. The substructure engine is taking a long time to determine
>> whether or not that particular molecule has a match. PostgreSQL can
>> only interrupt the query when that call returns (the substructure
>> engine itself has no built-in timeout). This one is easy, though
>> time consuming, to track down. I'll see if I can do so.
>
> And there it is. Ironically it is the first molecule in my chembl_20
> structure table:
>
> chembl_20=# select * from rdk.mols limit 1;
> molregno | m
>
> --+---
> 23681 |
> O[C@H]1[C@H](O)[C@@H](O)[C@@H](O)[C@H](O[C@H]2[C@@H](O)[C@H](O)[C@@H](O)[C@@H](O)[C@@H]2O)[C@H]1O
> (1 row)
>
> chembl_20=# select
> 'O[C@H]1[C@H](O)[C@@H](O)[C@@H](O)[C@H](O[C@H]2[C@@H](O)[C@H](O)[C@@H](O)[C@@H](O)[C@@H]2O)[C@H]1O'::mol@>'O.O.O.O.O.O.O.O.O.OS(O)(=O)=O';
> ERROR: canceling statement due to statement timeout
> Time: 35996.985 ms
>
> Here's the same thing from Python:
>
> In [3]: m =
> Chem.MolFromSmiles('O[C@H]1[C@H](O)[C@@H](O)[C@@H](O)[C@H](O[C@H]2[C@@H](O)[C@H](O)[C@@H](O)[C@@H](O)[C@@H]2O)[C@H]1O')
>
> In [4]: p = Chem.MolFromSmiles('O.O.O.O.O.O.O.O.O.OS(O)(=O)=O')
>
> In [5]:
> t1=time.time();m.HasSubstructMatch(p);t2=time.time();print(t2-t1)
> 36.09873843193054
>
> Here's the github issue: https://github.com/rdkit/rdkit/issues/880 [1]
>
> So now my task is to figure out why this substructure query is taking
> so long (there's clearly something pathological going on here since
> that molecule doesn't have a single S in it) and to explore adding a
> timeout to the substructure searching code.
>
> Thanks for reporting this!
> -greg
>
>
>
> Links:
> --
> [1] https://github.com/rdkit/rdkit/issues/880
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss