Re: [Rdkit-discuss] strict parsing in java

2016-04-25 Thread Greg Landrum
Hi Tim,

That should be handled all the time in the new version.

The check in the code is here:
https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/FileParsers/MolFileParser.cpp#L1105
and that's not part of the strictParsing coverage.

Are you sure you're using the 2016.03 release (or something built from
github master)?


On Mon, Apr 25, 2016 at 6:46 PM, Tim Dudgeon  wrote:

> I've got molfiles that have element labels all in upper case (e.g. CL
> instead of Cl).
> Parsing these fails (Element 'CL' not found).
> I notice that in the C++ and Python APIs there is a 'strictParsing'
> option that I'm hoping makes RDKit tolerant of this, but this option
> does not seem to be present in the Java API, and guess, what, that's the
> one I'm using!
> Is this possible from Java?
>
> Tim
>
>
> --
> Find and fix application performance issues faster with Applications
> Manager
> Applications Manager provides deep performance insights into multiple
> tiers of
> your business applications. It resolves application problems quickly and
> reduces your MTTR. Get your free trial!
> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Substructure search

2016-04-25 Thread groberts
Hi Greg,

Thank you very much for your quick reply and taking the time to look 
into this.

As a crude work around, if I split the dot-disconnected string into 
individual and unique components then include in the where clause, the 
query returns the result rapidly:

select * from rdk.mols where m@>'O' and m@>'OS(O)(=O)=O' and 
m@>'O.O.O.O.O.O.O.O.O.OS(O)(=O)=O' limit 10;

I suppose this won't help in every case, but it helps.

Best regards,
Greg



On 2016-04-24 04:47, Greg Landrum wrote:
> On Sun, Apr 24, 2016 at 11:28 AM, Greg Landrum
>  wrote:
> 
>> Here's my guess: The highly redundant query is getting hung up on
>> one large molecule where there are a large number of possible
>> matches. The substructure engine is taking a long time to determine
>> whether or not that particular molecule has a match. PostgreSQL can
>> only interrupt the query when that call returns (the substructure
>> engine itself has no built-in timeout). This one is easy, though
>> time consuming, to track down. I'll see if I can do so.
> 
>  And there it is. Ironically it is the first molecule in my chembl_20
> structure table:
> 
> chembl_20=# select * from rdk.mols limit 1;
>  molregno | m
> 
> --+---
> 23681 |
> O[C@H]1[C@H](O)[C@@H](O)[C@@H](O)[C@H](O[C@H]2[C@@H](O)[C@H](O)[C@@H](O)[C@@H](O)[C@@H]2O)[C@H]1O
> (1 row)
> 
> chembl_20=# select
> 'O[C@H]1[C@H](O)[C@@H](O)[C@@H](O)[C@H](O[C@H]2[C@@H](O)[C@H](O)[C@@H](O)[C@@H](O)[C@@H]2O)[C@H]1O'::mol@>'O.O.O.O.O.O.O.O.O.OS(O)(=O)=O';
> ERROR:  canceling statement due to statement timeout
> Time: 35996.985 ms
> 
> Here's the same thing from Python:
> 
> In [3]: m =
> Chem.MolFromSmiles('O[C@H]1[C@H](O)[C@@H](O)[C@@H](O)[C@H](O[C@H]2[C@@H](O)[C@H](O)[C@@H](O)[C@@H](O)[C@@H]2O)[C@H]1O')
> 
> In [4]: p = Chem.MolFromSmiles('O.O.O.O.O.O.O.O.O.OS(O)(=O)=O')
> 
> In [5]:
> t1=time.time();m.HasSubstructMatch(p);t2=time.time();print(t2-t1)
> 36.09873843193054
> 
> Here's the github issue: https://github.com/rdkit/rdkit/issues/880 [1]
> 
> So now my task is to figure out why this substructure query is taking
> so long (there's clearly something pathological going on here since
> that molecule doesn't have a single S in it) and to explore adding a
> timeout to the substructure searching code.
> 
> Thanks for reporting this!
> -greg
> 
> 
> 
> Links:
> --
> [1] https://github.com/rdkit/rdkit/issues/880


--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] strict parsing in java

2016-04-25 Thread Tim Dudgeon
I've got molfiles that have element labels all in upper case (e.g. CL 
instead of Cl).
Parsing these fails (Element 'CL' not found).
I notice that in the C++ and Python APIs there is a 'strictParsing' 
option that I'm hoping makes RDKit tolerant of this, but this option 
does not seem to be present in the Java API, and guess, what, that's the 
one I'm using!
Is this possible from Java?

Tim

--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss