[Rdkit-discuss] ?????? Why this substructure query by SMARTS failed

Zhenting Gao Sat, 25 Apr 2020 05:27:40 -0700

Hi Greg and Andreas,


Thanks for your help to look into my question.
However, I made a wrong judgement about the mismatch between the SMARTS and the 
SMILES structure.
They actually match well!!! Sorry for my wrong question


The problem roots from my test of the PAINS KNIME 
workflow(http://www.myexperiment.org/workflows/4748.html). I want to have a 
Python script that do the same as the workflow.
After the KNIME test, I tried to implemented the PAINS filter with RDkit in 
Python.
I exported the PAINS SMARTS patterns and query SMILES to csv files, and wrote a 
short Python script.
However, the script only identified 350 SMILES that match PAINS, which is very 
different from the 753(it is 824 when I run with Knime 4.1.2) reported by the 
KNIME workflow.


After discussion with my colleague, we found that the reason for the difference 
is related to whether hydrogen addition to the query molecule is executed. This 
is actually implemented by Greg in his test 
script(https://github.com/rdkit/rdkit/blob/master/Data/Pains/test_data/run_tests.py)


In short, we have a Python script now that can do the same thing as KNIME 
workflow. And we'd like to share with the community.
https://github.com/zhentg/GShare/blob/master/CADD/PAINS_filter.py


Python=3.7.6
RDkit=2019.09.3


Best regards
Zhenting
4/25/2020








------------------ ???????? ------------------
??????:&nbsp;"Greg Landrum"<greg.land...@gmail.com&gt;;
????????:&nbsp;2020??4??24??(??????) ????2:29
??????:&nbsp;"Zhenting Gao"<183310...@qq.com&gt;;
????:&nbsp;"Rdkit-discuss"<rdkit-discuss@lists.sourceforge.net&gt;;
????:&nbsp;Re: [Rdkit-discuss] Why this substructure query by SMARTS failed



Hi Zhenting,

This work fine for me with both the 2020.03 release:


In [6]: print(rdkit.__version__) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
2020.03.1


In [7]: from rdkit import Chem &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;


In [8]: p = 
Chem.MolFromSmarts('c1:c:c(:c:c:c:1-[#8]-[#6&amp;X4])-[#7;$([#7&amp;!H0]-[#6&amp;X4]),$([#7](-[#6&amp;X4])-[#6&amp;X4])]')
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;


In [9]: m = Chem.MolFromSmiles('N2C(C1(CCCC1)Cc3c2cc(c(c3)OC)OC)CC=C') &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;


In [10]: m.HasSubstructMatch(p) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
Out[10]: True


and the 2019.09 release:
In [1]: import rdkit &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp;


In [2]: print(rdkit.__version__) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
2019.09.3


In [3]: from rdkit import Chem &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;


In [4]: m = Chem.MolFromSmiles('N2C(C1(CCCC1)Cc3c2cc(c(c3)OC)OC)CC=C') &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;


In [5]: p = 
Chem.MolFromSmarts('c1:c:c(:c:c:c:1-[#8]-[#6&amp;X4])-[#7;$([#7&amp;!H0]-[#6&amp;X4]),$([#7](-[#6&amp;X4])-[#6&amp;X4])]')
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp;


In [6]: m.HasSubstructMatch(p) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
Out[6]: True


Can you please share a code snippet that shows the problem?


-greg








On Thu, Apr 23, 2020 at 7:26 PM Zhenting Gao <183310...@qq.com&gt; wrote:

Hi there,
 
 I'm trying to filter a compound list by PAINS filter.
 With SMARTS query
 
'c1:c:c(:c:c:c:1-[#8]-[#6&amp;X4])-[#7;$([#7&amp;!H0]-[#6&amp;X4]),$([#7](-[#6&amp;X4])-[#6&amp;X4])]'
 
 KNIME can identify the following SMILES as a match
 'N2C(C1(CCCC1)Cc3c2cc(c(c3)OC)OC)CC=C'
 
 But I can't identify the same SMILES with RDkit 2019.9.3.
 I guess the difference is aromaticity mismatch. Could you help?
 
 Best regards
 Zhenting
 _______________________________________________
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] ?????? Why this substructure query by SMARTS failed

Reply via email to