Hi Greg and Andreas,
Thanks for your help to look into my question.
However, I made a wrong judgement about the mismatch between the SMARTS and the
SMILES structure.
They actually match well!!! Sorry for my wrong question
The problem roots from my test of the PAINS KNIME
workflow(http://www.myexperiment.org/workflows/4748.html). I want to have a
Python script that do the same as the workflow.
After the KNIME test, I tried to implemented the PAINS filter with RDkit in
Python.
I exported the PAINS SMARTS patterns and query SMILES to csv files, and wrote a
short Python script.
However, the script only identified 350 SMILES that match PAINS, which is very
different from the 753(it is 824 when I run with Knime 4.1.2) reported by the
KNIME workflow.
After discussion with my colleague, we found that the reason for the difference
is related to whether hydrogen addition to the query molecule is executed. This
is actually implemented by Greg in his test
script(https://github.com/rdkit/rdkit/blob/master/Data/Pains/test_data/run_tests.py)
In short, we have a Python script now that can do the same thing as KNIME
workflow. And we'd like to share with the community.
https://github.com/zhentg/GShare/blob/master/CADD/PAINS_filter.py
Python=3.7.6
RDkit=2019.09.3
Best regards
Zhenting
4/25/2020
------------------ ???????? ------------------
??????: "Greg Landrum"<greg.land...@gmail.com>;
????????: 2020??4??24??(??????) ????2:29
??????: "Zhenting Gao"<183310...@qq.com>;
????: "Rdkit-discuss"<rdkit-discuss@lists.sourceforge.net>;
????: Re: [Rdkit-discuss] Why this substructure query by SMARTS failed
Hi Zhenting,
This work fine for me with both the 2020.03 release:
In [6]: print(rdkit.__version__)
2020.03.1
In [7]: from rdkit import Chem
In [8]: p =
Chem.MolFromSmarts('c1:c:c(:c:c:c:1-[#8]-[#6&X4])-[#7;$([#7&!H0]-[#6&X4]),$([#7](-[#6&X4])-[#6&X4])]')
In [9]: m = Chem.MolFromSmiles('N2C(C1(CCCC1)Cc3c2cc(c(c3)OC)OC)CC=C')
In [10]: m.HasSubstructMatch(p)
Out[10]: True
and the 2019.09 release:
In [1]: import rdkit
In [2]: print(rdkit.__version__)
2019.09.3
In [3]: from rdkit import Chem
In [4]: m = Chem.MolFromSmiles('N2C(C1(CCCC1)Cc3c2cc(c(c3)OC)OC)CC=C')
In [5]: p =
Chem.MolFromSmarts('c1:c:c(:c:c:c:1-[#8]-[#6&X4])-[#7;$([#7&!H0]-[#6&X4]),$([#7](-[#6&X4])-[#6&X4])]')
In [6]: m.HasSubstructMatch(p)
Out[6]: True
Can you please share a code snippet that shows the problem?
-greg
On Thu, Apr 23, 2020 at 7:26 PM Zhenting Gao <183310...@qq.com> wrote:
Hi there,
I'm trying to filter a compound list by PAINS filter.
With SMARTS query
'c1:c:c(:c:c:c:1-[#8]-[#6&X4])-[#7;$([#7&!H0]-[#6&X4]),$([#7](-[#6&X4])-[#6&X4])]'
KNIME can identify the following SMILES as a match
'N2C(C1(CCCC1)Cc3c2cc(c(c3)OC)OC)CC=C'
But I can't identify the same SMILES with RDkit 2019.9.3.
I guess the difference is aromaticity mismatch. Could you help?
Best regards
Zhenting
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss