Hello,
I'm seeing unexpected results when trying to match a search query encoded
as an MDL Molfile. It looks like I'm not getting any matches when the
oxygens of a quinone are replaced with placeholder atoms in an otherwise
identical structure.
That is, if I take the molfile for quinone, copy it and only change the 'O'
atoms to '*' atoms, the query doesn't work, possibly due to aromaticity
issues:
>>> from rdkit import Chem
>>> print rdkit.__version__
2015.03.1
>>> m = Chem.MolFromMolFile("quinone_test.sdf")
>>> q = Chem.MolFromMolFile("quinone_stub.sdf")
>>> m.HasSubstructMatch(q)
False
>>> Chem.MolToSmiles(m)
'O=C1C=CC(=O)C=C1'
>>> Chem.MolToSmiles(q)
'[*]=c1ccc(=[*])cc1'
>>> Chem.MolToSmarts(m)
'[#8]=[#6]1-[#6]=[#6]-[#6](-[#6]=[#6]-1)=[#8]'
>>> Chem.MolToSmarts(q)
'*=[#6]1:[#6]:[#6]:[#6](:[#6]:[#6]:1)=*'
Note I still have issues even if I load the query as a SMILES string:
>>> q2 = Chem.MolFromSmiles("[*]=C1-C=C-C(=[*])-C=C1")
>>> m.HasSubstructMatch(q2)
False
>>> Chem.MolToSmiles(q2)
'[*]=c1ccc(=[*])cc1'
But not when I load it as a SMARTS string:
>>> q3 = Chem.MolFromSmarts("[*]=C1-C=C-C(=[*])-C=C1")
>>> m.HasSubstructMatch(q3)
True
>>> Chem.MolToSmiles(q3)
'[*]=C1C=CC(=[*])C=C1'
As using SMARTS strings is not really feasible for what I'm doing, is there
something I'm doing wrong with respect to loading query molecules from
Molfiles? The structure is already single/double Kekulized in the molfile,
so is there some flag or other loading function I should be using to avoid
spurious aromatization? (Hopefully, one that's general enough that I won't
have issues when loading and matching truly aromatic molecules.)
Thanks,
-Rocco
P.S. My end usage will actually be using the C++ API, if that makes a
difference for recommendations.
~~~~
## quinone_test.sdf, for completeness (quinone_stub.sdf is identical,
except for "*" instead of the two "O"):
quinone
comment 1
comment 2
12 12 0 0 0 0 0 0 0 0999 V2000
1.0263 -0.0278 -0.3487 O 0 0 0 0 0 0 0 0 0 0 0 0
2.2087 -0.0217 -0.0369 C 0 0 0 0 0 0 0 0 0 0 0 0
2.9446 1.2428 0.1576 C 0 0 0 0 0 0 0 0 0 0 0 0
4.2373 1.2490 0.4999 C 0 0 0 0 0 0 0 0 0 0 0 0
4.9841 -0.0093 0.6981 C 0 0 0 0 0 0 0 0 0 0 0 0
6.1658 -0.0035 1.0123 O 0 0 0 0 0 0 0 0 0 0 0 0
4.2483 -1.2741 0.5019 C 0 0 0 0 0 0 0 0 0 0 0 0
2.9564 -1.2801 0.1598 C 0 0 0 0 0 0 0 0 0 0 0 0
2.3826 2.1566 0.0087 H 0 0 0 0 0 0 0 0 0 0 0 0
4.7914 2.1678 0.6465 H 0 0 0 0 0 0 0 0 0 0 0 0
4.8110 -2.1878 0.6502 H 0 0 0 0 0 0 0 0 0 0 0 0
2.4019 -2.1992 0.0122 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 8 1 0 0 0 0
2 3 1 0 0 0 0
3 4 2 0 0 0 0
3 9 1 0 0 0 0
4 5 1 0 0 0 0
4 10 1 0 0 0 0
5 7 1 0 0 0 0
5 6 2 0 0 0 0
7 8 2 0 0 0 0
7 11 1 0 0 0 0
8 12 1 0 0 0 0
M END
$$$$
------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss