Hello,

I'm seeing unexpected results when trying to match a search query encoded
as an MDL Molfile. It looks like I'm not getting any matches when the
oxygens of a quinone are replaced with placeholder atoms in an otherwise
identical structure.

That is, if I take the molfile for quinone, copy it and only change the 'O'
atoms to '*' atoms, the query doesn't work, possibly due to aromaticity
issues:

>>> from rdkit import Chem
>>> print rdkit.__version__
2015.03.1
>>> m = Chem.MolFromMolFile("quinone_test.sdf")
>>> q = Chem.MolFromMolFile("quinone_stub.sdf")
>>> m.HasSubstructMatch(q)
False
>>> Chem.MolToSmiles(m)
'O=C1C=CC(=O)C=C1'
>>> Chem.MolToSmiles(q)
'[*]=c1ccc(=[*])cc1'
>>> Chem.MolToSmarts(m)
'[#8]=[#6]1-[#6]=[#6]-[#6](-[#6]=[#6]-1)=[#8]'
>>> Chem.MolToSmarts(q)
'*=[#6]1:[#6]:[#6]:[#6](:[#6]:[#6]:1)=*'

Note I still have issues even if I load the query as a SMILES string:

>>> q2 = Chem.MolFromSmiles("[*]=C1-C=C-C(=[*])-C=C1")
>>> m.HasSubstructMatch(q2)
False
>>> Chem.MolToSmiles(q2)
'[*]=c1ccc(=[*])cc1'

But not when I load it as a SMARTS string:

>>> q3 = Chem.MolFromSmarts("[*]=C1-C=C-C(=[*])-C=C1")
>>> m.HasSubstructMatch(q3)
True
>>> Chem.MolToSmiles(q3)
'[*]=C1C=CC(=[*])C=C1'

As using SMARTS strings is not really feasible for what I'm doing, is there
something I'm doing wrong with respect to loading query molecules from
Molfiles? The structure is already single/double Kekulized in the molfile,
so is there some flag or other loading function I should be using to avoid
spurious aromatization? (Hopefully, one that's general enough that I won't
have issues when loading and matching truly aromatic molecules.)

Thanks,
-Rocco

P.S. My end usage will actually be using the C++ API, if that makes a
difference for recommendations.

~~~~

## quinone_test.sdf, for completeness (quinone_stub.sdf is identical,
except for "*" instead of the two "O"):

quinone
comment 1
comment 2
 12 12  0  0  0  0  0  0  0  0999 V2000
    1.0263   -0.0278   -0.3487 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.2087   -0.0217   -0.0369 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.9446    1.2428    0.1576 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.2373    1.2490    0.4999 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.9841   -0.0093    0.6981 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.1658   -0.0035    1.0123 O   0  0  0  0  0  0  0  0  0  0  0  0
    4.2483   -1.2741    0.5019 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.9564   -1.2801    0.1598 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.3826    2.1566    0.0087 H   0  0  0  0  0  0  0  0  0  0  0  0
    4.7914    2.1678    0.6465 H   0  0  0  0  0  0  0  0  0  0  0  0
    4.8110   -2.1878    0.6502 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.4019   -2.1992    0.0122 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0  0  0  0
  2  8  1  0  0  0  0
  2  3  1  0  0  0  0
  3  4  2  0  0  0  0
  3  9  1  0  0  0  0
  4  5  1  0  0  0  0
  4 10  1  0  0  0  0
  5  7  1  0  0  0  0
  5  6  2  0  0  0  0
  7  8  2  0  0  0  0
  7 11  1  0  0  0  0
  8 12  1  0  0  0  0
M  END
$$$$
------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to