Hi Rocco, Apologies for the slow reply; the RDKit UGM last week consumed all my attention.
What you are observing is a consequence of the RDKit's aromaticity model ( http://rdkit.org/docs/RDKit_Book.html#aromaticity): the exocyclic double bonds to O in quinone cause the 6-ring there to be non-aromatic. The dummy atoms in the query molecule, on the other hand, do not perturb the aromaticity of the ring. This is one of those edge cases that is currently not straightforward to solve without either using SMARTS or adding query bonds ("single/aromatic" and "double/aromatic") to the stub query. -greg On Thu, Sep 3, 2015 at 11:45 PM, Rocco Moretti <[email protected]> wrote: > Hello, > > I'm seeing unexpected results when trying to match a search query encoded > as an MDL Molfile. It looks like I'm not getting any matches when the > oxygens of a quinone are replaced with placeholder atoms in an otherwise > identical structure. > > That is, if I take the molfile for quinone, copy it and only change the > 'O' atoms to '*' atoms, the query doesn't work, possibly due to aromaticity > issues: > > >>> from rdkit import Chem > >>> print rdkit.__version__ > 2015.03.1 > >>> m = Chem.MolFromMolFile("quinone_test.sdf") > >>> q = Chem.MolFromMolFile("quinone_stub.sdf") > >>> m.HasSubstructMatch(q) > False > >>> Chem.MolToSmiles(m) > 'O=C1C=CC(=O)C=C1' > >>> Chem.MolToSmiles(q) > '[*]=c1ccc(=[*])cc1' > >>> Chem.MolToSmarts(m) > '[#8]=[#6]1-[#6]=[#6]-[#6](-[#6]=[#6]-1)=[#8]' > >>> Chem.MolToSmarts(q) > '*=[#6]1:[#6]:[#6]:[#6](:[#6]:[#6]:1)=*' > > Note I still have issues even if I load the query as a SMILES string: > > >>> q2 = Chem.MolFromSmiles("[*]=C1-C=C-C(=[*])-C=C1") > >>> m.HasSubstructMatch(q2) > False > >>> Chem.MolToSmiles(q2) > '[*]=c1ccc(=[*])cc1' > > But not when I load it as a SMARTS string: > > >>> q3 = Chem.MolFromSmarts("[*]=C1-C=C-C(=[*])-C=C1") > >>> m.HasSubstructMatch(q3) > True > >>> Chem.MolToSmiles(q3) > '[*]=C1C=CC(=[*])C=C1' > > As using SMARTS strings is not really feasible for what I'm doing, is > there something I'm doing wrong with respect to loading query molecules > from Molfiles? The structure is already single/double Kekulized in the > molfile, so is there some flag or other loading function I should be using > to avoid spurious aromatization? (Hopefully, one that's general enough that > I won't have issues when loading and matching truly aromatic molecules.) > > Thanks, > -Rocco > > P.S. My end usage will actually be using the C++ API, if that makes a > difference for recommendations. > > ~~~~ > > ## quinone_test.sdf, for completeness (quinone_stub.sdf is identical, > except for "*" instead of the two "O"): > > quinone > comment 1 > comment 2 > 12 12 0 0 0 0 0 0 0 0999 V2000 > 1.0263 -0.0278 -0.3487 O 0 0 0 0 0 0 0 0 0 0 0 0 > 2.2087 -0.0217 -0.0369 C 0 0 0 0 0 0 0 0 0 0 0 0 > 2.9446 1.2428 0.1576 C 0 0 0 0 0 0 0 0 0 0 0 0 > 4.2373 1.2490 0.4999 C 0 0 0 0 0 0 0 0 0 0 0 0 > 4.9841 -0.0093 0.6981 C 0 0 0 0 0 0 0 0 0 0 0 0 > 6.1658 -0.0035 1.0123 O 0 0 0 0 0 0 0 0 0 0 0 0 > 4.2483 -1.2741 0.5019 C 0 0 0 0 0 0 0 0 0 0 0 0 > 2.9564 -1.2801 0.1598 C 0 0 0 0 0 0 0 0 0 0 0 0 > 2.3826 2.1566 0.0087 H 0 0 0 0 0 0 0 0 0 0 0 0 > 4.7914 2.1678 0.6465 H 0 0 0 0 0 0 0 0 0 0 0 0 > 4.8110 -2.1878 0.6502 H 0 0 0 0 0 0 0 0 0 0 0 0 > 2.4019 -2.1992 0.0122 H 0 0 0 0 0 0 0 0 0 0 0 0 > 1 2 2 0 0 0 0 > 2 8 1 0 0 0 0 > 2 3 1 0 0 0 0 > 3 4 2 0 0 0 0 > 3 9 1 0 0 0 0 > 4 5 1 0 0 0 0 > 4 10 1 0 0 0 0 > 5 7 1 0 0 0 0 > 5 6 2 0 0 0 0 > 7 8 2 0 0 0 0 > 7 11 1 0 0 0 0 > 8 12 1 0 0 0 0 > M END > $$$$ > > > > > > ------------------------------------------------------------------------------ > Monitor Your Dynamic Infrastructure at Any Scale With Datadog! > Get real-time metrics from all of your servers, apps and tools > in one place. > SourceForge users - Click here to start your Free Trial of Datadog now! > http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140 > _______________________________________________ > Rdkit-discuss mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > >
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

