Thanks Ivan -- very helpful. Is there any consensus on idioms for identifying multiple moieties in the same fragment? Do I have to use len(mol.GetSubstructMatches(patt)) > 1 as some kind of selector and then do some kind of graph traversal routine to see if any of the matches are covalently connected?
On Sat, Mar 7, 2020 at 3:34 PM Ivan Tubert-Brohman < ivan.tubert-broh...@schrodinger.com> wrote: > Hi Curt, > > According to > https://www.rdkit.org/docs/RDKit_Book.html#smarts-support-and-extensions , > it's not supported: > > Here’s the (hopefully complete) list of SMARTS features that are *not* >> supported: >> >> - Non-tetrahedral chiral classes >> >> >> - the @? operator >> >> >> - explicit atomic masses (though isotope queries are supported) >> >> >> - component level grouping requiring matches in different components, >> i.e. (C).(C) >> >> OK, the way it's worded it sounds like (C.C) might be supported (since > that would be requiring matches in the same component), but as you've seen, > it isn't supported either... > > Ivan > > > On Sat, Mar 7, 2020 at 4:58 PM Curt Fischer <curt.r.fisc...@gmail.com> > wrote: > >> Hi rdkit fiends! >> >> The [Daylight SMARTS example page]( >> https://daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html) >> gives several examples for "multiple group" smarts, including these strings: >> >> ([Cl!$(Cl~c)].[c!$(c~Cl)]) >> ([Cl]).([c]) >> ([Cl].[c]) >> [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)] >> >> In general, I cannot get these to be parsed by Chem.MolFromSmarts(). >> >> For example, Chem.MolFromSmarts('([Cl!$(Cl~c)].[c!$(c~Cl)])') gives me >> this error message: >> >> ``` >> [13:01:41] SMARTS Parse Error: syntax error while parsing: >> ([Cl!$(Cl~c)_100].[c!$(c~Cl)_101]) >> [13:01:41] SMARTS Parse Error: Failed parsing SMARTS >> '([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])' for input: '([Cl!$(Cl~c)].[c!$(c~Cl)])' >> ``` >> My understanding of SMARTS is that the outermost parentheses in this >> SMARTS string are required to force the chlorine and the aromatic carbon to >> be somewhere in the same covalently connected fragment. E.g. this pattern >> *should* hit benzyl chloride ClCc1ccccc1 but should *not* hit the >> hydrochloride salt of aniline Cl.Nc1ccccc1. >> >> What am I getting wrong? Is there a way to write rdkit-parsable SMARTS >> that achieves this? (I want to filter our molecules that contain more than >> one of certain moieties, while allowing molecules that have one (or zero) >> such moieties. But salts or covalently disconnected fragments that each >> contain one instance of the moiety should be fine.) >> >> Details on my setup: >> >> - RDKit Version: 2019.09.3 >> - Operating system: macOS 10.15.2 >> - Python version (if relevant): 3.6 >> - Are you using conda? yes >> - If you are using conda, which channel did you install the rdkit from? >> `conda-forge` >> - If you are not using conda: how did you install the RDKit? >> >> Curt >> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss