One version I came up with is, assuming "query" is a Smarts-derived molecule that you want to ensure occurs once and only once in any single fragment in a set of molecules:
def hasMultiSubstructPerFrag(mol, query): """ Determines whether mol has more than one match to query in a single covalently connected fragment. """ if mol.HasSubstructMatch(query): if any(len(frag.GetSubstructMatches(query)) > 1 for frag in rdmolops.GetMolFrags(mol, asMols=True) ): return True else: return False On Sat, Mar 7, 2020 at 4:02 PM Curt Fischer <curt.r.fisc...@gmail.com> wrote: > Thanks Ivan -- very helpful. > > Is there any consensus on idioms for identifying multiple moieties in the > same fragment? Do I have to use len(mol.GetSubstructMatches(patt)) > 1 as > some kind of selector and then do some kind of graph traversal routine to > see if any of the matches are covalently connected? > > On Sat, Mar 7, 2020 at 3:34 PM Ivan Tubert-Brohman < > ivan.tubert-broh...@schrodinger.com> wrote: > >> Hi Curt, >> >> According to >> https://www.rdkit.org/docs/RDKit_Book.html#smarts-support-and-extensions , >> it's not supported: >> >> Here’s the (hopefully complete) list of SMARTS features that are *not* >>> supported: >>> >>> - Non-tetrahedral chiral classes >>> >>> >>> - the @? operator >>> >>> >>> - explicit atomic masses (though isotope queries are supported) >>> >>> >>> - component level grouping requiring matches in different >>> components, i.e. (C).(C) >>> >>> OK, the way it's worded it sounds like (C.C) might be supported (since >> that would be requiring matches in the same component), but as you've seen, >> it isn't supported either... >> >> Ivan >> >> >> On Sat, Mar 7, 2020 at 4:58 PM Curt Fischer <curt.r.fisc...@gmail.com> >> wrote: >> >>> Hi rdkit fiends! >>> >>> The [Daylight SMARTS example page]( >>> https://daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html) >>> gives several examples for "multiple group" smarts, including these strings: >>> >>> ([Cl!$(Cl~c)].[c!$(c~Cl)]) >>> ([Cl]).([c]) >>> ([Cl].[c]) >>> [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)] >>> >>> In general, I cannot get these to be parsed by Chem.MolFromSmarts(). >>> >>> For example, Chem.MolFromSmarts('([Cl!$(Cl~c)].[c!$(c~Cl)])') gives me >>> this error message: >>> >>> ``` >>> [13:01:41] SMARTS Parse Error: syntax error while parsing: >>> ([Cl!$(Cl~c)_100].[c!$(c~Cl)_101]) >>> [13:01:41] SMARTS Parse Error: Failed parsing SMARTS >>> '([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])' for input: '([Cl!$(Cl~c)].[c!$(c~Cl)])' >>> ``` >>> My understanding of SMARTS is that the outermost parentheses in this >>> SMARTS string are required to force the chlorine and the aromatic carbon to >>> be somewhere in the same covalently connected fragment. E.g. this pattern >>> *should* hit benzyl chloride ClCc1ccccc1 but should *not* hit the >>> hydrochloride salt of aniline Cl.Nc1ccccc1. >>> >>> What am I getting wrong? Is there a way to write rdkit-parsable SMARTS >>> that achieves this? (I want to filter our molecules that contain more than >>> one of certain moieties, while allowing molecules that have one (or zero) >>> such moieties. But salts or covalently disconnected fragments that each >>> contain one instance of the moiety should be fine.) >>> >>> Details on my setup: >>> >>> - RDKit Version: 2019.09.3 >>> - Operating system: macOS 10.15.2 >>> - Python version (if relevant): 3.6 >>> - Are you using conda? yes >>> - If you are using conda, which channel did you install the rdkit from? >>> `conda-forge` >>> - If you are not using conda: how did you install the RDKit? >>> >>> Curt >>> >>> _______________________________________________ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >>
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss