One version I came up with is, assuming "query" is a Smarts-derived
molecule that you want to ensure occurs once and only once in any single
fragment in a set of molecules:

def hasMultiSubstructPerFrag(mol, query):
    """
    Determines whether mol has more than one match to query in a single
covalently connected fragment.
    """
    if mol.HasSubstructMatch(query):
        if any(len(frag.GetSubstructMatches(query)) > 1
               for frag in
               rdmolops.GetMolFrags(mol, asMols=True)
              ):
            return True
    else:
        return False


On Sat, Mar 7, 2020 at 4:02 PM Curt Fischer <curt.r.fisc...@gmail.com>
wrote:

> Thanks Ivan -- very helpful.
>
> Is there any consensus on idioms for identifying multiple moieties in the
> same fragment?  Do I have to use len(mol.GetSubstructMatches(patt)) > 1 as
> some kind of selector and then do some kind of graph traversal routine to
> see if any of the matches are covalently connected?
>
> On Sat, Mar 7, 2020 at 3:34 PM Ivan Tubert-Brohman <
> ivan.tubert-broh...@schrodinger.com> wrote:
>
>> Hi Curt,
>>
>> According to
>> https://www.rdkit.org/docs/RDKit_Book.html#smarts-support-and-extensions ,
>> it's not supported:
>>
>> Here’s the (hopefully complete) list of SMARTS features that are *not*
>>>  supported:
>>>
>>>    - Non-tetrahedral chiral classes
>>>
>>>
>>>    - the @? operator
>>>
>>>
>>>    - explicit atomic masses (though isotope queries are supported)
>>>
>>>
>>>    - component level grouping requiring matches in different
>>>    components, i.e. (C).(C)
>>>
>>> OK, the way it's worded it sounds like (C.C) might be supported (since
>> that would be requiring matches in the same component), but as you've seen,
>> it isn't supported either...
>>
>> Ivan
>>
>>
>> On Sat, Mar 7, 2020 at 4:58 PM Curt Fischer <curt.r.fisc...@gmail.com>
>> wrote:
>>
>>> Hi rdkit fiends!
>>>
>>> The [Daylight SMARTS example page](
>>> https://daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html)
>>> gives several examples for "multiple group" smarts, including these strings:
>>>
>>> ([Cl!$(Cl~c)].[c!$(c~Cl)])
>>> ([Cl]).([c])
>>> ([Cl].[c])
>>> [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
>>>
>>> In general, I cannot get these to be parsed by Chem.MolFromSmarts().
>>>
>>> For example,  Chem.MolFromSmarts('([Cl!$(Cl~c)].[c!$(c~Cl)])') gives me
>>> this error message:
>>>
>>> ```
>>> [13:01:41] SMARTS Parse Error: syntax error while parsing:
>>> ([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])
>>> [13:01:41] SMARTS Parse Error: Failed parsing SMARTS
>>> '([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])' for input: '([Cl!$(Cl~c)].[c!$(c~Cl)])'
>>> ```
>>> My understanding of SMARTS is that the outermost parentheses in this
>>> SMARTS string are required to force the chlorine and the aromatic carbon to
>>> be somewhere in the same covalently connected fragment.  E.g. this pattern
>>> *should* hit benzyl chloride ClCc1ccccc1 but should *not* hit the
>>> hydrochloride salt of aniline Cl.Nc1ccccc1.
>>>
>>> What am I getting wrong?  Is there a way to write rdkit-parsable SMARTS
>>> that achieves this?  (I want to filter our molecules that contain more than
>>> one of certain moieties, while allowing molecules that have one (or zero)
>>> such moieties.  But salts or covalently disconnected fragments that each
>>> contain one instance of the moiety should be fine.)
>>>
>>> Details on my setup:
>>>
>>> - RDKit Version: 2019.09.3
>>> - Operating system: macOS 10.15.2
>>> - Python version (if relevant): 3.6
>>> - Are you using conda? yes
>>> - If you are using conda, which channel did you install the rdkit from?
>>> `conda-forge`
>>> - If you are not using conda: how did you install the RDKit?
>>>
>>> Curt
>>>
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to