First: the reason the RDKit does not parse things like:

In [2]: p = Chem.MolFromSmarts('([Cl-].[Na+])')


[05:58:01] SMARTS Parse Error: syntax error while parsing: ([Cl-].[Na+])
[05:58:01] SMARTS Parse Error: Failed parsing SMARTS '([Cl-].[Na+])' for
input: '([Cl-].[Na+])'

In [3]: p = Chem.MolFromSmarts('([Cl-]).([Na+])')


[05:59:16] SMARTS Parse Error: syntax error while parsing: ([Cl-]).([Na+])
[05:59:16] SMARTS Parse Error: Failed parsing SMARTS '([Cl-]).([Na+])' for
input: '([Cl-]).([Na+])'


is because those query types are not supported by the substructure search
engine. Rather than accept the input and then doing the wrong thing, we've
opted not to accept it.


On Sun, Mar 8, 2020 at 1:03 AM Curt Fischer <curt.r.fisc...@gmail.com>
wrote:

>
> Is there any consensus on idioms for identifying multiple moieties in the
> same fragment?  Do I have to use len(mol.GetSubstructMatches(patt)) > 1 as
> some kind of selector and then do some kind of graph traversal routine to
> see if any of the matches are covalently connected?
>

My standard answer if you want to find multiple entities in the same
fragment you can use:

In [4]: p = Chem.MolFromSmarts('O.N')



 and then either make sure that your molecules have a single fragment *or*
that the matches you get back are contained in single fragment. Here's one
way of doing that:

In [18]: def fragsearch(m,p):
    ...:     matches = [set(x) for x in m.GetSubstructMatches(p)]
    ...:     for frag in frags:
    ...:         for match in matches:
    ...:             if match.issubset(frag):
    ...:                 return match
    ...:     return False
In [21]: m1 = Chem.MolFromSmiles('OCCCN.CCC')



In [22]: m2 = Chem.MolFromSmiles('OCCC.CCCN')



In [23]: m1.HasSubstructMatch(p)


Out[23]: True

In [24]: m2.HasSubstructMatch(p)


Out[24]: True

In [25]: fragsearch(m1,p)


Out[25]: {0, 4}

In [26]: fragsearch(m2,p)


Out[26]: False


Do you really have a use case where you have molecules containing multiple
fragments that you can't separate into a pieces and you want to do this
kind of search?

Best,
-greg

On Sat, Mar 7, 2020 at 3:34 PM Ivan Tubert-Brohman <
> ivan.tubert-broh...@schrodinger.com> wrote:
>
>> Hi Curt,
>>
>> According to
>> https://www.rdkit.org/docs/RDKit_Book.html#smarts-support-and-extensions ,
>> it's not supported:
>>
>> Here’s the (hopefully complete) list of SMARTS features that are *not*
>>>  supported:
>>>
>>>    - Non-tetrahedral chiral classes
>>>
>>>
>>>    - the @? operator
>>>
>>>
>>>    - explicit atomic masses (though isotope queries are supported)
>>>
>>>
>>>    - component level grouping requiring matches in different
>>>    components, i.e. (C).(C)
>>>
>>> OK, the way it's worded it sounds like (C.C) might be supported (since
>> that would be requiring matches in the same component), but as you've seen,
>> it isn't supported either...
>>
>> Ivan
>>
>>
>> On Sat, Mar 7, 2020 at 4:58 PM Curt Fischer <curt.r.fisc...@gmail.com>
>> wrote:
>>
>>> Hi rdkit fiends!
>>>
>>> The [Daylight SMARTS example page](
>>> https://daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html)
>>> gives several examples for "multiple group" smarts, including these strings:
>>>
>>> ([Cl!$(Cl~c)].[c!$(c~Cl)])
>>> ([Cl]).([c])
>>> ([Cl].[c])
>>> [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
>>>
>>> In general, I cannot get these to be parsed by Chem.MolFromSmarts().
>>>
>>> For example,  Chem.MolFromSmarts('([Cl!$(Cl~c)].[c!$(c~Cl)])') gives me
>>> this error message:
>>>
>>> ```
>>> [13:01:41] SMARTS Parse Error: syntax error while parsing:
>>> ([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])
>>> [13:01:41] SMARTS Parse Error: Failed parsing SMARTS
>>> '([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])' for input: '([Cl!$(Cl~c)].[c!$(c~Cl)])'
>>> ```
>>> My understanding of SMARTS is that the outermost parentheses in this
>>> SMARTS string are required to force the chlorine and the aromatic carbon to
>>> be somewhere in the same covalently connected fragment.  E.g. this pattern
>>> *should* hit benzyl chloride ClCc1ccccc1 but should *not* hit the
>>> hydrochloride salt of aniline Cl.Nc1ccccc1.
>>>
>>> What am I getting wrong?  Is there a way to write rdkit-parsable SMARTS
>>> that achieves this?  (I want to filter our molecules that contain more than
>>> one of certain moieties, while allowing molecules that have one (or zero)
>>> such moieties.  But salts or covalently disconnected fragments that each
>>> contain one instance of the moiety should be fine.)
>>>
>>> Details on my setup:
>>>
>>> - RDKit Version: 2019.09.3
>>> - Operating system: macOS 10.15.2
>>> - Python version (if relevant): 3.6
>>> - Are you using conda? yes
>>> - If you are using conda, which channel did you install the rdkit from?
>>> `conda-forge`
>>> - If you are not using conda: how did you install the RDKit?
>>>
>>> Curt
>>>
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to