Hi Thomas,

I believe what you want can be done using recursive SMARTS and disconnected
SMARTS. For example,

In [7]: mol = Chem.MolFromSmiles('CCC=C')

In [8]:
mol.GetSubstructMatches(Chem.MolFromSmarts('[$(C-*)].CC.[$(C=*)]'))

Out[8]: ((0, 1, 2, 3),)

The recursive SMARTS let you match a single atom, but specifying its
context. [$(C=*)] means match any atom, as long as it's a carbon with a
double bond to any other atom. Importantly, the "any other atom" is not
"consumed", so it can still be matched elsewhere in the SMARTS.

The SMARTS above won't guarantee that there are no gaps, but you could
independently check that the number of atoms in the molecule equals the
number of atoms in the SMARTS.

Hope this helps,
Ivan



On Fri, Mar 5, 2021 at 7:36 AM Thomas <odioidenti...@gmail.com> wrote:

> Is it possible to search for a fragment that is not a valid structure
> itself, but part of a structure?
>
> Problem: "Given a structure, and a decomposition of the structure,
> highlight each part with a different color"
> The decomposition is always in the form of 1 SMILES and n SMILES FRAGMENTS
> The "smiles fragments" are noted with an asterisk in the "connection
> bonds".
>
> For example:
> mol: CCC=C
> decomposition:  C*   CC    *=C
>
> For a human it takes nothing to spot "who is who", but how would you
> approach it?
>
> - I cannot match the SMARTS "C=": it's not a valid SMARTS
> - I cannot match it without the broken bonds: I would lose the difference
> between C* and C=*
> - I cannot match it like it is: the asterisks will match the first atom of
> the other fragment. (Maybe is there a way to get which part matched with
> who? In that case I could remove the atom matching the asterisk...)
>
> Maybe there is an easy way to represent this pattern 'C=' in SMARTS, but
> the daylight manual is not clear about it. Or maybe I'm just too lazy to
> get it....
>
> In other words: is it possible to write n SMARTS that together match the
> whole structure (all the atoms and all the bonds, with no overlapping and
> no gaps)? Because if the SMARTS must be a complete structure (without
> "unbonded" bonds), that's actually not possible.
> Thank you
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to