Hi

I've always regarded it as dangerous to rely on the use of explicit
hydrogens in search queries and pattern matches. I think it's generally
safer to use H-count properties in your SMARTS. In your example case this
will require the use of recursive SMARTS to allow matching of the CH3 and
CH2Cn fragments you're interested in. The SMARTS
"[CH](=O)O[$(CH3);$([CH2]C)]" should do what you want. The [CH] forces it
to only match formate esters. The recursive SMARTS [$(CH3);$([CH2]C)] can
be interpreted as 'an atom which is EITHER an aliphatic carbon with 3
hydrogens OR an aliphatic carbon with two hydrogens and an attached
aliphatic carbon'. It's possible to build very powerful queries using this
kind of approach, and it's not necessary to add explicit Hs to make it work.

from rdkit import Chem
mols = [Chem.MolFromSmiles(s) for s in ["C(=O)OC", "C(=O)OCC", "C(=O)OCCC"]]
pat1 = Chem.MolFromSmarts("[CH](=O)O[$(CH3);$([CH2]C)]")
[mol.HasSubstructMatch(mol, pat1) for mol in mols]
[True, True, True]

All the best,
Chris

On Sat, 8 Feb 2020 at 20:29, Andrew Dalke <da...@dalkescientific.com> wrote:

> On Feb 8, 2020, at 17:55, Janusz Petkowski <jjpet...@mit.edu> wrote:
> >
> > If not how can I match cases where in a given position there can be C or
> H with rdkit?
>
> I believe you should use #1 instead of H.
>
>
> >>> from rdkit import Chem
> >>> mols = [Chem.MolFromSmiles(s) for s in ["C(=O)OC", "C(=O)OCC",
> "C(=O)OCCC"]]
> >>> hmols = [Chem.AddHs(mol) for mol in mols]
>
>
>   Your pattern:
>
> >>> pat1 = Chem.MolFromSmarts("[H]C(=O)OC([C,H])([H])[H]")
> >>> [mol.HasSubstructMatch(pat1) for mol in hmols]
> [False, True, True]
>
>   Using #1 instead of H:
>
> >>> pat2 = Chem.MolFromSmarts("[H]C(=O)OC([C,#1])([#1])[#1]")
> >>> [mol.HasSubstructMatch(pat2) for mol in hmols]
> [True, True, True]
>
>
> "H" has an odd interpretation.
> https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html says:
>
>     Note that atomic primitive H can have two meanings,
>     implying a property or the element itself. [H] means
>     hydrogen atom. [*H2] means any atom with exactly
>     two hydrogens attached
>
> I believe the goal of having [H] match a hydrogen atom is to allow a
> SMILES, when interpreted as a SMARTS, to be able to match the SMILES when
> interpreted as a molecule. I'm not sure about that though.
>
> Cheers,
>
>                                 Andrew
>                                 da...@dalkescientific.com
>
>
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to