Sorry - tried to type this too early in the morning and introduced some
errors transcribing the SMARTS pattern!

It should have been "[CH](=O)O[$([CH3]),$([CH2]C)]") as in
pat1 = Chem.MolFromSmarts("[CH](=O)O[$([CH3]),$([CH2]C)]")

Best regards,
Chris

On Sun, 9 Feb 2020 at 08:28, Chris Earnshaw <cgearns...@gmail.com> wrote:

> Hi
>
> I've always regarded it as dangerous to rely on the use of explicit
> hydrogens in search queries and pattern matches. I think it's generally
> safer to use H-count properties in your SMARTS. In your example case this
> will require the use of recursive SMARTS to allow matching of the CH3 and
> CH2Cn fragments you're interested in. The SMARTS
> "[CH](=O)O[$(CH3);$([CH2]C)]" should do what you want. The [CH] forces it
> to only match formate esters. The recursive SMARTS [$(CH3);$([CH2]C)] can
> be interpreted as 'an atom which is EITHER an aliphatic carbon with 3
> hydrogens OR an aliphatic carbon with two hydrogens and an attached
> aliphatic carbon'. It's possible to build very powerful queries using this
> kind of approach, and it's not necessary to add explicit Hs to make it work.
>
> from rdkit import Chem
> mols = [Chem.MolFromSmiles(s) for s in ["C(=O)OC", "C(=O)OCC",
> "C(=O)OCCC"]]
> pat1 = Chem.MolFromSmarts("[CH](=O)O[$(CH3);$([CH2]C)]")
> [mol.HasSubstructMatch(mol, pat1) for mol in mols]
> [True, True, True]
>
> All the best,
> Chris
>
> On Sat, 8 Feb 2020 at 20:29, Andrew Dalke <da...@dalkescientific.com>
> wrote:
>
>> On Feb 8, 2020, at 17:55, Janusz Petkowski <jjpet...@mit.edu> wrote:
>> >
>> > If not how can I match cases where in a given position there can be C
>> or H with rdkit?
>>
>> I believe you should use #1 instead of H.
>>
>>
>> >>> from rdkit import Chem
>> >>> mols = [Chem.MolFromSmiles(s) for s in ["C(=O)OC", "C(=O)OCC",
>> "C(=O)OCCC"]]
>> >>> hmols = [Chem.AddHs(mol) for mol in mols]
>>
>>
>>   Your pattern:
>>
>> >>> pat1 = Chem.MolFromSmarts("[H]C(=O)OC([C,H])([H])[H]")
>> >>> [mol.HasSubstructMatch(pat1) for mol in hmols]
>> [False, True, True]
>>
>>   Using #1 instead of H:
>>
>> >>> pat2 = Chem.MolFromSmarts("[H]C(=O)OC([C,#1])([#1])[#1]")
>> >>> [mol.HasSubstructMatch(pat2) for mol in hmols]
>> [True, True, True]
>>
>>
>> "H" has an odd interpretation.
>> https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html says:
>>
>>     Note that atomic primitive H can have two meanings,
>>     implying a property or the element itself. [H] means
>>     hydrogen atom. [*H2] means any atom with exactly
>>     two hydrogens attached
>>
>> I believe the goal of having [H] match a hydrogen atom is to allow a
>> SMILES, when interpreted as a SMARTS, to be able to match the SMILES when
>> interpreted as a molecule. I'm not sure about that though.
>>
>> Cheers,
>>
>>                                 Andrew
>>                                 da...@dalkescientific.com
>>
>>
>>
>>
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to