Hi I've always regarded it as dangerous to rely on the use of explicit hydrogens in search queries and pattern matches. I think it's generally safer to use H-count properties in your SMARTS. In your example case this will require the use of recursive SMARTS to allow matching of the CH3 and CH2Cn fragments you're interested in. The SMARTS "[CH](=O)O[$(CH3);$([CH2]C)]" should do what you want. The [CH] forces it to only match formate esters. The recursive SMARTS [$(CH3);$([CH2]C)] can be interpreted as 'an atom which is EITHER an aliphatic carbon with 3 hydrogens OR an aliphatic carbon with two hydrogens and an attached aliphatic carbon'. It's possible to build very powerful queries using this kind of approach, and it's not necessary to add explicit Hs to make it work.
from rdkit import Chem mols = [Chem.MolFromSmiles(s) for s in ["C(=O)OC", "C(=O)OCC", "C(=O)OCCC"]] pat1 = Chem.MolFromSmarts("[CH](=O)O[$(CH3);$([CH2]C)]") [mol.HasSubstructMatch(mol, pat1) for mol in mols] [True, True, True] All the best, Chris On Sat, 8 Feb 2020 at 20:29, Andrew Dalke <da...@dalkescientific.com> wrote: > On Feb 8, 2020, at 17:55, Janusz Petkowski <jjpet...@mit.edu> wrote: > > > > If not how can I match cases where in a given position there can be C or > H with rdkit? > > I believe you should use #1 instead of H. > > > >>> from rdkit import Chem > >>> mols = [Chem.MolFromSmiles(s) for s in ["C(=O)OC", "C(=O)OCC", > "C(=O)OCCC"]] > >>> hmols = [Chem.AddHs(mol) for mol in mols] > > > Your pattern: > > >>> pat1 = Chem.MolFromSmarts("[H]C(=O)OC([C,H])([H])[H]") > >>> [mol.HasSubstructMatch(pat1) for mol in hmols] > [False, True, True] > > Using #1 instead of H: > > >>> pat2 = Chem.MolFromSmarts("[H]C(=O)OC([C,#1])([#1])[#1]") > >>> [mol.HasSubstructMatch(pat2) for mol in hmols] > [True, True, True] > > > "H" has an odd interpretation. > https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html says: > > Note that atomic primitive H can have two meanings, > implying a property or the element itself. [H] means > hydrogen atom. [*H2] means any atom with exactly > two hydrogens attached > > I believe the goal of having [H] match a hydrogen atom is to allow a > SMILES, when interpreted as a SMARTS, to be able to match the SMILES when > interpreted as a molecule. I'm not sure about that though. > > Cheers, > > Andrew > da...@dalkescientific.com > > > > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss