On Feb 8, 2020, at 17:55, Janusz Petkowski <jjpet...@mit.edu> wrote: > > If not how can I match cases where in a given position there can be C or H > with rdkit?
I believe you should use #1 instead of H. >>> from rdkit import Chem >>> mols = [Chem.MolFromSmiles(s) for s in ["C(=O)OC", "C(=O)OCC", "C(=O)OCCC"]] >>> hmols = [Chem.AddHs(mol) for mol in mols] Your pattern: >>> pat1 = Chem.MolFromSmarts("[H]C(=O)OC([C,H])([H])[H]") >>> [mol.HasSubstructMatch(pat1) for mol in hmols] [False, True, True] Using #1 instead of H: >>> pat2 = Chem.MolFromSmarts("[H]C(=O)OC([C,#1])([#1])[#1]") >>> [mol.HasSubstructMatch(pat2) for mol in hmols] [True, True, True] "H" has an odd interpretation. https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html says: Note that atomic primitive H can have two meanings, implying a property or the element itself. [H] means hydrogen atom. [*H2] means any atom with exactly two hydrogens attached I believe the goal of having [H] match a hydrogen atom is to allow a SMILES, when interpreted as a SMARTS, to be able to match the SMILES when interpreted as a molecule. I'm not sure about that though. Cheers, Andrew da...@dalkescientific.com _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss