On Feb 8, 2020, at 17:55, Janusz Petkowski <jjpet...@mit.edu> wrote:
> 
> If not how can I match cases where in a given position there can be C or H 
> with rdkit?

I believe you should use #1 instead of H.


>>> from rdkit import Chem
>>> mols = [Chem.MolFromSmiles(s) for s in ["C(=O)OC", "C(=O)OCC", "C(=O)OCCC"]]
>>> hmols = [Chem.AddHs(mol) for mol in mols]


  Your pattern:

>>> pat1 = Chem.MolFromSmarts("[H]C(=O)OC([C,H])([H])[H]")
>>> [mol.HasSubstructMatch(pat1) for mol in hmols]
[False, True, True]

  Using #1 instead of H:

>>> pat2 = Chem.MolFromSmarts("[H]C(=O)OC([C,#1])([#1])[#1]")
>>> [mol.HasSubstructMatch(pat2) for mol in hmols]
[True, True, True]


"H" has an odd interpretation. 
https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html says:

    Note that atomic primitive H can have two meanings,
    implying a property or the element itself. [H] means
    hydrogen atom. [*H2] means any atom with exactly
    two hydrogens attached

I believe the goal of having [H] match a hydrogen atom is to allow a SMILES, 
when interpreted as a SMARTS, to be able to match the SMILES when interpreted 
as a molecule. I'm not sure about that though.

Cheers,

                                Andrew
                                da...@dalkescientific.com




_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to