Re: [Rdkit-discuss] Question matching substructures from SMARTS with explicit hydrogens

2022-03-07 Thread David Cosgrove
Glad it works for you. As Greg pointed out to someone else today, it’s marginally more efficient to do [#6] than [C,c] and likewise for nitrogen. But it’s always a trade off between speed and legibility/maintainability. If speed is of the essence and you’re running on millions of compounds it

Re: [Rdkit-discuss] Question matching substructures from SMARTS with explicit hydrogens

2022-03-07 Thread Adam Moyer
Ahh! Thank you so much, to both of you. Yes, the different meaning of H in the various contexts was tripping me up. Also, DescribeQuery() was definitely a function that I needed for debugging this solo. Thank you. I will keep that in mind in the future. I found that this smiles (S4) is exactly

Re: [Rdkit-discuss] Question matching substructures from SMARTS with explicit hydrogens

2022-03-01 Thread Ivan Tubert-Brohman
A minor correction: [H] by itself *is* valid and means a hydrogen atom. The Daylight docs say as much in section 4.1. But in other contexts it means a hydrogen count, so to be safe, always using #1 to mean a hydrogen atom can be a good practice. If you are ever in doubt about how RDKit is

Re: [Rdkit-discuss] Question matching substructures from SMARTS with explicit hydrogens

2022-03-01 Thread David Cosgrove
Hi Adam There are a number of issues here. The key one, I think, is a misunderstanding about the meaning of H in SMARTS. It means "a single attached hydrogen", and is a qualifier for another atom, it cannot be used by itself. So [*H] is valid, [H] isn't. See the table at

[Rdkit-discuss] Question matching substructures from SMARTS with explicit hydrogens

2022-02-28 Thread Adam Moyer
Hello, I have a baffling case where I am trying to match substructures on two ligands for the goal of aligning them. I have two ligands; one is a 6-chloroindole (6CI) and the other is a para-chloro toluene (PCT). I am attempting to use the following SMARTS (S1) to match them: