Glad it works for you. As Greg pointed out to someone else today, it’s marginally more efficient to do [#6] than [C,c] and likewise for nitrogen. But it’s always a trade off between speed and legibility/maintainability. If speed is of the essence and you’re running on millions of compounds it might be worth trying.
On Mon, 7 Mar 2022 at 20:45, Adam Moyer <atom.mo...@gmail.com> wrote: > Ahh! Thank you so much, to both of you. > > Yes, the different meaning of H in the various contexts was tripping me up. > > Also, DescribeQuery() was definitely a function that I needed for > debugging this solo. Thank you. I will keep that in mind in the future. > > I found that this smiles (S4) is exactly what I > needed: '[C,c]1(Cl)[C,c][C,c]([N,n,C,c,#1])[C,c]([C,c])[C,c]([#1])[C,c]1'. > > Cheers, > Adam > > On Tue, Mar 1, 2022 at 4:32 AM Ivan Tubert-Brohman < > ivan.tubert-broh...@schrodinger.com> wrote: > >> A minor correction: [H] by itself *is* valid and means a hydrogen atom. >> The Daylight docs say as much in section 4.1. But in other contexts it >> means a hydrogen count, so to be safe, always using #1 to mean a hydrogen >> atom can be a good practice. >> >> If you are ever in doubt about how RDKit is interpreting a SMARTS, >> I recommend making use of the DescribeQuery function which provides a tree >> representation of a query atom or bond. For example (comments added): >> >> >>> mol = Chem.MolFromSmarts('[H][N,H][N,#1]') >> >> >> >>> print(mol.GetAtomWithIdx(0).DescribeQuery()) # [H] >> >> AtomAtomicNum 1 = val # [H] interpreted as a hydrogen atom >> >> >>> print(mol.GetAtomWithIdx(1).DescribeQuery()) # [N,H] >> >> AtomOr >> AtomType 7 = val >> AtomHCount 1 = val # H interpreted as a hydrogen count >> # Overall query atom means "an aliphatic nitrogen OR (any atom with one >> hydrogen)! >> >> >>> print(mol.GetAtomWithIdx(2).DescribeQuery()) # [N,#1] >> >> AtomOr >> AtomType 7 = val >> AtomAtomicNum 1 = val # "#1" is atomic number, therefore a hydrogen >> atom. >> # Overall query atom means "an aliphatic nitrogen OR a hydrogen" >> >> One non-obvious convention in the DescribeQuery output is that AtomType >> implies aliphatic when the value is a normal atomic number, or aromatic if >> the atomic number is offset by 1000. For example, [n] is "AtomType 1007". >> >> Hope you find this approach useful in the future. >> >> Ivan >> >> On Tue, Mar 1, 2022 at 6:33 AM David Cosgrove <davidacosgrov...@gmail.com> >> wrote: >> >>> Hi Adam >>> There are a number of issues here. The key one, I think, is a >>> misunderstanding about the meaning of H in SMARTS. It means "a single >>> attached hydrogen", and is a qualifier for another atom, it cannot be used >>> by itself. So [*H] is valid, [H] isn't. See the table at >>> https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html. If you >>> want to refer to an explicit hydrogen, you have to use [#1]. However, that >>> will only match an explicit hydrogen in the molecule, not an implicit one. >>> Thus c[#1] doesn't match anything in c1ccccc1. If you have read in a >>> molecule from a molfile, for example, that has explicit hydrogens then you >>> will be ok. >>> >>> Further to that, your SMARTS strings, at least as they have appeared in >>> gmail, which may have garbled them, are incorrect. In S1, the brackets >>> round [N,n,H] make it a substituent, so it will not match the indole >>> nitrogen. Also, it would probably be better as [N,n;H], which would be >>> read as "(aliphatic nitrogen OR aromatic nitrogen) AND 1 attached >>> hydrogen." The [N,n,H] will match a methylated indole nitrogen which I >>> imagine is not what you want. Similar remarks apply to S2. A SMARTS that >>> matches both 6CI and PCT >>> is [C,c]1(Cl)[C,c][C,c;H][C,c]([C,c])[C,c;H][C,c]1, but that won't match >>> the H atoms themselves if you want to use them in the overlay, and it also >>> won't work in the aliphatic case of, for example, ClC1CCC(C)CC1 because >>> there the carbon atoms have 2 attached hydrogens. If you really do want >>> it to match aliphatic cases as well, then you will need something >>> like >>> [C,c]1(Cl)[$([CH2]),$([cH])][$([CH2]),$([cH])][C,c]([C,c])[$([CH2]),$([cH])][$([CH2]),$([cH])]1 >>> which is quite a mouthful. The carbons at the 2,3,5 and 6 positions on the >>> ring are specified as either [CH2] or [cH]. >>> >>> Jupyter notebook can be really useful for debugging SMARTS patterns like >>> this. The one I used was variations of >>> ``` >>> from rdkit import Chem >>> from IPython.display import SVG >>> mol = Chem.MolFromSmiles('C1=CC(=CC2=C1C=CN2C)Cl') >>> qmol = Chem.MolFromSmarts('[C,c]1(Cl)[C,c][C,c][C,c]([C,c])[C,c][C,c]1') >>> print(mol.GetSubstructMatches(qmol)) >>> mol >>> ``` >>> which prints the numbers of the matching atoms and also draws the >>> molecule with the match highlighted. >>> Regards, >>> Dave >>> >>> >>> On Tue, Mar 1, 2022 at 1:43 AM Adam Moyer <atom.mo...@gmail.com> wrote: >>> >>>> Hello, >>>> >>>> I have a baffling case where I am trying to match substructures on two >>>> ligands for the goal of aligning them. >>>> >>>> I have two ligands; one is a 6-chloroindole (6CI) and the other is a >>>> para-chloro toluene (PCT). >>>> >>>> I am attempting to use the following SMARTS (S1) to match >>>> them: '[C,c]1(Cl)[C,c][C,c]*([N,n,H])*[C,c]([C,c,H])[C,c]([H])[C,c]1'. >>>> For some reason S1 only finds a match in 6CI. >>>> >>>> When I use the following SMARTS (S2) I only match to PCT as expected: >>>> '[C,c]1(Cl)[C,c][C,c]*([H])*[C,c]([C,c,H])[C,c]([H])[C,c]1'. >>>> >>>> How can S1 not match PCT? S1 is strictly a superset of S2 because I am >>>> using the "or" operation. Do I have a misunderstanding of how explicit >>>> hydrogens work in RDKit/SMARTS? >>>> >>>> Lastly when I use the last SMARTS (S3) I am able to match to both, but >>>> I cannot use that smarts due to other requirements in my >>>> project: '[C,c]1(Cl)[C,c][C,c][C,c]([C,c,H])[C,c]([H])[C,c]1' >>>> >>>> Thanks! >>>> Adam >>>> _______________________________________________ >>>> Rdkit-discuss mailing list >>>> Rdkit-discuss@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>>> >>> >>> >>> -- >>> David Cosgrove >>> Freelance computational chemistry and chemoinformatics developer >>> http://cozchemix.co.uk >>> >>> _______________________________________________ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >> -- David Cosgrove Freelance computational chemistry and chemoinformatics developer http://cozchemix.co.uk
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss