Re: [Rdkit-discuss] SMARTS pattern matching of canonical forms of aromatic molecules
Start with your benzene molecule m = Chem.MolFromSmiles('c1c1') make a pattern using Peter's example, with three aromatic atoms connected by three aromatic bonds patt = Chem.MolFromSmarts('a:a:a') and it's a match: m.HasSubstructMatch(patt) >True Kekulize your mol, and the pattern doesn't match Chem.rdmolops.Kekulize(m) m.HasSubstructMatch(patt) >False but if you change the smarts pattern to match aromatic atoms connected by kekulized bonds, it matches patt2 = Chem.MolFromSmarts('[a]=[a]-[a]') m.HasSubstructMatch(patt2) >True Your original SMARTS query doesn't match, because C in a smarts string is specifically an aliphatic carbon. Change it to c and it will match. It would work, if you had removed the aromatic flags when kekulizing m = Chem.MolFromSmiles('c1c1') Chem.rdmolops.Kekulize(m, clearAromaticFlags = True) patt = Chem.MolFromSmarts('[C]=[C]-[C]') m.HasSubstructMatch(patt) >True So when you kekulize, without using the clearAromaticFlags option, then aromatic atoms will still only match 'a', not 'A', but the bonds will only match '=' or '-', but not ':' (they will also match '@' or '~', but that's beside the point here) As Peter mentions, by default if you read in a kekulized SMILES string, the mol you create will not be kekulized, but it sounds like you are intentionally kekulizing before doing substructure matching. Jason Biggs On Fri, Sep 8, 2017 at 5:19 PM, James T. Metz via Rdkit-discuss < rdkit-discuss@lists.sourceforge.net> wrote: > Hello, > > Suppose I read in the SMILES of an aromatic molecule e.g., for > benzene > > c1c1 > > I then want to convert the molecule to a Kekule representation and > then perform various SMARTS pattern recognition e.g. > > [C]=[C]-[C] > > I have tried various Kekule commands in RDkit, but I can not figure > out how to (or if it is possible) to recognize a SMARTS pattern for > a portion of a molecule which is aromatic, but is currently being > stored as a Kekule structure. > > Also, is it possible to generate and store more than one Kekule > form in RDkit? > > Thank you. > > Regards, > Jim Metz > > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SMARTS pattern matching of canonical forms of aromatic molecules
Hi, In SMARTS, 'a' matches an aromatic atom. So you would match your molecule with the pattern 'aaa', or if you wanted to restrict yourself to carbons, 'ccc'. This would match whether you created the molecule from a Kekulized or an aromatic SMILES. Remember that it's the molecular recognition code, not the form of the input SMILES, that determines whether a molecule is aromatic. -P. On Fri, Sep 8, 2017 at 6:19 PM, James T. Metz via Rdkit-discuss < rdkit-discuss@lists.sourceforge.net> wrote: > Hello, > > Suppose I read in the SMILES of an aromatic molecule e.g., for > benzene > > c1c1 > > I then want to convert the molecule to a Kekule representation and > then perform various SMARTS pattern recognition e.g. > > [C]=[C]-[C] > > I have tried various Kekule commands in RDkit, but I can not figure > out how to (or if it is possible) to recognize a SMARTS pattern for > a portion of a molecule which is aromatic, but is currently being > stored as a Kekule structure. > > Also, is it possible to generate and store more than one Kekule > form in RDkit? > > Thank you. > > Regards, > Jim Metz > > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] SMARTS pattern matching of canonical forms of aromatic molecules
Hello, Suppose I read in the SMILES of an aromatic molecule e.g., for benzene c1c1 I then want to convert the molecule to a Kekule representation and then perform various SMARTS pattern recognition e.g. [C]=[C]-[C] I have tried various Kekule commands in RDkit, but I can not figure out how to (or if it is possible) to recognize a SMARTS pattern for a portion of a molecule which is aromatic, but is currently being stored as a Kekule structure. Also, is it possible to generate and store more than one Kekule form in RDkit? Thank you. Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss