Re: [Rdkit-discuss] SMARTS for =C=, #CH, #C-
Dear Andy, Thank you for a quick and thorough email. I find it very instructional, although I need to read it a couple times more to digest it. Cheers, Chenyang On Wed, Nov 8, 2017 at 2:27 PM, Andrew Dalke wrote: > On Nov 8, 2017, at 21:00, Chenyang Shi wrote: > > =C= : [CH0;A;X2;!R](=[$(*)])=[$(*)] > > The recursive SMARTS notation, which is the term inside of the [$(...)], > finds a match for the entire pattern and returns the first atom in that > pattern. > > > For example, if I search "C=C=O" using "[CH0;A;X2;!R](=[$(*)])=[$(*)]", > > >>> from rdkit import Chem > > >>> m = Chem.MolFromSmiles('C=C=O') > > >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH0;A;X2;!R](= > [$(*)])=[$(*)]")) > > ((1, 0, 2),) > > > > it prints out atomic positions 1, 0, 2--three positions. But I would > expect only one position for the Carbon in the middle. > > The $(*) finds the pattern, which is a "*" and in this case the terminal > carbons, and returns it. The substructure search returns 3 positions > because the first is [CH0;A;X2;!R], the second is the first atom of "*", > and the third is the first atom of the other "*". > > If you only want the first atom the entire pattern, then put the entire > pattern in a recursive SMARTS, as in: > > [$([CH0;A;X2;!R](=*)=*)] > > >>> pat = Chem.MolFromSmarts("[$([CH0;A;X2;!R](=*)=*)]") > >>> mol = Chem.MolFromSmiles('C=C=O') > >>> mol.GetSubstructMatches(pat) > ((1,),) > > > Similarly, if I search "C#C" using "[CH1;A;X2;!R]#[$(*)]", > > >>> from rdkit import Chem > > >>> m = Chem.MolFromSmiles('C#C') > > >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH1;A;X2;!R]#[$(*)]")) > > ((0, 1),) > > I would expect two separate positions such as (0,), (1,), indicating > there are two carbon triple bonds (with an hydrogen). > > Since you are only looking for a single atom, try putting the entire > pattern in a recursive SMARTS, as in > > [$([CH1;A;X2;!R]#*)] > > >>> mol = Chem.MolFromSmiles("C#C") > >>> pat = Chem.MolFromSmarts("[$([CH1;A;X2;!R]#*)]") > >>> mol.GetSubstructMatches(pat) > ((0,), (1,)) > > > > Then if if I search "CC#CC" using " [CH0;A;X2;!R]#[$(*)]", > > I believe you want "[$([CH0;A;X2;!R]#*)]" > > Thank you for your clear description of what you expected. > > Cheers, > > Andrew > da...@dalkescientific.com > > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SMARTS for =C=, #CH, #C-
On Nov 8, 2017, at 21:00, Chenyang Shi wrote: > =C= : [CH0;A;X2;!R](=[$(*)])=[$(*)] The recursive SMARTS notation, which is the term inside of the [$(...)], finds a match for the entire pattern and returns the first atom in that pattern. > For example, if I search "C=C=O" using "[CH0;A;X2;!R](=[$(*)])=[$(*)]", > >>> from rdkit import Chem > >>> m = Chem.MolFromSmiles('C=C=O') > >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH0;A;X2;!R](=[$(*)])=[$(*)]")) > ((1, 0, 2),) > > it prints out atomic positions 1, 0, 2--three positions. But I would expect > only one position for the Carbon in the middle. The $(*) finds the pattern, which is a "*" and in this case the terminal carbons, and returns it. The substructure search returns 3 positions because the first is [CH0;A;X2;!R], the second is the first atom of "*", and the third is the first atom of the other "*". If you only want the first atom the entire pattern, then put the entire pattern in a recursive SMARTS, as in: [$([CH0;A;X2;!R](=*)=*)] >>> pat = Chem.MolFromSmarts("[$([CH0;A;X2;!R](=*)=*)]") >>> mol = Chem.MolFromSmiles('C=C=O') >>> mol.GetSubstructMatches(pat) ((1,),) > Similarly, if I search "C#C" using "[CH1;A;X2;!R]#[$(*)]", > >>> from rdkit import Chem > >>> m = Chem.MolFromSmiles('C#C') > >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH1;A;X2;!R]#[$(*)]")) > ((0, 1),) > I would expect two separate positions such as (0,), (1,), indicating there > are two carbon triple bonds (with an hydrogen). Since you are only looking for a single atom, try putting the entire pattern in a recursive SMARTS, as in [$([CH1;A;X2;!R]#*)] >>> mol = Chem.MolFromSmiles("C#C") >>> pat = Chem.MolFromSmarts("[$([CH1;A;X2;!R]#*)]") >>> mol.GetSubstructMatches(pat) ((0,), (1,)) > Then if if I search "CC#CC" using " [CH0;A;X2;!R]#[$(*)]", I believe you want "[$([CH0;A;X2;!R]#*)]" Thank you for your clear description of what you expected. Cheers, Andrew da...@dalkescientific.com -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] SMARTS for =C=, #CH, #C-
Dear RDKitters, I have a question regarding SMARTS codes for three simple functional groups, these are =C=, #CH and #C-. I am new to SMARTS/SMILES. I indeed tried to guess their codes. Here are my guesses: =C= : [CH0;A;X2;!R](=[$(*)])=[$(*)] #CH : [CH1;A;X2;!R]#[$(*)] #C- : [CH0;A;X2;!R]#[$(*)] I checked these SMARTS at http://smartsview.zbh.uni-hamburg.de/smartsview/calculate?method=get; they all seem make sense. For example, the webpage prints out following messages: =C=: it says "aliphatic C with 0 further total connections, with 0 further hydrogen, not in a ring". #CH: "aliphatic C with 0 further total connections, with 1 further hydrogen, not in a ring". #C-: "aliphatic C with 1 further total connections, with 0 further hydrogen, not in a ring". However, when I search subgroups using these SMARTS, I had problems. For example, if I search "C=C=O" using "[CH0;A;X2;!R](=[$(*)])=[$(*)]", >>> from rdkit import Chem >>> m = Chem.MolFromSmiles('C=C=O') >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH0;A;X2;!R](=[$(*)])=[$(*)]")) ((1, 0, 2),) it prints out atomic positions 1, 0, 2--three positions. But I would expect only one position for the Carbon in the middle. Similarly, if I search "C#C" using "[CH1;A;X2;!R]#[$(*)]", >>> from rdkit import Chem >>> m = Chem.MolFromSmiles('C#C') >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH1;A;X2;!R]#[$(*)]")) ((0, 1),) I would expect two separate positions such as (0,), (1,), indicating there are two carbon triple bonds (with an hydrogen). Then if if I search "CC#CC" using " [CH0;A;X2;!R]#[$(*)]", >>> from rdkit import Chem >>> m = Chem.MolFromSmiles('CC#CC') >>> m.GetSubstructMatches(Chem.MolFromSmarts(" [CH0;A;X2;!R]#[$(*)]")) ((1, 2),) Again, I would expect two separate positions such as (1,), (2,), indicating two carbon triple bonds. I think the problem might be my SMARTS for these three groups are not SPECIFIC. I would appreciate everyone's help on this. Cheers, Chenyang -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss