Re: [Rdkit-discuss] SMARTS for =C=, #CH, #C-

2017-11-08 Thread Chenyang Shi
Dear Andy,

Thank you for a quick and thorough email. I find it very instructional,
although I need to read it a couple times more to digest it.

Cheers,
Chenyang

On Wed, Nov 8, 2017 at 2:27 PM, Andrew Dalke 
wrote:

> On Nov 8, 2017, at 21:00, Chenyang Shi  wrote:
> > =C= : [CH0;A;X2;!R](=[$(*)])=[$(*)]
>
> The recursive SMARTS notation, which is the term inside of the [$(...)],
> finds a match for the entire pattern and returns the first atom in that
> pattern.
>
> > For example, if I search "C=C=O" using "[CH0;A;X2;!R](=[$(*)])=[$(*)]",
> > >>> from rdkit import Chem
> > >>> m = Chem.MolFromSmiles('C=C=O')
> > >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH0;A;X2;!R](=
> [$(*)])=[$(*)]"))
> > ((1, 0, 2),)
> >
> > it prints out atomic positions 1, 0, 2--three positions. But I would
> expect only one position for the Carbon in the middle.
>
> The $(*) finds the pattern, which is a "*" and in this case the terminal
> carbons, and returns it. The substructure search returns 3 positions
> because the first is [CH0;A;X2;!R], the second is the first atom of "*",
> and the third is the first atom of the other "*".
>
> If you only want the first atom the entire pattern, then put the entire
> pattern in a recursive SMARTS, as in:
>
>   [$([CH0;A;X2;!R](=*)=*)]
>
> >>> pat = Chem.MolFromSmarts("[$([CH0;A;X2;!R](=*)=*)]")
> >>> mol = Chem.MolFromSmiles('C=C=O')
> >>> mol.GetSubstructMatches(pat)
> ((1,),)
>
> > Similarly, if I search "C#C" using "[CH1;A;X2;!R]#[$(*)]",
> > >>> from rdkit import Chem
> > >>> m = Chem.MolFromSmiles('C#C')
> > >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH1;A;X2;!R]#[$(*)]"))
> > ((0, 1),)
> > I would expect two separate positions such as (0,), (1,), indicating
> there are two carbon triple bonds (with an hydrogen).
>
> Since you are only looking for a single atom, try putting the entire
> pattern in a recursive SMARTS, as in
>
>   [$([CH1;A;X2;!R]#*)]
>
> >>> mol = Chem.MolFromSmiles("C#C")
> >>> pat = Chem.MolFromSmarts("[$([CH1;A;X2;!R]#*)]")
> >>> mol.GetSubstructMatches(pat)
> ((0,), (1,))
>
>
> > Then if  if I search "CC#CC" using " [CH0;A;X2;!R]#[$(*)]",
>
> I believe you want "[$([CH0;A;X2;!R]#*)]"
>
> Thank you for your clear description of what you expected.
>
> Cheers,
>
> Andrew
> da...@dalkescientific.com
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMARTS for =C=, #CH, #C-

2017-11-08 Thread Andrew Dalke
On Nov 8, 2017, at 21:00, Chenyang Shi  wrote:
> =C= : [CH0;A;X2;!R](=[$(*)])=[$(*)] 

The recursive SMARTS notation, which is the term inside of the [$(...)], finds 
a match for the entire pattern and returns the first atom in that pattern.

> For example, if I search "C=C=O" using "[CH0;A;X2;!R](=[$(*)])=[$(*)]", 
> >>> from rdkit import Chem
> >>> m = Chem.MolFromSmiles('C=C=O')
> >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH0;A;X2;!R](=[$(*)])=[$(*)]"))
> ((1, 0, 2),)
> 
> it prints out atomic positions 1, 0, 2--three positions. But I would expect 
> only one position for the Carbon in the middle.

The $(*) finds the pattern, which is a "*" and in this case the terminal 
carbons, and returns it. The substructure search returns 3 positions because 
the first is [CH0;A;X2;!R], the second is the first atom of "*", and the third 
is the first atom of the other "*".

If you only want the first atom the entire pattern, then put the entire pattern 
in a recursive SMARTS, as in:

  [$([CH0;A;X2;!R](=*)=*)]

>>> pat = Chem.MolFromSmarts("[$([CH0;A;X2;!R](=*)=*)]")
>>> mol = Chem.MolFromSmiles('C=C=O')
>>> mol.GetSubstructMatches(pat)
((1,),)

> Similarly, if I search "C#C" using "[CH1;A;X2;!R]#[$(*)]", 
> >>> from rdkit import Chem
> >>> m = Chem.MolFromSmiles('C#C')
> >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH1;A;X2;!R]#[$(*)]"))
> ((0, 1),)
> I would expect two separate positions such as (0,), (1,), indicating there 
> are two carbon triple bonds (with an hydrogen).

Since you are only looking for a single atom, try putting the entire pattern in 
a recursive SMARTS, as in

  [$([CH1;A;X2;!R]#*)]

>>> mol = Chem.MolFromSmiles("C#C")
>>> pat = Chem.MolFromSmarts("[$([CH1;A;X2;!R]#*)]")
>>> mol.GetSubstructMatches(pat)
((0,), (1,))


> Then if  if I search "CC#CC" using " [CH0;A;X2;!R]#[$(*)]", 

I believe you want "[$([CH0;A;X2;!R]#*)]"

Thank you for your clear description of what you expected.

Cheers,

Andrew
da...@dalkescientific.com



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss