The Daylight website is a very good resource for SMILES, SMARTS, and
SMIRKS.

http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html

JW

___________________
JW Feng, Ph.D.
Denali Therapeutics Inc.
151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080 | (650)
270-0628

On Wed, Nov 8, 2017 at 2:52 PM, <rdkit-discuss-requ...@lists.sourceforge.net
> wrote:

> Send Rdkit-discuss mailing list submissions to
>         rdkit-discuss@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> or, via email, send a message with subject or body 'help' to
>         rdkit-discuss-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
>         rdkit-discuss-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Rdkit-discuss digest..."
>
>
> Today's Topics:
>
>    1. SMARTS for =C=, #CH, #C- (Chenyang Shi)
>    2. Re: SMARTS for =C=, #CH, #C- (Andrew Dalke)
>    3. Re: SMARTS for =C=, #CH, #C- (Chenyang Shi)
>    4. SMARTS for Joback and Reid method (Chenyang Shi)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 8 Nov 2017 14:00:36 -0600
> From: Chenyang Shi <cs3...@columbia.edu>
> To: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
> Subject: [Rdkit-discuss] SMARTS for =C=, #CH, #C-
> Message-ID:
>         <CAAj+Mte+mqgznFqFfeLVgL06ZJDbk0pX-uTLpGBk_n_jiWKqgg@mail.gmail.
> com>
> Content-Type: text/plain; charset="utf-8"
>
> Dear RDKitters,
>
> I have a question regarding SMARTS codes for three simple functional
> groups, these are =C=, #CH and #C-. I am new to SMARTS/SMILES. I indeed
> tried to guess their codes. Here are my guesses:
>
> =C= : [CH0;A;X2;!R](=[$(*)])=[$(*)]
>
> #CH : [CH1;A;X2;!R]#[$(*)]
>
> #C- :  [CH0;A;X2;!R]#[$(*)]
>
> I checked these SMARTS at
> http://smartsview.zbh.uni-hamburg.de/smartsview/calculate?method=get; they
> all seem make sense.
>
> For example, the webpage prints out following messages:
>
> =C=: it says "aliphatic C with 0 further total connections, with 0 further
> hydrogen, not in a ring".
>
> #CH: "aliphatic C with 0 further total connections, with 1 further
> hydrogen, not in a ring".
>
> #C-: "aliphatic C with 1 further total connections, with 0 further
> hydrogen, not in a ring".
>
> However, when I search subgroups using these SMARTS, I had problems.
>
> For example, if I search "C=C=O" using "[CH0;A;X2;!R](=[$(*)])=[$(*)]",
> >>> from rdkit import Chem
> >>> m = Chem.MolFromSmiles('C=C=O')
> >>>
> m.GetSubstructMatches(Chem.MolFromSmarts("[CH0;A;X2;!R](=[$(*)])=[$(*)]"))
> ((1, 0, 2),)
>
> it prints out atomic positions 1, 0, 2--three positions. But I would expect
> only one position for the Carbon in the middle.
>
> Similarly, if I search "C#C" using "[CH1;A;X2;!R]#[$(*)]",
> >>> from rdkit import Chem
> >>> m = Chem.MolFromSmiles('C#C')
> >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH1;A;X2;!R]#[$(*)]"))
> ((0, 1),)
> I would expect two separate positions such as (0,), (1,), indicating there
> are two carbon triple bonds (with an hydrogen).
>
>
> Then if  if I search "CC#CC" using " [CH0;A;X2;!R]#[$(*)]",
> >>> from rdkit import Chem
> >>> m = Chem.MolFromSmiles('CC#CC')
> >>> m.GetSubstructMatches(Chem.MolFromSmarts(" [CH0;A;X2;!R]#[$(*)]"))
> ((1, 2),)
> Again, I would expect two separate positions such as (1,), (2,), indicating
> two carbon triple bonds.
>
> I think the problem might be my SMARTS for these three groups are not
> SPECIFIC. I would appreciate everyone's help on this.
>
> Cheers,
> Chenyang
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 2
> Date: Wed, 8 Nov 2017 21:27:29 +0100
> From: Andrew Dalke <da...@dalkescientific.com>
> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
> Subject: Re: [Rdkit-discuss] SMARTS for =C=, #CH, #C-
> Message-ID: <8478f1ae-4916-4feb-8e67-e6cf4e52f...@dalkescientific.com>
> Content-Type: text/plain; charset=us-ascii
>
> On Nov 8, 2017, at 21:00, Chenyang Shi <cs3...@columbia.edu> wrote:
> > =C= : [CH0;A;X2;!R](=[$(*)])=[$(*)]
>
> The recursive SMARTS notation, which is the term inside of the [$(...)],
> finds a match for the entire pattern and returns the first atom in that
> pattern.
>
> > For example, if I search "C=C=O" using "[CH0;A;X2;!R](=[$(*)])=[$(*)]",
> > >>> from rdkit import Chem
> > >>> m = Chem.MolFromSmiles('C=C=O')
> > >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH0;A;X2;!R](=
> [$(*)])=[$(*)]"))
> > ((1, 0, 2),)
> >
> > it prints out atomic positions 1, 0, 2--three positions. But I would
> expect only one position for the Carbon in the middle.
>
> The $(*) finds the pattern, which is a "*" and in this case the terminal
> carbons, and returns it. The substructure search returns 3 positions
> because the first is [CH0;A;X2;!R], the second is the first atom of "*",
> and the third is the first atom of the other "*".
>
> If you only want the first atom the entire pattern, then put the entire
> pattern in a recursive SMARTS, as in:
>
>   [$([CH0;A;X2;!R](=*)=*)]
>
> >>> pat = Chem.MolFromSmarts("[$([CH0;A;X2;!R](=*)=*)]")
> >>> mol = Chem.MolFromSmiles('C=C=O')
> >>> mol.GetSubstructMatches(pat)
> ((1,),)
>
> > Similarly, if I search "C#C" using "[CH1;A;X2;!R]#[$(*)]",
> > >>> from rdkit import Chem
> > >>> m = Chem.MolFromSmiles('C#C')
> > >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH1;A;X2;!R]#[$(*)]"))
> > ((0, 1),)
> > I would expect two separate positions such as (0,), (1,), indicating
> there are two carbon triple bonds (with an hydrogen).
>
> Since you are only looking for a single atom, try putting the entire
> pattern in a recursive SMARTS, as in
>
>   [$([CH1;A;X2;!R]#*)]
>
> >>> mol = Chem.MolFromSmiles("C#C")
> >>> pat = Chem.MolFromSmarts("[$([CH1;A;X2;!R]#*)]")
> >>> mol.GetSubstructMatches(pat)
> ((0,), (1,))
>
>
> > Then if  if I search "CC#CC" using " [CH0;A;X2;!R]#[$(*)]",
>
> I believe you want "[$([CH0;A;X2;!R]#*)]"
>
> Thank you for your clear description of what you expected.
>
> Cheers,
>
>                                 Andrew
>                                 da...@dalkescientific.com
>
>
>
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 8 Nov 2017 14:42:08 -0600
> From: Chenyang Shi <cs3...@columbia.edu>
> To: Andrew Dalke <da...@dalkescientific.com>
> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
> Subject: Re: [Rdkit-discuss] SMARTS for =C=, #CH, #C-
> Message-ID:
>         <CAAj+MtcO39v0rc+EKE+ASejQmV6V1=H6YL8O1xbDLuSRk6qkG
> g...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Dear Andy,
>
> Thank you for a quick and thorough email. I find it very instructional,
> although I need to read it a couple times more to digest it.
>
> Cheers,
> Chenyang
>
> On Wed, Nov 8, 2017 at 2:27 PM, Andrew Dalke <da...@dalkescientific.com>
> wrote:
>
> > On Nov 8, 2017, at 21:00, Chenyang Shi <cs3...@columbia.edu> wrote:
> > > =C= : [CH0;A;X2;!R](=[$(*)])=[$(*)]
> >
> > The recursive SMARTS notation, which is the term inside of the [$(...)],
> > finds a match for the entire pattern and returns the first atom in that
> > pattern.
> >
> > > For example, if I search "C=C=O" using "[CH0;A;X2;!R](=[$(*)])=[$(*)]
> ",
> > > >>> from rdkit import Chem
> > > >>> m = Chem.MolFromSmiles('C=C=O')
> > > >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH0;A;X2;!R](=
> > [$(*)])=[$(*)]"))
> > > ((1, 0, 2),)
> > >
> > > it prints out atomic positions 1, 0, 2--three positions. But I would
> > expect only one position for the Carbon in the middle.
> >
> > The $(*) finds the pattern, which is a "*" and in this case the terminal
> > carbons, and returns it. The substructure search returns 3 positions
> > because the first is [CH0;A;X2;!R], the second is the first atom of "*",
> > and the third is the first atom of the other "*".
> >
> > If you only want the first atom the entire pattern, then put the entire
> > pattern in a recursive SMARTS, as in:
> >
> >   [$([CH0;A;X2;!R](=*)=*)]
> >
> > >>> pat = Chem.MolFromSmarts("[$([CH0;A;X2;!R](=*)=*)]")
> > >>> mol = Chem.MolFromSmiles('C=C=O')
> > >>> mol.GetSubstructMatches(pat)
> > ((1,),)
> >
> > > Similarly, if I search "C#C" using "[CH1;A;X2;!R]#[$(*)]",
> > > >>> from rdkit import Chem
> > > >>> m = Chem.MolFromSmiles('C#C')
> > > >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH1;A;X2;!R]#[$(*)]"))
> > > ((0, 1),)
> > > I would expect two separate positions such as (0,), (1,), indicating
> > there are two carbon triple bonds (with an hydrogen).
> >
> > Since you are only looking for a single atom, try putting the entire
> > pattern in a recursive SMARTS, as in
> >
> >   [$([CH1;A;X2;!R]#*)]
> >
> > >>> mol = Chem.MolFromSmiles("C#C")
> > >>> pat = Chem.MolFromSmarts("[$([CH1;A;X2;!R]#*)]")
> > >>> mol.GetSubstructMatches(pat)
> > ((0,), (1,))
> >
> >
> > > Then if  if I search "CC#CC" using " [CH0;A;X2;!R]#[$(*)]",
> >
> > I believe you want "[$([CH0;A;X2;!R]#*)]"
> >
> > Thank you for your clear description of what you expected.
> >
> > Cheers,
> >
> >                                 Andrew
> >                                 da...@dalkescientific.com
> >
> >
> >
> > ------------------------------------------------------------
> > ------------------
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > _______________________________________________
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 4
> Date: Wed, 8 Nov 2017 16:52:19 -0600
> From: Chenyang Shi <cs3...@columbia.edu>
> To: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
> Subject: [Rdkit-discuss] SMARTS for Joback and Reid method
> Message-ID:
>         <CAAj+MtdQLoHut6se7bjCWRpyVZd=bqRk-pR4MFZu=DCpdKWMHQ@mail.
> gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi everyone,
>
> I have been recently working on a project that implements Joback method
> using RDKit (https://en.wikipedia.org/wiki/Joback_method).
>
> I believe the core to the success of this project is to make the 41
> functional groups correctly represented by SMARTS code. I have compiled my
> own codes, see attachment. I would appreciate your review of it and let me
> know if you spot errors.
>
> I think building a robust/well-tested SMARTS database (though small in my
> case) would be helpful to others and other projects.
>
> Thank you,
> Chenyang
>
> PS: The ones highlighted red in the document are robust.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: SMARTS.docx
> Type: application/vnd.openxmlformats-officedocument.
> wordprocessingml.document
> Size: 16081 bytes
> Desc: not available
>
> ------------------------------
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> ------------------------------
>
> End of Rdkit-discuss Digest, Vol 121, Issue 15
> **********************************************
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to