Re: [Rdkit-discuss] Permutation of multiple enumeration

Patrick Walters Wed, 06 Jul 2022 17:23:01 -0700

Here's a simple example showing the enumeration of a 3 component library
based on a reaction
https://gist.github.com/PatWalters/7439099598b4f08a331a81b209f88baa



On Wed, Jul 6, 2022 at 4:57 PM Andrew Dalke <da...@dalkescientific.com>
wrote:

> Hi Carsten,
>
>   How are the fragments expressed? With attachment points marked with
> "[*:1]", "[*:2]" and "[*:3]" atoms?
>
> One technique is to rewrite the SMILES to use closures. (See
> https://onlinelibrary.wiley.com/doi/10.1002/qsar.200310008 or
> http://www.dalkescientific.com/writings/diary/archive/2005/05/07/attachment_points.html
> ).
>
> For example, if your core SMILES are:
>
> [*:1]c1ncc([*:2])cn1
> CC([*:2])O[*:1]
>
> and your R1 contains
>
> *F
> Cl*
> Br*
>
> and your R2 contains
>
> *CCO
> CO*
>
> then you could rewrite these to use "%91" to connect the [*:1] with the R1
> "*" and use "%92" to connect the [*:2] with the R2 "*", using
> dot-disconnected terms.
>
> For example:
>
>   [*:1]c1ncc([*:2])cn1 + *F + *CCO
>
> can be rewritten as
>
>   c%911ncc%92cn1.F%91.C%92CO
>
> which is parsed and canonicalized to:
>
>   OCCc1cnc(F)nc1
>
> Rewriting the SMILES this way is a bit tricky. I've attached a program
> which does it for you.
>
>
> Running it on the above gives:
>
> % cat core.smi
> [*:1]c1ncc([*:2])cn1
> CC([*:2])N[*:1]
>
> % cat r1.smi
> *F
> Cl*
> Br*
>
> % cat r2.smi
> *CCO
> CO*
>
> % python enumerate.py --R1 r1.smi --R2 r2.smi core.smi
> c1%91ncc%92cn1.F%91.C%92CO -> OCCc1cnc(F)nc1
> c1%91ncc%92cn1.F%91.CO%92 -> COc1cnc(F)nc1
> c1%91ncc%92cn1.Cl%91.C%92CO -> OCCc1cnc(Cl)nc1
> c1%91ncc%92cn1.Cl%91.CO%92 -> COc1cnc(Cl)nc1
> c1%91ncc%92cn1.Br%91.C%92CO -> OCCc1cnc(Br)nc1
> c1%91ncc%92cn1.Br%91.CO%92 -> COc1cnc(Br)nc1
> CC(O%91)%92.F%91.C%92CO -> CC(CCO)OF
> CC(O%91)%92.F%91.CO%92 -> COC(C)OF
> CC(O%91)%92.Cl%91.C%92CO -> CC(CCO)OCl
> CC(O%91)%92.Cl%91.CO%92 -> COC(C)OCl
> CC(O%91)%92.Br%91.C%92CO -> CC(CCO)OBr
> CC(O%91)%92.Br%91.CO%92 -> COC(C)OBr
>
> It also supports --R3 if your core has 3 R-groups, with the third core
> point labeled [*:3].
>
> Best regards
>
>
>                                 Andrew
>                                 da...@dalkescientific.com
>
>
>
>
>
> > On Jul 6, 2022, at 21:00, Carsten Bauer <carsten.ba...@bluewin.ch>
> wrote:
> >
> > Hello
> >
> > I have a structure with three substituents R1, R2 and R3
> > R1 is an enumeration of 30+ SMILES
> > R2 and R3 each is an enumeration of <5 SMILES
> > Chemical space = 30 x 5 x 5 = 750+ in-silico compounds
> >
> > Can anyone share (i.e publish in a citable form) an RDKit code for this
> permutation?
> > Is there a textbook example illustrating this daily question from the
> lab in an example, please?
> >
> > I can’t follow
> > https://www.rdkit.org/docs/cppapi/EnumerationStrategyBase_8h_source.html
> >
> > Sorry.
> >
> > Many thanks for getting back.
> > Kindest regards
> > C.
> >
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Permutation of multiple enumeration

Reply via email to