Re: [Rdkit-discuss] Exhaustive Library Enumeration

2018-01-17 Thread Christos Kannas
Hi Andy,

A better option is to sanitize the products of a reaction enumeration
before using them as reactants.
Look at this example from RDKit "Getting Started" documentation.

Note that the molecules that are produced by the chemical reaction
processing code are not sanitized, as this artificial reaction demonstrates:

>>> rxn = 
>>> AllChem.ReactionFromSmarts('[C:1]=[C:2][C:3]=[C:4].[C:5]=[C:6]>>[C:1]1=[C:2][C:3]=[C:4][C:5]=[C:6]1')>>>
>>>  ps = rxn.RunReactants((Chem.MolFromSmiles('C=CC=C'), 
>>> Chem.MolFromSmiles('C=C')))>>> Chem.MolToSmiles(ps[0][0])'C1=CC=CC=C1'>>> 
>>> p0 = ps[0][0]>>> 
>>> Chem.SanitizeMol(p0)rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE>>> 
>>> Chem.MolToSmiles(p0)'c1c1'

​PS: ​I forgot that the results of a reaction enumeration were not
sanitised, until I so the error in the command line.

Best,

Christos

Christos Kannas

Chem[o]informatics Researcher & Software Developer

[image: View Christos Kannas's profile on LinkedIn]


On 18 January 2018 at 00:07, Andy Jennings 
wrote:

> Hi Christos,
>
> Many thanks for the reply. I hadn't appreciated that the presence of a
> single invalid reagent would bring the entire thing crashing down, rather
> than issuing a warning/error and moving onto other molecules in the set.
> Good to know, and I'll have to be less lazy in my code ;-)
>
> Best,
> Andy
>
> On Wed, Jan 17, 2018 at 1:56 PM, Christos Kannas 
> wrote:
>
>> Hi Andy,
>>
>> The reason that your code breaks is that the second product of the third
>> iteration ( 'NN(Cc1c1)(Cc1c1)Cc1c1') is not a valid
>> molecule.
>> And when calling Chem.MolFromSmiles( 'NN(Cc1c1)(Cc1c1)Cc1
>> c1') it creates a None object.
>> So you have to filter out the molecules that are not valid.
>>
>> See this Jupyter Notebook
>>  at
>> cell 5 the 1st line in the while loop.
>>
>> Best,
>>
>> Christos
>>
>> Christos Kannas
>>
>> Chem[o]informatics Researcher & Software Developer
>>
>> [image: View Christos Kannas's profile on LinkedIn]
>> 
>>
>> On 17 January 2018 at 18:16, Andy Jennings 
>> wrote:
>>
>>> Hi RDKitters,
>>>
>>> I have a question and an observation on the topic of library enumeration.
>>>
>>> First, the question: is there a call within RDKit to trigger the
>>> exhaustive reaction of reagents? For example, if I have two reagents - a
>>> primary amine and an akyl chloride - can I tell RDKit to enumerate the
>>> reaction as though there were an excess of each reagent? In my case here
>>> the reaction would continue until the alkylation can no longer occur
>>> because there are no more valences available on the amine and I would
>>> either be tri-alkylated for a neutral product or quat-alkylated for a
>>> positively charged product
>>> e.g. CCN + RCl -> CCN(R)(R)R or CC[N+](R)(R)(R)R
>>>
>>> This brings me to my observation. When I try to attempt exactly this by
>>> repeatedly exposing the product to the reagent again I am able to drive it
>>> to exhaustion *in some cases*.
>>>
>>> For example, in the example above where RCl is benzyl chloride and my
>>> smirks is:
>>> [#7:1].[#6:2][Cl:3]>>[#6:2][#7:1].[Cl:3]'
>>> I do drive the final product to be exclusively the tri-akylated amine.
>>> Success.
>>>
>>> However, when I attempt the same thing with an amine with more than one
>>> reactive nitrogen (e.g. NN) I don't get a single product with 6
>>> alkylations, I get two unique product each with three alkylations. One
>>> product has two alkylations on the first nitrogen and one on the second,
>>> the other product has three alkylations on the first nitrogen and none on
>>> the second. Attempting to drive the reaction once again leads to a
>>> 'reaction called with None reactants' ValueError. My dreadful code is below
>>> and the output is
>>> Reaction 1: ['NNCc1c1']
>>> Reaction 2: ['NN(Cc1c1)Cc1c1', 'c1ccc(CNNCc2c2)cc1']
>>> Reaction 3: ['c1ccc(CNN(Cc2c2)Cc2c2)cc1',
>>> 'NN(Cc1c1)(Cc1c1)Cc1c1']
>>> Reaction 4: ValueError
>>>
>>> Any pointers would be great, as would any pre-existing library
>>> enumeration code. The examples I've found shipped with RDKit don't appear
>>> to allow me to name the products using a combination of the reagent names
>>> (useful for tracking library content).
>>>
>>> Best,
>>> Andy
>>>
>>>  Code snippet 
>>>
>>> amine = Chem.MolFromSmiles('NN')
>>> acyl = Chem.MolFromSmiles('c1c1CCl')
>>> rxn = AllChem.ReactionFromSmarts('[#7:1].[#6:2][Cl:3]>>[#6:2][#7:1
>>> ].[Cl:3]')
>>>
>>> # First reaction
>>> reactantListMols = [amine,acyl]
>>> prods = AllChem.EnumerateLibraryFromReaction(rxn,[reactantListMols,r
>>> eactantListMols])
>>> prods = list(prods)
>>> smis = list(set([Chem.MolToSmiles(x[0],isomericSmiles=True) for x in
>>> prods]))
>>> 

Re: [Rdkit-discuss] Exhaustive Library Enumeration

2018-01-17 Thread Andy Jennings
Hi Christos,

Many thanks for the reply. I hadn't appreciated that the presence of a
single invalid reagent would bring the entire thing crashing down, rather
than issuing a warning/error and moving onto other molecules in the set.
Good to know, and I'll have to be less lazy in my code ;-)

Best,
Andy

On Wed, Jan 17, 2018 at 1:56 PM, Christos Kannas 
wrote:

> Hi Andy,
>
> The reason that your code breaks is that the second product of the third
> iteration ( 'NN(Cc1c1)(Cc1c1)Cc1c1') is not a valid
> molecule.
> And when calling Chem.MolFromSmiles( 'NN(Cc1c1)(Cc1c1)Cc
> 1c1') it creates a None object.
> So you have to filter out the molecules that are not valid.
>
> See this Jupyter Notebook
>  at
> cell 5 the 1st line in the while loop.
>
> Best,
>
> Christos
>
> Christos Kannas
>
> Chem[o]informatics Researcher & Software Developer
>
> [image: View Christos Kannas's profile on LinkedIn]
> 
>
> On 17 January 2018 at 18:16, Andy Jennings 
> wrote:
>
>> Hi RDKitters,
>>
>> I have a question and an observation on the topic of library enumeration.
>>
>> First, the question: is there a call within RDKit to trigger the
>> exhaustive reaction of reagents? For example, if I have two reagents - a
>> primary amine and an akyl chloride - can I tell RDKit to enumerate the
>> reaction as though there were an excess of each reagent? In my case here
>> the reaction would continue until the alkylation can no longer occur
>> because there are no more valences available on the amine and I would
>> either be tri-alkylated for a neutral product or quat-alkylated for a
>> positively charged product
>> e.g. CCN + RCl -> CCN(R)(R)R or CC[N+](R)(R)(R)R
>>
>> This brings me to my observation. When I try to attempt exactly this by
>> repeatedly exposing the product to the reagent again I am able to drive it
>> to exhaustion *in some cases*.
>>
>> For example, in the example above where RCl is benzyl chloride and my
>> smirks is:
>> [#7:1].[#6:2][Cl:3]>>[#6:2][#7:1].[Cl:3]'
>> I do drive the final product to be exclusively the tri-akylated amine.
>> Success.
>>
>> However, when I attempt the same thing with an amine with more than one
>> reactive nitrogen (e.g. NN) I don't get a single product with 6
>> alkylations, I get two unique product each with three alkylations. One
>> product has two alkylations on the first nitrogen and one on the second,
>> the other product has three alkylations on the first nitrogen and none on
>> the second. Attempting to drive the reaction once again leads to a
>> 'reaction called with None reactants' ValueError. My dreadful code is below
>> and the output is
>> Reaction 1: ['NNCc1c1']
>> Reaction 2: ['NN(Cc1c1)Cc1c1', 'c1ccc(CNNCc2c2)cc1']
>> Reaction 3: ['c1ccc(CNN(Cc2c2)Cc2c2)cc1',
>> 'NN(Cc1c1)(Cc1c1)Cc1c1']
>> Reaction 4: ValueError
>>
>> Any pointers would be great, as would any pre-existing library
>> enumeration code. The examples I've found shipped with RDKit don't appear
>> to allow me to name the products using a combination of the reagent names
>> (useful for tracking library content).
>>
>> Best,
>> Andy
>>
>>  Code snippet 
>>
>> amine = Chem.MolFromSmiles('NN')
>> acyl = Chem.MolFromSmiles('c1c1CCl')
>> rxn = AllChem.ReactionFromSmarts('[#7:1].[#6:2][Cl:3]>>[#6:2][#7:1
>> ].[Cl:3]')
>>
>> # First reaction
>> reactantListMols = [amine,acyl]
>> prods = AllChem.EnumerateLibraryFromReaction(rxn,[reactantListMols,r
>> eactantListMols])
>> prods = list(prods)
>> smis = list(set([Chem.MolToSmiles(x[0],isomericSmiles=True) for x in
>> prods]))
>> print smis
>> # ['NNCc1c1']
>>
>> # Now repeat until doom
>> for i in range(0,10):
>> oldproducts = [Chem.MolFromSmiles(x) for x in smis]
>> reactantListMols = oldproducts + [acyl]
>> prods = AllChem.EnumerateLibraryFromReaction(rxn,[reactantListMols,r
>> eactantListMols])
>> prods = list(prods)
>> smis = list(set([Chem.MolToSmiles(x[0],isomericSmiles=True) for x in
>> prods]))
>> print smis
>>
>>  End Code 
>>
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net

Re: [Rdkit-discuss] Exhaustive Library Enumeration

2018-01-17 Thread Christos Kannas
Hi Andy,

The reason that your code breaks is that the second product of the third
iteration ( 'NN(Cc1c1)(Cc1c1)Cc1c1') is not a valid
molecule.
And when calling Chem.MolFromSmiles( 'NN(Cc1c1)(Cc1c1)Cc1c1')
it creates a None object.
So you have to filter out the molecules that are not valid.

See this Jupyter Notebook
 at cell
5 the 1st line in the while loop.

Best,

Christos

Christos Kannas

Chem[o]informatics Researcher & Software Developer

[image: View Christos Kannas's profile on LinkedIn]


On 17 January 2018 at 18:16, Andy Jennings 
wrote:

> Hi RDKitters,
>
> I have a question and an observation on the topic of library enumeration.
>
> First, the question: is there a call within RDKit to trigger the
> exhaustive reaction of reagents? For example, if I have two reagents - a
> primary amine and an akyl chloride - can I tell RDKit to enumerate the
> reaction as though there were an excess of each reagent? In my case here
> the reaction would continue until the alkylation can no longer occur
> because there are no more valences available on the amine and I would
> either be tri-alkylated for a neutral product or quat-alkylated for a
> positively charged product
> e.g. CCN + RCl -> CCN(R)(R)R or CC[N+](R)(R)(R)R
>
> This brings me to my observation. When I try to attempt exactly this by
> repeatedly exposing the product to the reagent again I am able to drive it
> to exhaustion *in some cases*.
>
> For example, in the example above where RCl is benzyl chloride and my
> smirks is:
> [#7:1].[#6:2][Cl:3]>>[#6:2][#7:1].[Cl:3]'
> I do drive the final product to be exclusively the tri-akylated amine.
> Success.
>
> However, when I attempt the same thing with an amine with more than one
> reactive nitrogen (e.g. NN) I don't get a single product with 6
> alkylations, I get two unique product each with three alkylations. One
> product has two alkylations on the first nitrogen and one on the second,
> the other product has three alkylations on the first nitrogen and none on
> the second. Attempting to drive the reaction once again leads to a
> 'reaction called with None reactants' ValueError. My dreadful code is below
> and the output is
> Reaction 1: ['NNCc1c1']
> Reaction 2: ['NN(Cc1c1)Cc1c1', 'c1ccc(CNNCc2c2)cc1']
> Reaction 3: ['c1ccc(CNN(Cc2c2)Cc2c2)cc1',
> 'NN(Cc1c1)(Cc1c1)Cc1c1']
> Reaction 4: ValueError
>
> Any pointers would be great, as would any pre-existing library enumeration
> code. The examples I've found shipped with RDKit don't appear to allow me
> to name the products using a combination of the reagent names (useful for
> tracking library content).
>
> Best,
> Andy
>
>  Code snippet 
>
> amine = Chem.MolFromSmiles('NN')
> acyl = Chem.MolFromSmiles('c1c1CCl')
> rxn = AllChem.ReactionFromSmarts('[#7:1].[#6:2][Cl:3]>>[#6:2][#7:
> 1].[Cl:3]')
>
> # First reaction
> reactantListMols = [amine,acyl]
> prods = AllChem.EnumerateLibraryFromReaction(rxn,[reactantListMols,
> reactantListMols])
> prods = list(prods)
> smis = list(set([Chem.MolToSmiles(x[0],isomericSmiles=True) for x in
> prods]))
> print smis
> # ['NNCc1c1']
>
> # Now repeat until doom
> for i in range(0,10):
> oldproducts = [Chem.MolFromSmiles(x) for x in smis]
> reactantListMols = oldproducts + [acyl]
> prods = AllChem.EnumerateLibraryFromReaction(rxn,[reactantListMols,
> reactantListMols])
> prods = list(prods)
> smis = list(set([Chem.MolToSmiles(x[0],isomericSmiles=True) for x in
> prods]))
> print smis
>
>  End Code 
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss