Re: [Rdkit-discuss] question on rdRGroupDecomposition

2018-05-15 Thread Greg Landrum
On Wed, May 16, 2018 at 4:24 AM Patrick Walters  wrote:

>
> Don't expend a lot of effort on this.
>

I'm primarily curious to understand why this isn't behaving as we expect it
to. It's either a bug or something that should be documented.


> I ended up writing my own implementation of R-group decomposition.
>

ouch... I'm sorry. You shouldn't need to do that.

-greg



> Pat
>
> On Tue, May 15, 2018 at 10:00 PM Greg Landrum 
> wrote:
>
>> Hi Pat,
>>
>> This one has me stumped.
>> @Brian: do you understand what's going on here or should I fire up the
>> debugger?
>>
>> -greg
>>
>>
>>
>> On Mon, May 14, 2018 at 4:24 AM Patrick Walters 
>> wrote:
>>
>>> Hi All,
>>>
>>> I'm hoping someone can help me with rdRGroupDecomposition.  I'd like to
>>> be able to specify specific R-group locations AND match cases where R=H.
>>>  The example below illustrates what I'm talking about.
>>> When RGroupDecompositionParameters.onlyMatchAtRGroups = True, cases where R
>>> == H are skipped.  I tried putting an explicit hydrogen on the core to
>>> block a position, but it appears that the explicit hydrogen is ignored.
>>>
>>> from rdkit import Chem
>>> from rdkit.Chem.rdRGroupDecomposition import RGroupDecomposition,
>>> RGroupDecompositionParameters
>>>
>>> # run an RGroupDecomposition on a set of molecules
>>> def process_r_groups(core_mol,rg_params,mols):
>>> rg = RGroupDecomposition(core_mol,rg_params)
>>> for mol in mol_list:
>>> rg.Add(mol)
>>> rg.Process()
>>> return [x for x in rg.GetRGroupsAsRows(asSmiles=True)]
>>>
>>>
>>> buff = """CCc1ccnc(C)n1
>>> Cc1ncccn1
>>> Cc1cnc(C)nc1"""
>>>
>>> mol_list = [Chem.MolFromSmiles(x) for x in buff.split("\n")]
>>> core = Chem.MolFromSmiles("[H]c1cc([2*])nc([1*])n1")
>>> # default parameters, note that 3 R-groups are returned, the
>>> # explicit hydrogen is ignored
>>> params_1 = RGroupDecompositionParameters()
>>> for row in process_r_groups(core,params_1,mol_list):
>>> print(row)
>>>
>>> print()
>>>
>>> params_2 = RGroupDecompositionParameters()
>>> params_2.onlyMatchAtRGroups = True
>>> # run with the onlyMatchAtRGroups parameter
>>> # now only one row is returned
>>> for row in process_r_groups(core,params_2,mol_list):
>>> print(row)
>>>
>>> The output from the script above is
>>>
>>> {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]',
>>> 'R2': '[H][*:2]', 'R3': '[H]C([H])([H])C([H])([H])[*:3]'}
>>> {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]',
>>> 'R2': '[H][*:2]', 'R3': '[H][*:3]'}
>>> {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]',
>>> 'R2': '[H]C([H])([H])[*:2]', 'R3': '[H][*:3]'}
>>>
>>> {'Core': 'c1cc([*:2])nc([*:1])n1', 'R1': '[H]C([H])([H])[*:1]', 'R2':
>>> '[H]C([H])([H])C([H])([H])[*:2]'}
>>>
>>> I'd like to figure out how I can only get the substituents at the
>>> labeled positions, but have it match where R1 == H or R2 == H.
>>>
>>> Thanks in advance,
>>>
>>> Pat
>>>
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] question on rdRGroupDecomposition

2018-05-15 Thread Patrick Walters
Hi Greg,

Don't expend a lot of effort on this.  I ended up writing my own
implementation of R-group decomposition.

Pat

On Tue, May 15, 2018 at 10:00 PM Greg Landrum 
wrote:

> Hi Pat,
>
> This one has me stumped.
> @Brian: do you understand what's going on here or should I fire up the
> debugger?
>
> -greg
>
>
>
> On Mon, May 14, 2018 at 4:24 AM Patrick Walters 
> wrote:
>
>> Hi All,
>>
>> I'm hoping someone can help me with rdRGroupDecomposition.  I'd like to
>> be able to specify specific R-group locations AND match cases where R=H.
>>  The example below illustrates what I'm talking about.
>> When RGroupDecompositionParameters.onlyMatchAtRGroups = True, cases where R
>> == H are skipped.  I tried putting an explicit hydrogen on the core to
>> block a position, but it appears that the explicit hydrogen is ignored.
>>
>> from rdkit import Chem
>> from rdkit.Chem.rdRGroupDecomposition import RGroupDecomposition,
>> RGroupDecompositionParameters
>>
>> # run an RGroupDecomposition on a set of molecules
>> def process_r_groups(core_mol,rg_params,mols):
>> rg = RGroupDecomposition(core_mol,rg_params)
>> for mol in mol_list:
>> rg.Add(mol)
>> rg.Process()
>> return [x for x in rg.GetRGroupsAsRows(asSmiles=True)]
>>
>>
>> buff = """CCc1ccnc(C)n1
>> Cc1ncccn1
>> Cc1cnc(C)nc1"""
>>
>> mol_list = [Chem.MolFromSmiles(x) for x in buff.split("\n")]
>> core = Chem.MolFromSmiles("[H]c1cc([2*])nc([1*])n1")
>> # default parameters, note that 3 R-groups are returned, the
>> # explicit hydrogen is ignored
>> params_1 = RGroupDecompositionParameters()
>> for row in process_r_groups(core,params_1,mol_list):
>> print(row)
>>
>> print()
>>
>> params_2 = RGroupDecompositionParameters()
>> params_2.onlyMatchAtRGroups = True
>> # run with the onlyMatchAtRGroups parameter
>> # now only one row is returned
>> for row in process_r_groups(core,params_2,mol_list):
>> print(row)
>>
>> The output from the script above is
>>
>> {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]',
>> 'R2': '[H][*:2]', 'R3': '[H]C([H])([H])C([H])([H])[*:3]'}
>> {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]',
>> 'R2': '[H][*:2]', 'R3': '[H][*:3]'}
>> {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]',
>> 'R2': '[H]C([H])([H])[*:2]', 'R3': '[H][*:3]'}
>>
>> {'Core': 'c1cc([*:2])nc([*:1])n1', 'R1': '[H]C([H])([H])[*:1]', 'R2':
>> '[H]C([H])([H])C([H])([H])[*:2]'}
>>
>> I'd like to figure out how I can only get the substituents at the labeled
>> positions, but have it match where R1 == H or R2 == H.
>>
>> Thanks in advance,
>>
>> Pat
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] question on rdRGroupDecomposition

2018-05-15 Thread Greg Landrum
Hi Pat,

This one has me stumped.
@Brian: do you understand what's going on here or should I fire up the
debugger?

-greg



On Mon, May 14, 2018 at 4:24 AM Patrick Walters  wrote:

> Hi All,
>
> I'm hoping someone can help me with rdRGroupDecomposition.  I'd like to be
> able to specify specific R-group locations AND match cases where R=H.   The
> example below illustrates what I'm talking about.
> When RGroupDecompositionParameters.onlyMatchAtRGroups = True, cases where R
> == H are skipped.  I tried putting an explicit hydrogen on the core to
> block a position, but it appears that the explicit hydrogen is ignored.
>
> from rdkit import Chem
> from rdkit.Chem.rdRGroupDecomposition import RGroupDecomposition,
> RGroupDecompositionParameters
>
> # run an RGroupDecomposition on a set of molecules
> def process_r_groups(core_mol,rg_params,mols):
> rg = RGroupDecomposition(core_mol,rg_params)
> for mol in mol_list:
> rg.Add(mol)
> rg.Process()
> return [x for x in rg.GetRGroupsAsRows(asSmiles=True)]
>
>
> buff = """CCc1ccnc(C)n1
> Cc1ncccn1
> Cc1cnc(C)nc1"""
>
> mol_list = [Chem.MolFromSmiles(x) for x in buff.split("\n")]
> core = Chem.MolFromSmiles("[H]c1cc([2*])nc([1*])n1")
> # default parameters, note that 3 R-groups are returned, the
> # explicit hydrogen is ignored
> params_1 = RGroupDecompositionParameters()
> for row in process_r_groups(core,params_1,mol_list):
> print(row)
>
> print()
>
> params_2 = RGroupDecompositionParameters()
> params_2.onlyMatchAtRGroups = True
> # run with the onlyMatchAtRGroups parameter
> # now only one row is returned
> for row in process_r_groups(core,params_2,mol_list):
> print(row)
>
> The output from the script above is
>
> {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]',
> 'R2': '[H][*:2]', 'R3': '[H]C([H])([H])C([H])([H])[*:3]'}
> {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]',
> 'R2': '[H][*:2]', 'R3': '[H][*:3]'}
> {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]',
> 'R2': '[H]C([H])([H])[*:2]', 'R3': '[H][*:3]'}
>
> {'Core': 'c1cc([*:2])nc([*:1])n1', 'R1': '[H]C([H])([H])[*:1]', 'R2':
> '[H]C([H])([H])C([H])([H])[*:2]'}
>
> I'd like to figure out how I can only get the substituents at the labeled
> positions, but have it match where R1 == H or R2 == H.
>
> Thanks in advance,
>
> Pat
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] question on rdRGroupDecomposition

2018-05-13 Thread Patrick Walters
Hi All,

I'm hoping someone can help me with rdRGroupDecomposition.  I'd like to be
able to specify specific R-group locations AND match cases where R=H.   The
example below illustrates what I'm talking about.
When RGroupDecompositionParameters.onlyMatchAtRGroups = True, cases where R
== H are skipped.  I tried putting an explicit hydrogen on the core to
block a position, but it appears that the explicit hydrogen is ignored.

from rdkit import Chem
from rdkit.Chem.rdRGroupDecomposition import RGroupDecomposition,
RGroupDecompositionParameters

# run an RGroupDecomposition on a set of molecules
def process_r_groups(core_mol,rg_params,mols):
rg = RGroupDecomposition(core_mol,rg_params)
for mol in mol_list:
rg.Add(mol)
rg.Process()
return [x for x in rg.GetRGroupsAsRows(asSmiles=True)]


buff = """CCc1ccnc(C)n1
Cc1ncccn1
Cc1cnc(C)nc1"""

mol_list = [Chem.MolFromSmiles(x) for x in buff.split("\n")]
core = Chem.MolFromSmiles("[H]c1cc([2*])nc([1*])n1")
# default parameters, note that 3 R-groups are returned, the
# explicit hydrogen is ignored
params_1 = RGroupDecompositionParameters()
for row in process_r_groups(core,params_1,mol_list):
print(row)

print()

params_2 = RGroupDecompositionParameters()
params_2.onlyMatchAtRGroups = True
# run with the onlyMatchAtRGroups parameter
# now only one row is returned
for row in process_r_groups(core,params_2,mol_list):
print(row)

The output from the script above is

{'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', 'R2':
'[H][*:2]', 'R3': '[H]C([H])([H])C([H])([H])[*:3]'}
{'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', 'R2':
'[H][*:2]', 'R3': '[H][*:3]'}
{'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', 'R2':
'[H]C([H])([H])[*:2]', 'R3': '[H][*:3]'}

{'Core': 'c1cc([*:2])nc([*:1])n1', 'R1': '[H]C([H])([H])[*:1]', 'R2':
'[H]C([H])([H])C([H])([H])[*:2]'}

I'd like to figure out how I can only get the substituents at the labeled
positions, but have it match where R1 == H or R2 == H.

Thanks in advance,

Pat
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss