Re: [Rdkit-discuss] question on rdRGroupDecomposition
On Wed, May 16, 2018 at 4:24 AM Patrick Walterswrote: > > Don't expend a lot of effort on this. > I'm primarily curious to understand why this isn't behaving as we expect it to. It's either a bug or something that should be documented. > I ended up writing my own implementation of R-group decomposition. > ouch... I'm sorry. You shouldn't need to do that. -greg > Pat > > On Tue, May 15, 2018 at 10:00 PM Greg Landrum > wrote: > >> Hi Pat, >> >> This one has me stumped. >> @Brian: do you understand what's going on here or should I fire up the >> debugger? >> >> -greg >> >> >> >> On Mon, May 14, 2018 at 4:24 AM Patrick Walters >> wrote: >> >>> Hi All, >>> >>> I'm hoping someone can help me with rdRGroupDecomposition. I'd like to >>> be able to specify specific R-group locations AND match cases where R=H. >>> The example below illustrates what I'm talking about. >>> When RGroupDecompositionParameters.onlyMatchAtRGroups = True, cases where R >>> == H are skipped. I tried putting an explicit hydrogen on the core to >>> block a position, but it appears that the explicit hydrogen is ignored. >>> >>> from rdkit import Chem >>> from rdkit.Chem.rdRGroupDecomposition import RGroupDecomposition, >>> RGroupDecompositionParameters >>> >>> # run an RGroupDecomposition on a set of molecules >>> def process_r_groups(core_mol,rg_params,mols): >>> rg = RGroupDecomposition(core_mol,rg_params) >>> for mol in mol_list: >>> rg.Add(mol) >>> rg.Process() >>> return [x for x in rg.GetRGroupsAsRows(asSmiles=True)] >>> >>> >>> buff = """CCc1ccnc(C)n1 >>> Cc1ncccn1 >>> Cc1cnc(C)nc1""" >>> >>> mol_list = [Chem.MolFromSmiles(x) for x in buff.split("\n")] >>> core = Chem.MolFromSmiles("[H]c1cc([2*])nc([1*])n1") >>> # default parameters, note that 3 R-groups are returned, the >>> # explicit hydrogen is ignored >>> params_1 = RGroupDecompositionParameters() >>> for row in process_r_groups(core,params_1,mol_list): >>> print(row) >>> >>> print() >>> >>> params_2 = RGroupDecompositionParameters() >>> params_2.onlyMatchAtRGroups = True >>> # run with the onlyMatchAtRGroups parameter >>> # now only one row is returned >>> for row in process_r_groups(core,params_2,mol_list): >>> print(row) >>> >>> The output from the script above is >>> >>> {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', >>> 'R2': '[H][*:2]', 'R3': '[H]C([H])([H])C([H])([H])[*:3]'} >>> {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', >>> 'R2': '[H][*:2]', 'R3': '[H][*:3]'} >>> {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', >>> 'R2': '[H]C([H])([H])[*:2]', 'R3': '[H][*:3]'} >>> >>> {'Core': 'c1cc([*:2])nc([*:1])n1', 'R1': '[H]C([H])([H])[*:1]', 'R2': >>> '[H]C([H])([H])C([H])([H])[*:2]'} >>> >>> I'd like to figure out how I can only get the substituents at the >>> labeled positions, but have it match where R1 == H or R2 == H. >>> >>> Thanks in advance, >>> >>> Pat >>> >>> -- >>> Check out the vibrant tech community on one of the world's most >>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >>> ___ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >> -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] question on rdRGroupDecomposition
Hi Greg, Don't expend a lot of effort on this. I ended up writing my own implementation of R-group decomposition. Pat On Tue, May 15, 2018 at 10:00 PM Greg Landrumwrote: > Hi Pat, > > This one has me stumped. > @Brian: do you understand what's going on here or should I fire up the > debugger? > > -greg > > > > On Mon, May 14, 2018 at 4:24 AM Patrick Walters > wrote: > >> Hi All, >> >> I'm hoping someone can help me with rdRGroupDecomposition. I'd like to >> be able to specify specific R-group locations AND match cases where R=H. >> The example below illustrates what I'm talking about. >> When RGroupDecompositionParameters.onlyMatchAtRGroups = True, cases where R >> == H are skipped. I tried putting an explicit hydrogen on the core to >> block a position, but it appears that the explicit hydrogen is ignored. >> >> from rdkit import Chem >> from rdkit.Chem.rdRGroupDecomposition import RGroupDecomposition, >> RGroupDecompositionParameters >> >> # run an RGroupDecomposition on a set of molecules >> def process_r_groups(core_mol,rg_params,mols): >> rg = RGroupDecomposition(core_mol,rg_params) >> for mol in mol_list: >> rg.Add(mol) >> rg.Process() >> return [x for x in rg.GetRGroupsAsRows(asSmiles=True)] >> >> >> buff = """CCc1ccnc(C)n1 >> Cc1ncccn1 >> Cc1cnc(C)nc1""" >> >> mol_list = [Chem.MolFromSmiles(x) for x in buff.split("\n")] >> core = Chem.MolFromSmiles("[H]c1cc([2*])nc([1*])n1") >> # default parameters, note that 3 R-groups are returned, the >> # explicit hydrogen is ignored >> params_1 = RGroupDecompositionParameters() >> for row in process_r_groups(core,params_1,mol_list): >> print(row) >> >> print() >> >> params_2 = RGroupDecompositionParameters() >> params_2.onlyMatchAtRGroups = True >> # run with the onlyMatchAtRGroups parameter >> # now only one row is returned >> for row in process_r_groups(core,params_2,mol_list): >> print(row) >> >> The output from the script above is >> >> {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', >> 'R2': '[H][*:2]', 'R3': '[H]C([H])([H])C([H])([H])[*:3]'} >> {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', >> 'R2': '[H][*:2]', 'R3': '[H][*:3]'} >> {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', >> 'R2': '[H]C([H])([H])[*:2]', 'R3': '[H][*:3]'} >> >> {'Core': 'c1cc([*:2])nc([*:1])n1', 'R1': '[H]C([H])([H])[*:1]', 'R2': >> '[H]C([H])([H])C([H])([H])[*:2]'} >> >> I'd like to figure out how I can only get the substituents at the labeled >> positions, but have it match where R1 == H or R2 == H. >> >> Thanks in advance, >> >> Pat >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] question on rdRGroupDecomposition
Hi Pat, This one has me stumped. @Brian: do you understand what's going on here or should I fire up the debugger? -greg On Mon, May 14, 2018 at 4:24 AM Patrick Walterswrote: > Hi All, > > I'm hoping someone can help me with rdRGroupDecomposition. I'd like to be > able to specify specific R-group locations AND match cases where R=H. The > example below illustrates what I'm talking about. > When RGroupDecompositionParameters.onlyMatchAtRGroups = True, cases where R > == H are skipped. I tried putting an explicit hydrogen on the core to > block a position, but it appears that the explicit hydrogen is ignored. > > from rdkit import Chem > from rdkit.Chem.rdRGroupDecomposition import RGroupDecomposition, > RGroupDecompositionParameters > > # run an RGroupDecomposition on a set of molecules > def process_r_groups(core_mol,rg_params,mols): > rg = RGroupDecomposition(core_mol,rg_params) > for mol in mol_list: > rg.Add(mol) > rg.Process() > return [x for x in rg.GetRGroupsAsRows(asSmiles=True)] > > > buff = """CCc1ccnc(C)n1 > Cc1ncccn1 > Cc1cnc(C)nc1""" > > mol_list = [Chem.MolFromSmiles(x) for x in buff.split("\n")] > core = Chem.MolFromSmiles("[H]c1cc([2*])nc([1*])n1") > # default parameters, note that 3 R-groups are returned, the > # explicit hydrogen is ignored > params_1 = RGroupDecompositionParameters() > for row in process_r_groups(core,params_1,mol_list): > print(row) > > print() > > params_2 = RGroupDecompositionParameters() > params_2.onlyMatchAtRGroups = True > # run with the onlyMatchAtRGroups parameter > # now only one row is returned > for row in process_r_groups(core,params_2,mol_list): > print(row) > > The output from the script above is > > {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', > 'R2': '[H][*:2]', 'R3': '[H]C([H])([H])C([H])([H])[*:3]'} > {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', > 'R2': '[H][*:2]', 'R3': '[H][*:3]'} > {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', > 'R2': '[H]C([H])([H])[*:2]', 'R3': '[H][*:3]'} > > {'Core': 'c1cc([*:2])nc([*:1])n1', 'R1': '[H]C([H])([H])[*:1]', 'R2': > '[H]C([H])([H])C([H])([H])[*:2]'} > > I'd like to figure out how I can only get the substituents at the labeled > positions, but have it match where R1 == H or R2 == H. > > Thanks in advance, > > Pat > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] question on rdRGroupDecomposition
Hi All, I'm hoping someone can help me with rdRGroupDecomposition. I'd like to be able to specify specific R-group locations AND match cases where R=H. The example below illustrates what I'm talking about. When RGroupDecompositionParameters.onlyMatchAtRGroups = True, cases where R == H are skipped. I tried putting an explicit hydrogen on the core to block a position, but it appears that the explicit hydrogen is ignored. from rdkit import Chem from rdkit.Chem.rdRGroupDecomposition import RGroupDecomposition, RGroupDecompositionParameters # run an RGroupDecomposition on a set of molecules def process_r_groups(core_mol,rg_params,mols): rg = RGroupDecomposition(core_mol,rg_params) for mol in mol_list: rg.Add(mol) rg.Process() return [x for x in rg.GetRGroupsAsRows(asSmiles=True)] buff = """CCc1ccnc(C)n1 Cc1ncccn1 Cc1cnc(C)nc1""" mol_list = [Chem.MolFromSmiles(x) for x in buff.split("\n")] core = Chem.MolFromSmiles("[H]c1cc([2*])nc([1*])n1") # default parameters, note that 3 R-groups are returned, the # explicit hydrogen is ignored params_1 = RGroupDecompositionParameters() for row in process_r_groups(core,params_1,mol_list): print(row) print() params_2 = RGroupDecompositionParameters() params_2.onlyMatchAtRGroups = True # run with the onlyMatchAtRGroups parameter # now only one row is returned for row in process_r_groups(core,params_2,mol_list): print(row) The output from the script above is {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', 'R2': '[H][*:2]', 'R3': '[H]C([H])([H])C([H])([H])[*:3]'} {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', 'R2': '[H][*:2]', 'R3': '[H][*:3]'} {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', 'R2': '[H]C([H])([H])[*:2]', 'R3': '[H][*:3]'} {'Core': 'c1cc([*:2])nc([*:1])n1', 'R1': '[H]C([H])([H])[*:1]', 'R2': '[H]C([H])([H])C([H])([H])[*:2]'} I'd like to figure out how I can only get the substituents at the labeled positions, but have it match where R1 == H or R2 == H. Thanks in advance, Pat -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss