Re: [Rdkit-discuss] GetSubstructMatches and unique match
Dear Paolo, Thank you very much. I'll test this and revert to you. Have a nice day. Best regards, Quoc-Tuan > Le 10 mai 2020 à 13:09, Paolo Tosco mailto:paolo.tosco.m...@gmail.com > a écrit : > > > Dear Quoc-Tuan, > > I think I have come with a reasonably fast algorithm that seems to be > more robust: > > https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409 > > Cheers, > p. > > On 06/05/2020 09:11, Quoc-Tuan DO wrote: > > > > Dear Paolo, > > > > > > > > Thank you again for your code. Sorry for bothering you again. It > works > > all fine for monoterpenes but not for diterpenes, sesquiterpenes nor > > triterpenes. > > > > > > > > pattern: C~C~C(~C)~C > > > > > > > > mol1: > CC(=O)O[C@H]1CC[C@]2([C@H](C1(C)C)CC=C([C@@H]2CC/C(=C/C(=O)O)/C)C)C > > > > > > > > => ((17, 18, 19, 20, 23), (16, 24, 13, 14, 15), (8, 9, 4, 12, 7)) > > > > > > > > It should find 4 distinct units. > > > > > > > > mol2: > OCC12CCC(C2C2C(CC1)(C)C1(C)CCC3C(C1CC2)(C)CCC(C3(C)C)O)C(=C)C > > > > > > > > => ((16, 25, 27, 17, 15), (18, 19, 12, 13, 14), (1, 2, 5, 6, 7)) > > > > > > > > It should find 6 distinct units. > > > > > > > > I tried with a smarts version of the pattern > > [#6]~[#6]~[#6](~[#6])~[#6], but got the same results as with smiles. > > > > > > > > What do you think? Is there something missing in the query? > > > > > > > > Thanks for your time, > > > > > > > > Best regards, > > > > > > > > QT > > > > > > > > > > > > Le 05/05/2020 à 14:52, Paolo Tosco a écrit : > > > > > > > > >> Dear Quoc-Tuan, > >> > >> this should do what you need: > >> > >> https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409 > >> > >> Cheers, > >> p. > >> > > > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] GetSubstructMatches and unique match
Dear Quoc-Tuan, I think I have come with a reasonably fast algorithm that seems to be more robust: https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409 Cheers, p. On 06/05/2020 09:11, Quoc-Tuan DO wrote: Dear Paolo, Thank you again for your code. Sorry for bothering you again. It works all fine for monoterpenes but not for diterpenes, sesquiterpenes nor triterpenes. pattern: C~C~C(~C)~C mol1: CC(=O)O[C@H]1CC[C@]2([C@H](C1(C)C)CC=C([C@@H]2CC/C(=C/C(=O)O)/C)C)C => ((17, 18, 19, 20, 23), (16, 24, 13, 14, 15), (8, 9, 4, 12, 7)) It should find 4 distinct units. mol2: OCC12CCC(C2C2C(CC1)(C)C1(C)CCC3C(C1CC2)(C)CCC(C3(C)C)O)C(=C)C => ((16, 25, 27, 17, 15), (18, 19, 12, 13, 14), (1, 2, 5, 6, 7)) It should find 6 distinct units. I tried with a smarts version of the pattern [#6]~[#6]~[#6](~[#6])~[#6], but got the same results as with smiles. What do you think? Is there something missing in the query? Thanks for your time, Best regards, QT Le 05/05/2020 à 14:52, Paolo Tosco a écrit : Dear Quoc-Tuan, this should do what you need: https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409 Cheers, p. ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] GetSubstructMatches and unique match
Dear Paolo, Thank you again for your code. Sorry for bothering you again. It works all fine for monoterpenes but not for diterpenes, sesquiterpenes nor triterpenes. pattern: C~C~C(~C)~C mol1: CC(=O)O[C@H]1CC[C@]2([C@H](C1(C)C)CC=C([C@@H]2CC/C(=C/C(=O)O)/C)C)C => ((17, 18, 19, 20, 23), (16, 24, 13, 14, 15), (8, 9, 4, 12, 7)) It should find 4 distinct units. mol2: OCC12CCC(C2C2C(CC1)(C)C1(C)CCC3C(C1CC2)(C)CCC(C3(C)C)O)C(=C)C => ((16, 25, 27, 17, 15), (18, 19, 12, 13, 14), (1, 2, 5, 6, 7)) It should find 6 distinct units. I tried with a smarts version of the pattern [#6]~[#6]~[#6](~[#6])~[#6], but got the same results as with smiles. What do you think? Is there something missing in the query? Thanks for your time, Best regards, QT Le 05/05/2020 à 14:52, Paolo Tosco a écrit : Dear Quoc-Tuan, this should do what you need: https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409 Cheers, p. ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] GetSubstructMatches and unique match
Dear Paolo, this answers my question as well, but in an unexpected way. Best, Jean-Marc Le 05/05/2020 à 14:52, Paolo Tosco a écrit : Dear Quoc-Tuan, this should do what you need: https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409 Cheers, p. On 05/05/2020 11:52, Quoc-Tuan DO wrote: Dear Paolo, Thank you for your reply. I understand now... I did not use uniquify option first then only uniquify=True. I thought the default would be uniquify=False. Actually my problem is to find 2 distinct units of isoprene (pattern) in the borneol (smiles) as the latter is a monoterpene. Do you have any idea I can do this ? Thanks in advance for your time. Best regards, QT Le 04/05/2020 à 19:53, Paolo Tosco a écrit : Dear Quoc-Tuan, On 04/05/2020 09:10, Greenpharma S.A.S. wrote: Dear All, Please could you help with the following problem (I could not find answers in discussion list) ? pattern='C~C~C(~C)~C' smiles='O[C@H]1C[C@H]2C([C@@]1(C)CC2)(C)C' pat = Chem.MolFromSmiles(pattern) mol = Chem.MolFromSmiles(smiles) res = mol.GetSubstructMatches(pat, uniquify=True) The results are: ((1, 2, 3, 4, 8), (1, 5, 4, 3, 9), (1, 5, 4, 3, 10), (1, 5, 4, 9, 10), (2, 1, 5, 4, 6), (2, 1, 5, 4, 7), (2, 1, 5, 6, 7), (2, 3, 4, 5, 9), (2, 3, 4, 5, 10), (2, 3, 4, 9, 10), (3, 4, 5, 1, 6), (3, 4, 5, 1, 7), (3, 4, 5, 6, 7), (5, 4, 3, 2, 8), (6, 5, 4, 3, 9), (6, 5, 4, 3, 10), (6, 5, 4, 9, 10), (7, 5, 4, 3, 9), (7, 5, 4, 3, 10), (7, 5, 4, 9, 10), (7, 8, 3, 2, 4), (8, 3, 4, 5, 9), (8, 3, 4, 5, 10), (8, 3, 4, 9, 10), (8, 7, 5, 1, 4), (8, 7, 5, 1, 6), (8, 7, 5, 4, 6), (9, 4, 3, 2, 8), (9, 4, 5, 1, 6), (9, 4, 5, 1, 7), (9, 4, 5, 6, 7), (10, 4, 3, 2, 8), (10, 4, 5, 1, 6), (10, 4, 5, 1, 7), (10, 4, 5, 6, 7)) I expect to have only 2 matches with uniquify=True as I only have 2 units of the pattern. GetSubstructMatches() will report all matches of the pattern against your molecule. In your case, there are 35 matches which are all constituted by different atom indices. Furthermore, with or without uniquify, I have the same answers. If you set uniquify=False, you actually get 70 matches, so twice as many answers. This time, matches can be constitued by the same indices, provided they are in a different permutation. I have uploaded a gist here: https://gist.github.com/ptosco/6d70cec235361fbaddc7cbc2cf9c3b5d that hopefully will make this clearer. Cheers, p. I also expected that there should be 2 "independent" lists but here, there is always at least one common atom between each list. Is there something misunderstood or misused? Thanks in advance for your help and explanations. Best regards, Quoc-Tuan ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Jean-Marc Nuzillard Directeur de Recherches au CNRS Institut de Chimie Moléculaire de Reims CNRS UMR 7312 Moulin de la Housse CPCBAI, Bâtiment 18 BP 1039 51687 REIMS Cedex 2 France Tel : 03 26 91 82 10 Fax : 03 26 91 31 66 http://www.univ-reims.fr/icmr http://eos.univ-reims.fr/LSD/CSNteam.html http://www.univ-reims.fr/LSD/ http://www.univ-reims.fr/LSD/JmnSoft/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] GetSubstructMatches and unique match
Dear Quoc-Tuan, this should do what you need: https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409 Cheers, p. On 05/05/2020 11:52, Quoc-Tuan DO wrote: Dear Paolo, Thank you for your reply. I understand now... I did not use uniquify option first then only uniquify=True. I thought the default would be uniquify=False. Actually my problem is to find 2 distinct units of isoprene (pattern) in the borneol (smiles) as the latter is a monoterpene. Do you have any idea I can do this ? Thanks in advance for your time. Best regards, QT Le 04/05/2020 à 19:53, Paolo Tosco a écrit : Dear Quoc-Tuan, On 04/05/2020 09:10, Greenpharma S.A.S. wrote: Dear All, Please could you help with the following problem (I could not find answers in discussion list) ? pattern='C~C~C(~C)~C' smiles='O[C@H]1C[C@H]2C([C@@]1(C)CC2)(C)C' pat = Chem.MolFromSmiles(pattern) mol = Chem.MolFromSmiles(smiles) res = mol.GetSubstructMatches(pat, uniquify=True) The results are: ((1, 2, 3, 4, 8), (1, 5, 4, 3, 9), (1, 5, 4, 3, 10), (1, 5, 4, 9, 10), (2, 1, 5, 4, 6), (2, 1, 5, 4, 7), (2, 1, 5, 6, 7), (2, 3, 4, 5, 9), (2, 3, 4, 5, 10), (2, 3, 4, 9, 10), (3, 4, 5, 1, 6), (3, 4, 5, 1, 7), (3, 4, 5, 6, 7), (5, 4, 3, 2, 8), (6, 5, 4, 3, 9), (6, 5, 4, 3, 10), (6, 5, 4, 9, 10), (7, 5, 4, 3, 9), (7, 5, 4, 3, 10), (7, 5, 4, 9, 10), (7, 8, 3, 2, 4), (8, 3, 4, 5, 9), (8, 3, 4, 5, 10), (8, 3, 4, 9, 10), (8, 7, 5, 1, 4), (8, 7, 5, 1, 6), (8, 7, 5, 4, 6), (9, 4, 3, 2, 8), (9, 4, 5, 1, 6), (9, 4, 5, 1, 7), (9, 4, 5, 6, 7), (10, 4, 3, 2, 8), (10, 4, 5, 1, 6), (10, 4, 5, 1, 7), (10, 4, 5, 6, 7)) I expect to have only 2 matches with uniquify=True as I only have 2 units of the pattern. GetSubstructMatches() will report all matches of the pattern against your molecule. In your case, there are 35 matches which are all constituted by different atom indices. Furthermore, with or without uniquify, I have the same answers. If you set uniquify=False, you actually get 70 matches, so twice as many answers. This time, matches can be constitued by the same indices, provided they are in a different permutation. I have uploaded a gist here: https://gist.github.com/ptosco/6d70cec235361fbaddc7cbc2cf9c3b5d that hopefully will make this clearer. Cheers, p. I also expected that there should be 2 "independent" lists but here, there is always at least one common atom between each list. Is there something misunderstood or misused? Thanks in advance for your help and explanations. Best regards, Quoc-Tuan ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] GetSubstructMatches and unique match
Dear Paolo, Thank you for your reply. I understand now... I did not use uniquify option first then only uniquify=True. I thought the default would be uniquify=False. Actually my problem is to find 2 distinct units of isoprene (pattern) in the borneol (smiles) as the latter is a monoterpene. Do you have any idea I can do this ? Thanks in advance for your time. Best regards, QT Le 04/05/2020 à 19:53, Paolo Tosco a écrit : Dear Quoc-Tuan, On 04/05/2020 09:10, Greenpharma S.A.S. wrote: Dear All, Please could you help with the following problem (I could not find answers in discussion list) ? pattern='C~C~C(~C)~C' smiles='O[C@H]1C[C@H]2C([C@@]1(C)CC2)(C)C' pat = Chem.MolFromSmiles(pattern) mol = Chem.MolFromSmiles(smiles) res = mol.GetSubstructMatches(pat, uniquify=True) The results are: ((1, 2, 3, 4, 8), (1, 5, 4, 3, 9), (1, 5, 4, 3, 10), (1, 5, 4, 9, 10), (2, 1, 5, 4, 6), (2, 1, 5, 4, 7), (2, 1, 5, 6, 7), (2, 3, 4, 5, 9), (2, 3, 4, 5, 10), (2, 3, 4, 9, 10), (3, 4, 5, 1, 6), (3, 4, 5, 1, 7), (3, 4, 5, 6, 7), (5, 4, 3, 2, 8), (6, 5, 4, 3, 9), (6, 5, 4, 3, 10), (6, 5, 4, 9, 10), (7, 5, 4, 3, 9), (7, 5, 4, 3, 10), (7, 5, 4, 9, 10), (7, 8, 3, 2, 4), (8, 3, 4, 5, 9), (8, 3, 4, 5, 10), (8, 3, 4, 9, 10), (8, 7, 5, 1, 4), (8, 7, 5, 1, 6), (8, 7, 5, 4, 6), (9, 4, 3, 2, 8), (9, 4, 5, 1, 6), (9, 4, 5, 1, 7), (9, 4, 5, 6, 7), (10, 4, 3, 2, 8), (10, 4, 5, 1, 6), (10, 4, 5, 1, 7), (10, 4, 5, 6, 7)) I expect to have only 2 matches with uniquify=True as I only have 2 units of the pattern. GetSubstructMatches() will report all matches of the pattern against your molecule. In your case, there are 35 matches which are all constituted by different atom indices. Furthermore, with or without uniquify, I have the same answers. If you set uniquify=False, you actually get 70 matches, so twice as many answers. This time, matches can be constitued by the same indices, provided they are in a different permutation. I have uploaded a gist here: https://gist.github.com/ptosco/6d70cec235361fbaddc7cbc2cf9c3b5d that hopefully will make this clearer. Cheers, p. I also expected that there should be 2 "independent" lists but here, there is always at least one common atom between each list. Is there something misunderstood or misused? Thanks in advance for your help and explanations. Best regards, Quoc-Tuan ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] GetSubstructMatches and unique match
Dear Quoc-Tuan, GetSubstructMatches() tries to find isoprene at all positions where this is possible. You may want to test your SMARTS and its matching with structures at this great place: https://smartsview.zbh.uni-hamburg.de/ Maybe you would prefer to known whether borneol follows the isoprene rule or not by trying to cover its structure with two, unbound, isoprene units. I really would like to know how to write that with SMARTS. Jean-Marc Le 04/05/2020 à 10:10, Greenpharma S.A.S. a écrit : Dear All, Please could you help with the following problem (I could not find answers in discussion list) ? pattern='C~C~C(~C)~C' smiles='O[C@H]1C[C@H]2C([C@@]1(C)CC2)(C)C' pat = Chem.MolFromSmiles(pattern) mol = Chem.MolFromSmiles(smiles) res = mol.GetSubstructMatches(pat, uniquify=True) The results are: ((1, 2, 3, 4, 8), (1, 5, 4, 3, 9), (1, 5, 4, 3, 10), (1, 5, 4, 9, 10), (2, 1, 5, 4, 6), (2, 1, 5, 4, 7), (2, 1, 5, 6, 7), (2, 3, 4, 5, 9), (2, 3, 4, 5, 10), (2, 3, 4, 9, 10), (3, 4, 5, 1, 6), (3, 4, 5, 1, 7), (3, 4, 5, 6, 7), (5, 4, 3, 2, 8), (6, 5, 4, 3, 9), (6, 5, 4, 3, 10), (6, 5, 4, 9, 10), (7, 5, 4, 3, 9), (7, 5, 4, 3, 10), (7, 5, 4, 9, 10), (7, 8, 3, 2, 4), (8, 3, 4, 5, 9), (8, 3, 4, 5, 10), (8, 3, 4, 9, 10), (8, 7, 5, 1, 4), (8, 7, 5, 1, 6), (8, 7, 5, 4, 6), (9, 4, 3, 2, 8), (9, 4, 5, 1, 6), (9, 4, 5, 1, 7), (9, 4, 5, 6, 7), (10, 4, 3, 2, 8), (10, 4, 5, 1, 6), (10, 4, 5, 1, 7), (10, 4, 5, 6, 7)) I expect to have only 2 matches with uniquify=True as I only have 2 units of the pattern. Furthermore, with or without uniquify, I have the same answers. I also expected that there should be 2 "independent" lists but here, there is always at least one common atom between each list. Is there something misunderstood or misused? Thanks in advance for your help and explanations. Best regards, Quoc-Tuan ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Jean-Marc Nuzillard Directeur de Recherches au CNRS Institut de Chimie Moléculaire de Reims CNRS UMR 7312 Moulin de la Housse CPCBAI, Bâtiment 18 BP 1039 51687 REIMS Cedex 2 France Tel : 03 26 91 82 10 Fax : 03 26 91 31 66 http://www.univ-reims.fr/icmr http://eos.univ-reims.fr/LSD/CSNteam.html http://www.univ-reims.fr/LSD/ http://www.univ-reims.fr/LSD/JmnSoft/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] GetSubstructMatches and unique match
Dear Quoc-Tuan, On 04/05/2020 09:10, Greenpharma S.A.S. wrote: Dear All, Please could you help with the following problem (I could not find answers in discussion list) ? pattern='C~C~C(~C)~C' smiles='O[C@H]1C[C@H]2C([C@@]1(C)CC2)(C)C' pat = Chem.MolFromSmiles(pattern) mol = Chem.MolFromSmiles(smiles) res = mol.GetSubstructMatches(pat, uniquify=True) The results are: ((1, 2, 3, 4, 8), (1, 5, 4, 3, 9), (1, 5, 4, 3, 10), (1, 5, 4, 9, 10), (2, 1, 5, 4, 6), (2, 1, 5, 4, 7), (2, 1, 5, 6, 7), (2, 3, 4, 5, 9), (2, 3, 4, 5, 10), (2, 3, 4, 9, 10), (3, 4, 5, 1, 6), (3, 4, 5, 1, 7), (3, 4, 5, 6, 7), (5, 4, 3, 2, 8), (6, 5, 4, 3, 9), (6, 5, 4, 3, 10), (6, 5, 4, 9, 10), (7, 5, 4, 3, 9), (7, 5, 4, 3, 10), (7, 5, 4, 9, 10), (7, 8, 3, 2, 4), (8, 3, 4, 5, 9), (8, 3, 4, 5, 10), (8, 3, 4, 9, 10), (8, 7, 5, 1, 4), (8, 7, 5, 1, 6), (8, 7, 5, 4, 6), (9, 4, 3, 2, 8), (9, 4, 5, 1, 6), (9, 4, 5, 1, 7), (9, 4, 5, 6, 7), (10, 4, 3, 2, 8), (10, 4, 5, 1, 6), (10, 4, 5, 1, 7), (10, 4, 5, 6, 7)) I expect to have only 2 matches with uniquify=True as I only have 2 units of the pattern. GetSubstructMatches() will report all matches of the pattern against your molecule. In your case, there are 35 matches which are all constituted by different atom indices. Furthermore, with or without uniquify, I have the same answers. If you set uniquify=False, you actually get 70 matches, so twice as many answers. This time, matches can be constitued by the same indices, provided they are in a different permutation. I have uploaded a gist here: https://gist.github.com/ptosco/6d70cec235361fbaddc7cbc2cf9c3b5d that hopefully will make this clearer. Cheers, p. I also expected that there should be 2 "independent" lists but here, there is always at least one common atom between each list. Is there something misunderstood or misused? Thanks in advance for your help and explanations. Best regards, Quoc-Tuan ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] GetSubstructMatches and unique match
Dear All, Please could you help with the following problem (I could not find answers in discussion list) ? pattern='C~C~C(~C)~C' smiles='O[C@H]1C[C@H]2C([C@@]1(C)CC2)(C)C' pat = Chem.MolFromSmiles(pattern) mol = Chem.MolFromSmiles(smiles) res = mol.GetSubstructMatches(pat, uniquify=True) The results are: ((1, 2, 3, 4, 8), (1, 5, 4, 3, 9), (1, 5, 4, 3, 10), (1, 5, 4, 9, 10), (2, 1, 5, 4, 6), (2, 1, 5, 4, 7), (2, 1, 5, 6, 7), (2, 3, 4, 5, 9), (2, 3, 4, 5, 10), (2, 3, 4, 9, 10), (3, 4, 5, 1, 6), (3, 4, 5, 1, 7), (3, 4, 5, 6, 7), (5, 4, 3, 2, 8), (6, 5, 4, 3, 9), (6, 5, 4, 3, 10), (6, 5, 4, 9, 10), (7, 5, 4, 3, 9), (7, 5, 4, 3, 10), (7, 5, 4, 9, 10), (7, 8, 3, 2, 4), (8, 3, 4, 5, 9), (8, 3, 4, 5, 10), (8, 3, 4, 9, 10), (8, 7, 5, 1, 4), (8, 7, 5, 1, 6), (8, 7, 5, 4, 6), (9, 4, 3, 2, 8), (9, 4, 5, 1, 6), (9, 4, 5, 1, 7), (9, 4, 5, 6, 7), (10, 4, 3, 2, 8), (10, 4, 5, 1, 6), (10, 4, 5, 1, 7), (10, 4, 5, 6, 7)) I expect to have only 2 matches with uniquify=True as I only have 2 units of the pattern. Furthermore, with or without uniquify, I have the same answers. I also expected that there should be 2 "independent" lists but here, there is always at least one common atom between each list. Is there something misunderstood or misused? Thanks in advance for your help and explanations. Best regards, Quoc-Tuan ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] GetSubstructMatches() as smiles
On Aug 7, 2019, at 13:08, Paolo Tosco wrote: > You can use > > Chem.MolFragmentToSmiles(mol, match) > > where match is a tuple of atom indices returned by GetSubstructMatch(). Note however that if only the atom indices are given then Chem.MolFragmentToSmiles() will include all bonds which connect those atoms, even if the original SMARTS does not match those bonds. For example: >>> from rdkit import Chem >>> pat = Chem.MolFromSmarts("*~*~*~*") # match 4 linear atoms >>> mol = Chem.MolFromSmiles("C1CCC1") # ring of size 4 >>> atom_indices = mol.GetSubstructMatch(pat) >>> atom_indices (0, 1, 2, 3) >>> Chem.MolFragmentToSmiles(mol, atom_indices) # returns the ring 'C1CCC1' If this is important, then you need to pass the correct bond indices to MolFragmentToSmiles(). This can be done by using the bonds in the query graph to get the bond indices in the molecule graph. I believe the following is correct: def get_match_bond_indices(query, mol, match_atom_indices): bond_indices = [] for query_bond in query.GetBonds(): atom_index1 = match_atom_indices[query_bond.GetBeginAtomIdx()] atom_index2 = match_atom_indices[query_bond.GetEndAtomIdx()] bond_indices.append(mol.GetBondBetweenAtoms( atom_index1, atom_index2).GetIdx()) return bond_indices (Does a function like this already exist in RDKit?) I'll use it to get the bond indices for the *~*~*~* match: >>> bond_indices = get_match_bond_indices(pat, mol, atom_indices) >>> bond_indices [0, 1, 2] Passing the atom and bond indices gives the expected match SMILES: >>> Chem.MolFragmentToSmiles(mol, atom_indices, bond_indices) '' Cheers, Andrew da...@dalkescientific.com ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] GetSubstructMatches() as smiles
Hi Mel, You can use Chem.MolFragmentToSmiles(mol, match) where match is a tuple of atom indices returned by GetSubstructMatch(). Cheers, p. > On 7 Aug 2019, at 11:36, Melissa Adasme wrote: > > Dear rdkitters, > > I'm trying to find substructures (query molecules built from SMARTS) matching > my molecules (SMILES). I found the GetSubstructMatches() method which works > pretty well returning the indices of matching atoms in my molecule. > > I wonder if there is a way to directly obtain the SMILES of the found > substructures instead of the atom indexes or maybe a way to transform the > indexes to smiles? > > Many thanks in advance! > Mel > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] GetSubstructMatches() as smiles
Dear rdkitters, I'm trying to find substructures (query molecules built from SMARTS) matching my molecules (SMILES). I found the GetSubstructMatches() method which works pretty well returning the indices of matching atoms in my molecule. I wonder if there is a way to directly obtain the SMILES of the found substructures instead of the atom indexes or maybe a way to transform the indexes to smiles? Many thanks in advance! Mel ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] GetSubstructMatches
Hi Jean-Marc, The answer is in the error message, once you know how to read it, which isn't really trivial: On Wed, Dec 14, 2016 at 5:35 PM, Jean-Marc Nuzillard < jm.nuzill...@univ-reims.fr> wrote: > > Traceback (most recent call last): >File "glmap.py", line 11, in > matches = mol.GetSubstructMatches(skel) > Boost.Python.ArgumentError: Python argument types in > Mol.GetSubstructMatches(Mol, str) > did not match C++ signature: > GetSubstructMatches(class RDKit::ROMol self, class RDKit::ROMol > query, bool uniquify=True, bool useChirality=False, bool > useQueryQueryMatches=False, unsigned int maxMatches=1000) > It's telling you that you called Mol.GetSubstructMatches was called with a Mol and a string (the "Mol" is the object you're calling "mol" and the string is the object you are calling "skel"). It expects, however, to be called with a Mol and a Mol. If you convert skel into an RDKit molecule everything should work. -greg -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] GetSubstructMatches
Sure, it works! Thanks, Greg. Jean-Marc Le 14/12/2016 à 17:43, Greg Landrum a écrit : Hi Jean-Marc, The answer is in the error message, once you know how to read it, which isn't really trivial: On Wed, Dec 14, 2016 at 5:35 PM, Jean-Marc Nuzillard> wrote: Traceback (most recent call last): File "glmap.py", line 11, in matches = mol.GetSubstructMatches(skel) Boost.Python.ArgumentError: Python argument types in Mol.GetSubstructMatches(Mol, str) did not match C++ signature: GetSubstructMatches(class RDKit::ROMol self, class RDKit::ROMol query, bool uniquify=True, bool useChirality=False, bool useQueryQueryMatches=False, unsigned int maxMatches=1000) It's telling you that you called Mol.GetSubstructMatches was called with a Mol and a string (the "Mol" is the object you're calling "mol" and the string is the object you are calling "skel"). It expects, however, to be called with a Mol and a Mol. If you convert skel into an RDKit molecule everything should work. -greg -- Jean-Marc Nuzillard Institut de Chimie Moléculaire de Reims CNRS UMR 7312 Moulin de la Housse CPCBAI, Bâtiment 18 BP 1039 51687 REIMS Cedex 2 France Tel : 03 26 91 82 10 Fax : 03 26 91 31 66 http://www.univ-reims.fr/ICMR http://www.univ-reims.fr/LSD/ http://www.univ-reims.fr/LSD/JmnSoft/ -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] GetSubstructMatches
Hi all, I have encountered the following problem : Traceback (most recent call last): File "glmap.py", line 11, in matches = mol.GetSubstructMatches(skel) Boost.Python.ArgumentError: Python argument types in Mol.GetSubstructMatches(Mol, str) did not match C++ signature: GetSubstructMatches(class RDKit::ROMol self, class RDKit::ROMol query, bool uniquify=True, bool useChirality=False, bool useQueryQueryMatches=False, unsigned int maxMatches=1000) trying to find substructure skel in molecule mol. I use RDKit under Windows, using Anaconda python and packages. Presently, I have rdkitVersion 2016.03.1 and boostVersion 1_56. I get similar messages with GetSubstructMatch and HasSubstructMatch. Any idea? All the best, Jean-Marc -- Dr. Jean-Marc Nuzillard Institute of Molecular Chemistry CNRS UMR 7312 Moulin de la Housse CPCBAI, Bâtiment 18 BP 1039 51687 REIMS Cedex 2 France Tel : 33 3 26 91 82 10 Fax :33 3 26 91 31 66 http://www.univ-reims.fr/ICMR http://eos.univ-reims.fr/LSD/ http://eos.univ-reims.fr/LSD/JmnSoft/ -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] GetSubstructMatches() and resonance structures
Thanks to both for your replies. That's more or less what I was thinking of - I just wanted to make sure that there was not something already available before starting coding :-) I will get back to the list once I have something ready. Cheers, p. On 31 Oct 2014, at 05:17, Greg Landrum greg.land...@gmail.com wrote: The reply that Ling forwards has one approach to doing this. It's a bit easier for someone who is willing to do some C++ work.[1] One could imagine writing a function prepareForResonanceFormMatching(ROMol m) (or some such thing) that would be applied to the *query* molecule that does the following: - identifies the groups that need to be resonance-symmetrized - changes the resonance bonds to Query bonds that match single or double (possibly also aromatic?) - neutralizes any charges on resonating atoms in the group. The last step is important because the query C(O)O matches the molecule C(O)[O-] twice, but C(O)[O-] only matches once: In [11]: Chem.MolFromSmiles('C(O)[O-]').GetSubstructMatches(Chem.MolFromSmiles('C(O)O'),uniquify=False) Out[11]: ((0, 1, 2), (0, 2, 1)) In [12]: Chem.MolFromSmiles('C(O)[O-]').GetSubstructMatches(Chem.MolFromSmiles('C([O-])O'),uniquify=False) Out[12]: ((0, 2, 1),) I suspect such a function would be useful to multiple people. For identifying the groups that are resonance symmetrized: though this could be done using a set of particular patterns, it may be better to think about doing it more generally by having it find resonance systems.[2] The flag Bond.getIsConjugated(), set during sanitization, is probably useful for this. -greg [1] well, to the extent that anything is ever easier in C++ [2] this would allow finding the substructure matches within molecules like C1=C(C)C=CC=CC=C1 On Fri, Oct 31, 2014 at 2:09 AM, S.L. Chan slch...@yahoo.com wrote: Dear Paolo, I have asked a very similar question last year. This was what Greg said. Ling Re: [Rdkit-discuss] atom equivalence for substructure matching Re: [Rdkit-discuss] atom equivalence for substructure ma... Skip to site navigation (Press enter) View on www.mail-archive.com Preview by Yahoo From: Paolo Tosco paolo.to...@unito.it To: rdkit-discuss@lists.sourceforge.net rdkit-discuss@lists.sourceforge.net Sent: Thursday, October 30, 2014 4:26 PM Subject: [Rdkit-discuss] GetSubstructMatches() and resonance structures Dear all, The following code snippet compares two resonance structures of formate anion: import rdkit from rdkit import Chem mol1=Chem.MolFromSmiles('C([O-])=O') mol2=Chem.MolFromSmiles('C(=O)[O-]') mol1.GetSubstructMatches(mol2, uniquify = False) ((0, 2, 1),) mol1.GetSubstructMatches(mol1, uniquify = False) ((0, 1, 2),) I would rather like to get, in both cases, the following output: ((0, 1, 2),(0, 2, 1)) which would account for the carboxylate group symmetry due to resonance. The same applies to amidinium, guanidinium, etc. Is that currently feasible within the RDKit API? Thanks in advance, cheers Paolo -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] GetSubstructMatches() and resonance structures
Dear all, The following code snippet compares two resonance structures of formate anion: import rdkit from rdkit import Chem mol1=Chem.MolFromSmiles('C([O-])=O') mol2=Chem.MolFromSmiles('C(=O)[O-]') mol1.GetSubstructMatches(mol2, uniquify = False) ((0, 2, 1),) mol1.GetSubstructMatches(mol1, uniquify = False) ((0, 1, 2),) I would rather like to get, in both cases, the following output: ((0, 1, 2),(0, 2, 1)) which would account for the carboxylate group symmetry due to resonance. The same applies to amidinium, guanidinium, etc. Is that currently feasible within the RDKit API? Thanks in advance, cheers Paolo -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] GetSubstructMatches()
Indeed with the latest release there is no problem any more. Ling From: Greg Landrum greg.land...@gmail.com To: S.L. Chan slch...@yahoo.com Cc: rdkit-discuss@lists.sourceforge.net rdkit-discuss@lists.sourceforge.net Sent: Thursday, August 22, 2013 6:57 PM Subject: Re: [Rdkit-discuss] GetSubstructMatches() Dear Ling, On Thu, Aug 22, 2013 at 11:49 PM, S.L. Chan slch...@yahoo.com wrote: Good afternoon folks, I would imagine that if you remove the hydrogens, the resulting molecule would be a substructure of the original molecule. However, when I do the following to the attached MDL mol file, there is no matches. I would expect it to be a substructure. from rdkit import Chem mol = Chem.MolFromMolFile('temp.mol', removeHs=False) mhvy = Chem.RemoveHs(mol) matches = mol.GetSubstructMatches(mhvy) matches turns out to be empty. Which version of the RDKit are you using? I cannot reproduce this with the most recent release: In [5]: m = Chem.MolFromMolFile('temp.mol',removeHs=False) In [6]: mhvy = Chem.RemoveHs(m) In [7]: len(m.GetSubstructMatches(mhvy)) Out[7]: 1 In [8]: from rdkit import rdBase In [9]: rdBase.rdkitVersion Out[9]: '2013.06.1' -greg -- Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] GetSubstructMatches()
Good afternoon folks, I would imagine that if you remove the hydrogens, the resulting molecule would be a substructure of the original molecule. However, when I do the following to the attached MDL mol file, there is no matches. from rdkit import Chem mol = Chem.MolFromMolFile('temp.mol', removeHs=False) mhvy = Chem.RemoveHs(mol) matches = mol.GetSubstructMatches(mhvy) matches turns out to be empty. Is it something to do with the difference between Smarts and Smiles? If so, how can I work around this to obtain the atomic index relationship between the two molecules? Thank you for your insight. Ling temp.mol Description: Binary data -- Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] GetSubstructMatches()
Dear Ling, On Thu, Aug 22, 2013 at 11:49 PM, S.L. Chan slch...@yahoo.com wrote: Good afternoon folks, I would imagine that if you remove the hydrogens, the resulting molecule would be a substructure of the original molecule. However, when I do the following to the attached MDL mol file, there is no matches. I would expect it to be a substructure. from rdkit import Chem mol = Chem.MolFromMolFile('temp.mol', removeHs=False) mhvy = Chem.RemoveHs(mol) matches = mol.GetSubstructMatches(mhvy) matches turns out to be empty. Which version of the RDKit are you using? I cannot reproduce this with the most recent release: In [5]: m = Chem.MolFromMolFile('temp.mol',removeHs=False) In [6]: mhvy = Chem.RemoveHs(m) In [7]: len(m.GetSubstructMatches(mhvy)) Out[7]: 1 In [8]: from rdkit import rdBase In [9]: rdBase.rdkitVersion Out[9]: '2013.06.1' -greg -- Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss