Re: [Rdkit-discuss] aromatic bonds and graph edit distance
On 21/08/2019 05:06, Andrew Dalke wrote: Hi all, Someone asked me recently about finding the graph edit distance of two small (<= 14 atom) fragments. I figured this was something that could be brute forced. Following SmallWorld's example at https://cisrg.shef.ac.uk/shef2016/talks/oral13.pdf , given a fragment, incrementally delete terminals (except the "*" connection point atom), and ring bonds. Unless rdkit has something, I think graph edit distance is the kind of things for which you have to rely on a good graph library. Also, maybe the string edit distance between the two canonical smiles is a good enough proxy. For chain bonds, and non-aromatic bonds, it's easy to delete the bond and add the correct number of hydrogens to either side. But, what should I do when I cut an aromatic bond? For something like the first "co" in "c1cocn1", I want the result to be C=CN=CO. That's because the "o" can only be "-O-" in Kekule form. For something like "c1cnncn1", breaking on the "nn", I think I would like to get both 'N=CC=NC=N' and 'NC=CN=CN' because the "nn" can be a single or a double bond, depending on the Kekule representation, as in: Chem.CanonSmiles("C-1=N-N=C-C=N-1") 'c1cnncn1' Chem.CanonSmiles("C-1=N.N=C-C=N-1") 'N=CC=NC=N' Chem.CanonSmiles("C=1-N=N-C=C-N=1") 'c1cnncn1' Chem.CanonSmiles("C=1-N-[HH].[HH]N-C=C-N=1") 'NC=CN=CN' Problem is, I don't know how to figure out if a given aromatic bond must be a "-" or "=", or can be both. (Well, I could brute-force enumerae all 2**n possible aromatic bond assignments, then canonicalize, and see if both assignments are possible for a given bond.) As a non-chemist, I also ask if I'm even on a chemically meaningful track. Andrew da...@dalkescientific.com ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SAR matrices
This looks great, Greg--thank you so much for your help!! There's definitely more than enough for me to work with here. I'll be sure to update you on where this goes... Looking forward to the blog post! -Ken On Tue, Aug 20, 2019, at 7:00 PM, Greg Landrum wrote: > Ok, here's an initial proof-of-concept implementation that, I think, does the > basics of what you're looking for. > Hopefully there's enough there to get you started: > https://gist.github.com/greglandrum/f447708cbdb71f2193ca147ca503934d > > I will likely play around with this a bit more and turn it into a blog post... > > -greg > > > > On Tue, Aug 20, 2019 at 11:36 AM Greg Landrum wrote: >> I actually had a bit of inspiration while waiting for a connecting flight >> and think I will have a little demo of this ready in a day or so. >> >> -greg >> >> On Tue, 20 Aug 2019 at 03:29, Greg Landrum wrote: >>> This is a great problem, but it's certainly not a trivial one. >>> >>> It's a bit of a triviality, but here's at least a demo of how to draw the R >>> groups with the dummies as "attachment points": >>> https://gist.github.com/greglandrum/f7e310045542ab71447351a8043bbf3f >>> >>> >>> -greg >>> >>> >>> On Sun, Aug 18, 2019 at 2:43 PM ken wrote: Hello, I am trying to build a 2-D R-group grid (or table, or spreadsheet), where the row headers contain R1 values and the column headers contain R2 values (or vice versa). Compounds that have given R1 and R2 groups would be represented on the table as a filled cell that intersects those R1 and R2. For example, the input could be an SD file containing the following three compounds: The desired output grid from the sd file would look something like this ("Y" can be replaced with cell formatting or some other indicator): The closest thing to this that I have been able to find is the "SAR Matrix" (https://f1000research.com/articles/3-113/v2), but the code that was used to generate the matrices does not appear to be available. Does anyone happen to have such code or know how I can generate it? I imagine the first step would be to perform an R-group decomposition, but I'm not sure what to do from there. I started to see if I could build the program from scratch, but then I thought that someone must've done this before and I shouldn't needlessly reinvent it. I've been (re)learning Python for the past year or so and I *think* I have a pretty good handle on the language, but I wouldn't mind putting said learning to the test on a "real" project, so if anyone has a solution that outputs something that even vaguely resembles the desired grid/matrix, maybe I can modify it to fit my needs. At some point, I would need the grid to be editable in Word, but I'll cross that bridge when I get to it... Thank you in advance for your help, Ken ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > *Attachments:* > * r_matrix example01.png > * r_matrix example02.png ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SAR matrices
Ok, here's an initial proof-of-concept implementation that, I think, does the basics of what you're looking for. Hopefully there's enough there to get you started: https://gist.github.com/greglandrum/f447708cbdb71f2193ca147ca503934d I will likely play around with this a bit more and turn it into a blog post... -greg On Tue, Aug 20, 2019 at 11:36 AM Greg Landrum wrote: > I actually had a bit of inspiration while waiting for a connecting flight > and think I will have a little demo of this ready in a day or so. > > -greg > > On Tue, 20 Aug 2019 at 03:29, Greg Landrum wrote: > >> This is a great problem, but it's certainly not a trivial one. >> >> It's a bit of a triviality, but here's at least a demo of how to draw the >> R groups with the dummies as "attachment points": >> https://gist.github.com/greglandrum/f7e310045542ab71447351a8043bbf3f >> >> >> -greg >> >> >> On Sun, Aug 18, 2019 at 2:43 PM ken wrote: >> >>> Hello, >>> >>> I am trying to build a 2-D R-group grid (or table, or spreadsheet), >>> where the row headers contain R1 values and the column headers contain R2 >>> values (or vice versa). Compounds that have given R1 and R2 groups would >>> be represented on the table as a filled cell that intersects those R1 and >>> R2. For example, the input could be an SD file containing the following >>> three compounds: >>> >>> The desired output grid from the sd file would look something like this >>> ("Y" can be replaced with cell formatting or some other indicator): >>> >>> The closest thing to this that I have been able to find is the "SAR >>> Matrix" (https://f1000research.com/articles/3-113/v2), but the code >>> that was used to generate the matrices does not appear to be available. >>> Does anyone happen to have such code or know how I can generate it? I >>> imagine the first step would be to perform an R-group decomposition, but >>> I'm not sure what to do from there. >>> >>> I started to see if I could build the program from scratch, but then I >>> thought that someone must've done this before and I shouldn't needlessly >>> reinvent it. I've been (re)learning Python for the past year or so and I >>> *think* I have a pretty good handle on the language, but I wouldn't >>> mind putting said learning to the test on a "real" project, so if anyone >>> has a solution that outputs something that even vaguely resembles the >>> desired grid/matrix, maybe I can modify it to fit my needs. >>> >>> At some point, I would need the grid to be editable in Word, but I'll >>> cross that bridge when I get to it... >>> >>> Thank you in advance for your help, >>> Ken >>> ___ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >> ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] aromatic bonds and graph edit distance
Hi all, Someone asked me recently about finding the graph edit distance of two small (<= 14 atom) fragments. I figured this was something that could be brute forced. Following SmallWorld's example at https://cisrg.shef.ac.uk/shef2016/talks/oral13.pdf , given a fragment, incrementally delete terminals (except the "*" connection point atom), and ring bonds. For chain bonds, and non-aromatic bonds, it's easy to delete the bond and add the correct number of hydrogens to either side. But, what should I do when I cut an aromatic bond? For something like the first "co" in "c1cocn1", I want the result to be C=CN=CO. That's because the "o" can only be "-O-" in Kekule form. For something like "c1cnncn1", breaking on the "nn", I think I would like to get both 'N=CC=NC=N' and 'NC=CN=CN' because the "nn" can be a single or a double bond, depending on the Kekule representation, as in: >>> Chem.CanonSmiles("C-1=N-N=C-C=N-1") 'c1cnncn1' >>> Chem.CanonSmiles("C-1=N.N=C-C=N-1") 'N=CC=NC=N' >>> Chem.CanonSmiles("C=1-N=N-C=C-N=1") 'c1cnncn1' >>> Chem.CanonSmiles("C=1-N-[HH].[HH]N-C=C-N=1") 'NC=CN=CN' Problem is, I don't know how to figure out if a given aromatic bond must be a "-" or "=", or can be both. (Well, I could brute-force enumerae all 2**n possible aromatic bond assignments, then canonicalize, and see if both assignments are possible for a given bond.) As a non-chemist, I also ask if I'm even on a chemically meaningful track. Andrew da...@dalkescientific.com ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Don't let RDKit add hydrogens to sanitize a fragment
Dear colleagues, I'm working with coordination compounds and using .mol (v3000) files to describe the immediate coordination environment of my molecules. This is an example of a cobalt complex (just the coordination environment): Mrv1827 08101911143D 0 0 0 0 0999 V3000 M V30 BEGIN CTAB M V30 COUNTS 6 5 0 0 0 M V30 BEGIN ATOM M V30 1 Co 0.7663 2.1605 10.185 0 M V30 2 Cl 2.423 1.0205 11.4441 0 M V30 3 P 2.0121 2.3511 8.3115 0 M V30 4 P 0.1302 0.2072 9.1724 0 M V30 5 P -0.781 2.1773 11.8292 0 M V30 6 P 1.3423 4.1551 11.1519 0 M V30 END ATOM M V30 BEGIN BOND M V30 1 1 1 2 M V30 2 9 3 1 M V30 3 9 4 1 M V30 4 9 5 1 M V30 5 9 6 1 M V30 END BOND M V30 END CTAB M END The problem is that RDKit is adding hydrogens to make the fragment make sense and I'd like to avoid that: [cid:83062c0b-f007-4673-b71e-1cce0aa1df56] Using Chem.RemoveHs(mol) is not working too. Any help is much appreciated -- Henrique C. S. Junior ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SAR matrices
Hi Greg, Very nice demo! I’d like to ask whether we can set the size of the “elements” in a molecular graph rather than the figure size? It is easy to set the width and height when drawing a compound. But when we set two compounds as the same size, e.g. 200*150, they may be actually in different size from the view of a chemist, because in their mind the size of an element (such as a ring, a bond or the font size of an atom symbol) should be the same. So can we make the size of a molecular graph dynamic and keep their element size the same, which means a complex molecule should have a higher size than a simple one. In this example mentioned by Ken, the TOC in https://pubs.acs.org/doi/full/10.1021/ci300206e, the size of the substitutes might be 100*100 or 100*120, and the scaffolds are about 300*150. I am not sure if it is suitable to ask under this thread, but I think you should consider this to “draw” such R-group tables. Best, Hongbin Yang 杨弘宾, Ph.D. Research: Toxicophore and Chemoinformatics Pharmaceutical Science, School of Pharmacy East China University of Science and Technology On 08/20/2019 17:36,Greg Landrum wrote: I actually had a bit of inspiration while waiting for a connecting flight and think I will have a little demo of this ready in a day or so. -greg On Tue, 20 Aug 2019 at 03:29, Greg Landrum wrote: This is a great problem, but it's certainly not a trivial one. It's a bit of a triviality, but here's at least a demo of how to draw the R groups with the dummies as "attachment points": https://gist.github.com/greglandrum/f7e310045542ab71447351a8043bbf3f -greg On Sun, Aug 18, 2019 at 2:43 PM ken wrote: Hello, I am trying to build a 2-D R-group grid (or table, or spreadsheet), where the row headers contain R1 values and the column headers contain R2 values (or vice versa). Compounds that have given R1 and R2 groups would be represented on the table as a filled cell that intersects those R1 and R2. For example, the input could be an SD file containing the following three compounds: The desired output grid from the sd file would look something like this ("Y" can be replaced with cell formatting or some other indicator): The closest thing to this that I have been able to find is the "SAR Matrix" (https://f1000research.com/articles/3-113/v2), but the code that was used to generate the matrices does not appear to be available. Does anyone happen to have such code or know how I can generate it? I imagine the first step would be to perform an R-group decomposition, but I'm not sure what to do from there. I started to see if I could build the program from scratch, but then I thought that someone must've done this before and I shouldn't needlessly reinvent it. I've been (re)learning Python for the past year or so and I think I have a pretty good handle on the language, but I wouldn't mind putting said learning to the test on a "real" project, so if anyone has a solution that outputs something that even vaguely resembles the desired grid/matrix, maybe I can modify it to fit my needs. At some point, I would need the grid to be editable in Word, but I'll cross that bridge when I get to it... Thank you in advance for your help, Ken ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SAR matrices
I actually had a bit of inspiration while waiting for a connecting flight and think I will have a little demo of this ready in a day or so. -greg On Tue, 20 Aug 2019 at 03:29, Greg Landrum wrote: > This is a great problem, but it's certainly not a trivial one. > > It's a bit of a triviality, but here's at least a demo of how to draw the > R groups with the dummies as "attachment points": > https://gist.github.com/greglandrum/f7e310045542ab71447351a8043bbf3f > > > -greg > > > On Sun, Aug 18, 2019 at 2:43 PM ken wrote: > >> Hello, >> >> I am trying to build a 2-D R-group grid (or table, or spreadsheet), where >> the row headers contain R1 values and the column headers contain R2 values >> (or vice versa). Compounds that have given R1 and R2 groups would be >> represented on the table as a filled cell that intersects those R1 and R2. >> For example, the input could be an SD file containing the following three >> compounds: >> >> The desired output grid from the sd file would look something like this >> ("Y" can be replaced with cell formatting or some other indicator): >> >> The closest thing to this that I have been able to find is the "SAR >> Matrix" (https://f1000research.com/articles/3-113/v2), but the code that >> was used to generate the matrices does not appear to be available. Does >> anyone happen to have such code or know how I can generate it? I imagine >> the first step would be to perform an R-group decomposition, but I'm not >> sure what to do from there. >> >> I started to see if I could build the program from scratch, but then I >> thought that someone must've done this before and I shouldn't needlessly >> reinvent it. I've been (re)learning Python for the past year or so and I >> *think* I have a pretty good handle on the language, but I wouldn't mind >> putting said learning to the test on a "real" project, so if anyone has a >> solution that outputs something that even vaguely resembles the desired >> grid/matrix, maybe I can modify it to fit my needs. >> >> At some point, I would need the grid to be editable in Word, but I'll >> cross that bridge when I get to it... >> >> Thank you in advance for your help, >> Ken >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss