Re: [Rdkit-discuss] aromatic bonds and graph edit distance

2019-08-20 Thread Francois Berenger

On 21/08/2019 05:06, Andrew Dalke wrote:

Hi all,

  Someone asked me recently about finding the graph edit distance of
two small (<= 14 atom) fragments.

I figured this was something that could be brute forced. Following
SmallWorld's example at
https://cisrg.shef.ac.uk/shef2016/talks/oral13.pdf , given a fragment,
incrementally delete terminals (except the "*" connection point atom),
and ring bonds.


Unless rdkit has something, I think graph edit distance is the kind
of things for which you have to rely on a good graph library.

Also, maybe the string edit distance between the two canonical smiles is 
a good enough proxy.



For chain bonds, and non-aromatic bonds, it's easy to delete the bond
and add the correct number of hydrogens to either side.

But, what should I do when I cut an aromatic bond?

For something like the first "co" in "c1cocn1", I want the result to
be C=CN=CO. That's because the "o" can only be "-O-" in Kekule form.

For something like "c1cnncn1", breaking on the "nn", I think I would
like to get both 'N=CC=NC=N' and 'NC=CN=CN' because the "nn" can be a
single or a double bond, depending on the Kekule representation, as
in:


Chem.CanonSmiles("C-1=N-N=C-C=N-1")

'c1cnncn1'

Chem.CanonSmiles("C-1=N.N=C-C=N-1")

'N=CC=NC=N'


Chem.CanonSmiles("C=1-N=N-C=C-N=1")

'c1cnncn1'

Chem.CanonSmiles("C=1-N-[HH].[HH]N-C=C-N=1")

'NC=CN=CN'

Problem is, I don't know how to figure out if a given aromatic bond
must be a "-" or "=", or can be both.

(Well, I could brute-force enumerae all 2**n possible aromatic bond
assignments, then canonicalize, and see if both assignments are
possible for a given bond.)

As a non-chemist, I also ask if I'm even on a chemically meaningful 
track.



Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SAR matrices

2019-08-20 Thread Ken
This looks great, Greg--thank you so much for your help!! There's definitely 
more than enough for me to work with here. I'll be sure to update you on where 
this goes...

Looking forward to the blog post!
-Ken


On Tue, Aug 20, 2019, at 7:00 PM, Greg Landrum wrote:
> Ok, here's an initial proof-of-concept implementation that, I think, does the 
> basics of what you're looking for.
> Hopefully there's enough there to get you started:
> https://gist.github.com/greglandrum/f447708cbdb71f2193ca147ca503934d 
> 
> I will likely play around with this a bit more and turn it into a blog post...
> 
> -greg
> 
> 
> 
> On Tue, Aug 20, 2019 at 11:36 AM Greg Landrum  wrote:
>> I actually had a bit of inspiration while waiting for a connecting flight 
>> and think I will have a little demo of this ready in a day or so.
>> 
>> -greg
>> 
>> On Tue, 20 Aug 2019 at 03:29, Greg Landrum  wrote:
>>> This is a great problem, but it's certainly not a trivial one.
>>> 
>>> It's a bit of a triviality, but here's at least a demo of how to draw the R 
>>> groups with the dummies as "attachment points":
>>> https://gist.github.com/greglandrum/f7e310045542ab71447351a8043bbf3f
>>> 
>>> 
>>> -greg
>>> 
>>> 
>>> On Sun, Aug 18, 2019 at 2:43 PM ken  wrote:
 Hello,
 
 I am trying to build a 2-D R-group grid (or table, or spreadsheet), where 
 the row headers contain R1 values and the column headers contain R2 values 
 (or vice versa). Compounds that have given R1 and R2 groups would be 
 represented on the table as a filled cell that intersects those R1 and R2. 
 For example, the input could be an SD file containing the following three 
 compounds:
 
 The desired output grid from the sd file would look something like this 
 ("Y" can be replaced with cell formatting or some other indicator):
 
 
 
 The closest thing to this that I have been able to find is the "SAR 
 Matrix" (https://f1000research.com/articles/3-113/v2), but the code that 
 was used to generate the matrices does not appear to be available. Does 
 anyone happen to have such code or know how I can generate it? I imagine 
 the first step would be to perform an R-group decomposition, but I'm not 
 sure what to do from there. 
 
 I started to see if I could build the program from scratch, but then I 
 thought that someone must've done this before and I shouldn't needlessly 
 reinvent it. I've been (re)learning Python for the past year or so and I 
 *think* I have a pretty good handle on the language, but I wouldn't mind 
 putting said learning to the test on a "real" project, so if anyone has a 
 solution that outputs something that even vaguely resembles the desired 
 grid/matrix, maybe I can modify it to fit my needs.
 
 At some point, I would need the grid to be editable in Word, but I'll 
 cross that bridge when I get to it...
 
 Thank you in advance for your help,
 Ken
 ___
  Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
> *Attachments:*
>  * r_matrix example01.png
>  * r_matrix example02.png
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SAR matrices

2019-08-20 Thread Greg Landrum
Ok, here's an initial proof-of-concept implementation that, I think, does
the basics of what you're looking for.
Hopefully there's enough there to get you started:
https://gist.github.com/greglandrum/f447708cbdb71f2193ca147ca503934d

I will likely play around with this a bit more and turn it into a blog
post...

-greg



On Tue, Aug 20, 2019 at 11:36 AM Greg Landrum 
wrote:

> I actually had a bit of inspiration while waiting for a connecting flight
> and think I will have a little demo of this ready in a day or so.
>
> -greg
>
> On Tue, 20 Aug 2019 at 03:29, Greg Landrum  wrote:
>
>> This is a great problem, but it's certainly not a trivial one.
>>
>> It's a bit of a triviality, but here's at least a demo of how to draw the
>> R groups with the dummies as "attachment points":
>> https://gist.github.com/greglandrum/f7e310045542ab71447351a8043bbf3f
>>
>>
>> -greg
>>
>>
>> On Sun, Aug 18, 2019 at 2:43 PM ken  wrote:
>>
>>> Hello,
>>>
>>> I am trying to build a 2-D R-group grid (or table, or spreadsheet),
>>> where the row headers contain R1 values and the column headers contain R2
>>> values (or vice versa).  Compounds that have given R1 and R2 groups would
>>> be represented on the table as a filled cell that intersects those R1 and
>>> R2. For example, the input could be an SD file containing the following
>>> three compounds:
>>>
>>> The desired output grid from the sd file would look something like this
>>> ("Y" can be replaced with cell formatting or some other indicator):
>>>
>>> The closest thing to this that I have been able to find is the "SAR
>>> Matrix" (https://f1000research.com/articles/3-113/v2), but the code
>>> that was used to generate the matrices does not appear to be available.
>>> Does anyone happen to have such code or know how I can generate it? I
>>> imagine the first step would be to perform an R-group decomposition, but
>>> I'm not sure what to do from there.
>>>
>>> I started to see if I could build the program from scratch, but then I
>>> thought that someone must've done this before and I shouldn't needlessly
>>> reinvent it.  I've been (re)learning Python for the past year or so and I
>>> *think* I have a pretty good handle on the language, but I wouldn't
>>> mind putting said learning to the test on a "real" project, so if anyone
>>> has a solution that outputs something that even vaguely resembles the
>>> desired grid/matrix, maybe I can modify it to fit my needs.
>>>
>>> At some point, I would need the grid to be editable in Word, but I'll
>>> cross that bridge when I get to it...
>>>
>>> Thank you in advance for your help,
>>> Ken
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] aromatic bonds and graph edit distance

2019-08-20 Thread Andrew Dalke
Hi all,

  Someone asked me recently about finding the graph edit distance of two small 
(<= 14 atom) fragments.

I figured this was something that could be brute forced. Following SmallWorld's 
example at https://cisrg.shef.ac.uk/shef2016/talks/oral13.pdf , given a 
fragment, incrementally delete terminals (except the "*" connection point 
atom), and ring bonds.

For chain bonds, and non-aromatic bonds, it's easy to delete the bond and add 
the correct number of hydrogens to either side.

But, what should I do when I cut an aromatic bond?

For something like the first "co" in "c1cocn1", I want the result to be 
C=CN=CO. That's because the "o" can only be "-O-" in Kekule form.

For something like "c1cnncn1", breaking on the "nn", I think I would like to 
get both 'N=CC=NC=N' and 'NC=CN=CN' because the "nn" can be a single or a 
double bond, depending on the Kekule representation, as in:

>>> Chem.CanonSmiles("C-1=N-N=C-C=N-1")
'c1cnncn1'
>>> Chem.CanonSmiles("C-1=N.N=C-C=N-1")
'N=CC=NC=N'

>>> Chem.CanonSmiles("C=1-N=N-C=C-N=1")
'c1cnncn1'
>>> Chem.CanonSmiles("C=1-N-[HH].[HH]N-C=C-N=1")
'NC=CN=CN'

Problem is, I don't know how to figure out if a given aromatic bond must be a 
"-" or "=", or can be both.

(Well, I could brute-force enumerae all 2**n possible aromatic bond 
assignments, then canonicalize, and see if both assignments are possible for a 
given bond.)

As a non-chemist, I also ask if I'm even on a chemically meaningful track.


Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Don't let RDKit add hydrogens to sanitize a fragment

2019-08-20 Thread Henrique Castro
Dear colleagues,
I'm working with coordination compounds and using .mol (v3000) files to 
describe the immediate coordination environment of my molecules. This is an 
example of a cobalt complex (just the coordination environment):

Mrv1827 08101911143D

  0  0  0 0  0999 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 6 5 0 0 0
M  V30 BEGIN ATOM
M  V30 1 Co 0.7663 2.1605 10.185 0
M  V30 2 Cl 2.423 1.0205 11.4441 0
M  V30 3 P 2.0121 2.3511 8.3115 0
M  V30 4 P 0.1302 0.2072 9.1724 0
M  V30 5 P -0.781 2.1773 11.8292 0
M  V30 6 P 1.3423 4.1551 11.1519 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 1 1 1 2
M  V30 2 9 3 1
M  V30 3 9 4 1
M  V30 4 9 5 1
M  V30 5 9 6 1
M  V30 END BOND
M  V30 END CTAB
M  END

The problem is that RDKit is adding hydrogens to make the fragment make sense 
and I'd like to avoid that:
[cid:83062c0b-f007-4673-b71e-1cce0aa1df56]

Using Chem.RemoveHs(mol) is not working too.
Any help is much appreciated

--
Henrique C. S. Junior

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SAR matrices

2019-08-20 Thread Hongbin Yang
Hi Greg,


Very nice demo!


I’d like to ask whether we can set the size of the “elements” in a molecular 
graph rather than the figure size?


It is easy to set the width and height when drawing a compound. But when we set 
two compounds as the same size, e.g. 200*150, they may be actually in different 
size from the view of a chemist, because in their mind the size of an element 
(such as a ring, a bond or the font size of an atom symbol) should be the same. 
So can we make the size of a molecular graph dynamic and keep their element 
size the same, which means a complex molecule should have a higher size than a 
simple one.


In this example mentioned by Ken, the TOC in 
https://pubs.acs.org/doi/full/10.1021/ci300206e, the size of the substitutes 
might be 100*100 or 100*120, and the scaffolds are about 300*150.


I am not sure if it is suitable to ask under this thread, but I think you 
should consider this to “draw” such R-group tables.


Best,


Hongbin Yang 杨弘宾, Ph.D.
Research: Toxicophore and Chemoinformatics
Pharmaceutical Science, School of Pharmacy
East China University of Science and Technology 


On 08/20/2019 17:36,Greg Landrum wrote:
I actually had a bit of inspiration while waiting for a connecting flight and 
think I will have a little demo of this ready in a day or so.


-greg


On Tue, 20 Aug 2019 at 03:29, Greg Landrum  wrote:

This is a great problem, but it's certainly not a trivial one.


It's a bit of a triviality, but here's at least a demo of how to draw the R 
groups with the dummies as "attachment points":
https://gist.github.com/greglandrum/f7e310045542ab71447351a8043bbf3f




-greg




On Sun, Aug 18, 2019 at 2:43 PM ken  wrote:

Hello,



I am trying to build a 2-D R-group grid (or table, or spreadsheet), where the 
row headers contain R1 values and the column headers contain R2 values (or vice 
versa).  Compounds that have given R1 and R2 groups would be represented on the 
table as a filled cell that intersects those R1 and R2. For example, the input 
could be an SD file containing the following three compounds:



The desired output grid from the sd file would look something like this ("Y" 
can be replaced with cell formatting or some other indicator):


The closest thing to this that I have been able to find is the "SAR Matrix" 
(https://f1000research.com/articles/3-113/v2), but the code that was used to 
generate the matrices does not appear to be available.  Does anyone happen to 
have such code or know how I can generate it? I imagine the first step would be 
to perform an R-group decomposition, but I'm not sure what to do from there. 


I started to see if I could build the program from scratch, but then I thought 
that someone must've done this before and I shouldn't needlessly reinvent it.  
I've been (re)learning Python for the past year or so and I think I have a 
pretty good handle on the language, but I wouldn't mind putting said learning 
to the test on a "real" project, so if anyone has a solution that outputs 
something that even vaguely resembles the desired grid/matrix, maybe I can 
modify it to fit my needs.


At some point, I would need the grid to be editable in Word, but I'll cross 
that bridge when I get to it...


Thank you in advance for your help,
Ken

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SAR matrices

2019-08-20 Thread Greg Landrum
I actually had a bit of inspiration while waiting for a connecting flight
and think I will have a little demo of this ready in a day or so.

-greg

On Tue, 20 Aug 2019 at 03:29, Greg Landrum  wrote:

> This is a great problem, but it's certainly not a trivial one.
>
> It's a bit of a triviality, but here's at least a demo of how to draw the
> R groups with the dummies as "attachment points":
> https://gist.github.com/greglandrum/f7e310045542ab71447351a8043bbf3f
>
>
> -greg
>
>
> On Sun, Aug 18, 2019 at 2:43 PM ken  wrote:
>
>> Hello,
>>
>> I am trying to build a 2-D R-group grid (or table, or spreadsheet), where
>> the row headers contain R1 values and the column headers contain R2 values
>> (or vice versa).  Compounds that have given R1 and R2 groups would be
>> represented on the table as a filled cell that intersects those R1 and R2.
>> For example, the input could be an SD file containing the following three
>> compounds:
>>
>> The desired output grid from the sd file would look something like this
>> ("Y" can be replaced with cell formatting or some other indicator):
>>
>> The closest thing to this that I have been able to find is the "SAR
>> Matrix" (https://f1000research.com/articles/3-113/v2), but the code that
>> was used to generate the matrices does not appear to be available.  Does
>> anyone happen to have such code or know how I can generate it? I imagine
>> the first step would be to perform an R-group decomposition, but I'm not
>> sure what to do from there.
>>
>> I started to see if I could build the program from scratch, but then I
>> thought that someone must've done this before and I shouldn't needlessly
>> reinvent it.  I've been (re)learning Python for the past year or so and I
>> *think* I have a pretty good handle on the language, but I wouldn't mind
>> putting said learning to the test on a "real" project, so if anyone has a
>> solution that outputs something that even vaguely resembles the desired
>> grid/matrix, maybe I can modify it to fit my needs.
>>
>> At some point, I would need the grid to be editable in Word, but I'll
>> cross that bridge when I get to it...
>>
>> Thank you in advance for your help,
>> Ken
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss