Re: [Rdkit-discuss] reassembling a molecule from R-groups

2018-04-16 Thread Greg Landrum
One difference, if you care, is that the SMILES-based approach doesn’t
preserve any information about coordinates.

Andrew is providing further support for integrating this into the RDKit
itself. :-)

On Mon, 16 Apr 2018 at 08:30, Andrew Dalke 
wrote:

> On Apr 16, 2018, at 05:37, Patrick Walters  wrote:
> >
> > Thanks Andrew, the SMILES approach seemed to have quite a few edge cases
> so I wrote something to work directly on a molecule.
>
> That's the approach I started with, until I figured out that it doesn't
> preserve chirality.
>
> If I change the end of your code to:
>
> ==
> from mmpdblib import smiles_syntax
>
> def weld_dalke(core, r_groups):
> s1 = smiles_syntax.convert_labeled_wildcards_to_closures(core)
> s2 = smiles_syntax.convert_labeled_wildcards_to_closures(r_groups)
> return Chem.CanonSmiles(s1+"."+s2)
>
> if __name__ == "__main__":
> mol_to_weld = Chem.MolFromSmiles(
> "[*:1][C@](F)(Cl)O.N[*:1]")
> welded_mol = weld_r_groups(mol_to_weld)
> print("Expected  :", Chem.CanonSmiles("N[C@](F)(Cl)O"))
> print("Direct:", Chem.MolToSmiles(welded_mol, isomericSmiles=True))
> print("Via SMILES:", weld_dalke("[*:1][C@](F)(Cl)O", "N[*:1]"))
> ==
>
> These should print identical SMILES strings, but instead give:
>
> Expected  : N[C@](O)(F)Cl
> Direct: N[C@@](O)(F)Cl
> Via SMILES: N[C@](O)(F)Cl
>
>
> If chirality preservation isn't a concern, then there's no problem.
>
> BTW, your current code assumes there will only be one attachment point on
> an atom. For example, the input
>  [*:1][C@]([*:2])(Cl)O.N[*:1].F[*:2]
> create the output
>  N.O[C](F)Cl
>
> It's not hard to fix, and I think more of a d'oh! issue.
>
>
> In a quick benchmark I put together just now, I found that my SMILES
> syntax manipulation approach was about twice as fast to turn the two
> core/R-group SMILES strings into a molecule.
>
> Cheers,
>
>
> Andrew
> da...@dalkescientific.com
>
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reassembling a molecule from R-groups

2018-04-16 Thread Andrew Dalke
On Apr 16, 2018, at 05:37, Patrick Walters  wrote:
> 
> Thanks Andrew, the SMILES approach seemed to have quite a few edge cases so I 
> wrote something to work directly on a molecule. 

That's the approach I started with, until I figured out that it doesn't 
preserve chirality.

If I change the end of your code to:

==
from mmpdblib import smiles_syntax

def weld_dalke(core, r_groups):
s1 = smiles_syntax.convert_labeled_wildcards_to_closures(core)
s2 = smiles_syntax.convert_labeled_wildcards_to_closures(r_groups)
return Chem.CanonSmiles(s1+"."+s2)

if __name__ == "__main__":
mol_to_weld = Chem.MolFromSmiles(
"[*:1][C@](F)(Cl)O.N[*:1]")
welded_mol = weld_r_groups(mol_to_weld)
print("Expected  :", Chem.CanonSmiles("N[C@](F)(Cl)O"))
print("Direct:", Chem.MolToSmiles(welded_mol, isomericSmiles=True))
print("Via SMILES:", weld_dalke("[*:1][C@](F)(Cl)O", "N[*:1]"))
==

These should print identical SMILES strings, but instead give:

Expected  : N[C@](O)(F)Cl
Direct: N[C@@](O)(F)Cl
Via SMILES: N[C@](O)(F)Cl


If chirality preservation isn't a concern, then there's no problem.

BTW, your current code assumes there will only be one attachment point on an 
atom. For example, the input
 [*:1][C@]([*:2])(Cl)O.N[*:1].F[*:2]
create the output
 N.O[C](F)Cl

It's not hard to fix, and I think more of a d'oh! issue.


In a quick benchmark I put together just now, I found that my SMILES syntax 
manipulation approach was about twice as fast to turn the two core/R-group 
SMILES strings into a molecule.

Cheers,


Andrew
da...@dalkescientific.com



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reassembling a molecule from R-groups

2018-04-15 Thread Greg Landrum
Ah, I'm replying too late... sorry about that.

There's a command line utility in rdkit.Chem.ChemUtils.TemplateExpand that
could also be used for this purpose. The big difference is that it uses
isotope labels on the dummy atoms to do the matching.

A style point: you can now create an RWMol (instead of an EditableMol) in
order to get an editable molecule in Python. this would be:
em = Chem.RWMol(new_mol)
instead of
em = EditableMol(new_mol)

The advantage is that you don't need to call GetMol() at the end in order
to end up with a "normal" molecule.

It certainly wouldn't be bad if this were easier. That's maybe something
for the next release.

Best,
-greg




On Mon, Apr 16, 2018 at 5:37 AM, Patrick Walters 
wrote:

> Thanks Andrew, the SMILES approach seemed to have quite a few edge cases
> so I wrote something to work directly on a molecule.
>
>
> #!/usr/bin/env python
>
> import sys
> from rdkit import Chem
> from collections import defaultdict
> from rdkit.Chem.rdchem import EditableMol
>
>
> # Thanks to steeveslab-blog for example of how to edit RDKit molecules
> # http://asteeves.github.io/blog/2015/01/14/editing-in-rdkit/
> # Thanks to Andrew Dalke for the function name
>
>
> def weld_r_groups(input_mol):
> # First pass loop over atoms and find the atoms with an AtomMapNum
> join_dict = defaultdict(list)
> for atom in input_mol.GetAtoms():
> map_num = atom.GetAtomMapNum()
> if map_num > 0:
> join_dict[map_num].append(atom)
>
> # Second pass, transfer the atom maps to the neighbor atoms
> for idx, atom_list in join_dict.items():
> if len(atom_list) == 2:
> atm_1, atm_2 = atom_list
> nbr_1 = [x.GetOtherAtom(atm_1) for x in atm_1.GetBonds()][0]
> nbr_1.SetAtomMapNum(idx)
> nbr_2 = [x.GetOtherAtom(atm_2) for x in atm_2.GetBonds()][0]
> nbr_2.SetAtomMapNum(idx)
>
> # Nuke all of the dummy atoms
> new_mol = Chem.DeleteSubstructs(input_mol, Chem.MolFromSmarts('[#0]'))
>
> # Third pass - arrange the atoms with AtomMapNum, these will be
> connected
> bond_join_dict = defaultdict(list)
> for atom in new_mol.GetAtoms():
> map_num = atom.GetAtomMapNum()
> if map_num > 0:
> bond_join_dict[map_num].append(atom.GetIdx())
>
> # Make an editable molecule and add bonds between atoms with
> correspoing AtomMapNum
> em = EditableMol(new_mol)
> for idx, atom_list in bond_join_dict.items():
> if len(atom_list) == 2:
> start_atm, end_atm = atom_list
> em.AddBond(start_atm, end_atm, order=Chem.rdchem.BondType.
> SINGLE)
>
> final_mol = em.GetMol()
>
> # remove the AtomMapNum values
> for atom in final_mol.GetAtoms():
> atom.SetAtomMapNum(0)
> final_mol = Chem.RemoveHs(final_mol)
>
> return final_mol
>
>
> if __name__ == "__main__":
> mol_to_weld = Chem.MolFromSmiles(
> "CN(C)CC(Br)c1cc([*:2])c([*:1])cn1.[H]C([H])([H])[*:1].[H][*:2]")
> welded_mol = weld_r_groups(mol_to_weld)
> print(Chem.MolToSmiles(welded_mol))
>
>
> Best,
>
> Pat
>
>
> On Sun, Apr 15, 2018 at 12:16 PM, Patrick Walters 
> wrote:
>
>> Hi All,
>>
>> I was about to write a function to reassemble a molecule from a core +
>> R-groups, but I thought I'd check and see if such a function already
>> exists.  This is work with the output of rdRGroupDecomposition
>>
>> Gvien a core:
>> CN(C)CC(Br)c1cc([*:2])c([*:1])cn1
>>
>> Plus a set of R-groups
>> [H]C([H])([H])[*:1]
>> [H][*:2]
>>
>> Reconnect the pieces to generate a molecule
>> CN(C)CC(Br)c1ccc(C)cn1
>>
>> Thanks,
>>
>> Pat
>>
>>
>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reassembling a molecule from R-groups

2018-04-15 Thread Patrick Walters
Thanks Andrew, the SMILES approach seemed to have quite a few edge cases so
I wrote something to work directly on a molecule.


#!/usr/bin/env python

import sys
from rdkit import Chem
from collections import defaultdict
from rdkit.Chem.rdchem import EditableMol


# Thanks to steeveslab-blog for example of how to edit RDKit molecules
# http://asteeves.github.io/blog/2015/01/14/editing-in-rdkit/
# Thanks to Andrew Dalke for the function name


def weld_r_groups(input_mol):
# First pass loop over atoms and find the atoms with an AtomMapNum
join_dict = defaultdict(list)
for atom in input_mol.GetAtoms():
map_num = atom.GetAtomMapNum()
if map_num > 0:
join_dict[map_num].append(atom)

# Second pass, transfer the atom maps to the neighbor atoms
for idx, atom_list in join_dict.items():
if len(atom_list) == 2:
atm_1, atm_2 = atom_list
nbr_1 = [x.GetOtherAtom(atm_1) for x in atm_1.GetBonds()][0]
nbr_1.SetAtomMapNum(idx)
nbr_2 = [x.GetOtherAtom(atm_2) for x in atm_2.GetBonds()][0]
nbr_2.SetAtomMapNum(idx)

# Nuke all of the dummy atoms
new_mol = Chem.DeleteSubstructs(input_mol, Chem.MolFromSmarts('[#0]'))

# Third pass - arrange the atoms with AtomMapNum, these will be
connected
bond_join_dict = defaultdict(list)
for atom in new_mol.GetAtoms():
map_num = atom.GetAtomMapNum()
if map_num > 0:
bond_join_dict[map_num].append(atom.GetIdx())

# Make an editable molecule and add bonds between atoms with
correspoing AtomMapNum
em = EditableMol(new_mol)
for idx, atom_list in bond_join_dict.items():
if len(atom_list) == 2:
start_atm, end_atm = atom_list
em.AddBond(start_atm, end_atm,
order=Chem.rdchem.BondType.SINGLE)

final_mol = em.GetMol()

# remove the AtomMapNum values
for atom in final_mol.GetAtoms():
atom.SetAtomMapNum(0)
final_mol = Chem.RemoveHs(final_mol)

return final_mol


if __name__ == "__main__":
mol_to_weld = Chem.MolFromSmiles(
"CN(C)CC(Br)c1cc([*:2])c([*:1])cn1.[H]C([H])([H])[*:1].[H][*:2]")
welded_mol = weld_r_groups(mol_to_weld)
print(Chem.MolToSmiles(welded_mol))


Best,

Pat


On Sun, Apr 15, 2018 at 12:16 PM, Patrick Walters 
wrote:

> Hi All,
>
> I was about to write a function to reassemble a molecule from a core +
> R-groups, but I thought I'd check and see if such a function already
> exists.  This is work with the output of rdRGroupDecomposition
>
> Gvien a core:
> CN(C)CC(Br)c1cc([*:2])c([*:1])cn1
>
> Plus a set of R-groups
> [H]C([H])([H])[*:1]
> [H][*:2]
>
> Reconnect the pieces to generate a molecule
> CN(C)CC(Br)c1ccc(C)cn1
>
> Thanks,
>
> Pat
>
>
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reassembling a molecule from R-groups

2018-04-15 Thread Andrew Dalke
Hi Pat,

  I wrote something like this for mmpdb, which is the MMPA code I helped 
develop, at https://github.com/rdkit/mmpdb .

It has one restriction, which I'll get to in a moment.

The general idea is to convert the attachment points to closures, join them 
with a ".", and canonicalize:

>>> from mmpdblib import smiles_syntax
>>> s1 = 
>>> smiles_syntax.convert_labeled_wildcards_to_closures("CN(C)CC(Br)c1cc([*:2])c([*:1])cn1")
>>> s1
'CN(C)CC(Br)c1cc%92c%91cn1'
>>> s2 = 
>>> smiles_syntax.convert_labeled_wildcards_to_closures("[H]C([*:1])([H])[H].[H][*:2]")
>>> s2
'[H]C%91([H])[H].[H]%92'
>>> from rdkit import Chem
>>> Chem.CanonSmiles(s1+"."+s2)
'Cc1ccc(C(Br)CN(C)C)nc1'

The smiles_syntax.py file does not use any of the rest of the code.


The restriction is that the code as-is assumes the wild card atoms like [*:1] 
are either immediately before or after the attachment point. Otherwise it will 
give you (using the R-groups you actually posted):

>>> s2 = 
>>> smiles_syntax.convert_labeled_wildcards_to_closures("[H]C([H])([H])[*:1].[H][*:2]")
Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/dalke/cvses/mmpdb/mmpdblib/smiles_syntax.py", line 130, in 
convert_labeled_wildcards_to_closures
return convert_wildcards_to_closures(new_smiles, offsets)
  File "/Users/dalke/cvses/mmpdb/mmpdblib/smiles_syntax.py", line 98, in 
convert_wildcards_to_closures
new_smiles, new_smiles[wildcard_start-1:])
NotImplementedError: ('intermediate groups not supported', 
'[H]C([H])([H])[*].[H][*]', ')[*].[H][*]')

All this means is I didn't write the code to count the number of intermediate 
branches/matched parentheses between the attachment point a and the wildcard 
atom. ("Count" because I would need to invert any chirality on the base atom if 
there were an odd number of intermediate groups.) Such code wouldn't be hard to 
add.

It's not there because my experience is that RDKit only placed the "*" atoms in 
one of those two locations. However, as I just learned, if you leave the 
hydrogens in then the [H] atoms have priority:

>>> Chem.SanitizeMol(mol,Chem.SANITIZE_ALL^Chem.SANITIZE_CLEANUP^Chem.SANITIZE_PROPERTIES)
rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
>>> Chem.MolToSmiles(mol)
'[H]C([H])([H])[*:1]'

Then again, explicit [H] atoms aren't important for your end goal, so you could 
just recanonicalize all of your R-groups first, to ensure they are in the RDKit 
form, then use the SMILES rewriter.

For what it's worth, I coined the term "welding" to describe this technique of 
converting the labeled R-groups into ring-closures, then "." (dis)connected 
them to parse them as a single odd-looking SMILES.

Andrew
da...@dalkescientific.com




> On Apr 15, 2018, at 21:16, Patrick Walters  wrote:
> 
> Hi All,
> 
> I was about to write a function to reassemble a molecule from a core + 
> R-groups, but I thought I'd check and see if such a function already exists.  
> This is work with the output of rdRGroupDecomposition
> 
> Gvien a core:
> CN(C)CC(Br)c1cc([*:2])c([*:1])cn1
> 
> Plus a set of R-groups
> [H]C([H])([H])[*:1]
> [H][*:2]
> 
> Reconnect the pieces to generate a molecule
> CN(C)CC(Br)c1ccc(C)cn1
> 
> Thanks,
> 
> Pat


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss