Hi Pat,

  I wrote something like this for mmpdb, which is the MMPA code I helped 
develop, at https://github.com/rdkit/mmpdb .

It has one restriction, which I'll get to in a moment.

The general idea is to convert the attachment points to closures, join them 
with a ".", and canonicalize:

>>> from mmpdblib import smiles_syntax
>>> s1 = 
>>> smiles_syntax.convert_labeled_wildcards_to_closures("CN(C)CC(Br)c1cc([*:2])c([*:1])cn1")
>>> s1
>>> s2 = 
>>> smiles_syntax.convert_labeled_wildcards_to_closures("[H]C([*:1])([H])[H].[H][*:2]")
>>> s2
>>> from rdkit import Chem
>>> Chem.CanonSmiles(s1+"."+s2)

The smiles_syntax.py file does not use any of the rest of the code.

The restriction is that the code as-is assumes the wild card atoms like [*:1] 
are either immediately before or after the attachment point. Otherwise it will 
give you (using the R-groups you actually posted):

>>> s2 = 
>>> smiles_syntax.convert_labeled_wildcards_to_closures("[H]C([H])([H])[*:1].[H][*:2]")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/dalke/cvses/mmpdb/mmpdblib/smiles_syntax.py", line 130, in 
    return convert_wildcards_to_closures(new_smiles, offsets)
  File "/Users/dalke/cvses/mmpdb/mmpdblib/smiles_syntax.py", line 98, in 
    new_smiles, new_smiles[wildcard_start-1:])
NotImplementedError: ('intermediate groups not supported', 
'[H]C([H])([H])[*].[H][*]', ')[*].[H][*]')

All this means is I didn't write the code to count the number of intermediate 
branches/matched parentheses between the attachment point a and the wildcard 
atom. ("Count" because I would need to invert any chirality on the base atom if 
there were an odd number of intermediate groups.) Such code wouldn't be hard to 

It's not there because my experience is that RDKit only placed the "*" atoms in 
one of those two locations. However, as I just learned, if you leave the 
hydrogens in then the [H] atoms have priority:

>>> Chem.MolToSmiles(mol)

Then again, explicit [H] atoms aren't important for your end goal, so you could 
just recanonicalize all of your R-groups first, to ensure they are in the RDKit 
form, then use the SMILES rewriter.

For what it's worth, I coined the term "welding" to describe this technique of 
converting the labeled R-groups into ring-closures, then "." (dis)connected 
them to parse them as a single odd-looking SMILES.


> On Apr 15, 2018, at 21:16, Patrick Walters <wpwalt...@gmail.com> wrote:
> Hi All,
> I was about to write a function to reassemble a molecule from a core + 
> R-groups, but I thought I'd check and see if such a function already exists.  
> This is work with the output of rdRGroupDecomposition
> Gvien a core:
> CN(C)CC(Br)c1cc([*:2])c([*:1])cn1
> Plus a set of R-groups
> [H]C([H])([H])[*:1]
> [H][*:2]
> Reconnect the pieces to generate a molecule
> CN(C)CC(Br)c1ccc(C)cn1
> Thanks,
> Pat

Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Rdkit-discuss mailing list

Reply via email to