Hi Pat,
I wrote something like this for mmpdb, which is the MMPA code I helped
develop, at https://github.com/rdkit/mmpdb .
It has one restriction, which I'll get to in a moment.
The general idea is to convert the attachment points to closures, join them
with a ".", and canonicalize:
>>> from mmpdblib import smiles_syntax
>>> s1 =
>>> smiles_syntax.convert_labeled_wildcards_to_closures("CN(C)CC(Br)c1cc([*:2])c([*:1])cn1")
>>> s1
'CN(C)CC(Br)c1cc%92c%91cn1'
>>> s2 =
>>> smiles_syntax.convert_labeled_wildcards_to_closures("[H]C([*:1])([H])[H].[H][*:2]")
>>> s2
'[H]C%91([H])[H].[H]%92'
>>> from rdkit import Chem
>>> Chem.CanonSmiles(s1+"."+s2)
'Cc1ccc(C(Br)CN(C)C)nc1'
The smiles_syntax.py file does not use any of the rest of the code.
The restriction is that the code as-is assumes the wild card atoms like [*:1]
are either immediately before or after the attachment point. Otherwise it will
give you (using the R-groups you actually posted):
>>> s2 =
>>> smiles_syntax.convert_labeled_wildcards_to_closures("[H]C([H])([H])[*:1].[H][*:2]")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/dalke/cvses/mmpdb/mmpdblib/smiles_syntax.py", line 130, in
convert_labeled_wildcards_to_closures
return convert_wildcards_to_closures(new_smiles, offsets)
File "/Users/dalke/cvses/mmpdb/mmpdblib/smiles_syntax.py", line 98, in
convert_wildcards_to_closures
new_smiles, new_smiles[wildcard_start-1:])
NotImplementedError: ('intermediate groups not supported',
'[H]C([H])([H])[*].[H][*]', ')[*].[H][*]')
All this means is I didn't write the code to count the number of intermediate
branches/matched parentheses between the attachment point a and the wildcard
atom. ("Count" because I would need to invert any chirality on the base atom if
there were an odd number of intermediate groups.) Such code wouldn't be hard to
add.
It's not there because my experience is that RDKit only placed the "*" atoms in
one of those two locations. However, as I just learned, if you leave the
hydrogens in then the [H] atoms have priority:
>>> Chem.SanitizeMol(mol,Chem.SANITIZE_ALL^Chem.SANITIZE_CLEANUP^Chem.SANITIZE_PROPERTIES)
rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
>>> Chem.MolToSmiles(mol)
'[H]C([H])([H])[*:1]'
Then again, explicit [H] atoms aren't important for your end goal, so you could
just recanonicalize all of your R-groups first, to ensure they are in the RDKit
form, then use the SMILES rewriter.
For what it's worth, I coined the term "welding" to describe this technique of
converting the labeled R-groups into ring-closures, then "." (dis)connected
them to parse them as a single odd-looking SMILES.
Andrew
[email protected]
> On Apr 15, 2018, at 21:16, Patrick Walters <[email protected]> wrote:
>
> Hi All,
>
> I was about to write a function to reassemble a molecule from a core +
> R-groups, but I thought I'd check and see if such a function already exists.
> This is work with the output of rdRGroupDecomposition
>
> Gvien a core:
> CN(C)CC(Br)c1cc([*:2])c([*:1])cn1
>
> Plus a set of R-groups
> [H]C([H])([H])[*:1]
> [H][*:2]
>
> Reconnect the pieces to generate a molecule
> CN(C)CC(Br)c1ccc(C)cn1
>
> Thanks,
>
> Pat
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss