Hi Jose Manuel,

the problem is just that the scaffold returned by MurckoScaffold.GetScaffoldForMol() has no explicit hydrogens on the imino N:

for atom in ms.GetAtoms():
    print(atom.GetIdx(), atom.GetAtomicNum(), atom.GetNumExplicitHs(),
          atom.GetNumImplicitHs(), atom.GetIsAromatic())

0 7 1 0 True
1 6 0 1 True
2 7 0 0 True
3 6 0 0 True
4 6 0 0 True
5 7 0 0 False <--
6 7 1 0 True
7 6 0 1 True
8 7 0 0 True
9 6 0 0 True

Therefore, after sanitizing, that nitrogen is set to be a radical:

ms_all.GetAtomWithIdx(5).GetNumRadicalElectrons()
1

and the Unicode bullet operator used to represent the radical cannot be encoded by the latin-1 codec, hence theUnicodeEncodeError.

If you do a

ms_all.GetAtomWithIdx(5).SetNumExplicitHs(1)

before sanitizing, your problem will disappear.

Cheers,
p.

On 28/08/2019 13:22, Jose-Manuel Gally wrote:

Dear all,

I noticed a strange behavior when extracting murcko scaffolds from preprocessed molecules with an inhouse standardization protocol.

I made a gist to illustrate the problem:

https://gist.github.com/jose-manuel/04d69dd3ac52cca74449e73d614df42e

This leaves me with several questions:

 1. When working with the standardized molecule, I get a drawing of
    the murcko scaffold without Hs on the terminal nitrogen.
    Why is that? I would expect either a radical (so with '.') or an
    additional hydrogen. The smiles does not indicate the molecule is
    a radical either.

 2. When sanitizing the molecule to update the smiles, I get a radical
    by default, instead of a H bound to the nitrogen. Why is not a H
    added instead? If I switch off the FINDRADICALS sanitization flag,
    I do not get an extra hydrogen either...

 3. When I apply the default Sanitization to the murcko scaffold and
    try to display it, I get an UnicodeEncodeError.
    If I manually replace [N] by N in the smiles and create a new
    molecule from it, I don't get an error anymore. Is there a
    workaround? Interestingly, the function Draw.MolsToGridImage works
    just fine but I could not find how to change the atom label size
    and bond width.

Am I missing something obvious?

Many thanks in advance as any feedback would be much appreciated!

Cheers,
Jose Manuel

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient&utm_term=icon> Virus-free. www.avast.com <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient&utm_term=link>

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to