Greg can correct me if I'm wrong(1), but in RDKit there's actually three
"levels" of hydrogens:

* "Physical" hydrogens, which are represented as actual, independent atoms
in the atom graph. ("Physical hydrogens" is what I'm calling them - I don't
know if RDKit has an official term for them.)

* "Explicit" hydrogens, which are represented as a numeric annotation on
their attached heavy atom. (And *not* as a separate atom object.)

* "Implicit" hydrogens, which aren't actually represented anywhere, but are
calculated from the standard valence of the heavy atom, and how many are
occupied by actual atoms and explicit hydrogens.

Generally, except for some coordinate calculations, RDKit seems to be built
around working with molecules with explicit or implicit hydrogens. This is
why when you read in a molecule, RDKit normally removes any physical
hydrogens. (Note that for most file reading code there's a removeHs
parameter you can set to False to change this behavior, and read explicitly
listed hydrogens as physical hydrogens.)

By default "removing hydrogens" means turning them into implicit
hydrogens(2), but the RemoveHs() function has an "updateExplicitCount"
parameter which will cause the removed hydrogens to be turned into explicit
hydrogens instead. The standard MOL file loading code doesn't use this
option, though, so the hydrogens in the molecule are usually converted into
implicit when you read things in.

AddHs(), of course, turns explicit and implicit hydrogens into physical
hydrogens. (Though the "explicitOnly" parameter can be used to control
this.) It does annotate whether these physical hydrogens came from either
the implicit or explicit pool, so you can round trip things through AddHs()
and RemoveHs() appropriately. (There's also a "implicitOnly" parameter on
RemoveHs() which will only remove those hydrogens.)

Regards,
-Rocco

(1) I don't think the RDKit hydrogen model has ever been formalized in one
place for user-facing documentation, so this is the understanding I've
gotten from banging my head against various hydrogen-related issues.

(2) There's special complications here that there are certain structures,
such as imidazole, which needs physical or explicit hydrogens on one of the
nitrogens in order to Kekulize properly. If you're implicit only, the RDKit
sanitizer will choke. Thus, there's special casing in various Add/RemoveHs
function to avoid implicit-izing these critical hydrogens.

On Thu, Sep 8, 2016 at 1:46 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu>
wrote:

> On 09/08/2016 10:25 AM, Greg Landrum wrote:
> ...
> > Why do you want 2D drawings that include H atoms?
>
> On the subject of H atoms: when I read in the MOL file that has them, I
> need to explicitly call AddHs() in order to have them drawn.
>
> Question: do they actually get stripped off by the reader and re-added
> by AddHs()? Or are they there "hidden" somehow and AddHs() just
> "unhides" them?
>
> TIA
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>
>
> ------------------------------------------------------------
> ------------------
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to