Sorry this one slipped off my radar. Hopefully a late reply is still
helpful.

On Thu, Oct 12, 2017 at 10:10 PM, Jason Biggs <jasondbi...@gmail.com> wrote:

>
> I'm creating a public-facing data structure that uses the rdkit as the
> back end.  I don't want to expose three different levels of existence for a
> hydrogen - I want them to be actual Atoms or be implied by valence.
>

That shouldn't be too bad.


> What are the consequences of always converting anything the rdkit would
> return via GetNumExplicitHs into an actual atom with the addHs function.
>

Hs are in general left off because they add computational complexity (more
atoms to consider) and take up memory. They also cause simple problems like
making 2D drawings of molecules really crowded and harder to understand.


> Are there families of functions in the rdkit that just do not work if any
> hydrogen is instantiated?
>

It depends on what you mean by "do not work". I would expect every function
to run without error, but you are likely to get different values for many
functions. For example, the fingerprinting functions will return different
FPs for molecules with and without Hs:

In [10]: m = Chem.MolFromSmiles('CCOC')

In [11]: mh = Chem.AddHs(m)

In [14]:
len(rdMolDescriptors.GetMorganFingerprint(m,2).GetNonzeroElements())
Out[14]: 8

In [15]:
len(rdMolDescriptors.GetMorganFingerprint(mh,2).GetNonzeroElements())
Out[15]: 12



> What other strategy might I use to reduce the number of hydrogen types to
> 2?
>

Depends on what you're doing. For most applications you should be fine
using the standard suppressed H representation and then asking each atom
for the number of Hs using the method atom.GetTotalNumHs():

In [20]: m.GetAtomWithIdx(0).GetTotalNumHs()
Out[20]: 3


This approach isolates you from the details of the two different types of
"implicit" Hs.

Note that the same method can also work on molecules that have Hs attached,
but you need to provide an additional argument to have it count neighbors:

In [22]: mh.GetAtomWithIdx(0).GetTotalNumHs()
Out[22]: 0

In [23]: mh.GetAtomWithIdx(0).GetTotalNumHs(includeNeighbors=True)
Out[23]: 3


If you can provide more info about what you're trying to do, I can try and
provide more help, but in general I'd suggest sticking with the
suppressed-H representation whenever possible.

I found this interesting discussion on the matter, https://sourceforge.
> net/p/rdkit/mailman/message/30200937/, but it looks like the branch
> mentioned there was abandoned.
>

Nope, it wasn't abandoned, but it hasn't really started in earnest yet.
Making that kind of backwards incompatible change is difficult and painful.
It will, eventually, come.

-greg
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to