Hi Rafal,

Before I provide something of an answer, a really important remark:
There is almost no imaginable real-world situation where the time that
counting the number of Hs takes is actually important, so it's not really
worth worrying about.

The other method I could think of, and one that would limit the number of
times you have to go across the C++/Python border would be to get the
molecular formula and then use a regex to pull out the number of Hs:

In [12]: formu =
rdMolDescriptors.CalcMolFormula(Chem.MolFromSmiles('CCCCCCCNCCO.Cl'))


In [13]: formu

Out[13]: 'C9H22ClNO'

In [14]: re.findall(r'H([0-9]*)',formu)

Out[14]: ['22']


That is faster than your idea:

In [15]: matcher = re.compile(r'H([0-9]*)')


In [23]: armodafinil =
Chem.MolFromSmiles('NC(=O)C[S@@](=O)C(c1ccccc1)c1ccccc1')


In [24]: def h_count2(mol):
    ...:     return sum(x.GetTotalNumHs() for x in mol.GetAtoms())
    ...:


In [25]: def h_count1(mol,matcher=matcher):
    ...:     return sum(int(x) for x in
matcher.findall(rdMolDescriptors.CalcMolFormula(mol)))
    ...:


In [26]: %timeit h_count2(armodafinil)

31.8 µs ± 5.74 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [27]: %timeit h_count1(armodafinil)

6.22 µs ± 1.48 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)




But, in the end, I think this time difference is really unlikely to
actually be important and I think sum([atom.GetTotalNumHs() for atom in
mol.GetAtoms() ]) is the best way to do it since it makes your intent much
clearer.

-greg


On Wed, Mar 13, 2019 at 11:17 AM Rafal Roszak <rmrmg.c...@gmail.com> wrote:

> Hello group,
>
> Is there any fast way to count all hydrogen atoms in molecule?
> The only idea which I have is:
> sum([atom.GetTotalNumHs() for atom in mol.GetAtoms() ])
> which probably is not the most optimal because it require iteration
> over atom at python end.
> For all other atoms I found such solution:
> from rdkit.Chem import rdqueries
> q = rdqueries.AtomNumEqualsQueryAtom(6)
> len(mol.GetAtomsMatchingQuery(q))
>
> (in mailing-list archive:
> https://sourceforge.net/p/rdkit/mailman/message/34524687/)
> but this is valid only for atoms present in molecular graph, so is there
> any smart solution for H?
>
> Regards,
>
> Rafal
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to