Hi Rafal,
Before I provide something of an answer, a really important remark:
There is almost no imaginable real-world situation where the time that
counting the number of Hs takes is actually important, so it's not really
worth worrying about.
The other method I could think of, and one that would limit the number of
times you have to go across the C++/Python border would be to get the
molecular formula and then use a regex to pull out the number of Hs:
In [12]: formu =
rdMolDescriptors.CalcMolFormula(Chem.MolFromSmiles('CCCCCCCNCCO.Cl'))
In [13]: formu
Out[13]: 'C9H22ClNO'
In [14]: re.findall(r'H([0-9]*)',formu)
Out[14]: ['22']
That is faster than your idea:
In [15]: matcher = re.compile(r'H([0-9]*)')
In [23]: armodafinil =
Chem.MolFromSmiles('NC(=O)C[S@@](=O)C(c1ccccc1)c1ccccc1')
In [24]: def h_count2(mol):
...: return sum(x.GetTotalNumHs() for x in mol.GetAtoms())
...:
In [25]: def h_count1(mol,matcher=matcher):
...: return sum(int(x) for x in
matcher.findall(rdMolDescriptors.CalcMolFormula(mol)))
...:
In [26]: %timeit h_count2(armodafinil)
31.8 µs ± 5.74 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [27]: %timeit h_count1(armodafinil)
6.22 µs ± 1.48 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
But, in the end, I think this time difference is really unlikely to
actually be important and I think sum([atom.GetTotalNumHs() for atom in
mol.GetAtoms() ]) is the best way to do it since it makes your intent much
clearer.
-greg
On Wed, Mar 13, 2019 at 11:17 AM Rafal Roszak <[email protected]> wrote:
> Hello group,
>
> Is there any fast way to count all hydrogen atoms in molecule?
> The only idea which I have is:
> sum([atom.GetTotalNumHs() for atom in mol.GetAtoms() ])
> which probably is not the most optimal because it require iteration
> over atom at python end.
> For all other atoms I found such solution:
> from rdkit.Chem import rdqueries
> q = rdqueries.AtomNumEqualsQueryAtom(6)
> len(mol.GetAtomsMatchingQuery(q))
>
> (in mailing-list archive:
> https://sourceforge.net/p/rdkit/mailman/message/34524687/)
> but this is valid only for atoms present in molecular graph, so is there
> any smart solution for H?
>
> Regards,
>
> Rafal
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss