Hi Rafal, Before I provide something of an answer, a really important remark: There is almost no imaginable real-world situation where the time that counting the number of Hs takes is actually important, so it's not really worth worrying about.
The other method I could think of, and one that would limit the number of times you have to go across the C++/Python border would be to get the molecular formula and then use a regex to pull out the number of Hs: In [12]: formu = rdMolDescriptors.CalcMolFormula(Chem.MolFromSmiles('CCCCCCCNCCO.Cl')) In [13]: formu Out[13]: 'C9H22ClNO' In [14]: re.findall(r'H([0-9]*)',formu) Out[14]: ['22'] That is faster than your idea: In [15]: matcher = re.compile(r'H([0-9]*)') In [23]: armodafinil = Chem.MolFromSmiles('NC(=O)C[S@@](=O)C(c1ccccc1)c1ccccc1') In [24]: def h_count2(mol): ...: return sum(x.GetTotalNumHs() for x in mol.GetAtoms()) ...: In [25]: def h_count1(mol,matcher=matcher): ...: return sum(int(x) for x in matcher.findall(rdMolDescriptors.CalcMolFormula(mol))) ...: In [26]: %timeit h_count2(armodafinil) 31.8 µs ± 5.74 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [27]: %timeit h_count1(armodafinil) 6.22 µs ± 1.48 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each) But, in the end, I think this time difference is really unlikely to actually be important and I think sum([atom.GetTotalNumHs() for atom in mol.GetAtoms() ]) is the best way to do it since it makes your intent much clearer. -greg On Wed, Mar 13, 2019 at 11:17 AM Rafal Roszak <rmrmg.c...@gmail.com> wrote: > Hello group, > > Is there any fast way to count all hydrogen atoms in molecule? > The only idea which I have is: > sum([atom.GetTotalNumHs() for atom in mol.GetAtoms() ]) > which probably is not the most optimal because it require iteration > over atom at python end. > For all other atoms I found such solution: > from rdkit.Chem import rdqueries > q = rdqueries.AtomNumEqualsQueryAtom(6) > len(mol.GetAtomsMatchingQuery(q)) > > (in mailing-list archive: > https://sourceforge.net/p/rdkit/mailman/message/34524687/) > but this is valid only for atoms present in molecular graph, so is there > any smart solution for H? > > Regards, > > Rafal > > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss