On Dec 2, 2016, at 5:46 PM, Brian Kelley wrote: > I hacked a version of RDKit's smiles parser to compute heavy atom count, > perhaps some version of this could be used to check smiles validity without > making the actual molecule.
FWIW, here's my regex code for it, which makes the assumption that only "[H]" and anything with a "*" are not heavy. _atom_pat = re.compile(r""" ( Cl? | Br? | [NOSPFIbcnosp] | \[[^]]*\] ) """, re.X) def get_num_heavies(smiles): num_atoms = 0 for m in _atom_pat.finditer(smiles): text = m.group() if text == "[H]" or "*" in text: continue num_atoms += 1 return num_atoms Thus turns out to be a quite handy piece of functionality. Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss