On Dec 2, 2016, at 5:46 PM, Brian Kelley wrote:
> I hacked a version of RDKit's smiles parser to compute heavy atom count, 
> perhaps some version of this could be used to check smiles validity without 
> making the actual molecule.

FWIW, here's my regex code for it, which makes the assumption that only "[H]" 
and anything with a "*" are not heavy.

_atom_pat = re.compile(r"""
(
 Cl? |
 Br? |
 [NOSPFIbcnosp] |
 \[[^]]*\]
)
""", re.X)

def get_num_heavies(smiles):
    num_atoms = 0
    for m in _atom_pat.finditer(smiles):
        text = m.group()
        if text == "[H]" or "*" in text:
            continue
        num_atoms += 1
    return num_atoms

Thus turns out to be a quite handy piece of functionality.


                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to