Brian, Greg, and David,

    Thank you for your suggestions.  I will try to respond to your questions 
and comments:

    I am trying to reproduce results from a literature paper that used 
non-PYTHON and non-RDkit
code to identify certain patterns in molecules as part of a group contribution 
scheme resulting in the
prediction of thermodynamic quantities.  I have a training set of molecules and 
the results of calculations
for that training set (individual counts of groups of atoms and resulting 
energies).  Hence, my first goal 
is to reproduce the results reported for that training set, but using PYTHON 
and RDkit.  Since my goal 
is to reproduce literature results as closely as possible, I am not in a 
position to debate the logic of the 
original authors in their assignments of SMARTS/SMILES matching and counts.

    After this initial goal is met, I might consider alternative pattern 
matching and counting schemes and

compare those results to the literature results.  In fact, that would be good 

    As I mentioned in my first email on this topic, I do think I have come up 
with a "rule" that will give me

the correct answer (I have tried it for 8 cases using pencil and paper), my 
challenge is to code up the
"rule" in PYTHON.  I am a beginner at PYTHON, so I am struggling to get this 
idea into functional, bug-free
code.  Peter Shenkin's idea/code is getting close to what needs to be done, but 
doesn't handle all the cases.


    Jim Metz

-----Original Message-----
From: Brian Cole <>
To: James T. Metz <>
Cc: RDKit Discuss <>
Sent: Tue, Nov 7, 2017 7:23 pm
Subject: Re: [Rdkit-discuss] Python code to merge tuples from a SMARTS match

You can use Chem.CanonicalRankAtoms to de-duplicate the SMARTS matches based 
upon the atom symmetry like this: 

def count_unique_substructures(smiles, smarts):
    mol = Chem.MolFromSmiles(smiles)
    ranks = list(Chem.CanonicalRankAtoms(mol, breakTies=False))
    pattern = Chem.MolFromSmarts(smarts)
    unique_sets_of_atoms = set()
    for match in mol.GetSubstructMatches(pattern):
        match_ranks = frozenset([ranks[idx] for idx in match])
    return len(unique_sets_of_atoms)

However, this returns 1 for each of your cases. It's not clear to me why you 
would want your 2nd case to return 2 as all paths from a chlorine to a chlorine 
through 2 carbons are symmetric. 

>>> SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'
>>> smiles1 = 'ClC(Cl)CCl'
>>> smiles2 = 'ClC(Cl)C(Cl)(Cl)(Cl)'

>>> count_unique_substructures(smiles1, SMARTS)

>>> count_unique_substructures(smiles2, SMARTS)


On Tue, Nov 7, 2017 at 7:38 PM, James T. Metz via Rdkit-discuss 
<> wrote:

RDkit Discussion Group,

    I have written a SMARTS to detect vicinal chlorine groups

using RDkit.  There are 4 atoms involved in a vicinal chlorine group.

SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'

    I am trying to count the number of ("unique") occurrences of this


    For some molecules with symmetry, this results in


    For the molecule, smiles1 below, I want to obtain

a count of 1 i.e., 1 tuple of 4 atoms.

    smiles1 = 'ClC(Cl)CCl'


    However, using the SMARTS above, I obtain 2 tuples of 4 atoms.  
Beginning with a MOL file representation of smiles1, I get

    ((1,2,4,3), (0,2,4,3))

    One possible solution is to somehow merge the two tuples according 

to a "rule."  One rule that works is "if 3 of the atom indices are the same, 
then combine into one tuple."

    However, the rule needs a bit of modification for more complicated
cases (higher symmetry).


    smiles2 = 'ClC(Cl)CCl(Cl)(Cl)

    My goal is to get 2 tuples of 4 atoms for smiles2

    smiles2 is somewhat tricky because there are either

2 groups of 3 (4 atom) tuples, or 3 groups of 2 (4 atom)
tuples depending on how you choose your 3 atom indices.

    Again, if my goal is to get 2 tuples, then I need to somehow

pick the largest group, i.e., 2 groups of 3 tuples to do the merge 
operation which will give me 2 remaining groups (desired).

    I have already checked stackoverflow and a few other places

for PYTHON code to do the necessary merging, but I could not
find anything specific and appropriate.

    I would be most grateful if anyone has ideas how to do this.  I

suspect the answer is a few lines of well-written PYTHON code, 
and not modifying the SMARTS (I could be mistaken!).

    Thank you.


    Jim Metz

Check out the vibrant tech community on one of the world's most
engaging tech sites,!
Rdkit-discuss mailing list

Check out the vibrant tech community on one of the world's most
engaging tech sites,!
Rdkit-discuss mailing list

Reply via email to