# Re: [Rdkit-discuss] Python code to merge tuples from a SMARTS match

```Peter,
```
```
Thank you for your suggestions and accompanying code.

I have modified your code slightly and have created 3 tuples
for testing.  Your code works for tuples, match1 and match2, but
does not work for match3.  The code should return a 2 for match3,
because there are 2 sets of 3 tuples each containing 4 atom indices.
Using my "rule" that, "if 3 indices are the same, they are in one group
and one must form the groups of the largest possible size", one arrives
at 2 groups.  The merge function should then select one tuple from
each group, resulting in a count of 2 (for the final number of groups).

Keep in mind that I will not know how many groups of tuples will be

created for any given molecule.  Hence, I can not use hard coded array
indices.

Any ideas how to modify the code below to obtain the desired result

for tuple, match3, and how to deal with tuples of various sizes?

Regards,

Jim Metz

def merge2(matches):
if len(matches) > 1:
d = {}
for match in matches:
t = (matches[0], matches[1])
if (matches[0] < matches[1]):
t = (matches[0], matches[1])
else:
t = (matches[1], matches[0])
d[t] = match
merged_match = (d[t],)
else:
merged_match = matches

count = len(merged_match)
return(count)

match1 = ((0,2,3,4),)
match2 = ((0,2,3,4), (1,2,3,4))
match3 = ((0,2,4,5), (1,2,5,6), (2,3,4,5), (2,3,5,6), (0,2,5,6), (1,2,4,5))
matches = match2   # Change the number to test different tuples

output = merge2(matches)
print("Output is   ", output)

-----Original Message-----
From: Peter S. Shenkin <shen...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
Sent: Tue, Nov 7, 2017 7:05 pm
Subject: Re: [Rdkit-discuss] Python code to merge tuples from a SMARTS match

I think you probably used a slightly different SMILES than the one you showed.
The one you showed should have given ((0,1,3,4),(2,1,3,4)).

The proper merge rule would then be to consider all matches equivalent if the
2nd and 3rd atom in the match agree, in any order; i.e, the two carbons,
indices 1 and 3 in this case.

So to do this, for each molecule, do something like this:

d = dict{}
for match in matches:
t = (match[1], match[2])
if match[1] < match[2] ):
t = (match[1], match[2])
else:
t = (match[2], match[1])
d[t] = match

You will wind up with as many dictionary elements as there are matches.

-P.

On Tue, Nov 7, 2017 at 7:38 PM, James T. Metz via Rdkit-discuss
<rdkit-discuss@lists.sourceforge.net> wrote:

RDkit Discussion Group,

I have written a SMARTS to detect vicinal chlorine groups

using RDkit.  There are 4 atoms involved in a vicinal chlorine group.

SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'

I am trying to count the number of ("unique") occurrences of this

pattern.

For some molecules with symmetry, this results in

over-counting.

For the molecule, smiles1 below, I want to obtain

a count of 1 i.e., 1 tuple of 4 atoms.

smiles1 = 'ClC(Cl)CCl'

However, using the SMARTS above, I obtain 2 tuples of 4 atoms.
Beginning with a MOL file representation of smiles1, I get

((1,2,4,3), (0,2,4,3))

One possible solution is to somehow merge the two tuples according

to a "rule."  One rule that works is "if 3 of the atom indices are the same,
then combine into one tuple."

However, the rule needs a bit of modification for more complicated
cases (higher symmetry).

Consider

smiles2 = 'ClC(Cl)CCl(Cl)(Cl)

My goal is to get 2 tuples of 4 atoms for smiles2

smiles2 is somewhat tricky because there are either

2 groups of 3 (4 atom) tuples, or 3 groups of 2 (4 atom)
tuples depending on how you choose your 3 atom indices.

Again, if my goal is to get 2 tuples, then I need to somehow

pick the largest group, i.e., 2 groups of 3 tuples to do the merge
operation which will give me 2 remaining groups (desired).

I have already checked stackoverflow and a few other places

for PYTHON code to do the necessary merging, but I could not
find anything specific and appropriate.

I would be most grateful if anyone has ideas how to do this.  I

suspect the answer is a few lines of well-written PYTHON code,
and not modifying the SMARTS (I could be mistaken!).

Thank you.

Regards,

Jim Metz

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

```
```------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
```_______________________________________________