RDkit Discussion Group,

I have written a SMARTS to detect vicinal chlorine groups
using RDkit. There are 4 atoms involved in a vicinal chlorine group.
SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'
I am trying to count the number of ("unique") occurrences of this
pattern.
For some molecules with symmetry, this results in
over-counting.
For the molecule, smiles1 below, I want to obtain
a count of 1 i.e., 1 tuple of 4 atoms.
smiles1 = 'ClC(Cl)CCl'
However, using the SMARTS above, I obtain 2 tuples of 4 atoms.
Beginning with a MOL file representation of smiles1, I get
((1,2,4,3), (0,2,4,3))
One possible solution is to somehow merge the two tuples according
to a "rule." One rule that works is "if 3 of the atom indices are the same,
then combine into one tuple."
However, the rule needs a bit of modification for more complicated
cases (higher symmetry).
Consider
smiles2 = 'ClC(Cl)CCl(Cl)(Cl)
My goal is to get 2 tuples of 4 atoms for smiles2
smiles2 is somewhat tricky because there are either
2 groups of 3 (4 atom) tuples, or 3 groups of 2 (4 atom)
tuples depending on how you choose your 3 atom indices.
Again, if my goal is to get 2 tuples, then I need to somehow
pick the largest group, i.e., 2 groups of 3 tuples to do the merge
operation which will give me 2 remaining groups (desired).
I have already checked stackoverflow and a few other places
for PYTHON code to do the necessary merging, but I could not
find anything specific and appropriate.
I would be most grateful if anyone has ideas how to do this. I
suspect the answer is a few lines of well-written PYTHON code,
and not modifying the SMARTS (I could be mistaken!).
Thank you.
Regards,
Jim Metz

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss