Dear All, Firstly, a little disclaimer. The question I have may seem to be rather off-topic on that list, but thanks to RDKit's users community I have learned a lot of things regarding chemoinformatics in general, so I was hoping to find some help on topic not directly related to RDKit itself.
I am trying to implement Lynch-Willett (LW) algorithm for automatic detection of reaction sites [1,2]. Briefly speaking, the algorithm removes, in a stepwise manner, substructures common to a reaction's reactant and product based on extended connectivity indices. My problem lies in understanding how algorithm's stopping condition can be reached if an intermediate reaction diagram contains common (for both reaction's sides) but disconnected fragments. For those interested, I am including the longer and more in-depth explanation of the LW algorithm and my problem below. LW algorithm is based on Morgan algorithm: 1) Initially, each atom is assigned an index i.e. an integer $V^{1}_{i}$ derived from its type and bond pattern (I know, it's rather vague but I'm practically quoting the sources. Using original Morgan approach, node connectivity, will do.) 2) Higher order indices \[ V^{n}_{i} = 2 * V^{n - 1}_{i} + \sum_{k} V^{n-1}_{k} \] where summation is over all atoms adjacent to atom $i$, are calculated simultaneously for reactant and product until there is NO such pair for which $V^{n}_{r} = V^{n}_{p}$ ($r$ enumerates atoms of the reactant, $p$ atoms of the product). 3) All atom pairs, for which $V^{n - 1}_{r} = V^{n - 1}_{p}$ are considered to be in centers of identical circular substructures of radius $(n - 2)$ bonds. Atoms belonging to those substructures are deleted from the reactant and product. 4) The above process is repeated until all substructures common to the reactant and product are removed. The algorithm's idea seems to be relatively straightforward, but I am stuck on the first example from Ref. [1]. Authors consider a reaction which can be written as CC(=O)CC(C)C(CC#N)C(=O)N>>CC(=O)CC(C)C(CC#N)C#N The goal is to identify the change C(=O)N>>C#N. After applying steps 1) through 3) for the first time we end up with an intermediate reaction diagram CC#N.C(=O)N>>CC#N.C#N No problem here, my current implementation of LW algorithm arrives to this point as expected. But according to step 4), we should 'rinse and repeat' to eliminate the remaining common fragment 'CC#N'. And here is exactly my problem. I do not see how the algorithm's stopping condition can be reached if there are two identical, yet disconnected fragments. As I see it, whatever initial values indices we assign to the remaining atoms, the respective atoms on common fragment 'CC#N' will have *always* the same indices if we use the increment formula above. Thus, algorithm will enter an infinite loop, and actually does. I must be missing something obvious, but despite digging in literature and extensive 'googling' I cannot figure it out. Thus, any suggestion what I am missing and/or I got wrong are welcome. [1] MF Lynch and P Willett, J Chem Inf Comput Sci, 18, 154--159 (1978) [2] WL Chen, DZ Chen, and KT Taylor, WIREs Comput Mol Sci, 3, 560--593 (2013) Regards, -- MikoĊaj Kowalik ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss