[Rdkit-discuss] Lynch-Willett algorithm

Mikolaj Kowalik Mon, 30 Dec 2013 16:34:16 -0800

Dear All,

Firstly, a little disclaimer. The question I have may seem to be rather
off-topic on that list, but thanks to RDKit's users community I have learned
a lot of things regarding chemoinformatics in general, so I was hoping to
find some help on topic not directly related to RDKit itself.


I am trying to implement Lynch-Willett (LW) algorithm for automatic detection
of reaction sites [1,2]. Briefly speaking, the algorithm removes, in a
stepwise manner, substructures common to a reaction's reactant and product
based on extended connectivity indices. My problem lies in understanding how
algorithm's stopping condition can be reached if an intermediate reaction
diagram contains common (for both reaction's sides) but disconnected
fragments.

For those interested, I am including the longer and more in-depth explanation
of the LW algorithm and my problem below.  

LW algorithm is based on Morgan algorithm:

1) Initially, each atom is assigned an index i.e. an integer $V^{1}_{i}$
derived from its type and bond pattern (I know, it's rather vague but I'm
practically quoting the sources. Using original Morgan approach, node
connectivity, will do.)

2) Higher order indices
\[
    V^{n}_{i} = 2 * V^{n - 1}_{i} + \sum_{k} V^{n-1}_{k}
\]
where summation is over all atoms adjacent to atom $i$, are calculated
simultaneously for reactant and product until there is NO such pair for
which $V^{n}_{r} = V^{n}_{p}$ ($r$ enumerates atoms of the reactant, $p$
atoms of the product).

3) All atom pairs, for which $V^{n - 1}_{r} = V^{n - 1}_{p}$ are considered
to be in centers of identical circular substructures of radius $(n - 2)$
bonds. Atoms belonging to those substructures are deleted from the reactant
and product.

4) The above process is repeated until all substructures common to the
reactant and product are removed.

The algorithm's idea seems to be relatively straightforward, but I am stuck
on the first example from Ref. [1]. Authors consider a reaction which can be
written as

    CC(=O)CC(C)C(CC#N)C(=O)N>>CC(=O)CC(C)C(CC#N)C#N

The goal is to identify the change C(=O)N>>C#N.

After applying steps 1) through 3) for the first time we end up with an
intermediate reaction diagram

    CC#N.C(=O)N>>CC#N.C#N

No problem here, my current implementation of LW algorithm arrives to this
point as expected. But according to step 4), we should 'rinse and repeat' to
eliminate the remaining common fragment 'CC#N'.

And here is exactly my problem. I do not see how the algorithm's stopping
condition can be reached if there are two identical, yet disconnected
fragments. As I see it, whatever initial values indices we assign to the
remaining atoms, the respective atoms on common fragment 'CC#N' will have
*always* the same indices if we use the increment formula above. Thus,
algorithm will enter an infinite loop, and actually does.

I must be missing something obvious, but despite digging in literature and
extensive 'googling' I cannot figure it out. Thus, any suggestion what I am
missing and/or I got wrong are welcome.


[1] MF Lynch and P Willett, J Chem Inf Comput Sci, 18, 154--159 (1978)
[2] WL Chen, DZ Chen, and KT Taylor, WIREs Comput Mol Sci, 3, 560--593 (2013)


Regards,

-- 
Mikołaj Kowalik

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] Lynch-Willett algorithm

Reply via email to