On Tue, Dec 31, 2013 at 1:32 AM, Mikolaj Kowalik <mx...@psu.edu> wrote:

> Dear All,
>
> Firstly, a little disclaimer. The question I have may seem to be rather
> off-topic on that list, but thanks to RDKit's users community I have
> learned
> a lot of things regarding chemoinformatics in general, so I was hoping to
> find some help on topic not directly related to RDKit itself.
>

Seems perfectly relevant to me; others can of course pipe up if they find
it distracting.


> I am trying to implement Lynch-Willett (LW) algorithm for automatic
> detection
> of reaction sites [1,2]. Briefly speaking, the algorithm removes, in a
> stepwise manner, substructures common to a reaction's reactant and product
> based on extended connectivity indices. My problem lies in understanding
> how
> algorithm's stopping condition can be reached if an intermediate reaction
> diagram contains common (for both reaction's sides) but disconnected
> fragments.
>
> For those interested, I am including the longer and more in-depth
> explanation
> of the LW algorithm and my problem below.
>
> LW algorithm is based on Morgan algorithm:
>
> 1) Initially, each atom is assigned an index i.e. an integer $V^{1}_{i}$
> derived from its type and bond pattern (I know, it's rather vague but I'm
> practically quoting the sources. Using original Morgan approach, node
> connectivity, will do.)
>
> 2) Higher order indices
> \[
>     V^{n}_{i} = 2 * V^{n - 1}_{i} + \sum_{k} V^{n-1}_{k}
> \]
> where summation is over all atoms adjacent to atom $i$, are calculated
> simultaneously for reactant and product until there is NO such pair for
> which $V^{n}_{r} = V^{n}_{p}$ ($r$ enumerates atoms of the reactant, $p$
> atoms of the product).
>
> 3) All atom pairs, for which $V^{n - 1}_{r} = V^{n - 1}_{p}$ are considered
> to be in centers of identical circular substructures of radius $(n - 2)$
> bonds. Atoms belonging to those substructures are deleted from the reactant
> and product.
>
> 4) The above process is repeated until all substructures common to the
> reactant and product are removed.
>
> The algorithm's idea seems to be relatively straightforward, but I am stuck
> on the first example from Ref. [1]. Authors consider a reaction which can
> be
> written as
>
>     CC(=O)CC(C)C(CC#N)C(=O)N>>CC(=O)CC(C)C(CC#N)C#N
>
> The goal is to identify the change C(=O)N>>C#N.
>
> After applying steps 1) through 3) for the first time we end up with an
> intermediate reaction diagram
>
>     CC#N.C(=O)N>>CC#N.C#N
>
> No problem here, my current implementation of LW algorithm arrives to this
> point as expected. But according to step 4), we should 'rinse and repeat'
> to
> eliminate the remaining common fragment 'CC#N'.
>
> And here is exactly my problem. I do not see how the algorithm's stopping
> condition can be reached if there are two identical, yet disconnected
> fragments. As I see it, whatever initial values indices we assign to the
> remaining atoms, the respective atoms on common fragment 'CC#N' will have
> *always* the same indices if we use the increment formula above. Thus,
> algorithm will enter an infinite loop, and actually does.
>

If you "rinse" by re-calculating the atom invariants *based on the new bond
patterns*, I think the algorithm should, during the second pass, identify
that the two CC#N fragments are identical and remove them from the diagram.
This should leave you with the intended result.

It may be that you're expressing step 2 incorrectly. There's a compact
explanation of the algorithm on page 564 of the review from Chen et. al
that has the key stopping point that you may be missing in step 6: if the
number of unique atom classes does not increase from one iteration to the
next, terminate the iterations.

-greg
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to