On Thu, Jun 7, 2012 at 8:29 AM, My Th <rei4...@gmail.com> wrote:

> Hi!
>
> I was looking at bug #3508648 and it turns out that it segfaults in
> expandKekulize() due to stack overflow. The routine in question uses
> recursive algorithm to assign bond orders for aromatic systems. For
> large molecules (1bxr has almost 50k atoms) it means that the stack can
> grow quite a bit because of the recursive function calls and local
> variables.
>
> At first I started by eliminating some of the local variables which are
> not actually used for anything but getting indexes for atoms in the
> bond:
>     // Get the bond and its atoms
> -    OBBond *bond = mol->GetBond(bond_idx);
> -    OBAtom *atom1 = bond->GetBeginAtom();
> -    OBAtom *atom2 = bond->GetEndAtom();
> -    int idx1 = atom1->GetIdx();
> -    int idx2 = atom2->GetIdx();
> +    size_t idx1 = mol->GetBond(bond_idx)->GetBeginAtom()->GetIdx();
> +    size_t idx2 = mol->GetBond(bond_idx)->GetEndAtom()->GetIdx();
>
> Still crashed for default max stack size. I'm not sure if this kind of
> optimization is useful at all since compiler should realize that bond,
> atom1 and atom2 is used only to get atom indexes. On the other hand we
> are asking to create variables on the stack and once they are there,
> they stay around until they go out of the scope (which is basically at
> the end of kekulization). I'm not sure which of these is true.
>
> Any comments on this?
>

Every decent optimizer since about 1985 should eliminate these variables.
I'd be surprised if rearranging the code as illustrated made any difference
at all.


> Now I got to the real culprits of this issue - previousState and
> previousBondState. They are vectors of integers holding the state of
> each atom and bond in molecule at the beginning of each iteration.
> Putting them on heap allows to process 1bxr.pdb without increasing stack
> size limits.
>
>     // Remember the current state so that we can backtrack if the attempt
> fails
> -    vector<int> previousState         = atomState;     // Backup the atom
> states
> -    vector<int> previousBondState     = bondState;     // ... and the
> bond states
> +    // Keep previous states on heap to avoid stack overflows for large
> molecules
> +    vector<int>* ppreviousState     = new vector<int>(atomState);      //
> Backup the atom states
> +    vector<int>* ppreviousBondState = new vector<int>(bondState);      //
> ... and the bond states
>
> And deleting them just before expandKekulize() returns.
>

This is probably helpful, but we should remember that it's a short-term
Band-Aid.  The fundamental problem is still there, just waiting for an
even-larger molecule or a computer with less memory (as you mention below).

Aromaticity detection shouldn't be this hard. Most of the atoms in a
50,000-atom molecule are probably not even in rings.  And furthermore,
aromaticity detection only has to consider "ring groups" - groups of rings
that share at least one bond.  Once you divide the molecule up into ring
groups, you almost never have to consider more than a few dozen atoms at a
time.  If someone needs to deal with 50,000-atom sheets of graphite,
OpenBabel is probably not the right tool.

Craig


> Performance wise there is no significant difference between previous
> version and this when looking at times required to convert 1bxr or
> fullerene with babel. Since state vectors can be quite large, I think it
> is appropriate to keep them in heap. An alternative could be to store
> only differences in previous state, that could be a STL pair for bond
> and the two atoms, which should be considerably smaller than full state
> vectors.
>
> This solution allows to process a lot larger molecules as before, but
> still somewhere we will hit the stack size limit because of the
> recursive algorithm. Making smarter/iterative algorithm could allow to
> work around this (or just raising the stack size limit). But I'm not
> sure if this would be useful to anyone at the moment.
>
> I don't know of a way to get the stack size limit from C++ program, but
> that would allow for a meaningful error message instead of just
> segfault.
>
> Any suggestions, comments? I'm committing this as is if there are no
> objections.
>
>
> Reinis
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> OpenBabel-Devel mailing list
> OpenBabel-Devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/openbabel-devel
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to