Re: [OpenBabel-Devel] Use of stack and heap for kekulize

My Th Tue, 12 Jun 2012 08:12:23 -0700

C , 2012-06-07 23:45 +0300, My Th rakstīja:
> C , 2012-06-07 12:02 -0400, Geoffrey Hutchison rakstīja:
> > > +    size_t idx1 = mol->GetBond(bond_idx)->GetBeginAtom()->GetIdx();
> > > +    size_t idx2 = mol->GetBond(bond_idx)->GetEndAtom()->GetIdx();
> > > 
> > > Still crashed for default max stack size. I'm not sure if this kind of
> > > optimization is useful at all since compiler should realize that bond,
> > > atom1 and atom2 is used only to get atom indexes.
> > 
> > This is an obvious compiler optimization, and I'm not sure there's any
> > benefit in terms of code. I have generally heard from compiler
> > developers that this sort of "optimization" in code doesn't change
> > anything.
> 
> Ok, I tried to use gdb to step trough the code and see if bond and atoms
> are kept alive. For -O0 they are all around, for -O1 bond and atom2 gets
> optimized out, but atom1 somehow survives till the next recursive call
> (don't know what happens with it afterwards and I'm not sure it is
> actually on the stack), for -O[23] they all seem to be optimized out,
> tough for the moment their values are set, it is probably working trough
> them to get idx1 and idx2 if I'm interpreting this correctly.
> 
> It looks like a cleaner approach for the stack variables and works the
> same regardless of the optimization level used, but there is no real
> benefit for it (unless compiler does something stupid when generating
> code), tough it makes debug build a bit faster, so I'm still tempted to
> keep it, unless I'm told that it hurts readability ;)
> 
> > 
> > > +    // Keep previous states on heap to avoid stack overflows for large 
> > > molecules
> > > +    vector<int>* ppreviousState     = new vector<int>(atomState);      
> > > // Backup the atom states
> > > +    vector<int>* ppreviousBondState = new vector<int>(bondState);      
> > > // ... and the bond states
> > > 
> > > And deleting them just before expandKekulize() returns.
> > > 
> > > Performance wise there is no significant difference between previous
> > > version and this when looking at times required to convert 1bxr or
> > > fullerene with babel. Since state vectors can be quite large, I think it
> > > is appropriate to keep them in heap. An alternative could be to store
> > > only differences in previous state, that could be a STL pair for bond
> > > and the two atoms, which should be considerably smaller than full state
> > > vectors.
> > 
> > Either approach sounds good to me. Naturally your change results in the
> > smallest difference in code, which is probably useful for a 2.3.x
> > release.
> 
> I'll have a look how intrusive the diff approach would look like, its
> advantage would be overall better memory efficiency not just for the
> stack.


I have compared different implementations using valgrind tool massif to
look at heap and stack memory usage. I had the following
implementations:
A - master
B - only the obvious optimization for idx1 and idx2
C - keep previous states on heap
D - track only differences between states using STL pairs, use heap
E - track only differences in state using struct, use heap

C-E are on top of B.

   heap (kB)  stack (kB)
-O0
A  56456      9376
B  55966      8658
C  54110      6504
D  51864      7940
E  50427      6504
-O2
A  56462      9376
B  56462      9376
C  54801      7222
D  51865      7940
E  51146      7222
-O3
A  56462      9376
B  56462      9376
C  54801      7222
D  52583      8658
E  51146      7222

For my computer default stack size limit was 8192 kB, so it is clear
which implementations still had an overflow with the test molecule 1bxr.
But even the best implementation here was really only marginally below
the limit. Looks like we are due for a rewrite here.

With no optimization there is a small benefit of B, but otherwise there
is no difference.

Turned out that using STL pairs (D) actually increased the stack usage
in comparison with just putting old state in heap (C), and it got worse
with higher optimization level. I guess STL pairs have more state to
track and there were three of them in contrast to just a single pointer.

The best option seems to be to keep difference between the old state in
a struct and put that struct in heap (E). This gives the same stack
space consumption as previous implementation (C) and almost 10% saved
heap space.

E is only marginally more intrusive than C, if there is no problem with
it I'm committing only this one, otherwise I can also commit the minimal
version if it is necessary for 2.3.x. This implementation goes as
follows:

+
+    // Store only differences in states between iterations in kekulization
+    struct stateDiff_t {
+      size_t idx1;
+      int ps1;
+      size_t idx2;
+      int ps2;
+      size_t idxb;
+      int psb;
+    };
+

     // Remember the current state so that we can backtrack if the attempt fails
-    vector<int> previousState         = atomState;     // Backup the atom 
states
-    vector<int> previousBondState     = bondState;     // ... and the bond 
states
+    stateDiff_t* pps = new stateDiff_t;
+    pps->idx1 = idx1;
+    pps->ps1  = atomState[idx1];
+    pps->idx2 = idx2;
+    pps->ps2  = atomState[idx2];
+    pps->idxb = bond_idx;
+    pps->psb  = bondState[bond_idx];

       // Recursively try the next bond
-      if (expandKekulize(mol, bond_idx + 1, atomState, bondState, timeout))
+      if (expandKekulize(mol, bond_idx + 1, atomState, bondState, timeout)) {
+        delete pps;
         return true;
+      }

       // If the double bond didn't work, roll back the changes and try a 
single bond.
-      atomState = previousState;
-      bondState = previousBondState;
+      atomState[pps->idx1] = pps->ps1;
+      atomState[pps->idx2] = pps->ps2;
+      bondState[pps->idxb] = pps->psb;

     // Double bond not allowed here, or double bond failed, just recurse with 
a single bond.
     if (DEBUG) {cout << "bond " << bond_idx << " (atoms " << idx1 << " to " << 
idx2 << ") single" << endl;}
-    if (expandKekulize(mol, bond_idx + 1, atomState, bondState, timeout))
+    if (expandKekulize(mol, bond_idx + 1, atomState, bondState, timeout)) {
+      delete pps;
       return true;
+    }

     // If it didn't work, roll back the changes we made and return failure.
     if (DEBUG) {cout << "bond " << bond_idx << " single failed, rolling back 
changes" << endl;}
-    atomState = previousState;
-    bondState = previousBondState;
+    atomState[pps->idx1] = pps->ps1;
+    atomState[pps->idx2] = pps->ps2;
+    bondState[pps->idxb] = pps->psb;
+    delete pps;
     return false;


Reinis


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Re: [OpenBabel-Devel] Use of stack and heap for kekulize

Reply via email to