C , 2012-06-07 23:45 +0300, My Th rakstīja: > C , 2012-06-07 12:02 -0400, Geoffrey Hutchison rakstīja: > > > + size_t idx1 = mol->GetBond(bond_idx)->GetBeginAtom()->GetIdx(); > > > + size_t idx2 = mol->GetBond(bond_idx)->GetEndAtom()->GetIdx(); > > > > > > Still crashed for default max stack size. I'm not sure if this kind of > > > optimization is useful at all since compiler should realize that bond, > > > atom1 and atom2 is used only to get atom indexes. > > > > This is an obvious compiler optimization, and I'm not sure there's any > > benefit in terms of code. I have generally heard from compiler > > developers that this sort of "optimization" in code doesn't change > > anything. > > Ok, I tried to use gdb to step trough the code and see if bond and atoms > are kept alive. For -O0 they are all around, for -O1 bond and atom2 gets > optimized out, but atom1 somehow survives till the next recursive call > (don't know what happens with it afterwards and I'm not sure it is > actually on the stack), for -O[23] they all seem to be optimized out, > tough for the moment their values are set, it is probably working trough > them to get idx1 and idx2 if I'm interpreting this correctly. > > It looks like a cleaner approach for the stack variables and works the > same regardless of the optimization level used, but there is no real > benefit for it (unless compiler does something stupid when generating > code), tough it makes debug build a bit faster, so I'm still tempted to > keep it, unless I'm told that it hurts readability ;) > > > > > > + // Keep previous states on heap to avoid stack overflows for large > > > molecules > > > + vector<int>* ppreviousState = new vector<int>(atomState); > > > // Backup the atom states > > > + vector<int>* ppreviousBondState = new vector<int>(bondState); > > > // ... and the bond states > > > > > > And deleting them just before expandKekulize() returns. > > > > > > Performance wise there is no significant difference between previous > > > version and this when looking at times required to convert 1bxr or > > > fullerene with babel. Since state vectors can be quite large, I think it > > > is appropriate to keep them in heap. An alternative could be to store > > > only differences in previous state, that could be a STL pair for bond > > > and the two atoms, which should be considerably smaller than full state > > > vectors. > > > > Either approach sounds good to me. Naturally your change results in the > > smallest difference in code, which is probably useful for a 2.3.x > > release. > > I'll have a look how intrusive the diff approach would look like, its > advantage would be overall better memory efficiency not just for the > stack.
I have compared different implementations using valgrind tool massif to look at heap and stack memory usage. I had the following implementations: A - master B - only the obvious optimization for idx1 and idx2 C - keep previous states on heap D - track only differences between states using STL pairs, use heap E - track only differences in state using struct, use heap C-E are on top of B. heap (kB) stack (kB) -O0 A 56456 9376 B 55966 8658 C 54110 6504 D 51864 7940 E 50427 6504 -O2 A 56462 9376 B 56462 9376 C 54801 7222 D 51865 7940 E 51146 7222 -O3 A 56462 9376 B 56462 9376 C 54801 7222 D 52583 8658 E 51146 7222 For my computer default stack size limit was 8192 kB, so it is clear which implementations still had an overflow with the test molecule 1bxr. But even the best implementation here was really only marginally below the limit. Looks like we are due for a rewrite here. With no optimization there is a small benefit of B, but otherwise there is no difference. Turned out that using STL pairs (D) actually increased the stack usage in comparison with just putting old state in heap (C), and it got worse with higher optimization level. I guess STL pairs have more state to track and there were three of them in contrast to just a single pointer. The best option seems to be to keep difference between the old state in a struct and put that struct in heap (E). This gives the same stack space consumption as previous implementation (C) and almost 10% saved heap space. E is only marginally more intrusive than C, if there is no problem with it I'm committing only this one, otherwise I can also commit the minimal version if it is necessary for 2.3.x. This implementation goes as follows: + + // Store only differences in states between iterations in kekulization + struct stateDiff_t { + size_t idx1; + int ps1; + size_t idx2; + int ps2; + size_t idxb; + int psb; + }; + // Remember the current state so that we can backtrack if the attempt fails - vector<int> previousState = atomState; // Backup the atom states - vector<int> previousBondState = bondState; // ... and the bond states + stateDiff_t* pps = new stateDiff_t; + pps->idx1 = idx1; + pps->ps1 = atomState[idx1]; + pps->idx2 = idx2; + pps->ps2 = atomState[idx2]; + pps->idxb = bond_idx; + pps->psb = bondState[bond_idx]; // Recursively try the next bond - if (expandKekulize(mol, bond_idx + 1, atomState, bondState, timeout)) + if (expandKekulize(mol, bond_idx + 1, atomState, bondState, timeout)) { + delete pps; return true; + } // If the double bond didn't work, roll back the changes and try a single bond. - atomState = previousState; - bondState = previousBondState; + atomState[pps->idx1] = pps->ps1; + atomState[pps->idx2] = pps->ps2; + bondState[pps->idxb] = pps->psb; // Double bond not allowed here, or double bond failed, just recurse with a single bond. if (DEBUG) {cout << "bond " << bond_idx << " (atoms " << idx1 << " to " << idx2 << ") single" << endl;} - if (expandKekulize(mol, bond_idx + 1, atomState, bondState, timeout)) + if (expandKekulize(mol, bond_idx + 1, atomState, bondState, timeout)) { + delete pps; return true; + } // If it didn't work, roll back the changes we made and return failure. if (DEBUG) {cout << "bond " << bond_idx << " single failed, rolling back changes" << endl;} - atomState = previousState; - bondState = previousBondState; + atomState[pps->idx1] = pps->ps1; + atomState[pps->idx2] = pps->ps2; + bondState[pps->idxb] = pps->psb; + delete pps; return false; Reinis ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ OpenBabel-Devel mailing list OpenBabel-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-devel