Re: [OpenBabel-Devel] Use of stack and heap for kekulize

My Th Thu, 07 Jun 2012 13:45:52 -0700

C , 2012-06-07 12:02 -0400, Geoffrey Hutchison rakstīja:
> > +    size_t idx1 = mol->GetBond(bond_idx)->GetBeginAtom()->GetIdx();
> > +    size_t idx2 = mol->GetBond(bond_idx)->GetEndAtom()->GetIdx();
> > 
> > Still crashed for default max stack size. I'm not sure if this kind of
> > optimization is useful at all since compiler should realize that bond,
> > atom1 and atom2 is used only to get atom indexes.
> 
> This is an obvious compiler optimization, and I'm not sure there's any
> benefit in terms of code. I have generally heard from compiler
> developers that this sort of "optimization" in code doesn't change
> anything.


Ok, I tried to use gdb to step trough the code and see if bond and atoms
are kept alive. For -O0 they are all around, for -O1 bond and atom2 gets
optimized out, but atom1 somehow survives till the next recursive call
(don't know what happens with it afterwards and I'm not sure it is
actually on the stack), for -O[23] they all seem to be optimized out,
tough for the moment their values are set, it is probably working trough
them to get idx1 and idx2 if I'm interpreting this correctly.

It looks like a cleaner approach for the stack variables and works the
same regardless of the optimization level used, but there is no real
benefit for it (unless compiler does something stupid when generating
code), tough it makes debug build a bit faster, so I'm still tempted to
keep it, unless I'm told that it hurts readability ;)

> 
> > +    // Keep previous states on heap to avoid stack overflows for large 
> > molecules
> > +    vector<int>* ppreviousState     = new vector<int>(atomState);      // 
> > Backup the atom states
> > +    vector<int>* ppreviousBondState = new vector<int>(bondState);      // 
> > ... and the bond states
> > 
> > And deleting them just before expandKekulize() returns.
> > 
> > Performance wise there is no significant difference between previous
> > version and this when looking at times required to convert 1bxr or
> > fullerene with babel. Since state vectors can be quite large, I think it
> > is appropriate to keep them in heap. An alternative could be to store
> > only differences in previous state, that could be a STL pair for bond
> > and the two atoms, which should be considerably smaller than full state
> > vectors.
> 
> Either approach sounds good to me. Naturally your change results in the
> smallest difference in code, which is probably useful for a 2.3.x
> release.

I'll have a look how intrusive the diff approach would look like, its
advantage would be overall better memory efficiency not just for the
stack.

I noticed that there are other functions in kekulize.cpp which are using
the same idiom of state vectors on stack, I'll give them the same
treatment.

> 
> > This solution allows to process a lot larger molecules as before, but
> > still somewhere we will hit the stack size limit because of the
> > recursive algorithm. Making smarter/iterative algorithm could allow to
> > work around this
> 
> My long term plan has been to totally rewrite the aromaticity and
> Kekulization code. Aromaticity can be determined by a path-restricted
> breadth-first-search (i.e., from any atom, do BFS and try to find an
> aromatic cycle back to yourself. Terminate if longer than, say 20
> atoms).
> 
> I have some papers which suggest Kekulization can be done using maximum
> matching graph algorithms (which are order n^3, rather than the current
> exhaustive strategy). e.g.
> http://sourceforge.net/mailarchive/message.php?msg_id=1619718
> 
> One paper for maximum matching suggests that it can be done via matrix
> inversion, which could be done via Eigen in parallel. ;-)
> 
> Cheers,
> -Geoff

Thanks, that is an interesting reference. It might be quite easy to
improve OB in this respect. It is something worth to look into sometime.
But it isn't a priority till we hit the next bug :)

My plan was to get to look at the chain perception, since it is a huge
performance penalty especially if reading from a format which explicitly
numbers residues (like GRO), somehow I think that there should be a
possibility for optimization using that info with graph theory or
otherwise. Also iirc chain perception was tightly coupled with bond
perception so it couldn't be switched off or called selectively, that
should be corrected. I noticed Avogadro list mentioned disabling chain
perception for now, maybe that has a similar reason.


Reinis


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Re: [OpenBabel-Devel] Use of stack and heap for kekulize

Reply via email to