Hi,

On Mon, Jul 2, 2012 at 10:09 PM, ovalerio <omar.valerio-min...@ichec.ie> wrote:
>
>> See test/mmff94validate.cpp and the comments at the top. The easiest
>> to grab is:
>>
>>     MMFF94_dative.mol2
>>     MMFF94_opti.log
>
>
> Okay I will definitely have a look at that one. Thanks.
>
>>
>> surprising. I'll be curious what you find. (You can also use
>> obminimize on _dative.mol2 as well if you want a *lot* of profiling.)
>>
>
> Yes. I want to have a better idea of where is the load in terms of CPU
> usage.  First I tried to use gprof, since its the tool I am most familiar
> with, but I realize that gprof doesn't work for profiling applications that
> make use of shared libraries.

Most of the time is spent on calculating the non-bonded interactions
since these scale n^2 where n is the number of atoms. The
electrostatic interaction is much simpler to compute so the VDW
calculation is most CPU intensive.

> So I switched to OProfile.  I am using CPU_CLK_UNHALTED events to generate
> statistics. The following sample output shows the VDW energy term together
> with VectorDivide, --Substract, --Length, and floating point operations are
> the most frequently issued. This when running obminize against
> forcefields.sdf
>
> CPU: Intel Architectural Perfmon, speed 1600 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
> mask of 0x00 (No unit mask) count 12000
> samples  %        image name               symbol name
> -------------------------------------------------------------------------------
> 13316     6.9366  libopenbabel.so.4.0.0
> OpenBabel::OBForceField::VectorDivide(double*, double, double*)
> 12735     6.6339  plugin_forcefields.so    void
> OpenBabel::OBFFVDWCalculationMMFF94::Compute<false>()
> 11612     6.0489  libopenbabel.so.4.0.0
> OpenBabel::OBForceField::VectorSubtract(double*, double*, double*)
> 10354     5.3936  libopenbabel.so.4.0.0
> OpenBabel::OBForceField::VectorLength(double*)
> 7815      4.0710  libm-2.11.1.so           cos
> 7012      3.6527  libm-2.11.1.so           __ieee754_sqrt
> 5105      2.6593  libm-2.11.1.so           __ieee754_acos
> 4262      2.2202  libm-2.11.1.so           __ieee754_atan2
> 3571      1.8602  libopenbabel.so.4.0.0
> OpenBabel::OBForceField::VectorDot(double*, double*)
> 3317      1.7279  libm-2.11.1.so           sqrt
> 3314      1.7263  plugin_forcefields.so
> OpenBabel::OBForceField::VectorDistance(double*, double*)
> 3087      1.6081  libopenbabel.so.4.0.0
> OpenBabel::OBForceField::VectorCross(double*, double*, double*)
> 3030      1.5784  libopenbabel.so.4.0.0
> vector<OpenBabel::OBBond*>::iterator::__normal_iterator(OpenBabel::OBBond**
> const&)
> 2749      1.4320  plugin_forcefields.so    void
> OpenBabel::OBFFVDWCalculationMMFF94::Compute<true>()
>
> -- truncated output ---

This is in accordance with what I said above. However if you really
want to improve performance a better approach would be to implement
cut-offs using a neighbour list. (There are cut-offs in OBForceField
but iteration is still done over all n^2 atom pairs). Using a cut-off
with neighbour lists can reduce the O(n^2) to O(n*log(n)) which makes
a huge difference, more than you could ever achieve with a profiler
and by optimising the code for the same calculations. Since VDW
interactions drop to 0 (e.g. 7-8 A) in a shorter distance than
electrostatics (e.g. 10-15 A), separate neighbour lists could be used
(or the same one and simply ignore the longer range interactions for
VDW. Updating the neighbour list can be done in O(n) time which should
be done every amount of (minimising) steps (e.g. 10-20). There is
already a NeighborList class in Avogadro that could be reused for this
purpose. It even supports periodic boundary conditions (see
libavogadro/src/neighborlist.[h/cpp] found at
avogadro.openmolecules.net).

Reference for algorithm for NeighborList class:
Mattson, W.; B. M. Rice (1999). "Near-neighbor calculations using a
modified cell-linked list method". Computer Physics Communications,
119: 135.
http://dx.doi.org/10.1016/S0010-4655%2898%2900203-3

If you care to use large molecules: Currently OBForceField::Setup
pre-calculates Calculation objects for all n^2 atom pairs. A
OBFFVDWCalculationMMFF94 uses 228 bytes and a
OBFFElectrostaticCalculationMMFF94 uses 140 bytes. When there 3000
atoms, 3000*3000*(228+140) = 3.3 GB which easily throws an out of
memory exception if you only have 4 GB of RAM. For 5000 atoms, 9.2 GB
is needed! (this assumes you use 64-bit and sizeof(double)=8 and
sizeof(int)=4)

> Martin handle me a bigger file of molecules, that I will profile but this
> time using obenergy,  in order to have a better idea of how the terms from
> MMFF94 are making use of the processor.  And I want also to profile
> obconformer using small/medium sized molecules.
>
>
>> I think Tim also started on porting the forcefield.cpp code over to
>> use the Eigen template matrix/vector library:
>> https://github.com/timvdm/OBForceField/tree/master/src
>>
>
> I have to have a look at this too

I started this a while ago but haven't kept it in sync with OB trunk
so if this is used you should make sure to also check the current SVN
trunk code to get bug fixes and other improvemts. It is more modular
though. The force field class should only be concerned with computing
the energy and gradients and provide an API for accessing them.
minimisation etc. should be placed in different classes.

Cheers,
Tim

> Cheers,
> --
> Omar V.M.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to