Re: [OctDev] Question on performance, coding style and competitive software

Jaroslav Hajek Wed, 22 Apr 2009 01:17:18 -0700

On Wed, Apr 22, 2009 at 12:31 AM, David Bateman <dbate...@dbateman.org> wrote:
> Alois Schlögl wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>>
>> As some of you might know, my other pet project besides Octave, is
>> BioSig http://biosig.sf.net. BioSig is designed in such a way that it
>> can be used with both, Matlab and Octave. Mostly for performance reason,
>> we cannot abandon support for Matlab [1,2]. Octave is a viable
>> alternative in case the computational performance is not important. In
>> order to decide on the future strategy of BioSig, I hope to get answers
>> on the following questions:
>>
>> 1) Core development of Octave:
>> At the meeting of Octave developer in 2006, the issue was raised
>> that the Octave is about 4 to 5 times slower than Matlab [1]. (I
>> repeated the tests just recently, the results are attached below, and
>> show a difference of factors up to 13, average ~5) This issue is most
>> relevant for large computational jobs, were it makes a difference
>> whether a specific task takes 1 day or 5 days. Is anyone working to
>> address this problem? Is there any hope that the performance penalty
>> becomes smaller or will go away within a reasonable amount of time ?
>>
> Its hard to tell what the source of your speed issues are.. The flippant
> response would be that with a JIT in octave then yes we could be as
> fast, we just need someone to write it. I suspect something will be done
> here in the future. The recent changes of John to have an evaluator
> class and his statement of adding a profiler in Octave 3.4 mean that the
> machinery needed to add a JIT will be in place.
>
> However looking at your wackerman its not clear to me that its your
> for-loop that is taking all of the time in Octave. If it is have you
> considered rewriting
>
> for k = 1:size(m2,1),
>  if all(finite(m2(k,:))),
>    L = eig(reshape(m2(k,:), [K,K]));
>    L = L./sum(L);
>    if all(L>0)
>      OMEGA(k) = -sum(L.*log(L));
>    end;
>  end;
> end;
>
> with something like
>
> rows_m2 = size(m2, 1);
> m3 = permute (reshape (m2, [rows_m2, K, K]), [2, 3, 1]);
> idx = all (finite (m2), 1);
> t = cellfun (@(x) eig(x), mat2cell (m3 (:, :, idx), K, K, ones(1, rows_m2)),
>             'UniformOutput', false);
> t = cellfun (@(x) - sum (x .* log (x)),
>        cellfun (@(x) x ./ sum(x), 'UniformOutput', false));
> t(iscomplex(t)) = NaN;
> OMEGA(idx) = t;
>
> The code above is of course untested. But in the latest tip that should
> be much faster for Octave as Jaroslav optimized cellfun recently
>


This is indeed a good shot. I removed typos and tweaked it a little
more to produce the following:

OMEGA = repmat(NaN,size(m0));
rows_m2 = size(m2, 1);
m3 = permute (reshape (m2, [rows_m2, K, K]), [2, 3, 1]);
idx = all (finite (m2), 2);
t = cellfun (@eig, mat2cell (m3 (:, :, idx), K, K, ones(1, sum(idx))),
            'UniformOutput', false);
t = [t{:}];
t = t / diag (sum (t));
t = sum (t .* log (t));
t(iscomplex(t)) = NaN;
OMEGA(idx) = t;

Trying only the wackermann part of Alois's benchmark, this cuts down
the time on my computer
from 76.1s down to 12.3s.

The relatively big win here comes from the fact that my code avoids
involvement of the interpreter in any loops.
Note that I use @eig rather than your @(x) eig(x), which makes quite a
difference (about 60%).

For comparison, trying your original code, with some bugs fixed, I get
43.11 seconds. So it's better than the original, but still much worse
than my version.

The rule of thumb is simple: the slow part of Octave is primarily the
interpreter. Partly because it does not JIT-compile, and also because
so far it was probably written primarily with functionality in mind.
I think that the inner algorithms are quite on par with Matlab (at
least with the same external libraries), some are faster, some slower
(admittedly, the second group is probably bigger).

In general, you must not expect too much from the cellfun
optimization. What I effectively did was removing some redundant
copying and conjuring up a faster (but tricky) approach to collect the
results when UniformOutput=true, so maybe those optimizations didn't
even count here.

In fact, I think this is where JIT would best start: optimizing
function handles passed into cellfun.
At the point of the call, cellfun is very likely to have all type
information it needs, so it could well try to compile the function
rather than repeatedly interpreting it.

To be honest, I'd consider it super cool if gcc provided some support
for JIT compiling, so that we could just translate Octave code to C++
at runtime and then run it. That would be IMHO by far the most
powerful, allowing us probably to compile code in almost unlimited and
very flexible manner.

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
Octave-dev mailing list
Octave-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: [OctDev] Question on performance, coding style and competitive software

Reply via email to