Re: [OctDev] block sparse matrices

Jaroslav Hajek Wed, 10 Mar 2010 00:38:10 -0800

On Tue, Mar 9, 2010 at 11:14 PM, David Bateman <dbate...@dbateman.org> wrote:
> Jaroslav Hajek wrote:
>>
>> I think I looked into those sources briefly (because I also noticed
>> that it implements basic ops), but indeed there seems to be nothing
>> magical (read: no assembler). Some operations we need (such as
>> transpose-multiply or diagonal scaling) were not even there, I think.
>> That doesn't mean it's a bad idea to defer some operations to CXSparse
>> (as you said, it is a dependency already), but I don't expect big
>> gains unless there are some better algorithms.
>>
>> I didn't mean to imply that the current built-in sparse implementation
>> is somehow slow (although perhaps there is room for optimizations).
>> Block sparse storage is simply bound to beat the plain sparse one when
>> the blocks get large enough.
>> It's just that I didn't expect 10x10 blocks to be enough, given the
>> m-code implementation.
>>
>>
>
> Well its not hard to reimplement the cs_multiply mex file from CXSPARSE as
> octave code and I attach a cs_multiply oct-file to test... Running the
> example
>
> n = [1e2, 5e2, 1e3, 1e3, 1e4, 1e4];
> d = [0.01, 0.01, 0.001, 0.01, 0.001, 0.005];
> for i = 1: length (n)
>  fprintf ("N = %i, d = %f\n", n(i), d(i));
>  A = sprand (n(i), n(i), d(i));
>  B = sprand (n(i), n(i), d(i));
>  tic; C0 = A * B;  toc
>  tic; C1 = mymultiply (A, B); toc
> endfor
>
> I see the results
>
> N = 100, d = 0.010000
> Elapsed time is 0.000342958 seconds.
> Elapsed time is 0.00414899 seconds.
> N = 500, d = 0.010000
> Elapsed time is 0.00301409 seconds.
> Elapsed time is 0.00264595 seconds.
> N = 1000, d = 0.001000
> Elapsed time is 0.000446094 seconds.
> Elapsed time is 0.000560952 seconds.
> N = 1000, d = 0.010000
> Elapsed time is 0.0157239 seconds.
> Elapsed time is 0.0209789 seconds.
> N = 10000, d = 0.001000
> Elapsed time is 0.317249 seconds.
> Elapsed time is 0.374798 seconds.
> N = 10000, d = 0.005000
> Elapsed time is 5.38696 seconds.
> Elapsed time is 9.66393 seconds.
>
> so it doesn't appear that using cs_multiply is an advantage
>
> D.


Hmm. One thing I don't understand: why the transposes? Isn't cxsparse
using CSC just like Octave?

I completed the block sparse-sparse multiplication (it's quite simple,
but hacky),
so here's a benchmark (again 10x10 and bigger blocks):

nb = 10; n = 1000;
[i, j] = find (sprand (n, n, 3/n) + speye (n));
a = rand (nb, nb, length (i));
s1 = blksparse (i, j, a, n, n);

[i, j] = find (sprand (n, n, 3/n) + speye (n));
a = rand (nb, nb, length (i));
s2 = blksparse (i, j, a, n, n);

tic;
p1 = s1 * s2;
toc

s1 = sparse (s1);
s2 = sparse (s2);
tic;
p2 = s1 * s2;
toc

norm (sparse(p1) - p2, "fro")

with nb = 10, n = 1000:

Elapsed time is 0.0700891 seconds.
Elapsed time is 0.154243 seconds.
ans =  1.1327e-13

nb = 10, n = 100

Elapsed time is 0.005583 seconds.
Elapsed time is 0.01352 seconds.
ans =  8.7034e-14

nb = 50, n = 100

Elapsed time is 0.1525 seconds.
Elapsed time is 1.456 seconds.
ans =  3.9378e-12

nb = 20, n = 1000

Elapsed time is 0.2465 seconds.
Elapsed time is 1.007 seconds.
ans =  3.9142e-13

so it seems that block multiplication is really a big advantage (with
larger blocks), so much that it even outweighs the extra work & memory
needed by the m-code. It's not that surprising, considering that
optimized BLAS also get their speed from blocked operations.
Unfortunately; while it's relatively easy to split arbitrary dense
matrix to blocks, it would be quite hard to do so with a general
sparse (CSC) matrix. That's why a separate class is needed to take
advantage of block structure.

regards

-- 
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Octave-dev mailing list
Octave-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: [OctDev] block sparse matrices

Reply via email to