Re: [OctDev] block sparse matrices

David Bateman Wed, 10 Mar 2010 12:46:37 -0800

Jaroslav Hajek wrote:
> On Tue, Mar 9, 2010 at 11:14 PM, David Bateman <dbate...@dbateman.org> wrote:
>   
>> Jaroslav Hajek wrote:
>>     
>>> I think I looked into those sources briefly (because I also noticed
>>> that it implements basic ops), but indeed there seems to be nothing
>>> magical (read: no assembler). Some operations we need (such as
>>> transpose-multiply or diagonal scaling) were not even there, I think.
>>> That doesn't mean it's a bad idea to defer some operations to CXSparse
>>> (as you said, it is a dependency already), but I don't expect big
>>> gains unless there are some better algorithms.
>>>
>>> I didn't mean to imply that the current built-in sparse implementation
>>> is somehow slow (although perhaps there is room for optimizations).
>>> Block sparse storage is simply bound to beat the plain sparse one when
>>> the blocks get large enough.
>>> It's just that I didn't expect 10x10 blocks to be enough, given the
>>> m-code implementation.
>>>
>>>
>>>       
>> Well its not hard to reimplement the cs_multiply mex file from CXSPARSE as
>> octave code and I attach a cs_multiply oct-file to test... Running the
>> example
>>
>> n = [1e2, 5e2, 1e3, 1e3, 1e4, 1e4];
>> d = [0.01, 0.01, 0.001, 0.01, 0.001, 0.005];
>> for i = 1: length (n)
>>  fprintf ("N = %i, d = %f\n", n(i), d(i));
>>  A = sprand (n(i), n(i), d(i));
>>  B = sprand (n(i), n(i), d(i));
>>  tic; C0 = A * B;  toc
>>  tic; C1 = mymultiply (A, B); toc
>> endfor
>>
>> I see the results
>>
>> N = 100, d = 0.010000
>> Elapsed time is 0.000342958 seconds.
>> Elapsed time is 0.00414899 seconds.
>> N = 500, d = 0.010000
>> Elapsed time is 0.00301409 seconds.
>> Elapsed time is 0.00264595 seconds.
>> N = 1000, d = 0.001000
>> Elapsed time is 0.000446094 seconds.
>> Elapsed time is 0.000560952 seconds.
>> N = 1000, d = 0.010000
>> Elapsed time is 0.0157239 seconds.
>> Elapsed time is 0.0209789 seconds.
>> N = 10000, d = 0.001000
>> Elapsed time is 0.317249 seconds.
>> Elapsed time is 0.374798 seconds.
>> N = 10000, d = 0.005000
>> Elapsed time is 5.38696 seconds.
>> Elapsed time is 9.66393 seconds.
>>
>> so it doesn't appear that using cs_multiply is an advantage
>>
>> D.
>>     
>
> Hmm. One thing I don't understand: why the transposes? Isn't cxsparse
> using CSC just like Octave?
>   
The transpose is suppose to be relatively cheap and its an easy way to 
force the rows to be ordered without having to force the multiply 
operator itself to maintain the ordering. Other operations in Octave 
depend on the sparse rows being ordered. The existing code instead 
doesn't transpose, but call the Octave sort class to sparse matrices 
with many values and just brute-forces the ordering if there are few 
elements per row.


> I completed the block sparse-sparse multiplication (it's quite simple,
> but hacky),
> so here's a benchmark (again 10x10 and bigger blocks):
>
>
>   
<snip>

> so it seems that block multiplication is really a big advantage (with
> larger blocks), so much that it even outweighs the extra work & memory
> needed by the m-code. It's not that surprising, considering that
> optimized BLAS also get their speed from blocked operations.
> Unfortunately; while it's relatively easy to split arbitrary dense
> matrix to blocks, it would be quite hard to do so with a general
> sparse (CSC) matrix. That's why a separate class is needed to take
> advantage of block structure.
>   
Yeah, things like UMFPACK internally block the sparse matrices and treat 
the sub-blocks with blas (ie dense kernels), so its a well known trick. 
The problem using this for an arbitrary CSC matrix is that figuring out 
the best way to block the matrix  for the dense kernels is probably 
going to take as much time as the multiplication itself.

D.


------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Octave-dev mailing list
Octave-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: [OctDev] block sparse matrices

Reply via email to