On Wed, Mar 10, 2010 at 9:52 PM, David Bateman <dbate...@dbateman.org> wrote:
> Jaroslav Hajek wrote:
>>
>> On Tue, Mar 9, 2010 at 11:14 PM, David Bateman <dbate...@dbateman.org>
>> wrote:
>>
>>>
>>> Jaroslav Hajek wrote:
>>>
>>>>
>>>> I think I looked into those sources briefly (because I also noticed
>>>> that it implements basic ops), but indeed there seems to be nothing
>>>> magical (read: no assembler). Some operations we need (such as
>>>> transpose-multiply or diagonal scaling) were not even there, I think.
>>>> That doesn't mean it's a bad idea to defer some operations to CXSparse
>>>> (as you said, it is a dependency already), but I don't expect big
>>>> gains unless there are some better algorithms.
>>>>
>>>> I didn't mean to imply that the current built-in sparse implementation
>>>> is somehow slow (although perhaps there is room for optimizations).
>>>> Block sparse storage is simply bound to beat the plain sparse one when
>>>> the blocks get large enough.
>>>> It's just that I didn't expect 10x10 blocks to be enough, given the
>>>> m-code implementation.
>>>>
>>>>
>>>>
>>>
>>> Well its not hard to reimplement the cs_multiply mex file from CXSPARSE
>>> as
>>> octave code and I attach a cs_multiply oct-file to test... Running the
>>> example
>>>
>>> n = [1e2, 5e2, 1e3, 1e3, 1e4, 1e4];
>>> d = [0.01, 0.01, 0.001, 0.01, 0.001, 0.005];
>>> for i = 1: length (n)
>>>  fprintf ("N = %i, d = %f\n", n(i), d(i));
>>>  A = sprand (n(i), n(i), d(i));
>>>  B = sprand (n(i), n(i), d(i));
>>>  tic; C0 = A * B;  toc
>>>  tic; C1 = mymultiply (A, B); toc
>>> endfor
>>>
>>> I see the results
>>>
>>> N = 100, d = 0.010000
>>> Elapsed time is 0.000342958 seconds.
>>> Elapsed time is 0.00414899 seconds.
>>> N = 500, d = 0.010000
>>> Elapsed time is 0.00301409 seconds.
>>> Elapsed time is 0.00264595 seconds.
>>> N = 1000, d = 0.001000
>>> Elapsed time is 0.000446094 seconds.
>>> Elapsed time is 0.000560952 seconds.
>>> N = 1000, d = 0.010000
>>> Elapsed time is 0.0157239 seconds.
>>> Elapsed time is 0.0209789 seconds.
>>> N = 10000, d = 0.001000
>>> Elapsed time is 0.317249 seconds.
>>> Elapsed time is 0.374798 seconds.
>>> N = 10000, d = 0.005000
>>> Elapsed time is 5.38696 seconds.
>>> Elapsed time is 9.66393 seconds.
>>>
>>> so it doesn't appear that using cs_multiply is an advantage
>>>
>>> D.
>>>
>>
>> Hmm. One thing I don't understand: why the transposes? Isn't cxsparse
>> using CSC just like Octave?
>>
>
> The transpose is suppose to be relatively cheap and its an easy way to force
> the rows to be ordered without having to force the multiply operator itself
> to maintain the ordering. Other operations in Octave depend on the sparse
> rows being ordered. The existing code instead doesn't transpose, but call
> the Octave sort class to sparse matrices with many values and just
> brute-forces the ordering if there are few elements per row.
>

Yeah, I studied your code yesterday. The splitting trick seems neat.
I'm just wondering whether
octave_sort is best suited for the job. It is actually optimized for
this kind of sorting, but maybe you could save something?
I mean, we are merging here sorted sequences of known positions and
lengths (whereas octave_sort needs to find them first).
Perhaps we could just call std::inplace_merge after each "col" loop completes?
In any case it would be neat to convert at least some of those macros
in Sparse-op-defs.h to templates.


>
> Yeah, things like UMFPACK internally block the sparse matrices and treat the
> sub-blocks with blas (ie dense kernels), so its a well known trick.

Internally? Or can UMFPACK be forced to solve a block sparse matrix
directly (i.e. using block structure you specify)? That would make it
worth considering a real-life block sparse extension.


> The
> problem using this for an arbitrary CSC matrix is that figuring out the best
> way to block the matrix  for the dense kernels is probably going to take as
> much time as the multiplication itself.
>


-- 
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Octave-dev mailing list
Octave-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/octave-dev

Reply via email to