On Wed, Mar 10, 2010 at 9:52 PM, David Bateman <dbate...@dbateman.org> wrote: > Jaroslav Hajek wrote: >> >> On Tue, Mar 9, 2010 at 11:14 PM, David Bateman <dbate...@dbateman.org> >> wrote: >> >>> >>> Jaroslav Hajek wrote: >>> >>>> >>>> I think I looked into those sources briefly (because I also noticed >>>> that it implements basic ops), but indeed there seems to be nothing >>>> magical (read: no assembler). Some operations we need (such as >>>> transpose-multiply or diagonal scaling) were not even there, I think. >>>> That doesn't mean it's a bad idea to defer some operations to CXSparse >>>> (as you said, it is a dependency already), but I don't expect big >>>> gains unless there are some better algorithms. >>>> >>>> I didn't mean to imply that the current built-in sparse implementation >>>> is somehow slow (although perhaps there is room for optimizations). >>>> Block sparse storage is simply bound to beat the plain sparse one when >>>> the blocks get large enough. >>>> It's just that I didn't expect 10x10 blocks to be enough, given the >>>> m-code implementation. >>>> >>>> >>>> >>> >>> Well its not hard to reimplement the cs_multiply mex file from CXSPARSE >>> as >>> octave code and I attach a cs_multiply oct-file to test... Running the >>> example >>> >>> n = [1e2, 5e2, 1e3, 1e3, 1e4, 1e4]; >>> d = [0.01, 0.01, 0.001, 0.01, 0.001, 0.005]; >>> for i = 1: length (n) >>> fprintf ("N = %i, d = %f\n", n(i), d(i)); >>> A = sprand (n(i), n(i), d(i)); >>> B = sprand (n(i), n(i), d(i)); >>> tic; C0 = A * B; toc >>> tic; C1 = mymultiply (A, B); toc >>> endfor >>> >>> I see the results >>> >>> N = 100, d = 0.010000 >>> Elapsed time is 0.000342958 seconds. >>> Elapsed time is 0.00414899 seconds. >>> N = 500, d = 0.010000 >>> Elapsed time is 0.00301409 seconds. >>> Elapsed time is 0.00264595 seconds. >>> N = 1000, d = 0.001000 >>> Elapsed time is 0.000446094 seconds. >>> Elapsed time is 0.000560952 seconds. >>> N = 1000, d = 0.010000 >>> Elapsed time is 0.0157239 seconds. >>> Elapsed time is 0.0209789 seconds. >>> N = 10000, d = 0.001000 >>> Elapsed time is 0.317249 seconds. >>> Elapsed time is 0.374798 seconds. >>> N = 10000, d = 0.005000 >>> Elapsed time is 5.38696 seconds. >>> Elapsed time is 9.66393 seconds. >>> >>> so it doesn't appear that using cs_multiply is an advantage >>> >>> D. >>> >> >> Hmm. One thing I don't understand: why the transposes? Isn't cxsparse >> using CSC just like Octave? >> > > The transpose is suppose to be relatively cheap and its an easy way to force > the rows to be ordered without having to force the multiply operator itself > to maintain the ordering. Other operations in Octave depend on the sparse > rows being ordered. The existing code instead doesn't transpose, but call > the Octave sort class to sparse matrices with many values and just > brute-forces the ordering if there are few elements per row. >
Yeah, I studied your code yesterday. The splitting trick seems neat. I'm just wondering whether octave_sort is best suited for the job. It is actually optimized for this kind of sorting, but maybe you could save something? I mean, we are merging here sorted sequences of known positions and lengths (whereas octave_sort needs to find them first). Perhaps we could just call std::inplace_merge after each "col" loop completes? In any case it would be neat to convert at least some of those macros in Sparse-op-defs.h to templates. > > Yeah, things like UMFPACK internally block the sparse matrices and treat the > sub-blocks with blas (ie dense kernels), so its a well known trick. Internally? Or can UMFPACK be forced to solve a block sparse matrix directly (i.e. using block structure you specify)? That would make it worth considering a real-life block sparse extension. > The > problem using this for an arbitrary CSC matrix is that figuring out the best > way to block the matrix for the dense kernels is probably going to take as > much time as the multiplication itself. > -- RNDr. Jaroslav Hajek, PhD computing expert & GNU Octave developer Aeronautical Research and Test Institute (VZLU) Prague, Czech Republic url: www.highegg.matfyz.cz ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Octave-dev mailing list Octave-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/octave-dev