On Tue, Mar 9, 2010 at 6:47 PM, David Bateman <dbate...@dbateman.org> wrote: > Jaroslav Hajek wrote: >> >> hi all, >> >> I contributed a toy implementation of block sparse matrices to the >> linear-algebra package (also attached). >> This is probably not meant for serious business, as all the code is >> m-files only, but is rather a demonstration of what can now be done in >> Octave. >> >> Basic operations (+, -, *, ', .') are supported. No division, of course. >> >> Curiously enough, despite the m-file implementation, already for a >> 1000x1000 matrix with 10x10 blocks and 4 blocks per row >> (corresponds to a FE matrix of 1000-element conforming triangular grid >> with 3-rd order discontinuous polynomial elements), >> this toy implementation can outperform the built-in sparse matrices: >> >> nb = 10; n = 1000; nrhs = 1; >> [i, j] = find (sprand (n, n, 3/n) + speye (n)); >> a = rand (nb, nb, length (i)); >> s = blksparse (i, j, a, n, n); >> >> v = rand (nb*n, nrhs); >> >> tic; >> for it = 1:100/nrhs >> s * v; >> endfor >> toc >> >> s = sparse (s); >> tic; >> for it = 1:100/nrhs >> s * v; >> endfor >> toc >> >> On my computer (Core 2 Duo, g++ -O3 -march=native), I get: >> >> octave:1> ttbs >> Elapsed time is 0.180152 seconds. >> Elapsed time is 0.344266 seconds. >> >> with nrhs=10 (thus reducing interpreter overhead and improving cache >> locality for blksparse), it gets even better: >> >> octave:3> ttbs >> Elapsed time is 0.0802341 seconds. >> Elapsed time is 0.34509 seconds. >> >> Of course, as division is unsupported, the usefulness of this >> implementation is quite limited, but still it can be used e.g. for >> iterative solvers. >> >> Comments? Suggestions? >> Note that this code requires Octave tip as it relies on certain recent >> additions (blkmm, accumdim). >> >> > > Well, I'd wondered about revisiting the * sparse operator in Octave to see > if the CXSPARSE code that matlab now uses is faster. We already have a > dependency on CXSPARSE and so doing this might be a benefit without the cost > of a new dependency. > > D. > >
I think I looked into those sources briefly (because I also noticed that it implements basic ops), but indeed there seems to be nothing magical (read: no assembler). Some operations we need (such as transpose-multiply or diagonal scaling) were not even there, I think. That doesn't mean it's a bad idea to defer some operations to CXSparse (as you said, it is a dependency already), but I don't expect big gains unless there are some better algorithms. I didn't mean to imply that the current built-in sparse implementation is somehow slow (although perhaps there is room for optimizations). Block sparse storage is simply bound to beat the plain sparse one when the blocks get large enough. It's just that I didn't expect 10x10 blocks to be enough, given the m-code implementation. -- RNDr. Jaroslav Hajek, PhD computing expert & GNU Octave developer Aeronautical Research and Test Institute (VZLU) Prague, Czech Republic url: www.highegg.matfyz.cz ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Octave-dev mailing list Octave-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/octave-dev