Hi Chris, the copy-CTOR for compressed_matrix is now implemented: https://github.com/viennacl/viennacl-dev/commit/0d62d8e0fb9a3eefc37aa225b5eb7195256181c9
You should get the desired behavior of just updating numerical values on the GPU with code similar to the following: viennacl::context host_ctx(viennacl::MAIN_MEMORY); viennacl::compressed_matrix<T> A(N,N, host_ctx); //your 'host matrix' /* fill A here */ viennacl::compressed_matrix<T> B(A); //create copy of A viennacl::context gpu_ctx(viennacl::CUDA_MEMORY); B.switch_memory_context(gpu_ctx); //migrate B to CUDA memory // write to B, starting at offset 0, copy 'nnz' elements // use host data from nonzero floating point values of A viennacl::backend::memory_write(B.handle(), 0, sizeof(T) * A.nnz(), A.handle().ram_handle().get()); Just repeat the last line every time you need to update the numerical values on the GPU. Please let me know how this turns out. Best regards, Karli On 04/21/2017 09:06 PM, Chris Marsh wrote: > Karl, > > No problem, the copy-constructor sounds like a perfect solution. Thanks > for doing this. > > How big is your system? > > The sparse matrix is approx 10^10 with about 1 million total non-zero > elements. > > > 2.5min for 5 time steps sounds a lot to me. > > I should have been more clear, sorry. The 2.5min includes a bunch of > other routines that are being run for the timestep, so it is more than > just the matrix solve. However, that 12s is entirely attributable to the > difference between STL and the copy and the opterator() access. Also, > running on a single laptop core instead of a cluster like it should be! > > However, one still has to compare against the available column indices > > Makes sense. In my case, I think I can just say I need the 3rd, or 4th > non-zero row item as I "know" where things are. but that's a non-generic > case. > > Cheers > Chris > > > > On 21 April 2017 at 04:34, Karl Rupp <[email protected] > <mailto:[email protected]>> wrote: > > Hi Chris, > > please apologize my late reply. > > > This is a local search operation > > > Oh, that isn't at all what I expected. I assumed with the > row, col > offset it could just index the CSR array directly? > > > when you call operator(), you pass the row and column index. The row > index jumps at the beginning of nonzeros for that row in the CSR > array. However, one still has to compare against the available > column indices to finally pick the correct entry (or create a new > one...). Only for dense matrices you can locate the respective entry > in the matrix directly. > > > > By how much does your code slow down? > > > The "optimization"? Over 5 time steps or so it was 12 s > slower, out > of a total of 2.5min or so. So enough that when I run it for > 15000 > time steps it adds up! > > > So it's 10 percent. How big is your system? 2.5min for 5 time steps > sounds a lot to me. > > > Also, do you fill the CSR matrix by increasing row > index, or is > your code filling rows at random? > > > I'm filling the CSR via operator(), and that is by > increasing row > index. > > > Ok, this should be acceptable in terms of performance. > > > However, when it is run in parallel with openmp, it will > effectively be random. > > > In parallel you should really fill the CSR array directly (possibly > with the exception of the first time step, where you build the > sparsity pattern) > > > What are you trying to accomplish? > > > With a OpenMP backend, I want to avoid the copy from STL -> > compressed_matrix. So my idea is to pre-allocate A, a > compressed_matrix on the host, regardless of what backend > I'm using > (instead of the STL variant). Then I want to either solve > directly > using A, or I want to copy A to a GPU and solve it on the GPU if > configured. For the former, this is currently working well, > barring > the operator() issues we are discussing above. The problem > arises > with the 2nd case. I could do the context change, but once > it's been > copied to the GPU I have to copy it *back* to take advantage > of the > pre-allocated matrix. That is, I'd like to avoid any additional > memory allocations. I would like to just copy(A,gpu_A) when > gpu is > available. However, there is no copy for compressed_matrix to > comprssed_matrix. > > > Thanks, that helps me with understanding the setting better. Let me > add a copy-constructor for compressed_matrix for you, so you can > avoid the unnecessary copy back to the host. Copying the numerical > entries for a fixed sparsity pattern can be done efficiently; I'll > send you a code snippet when I'm done with the copy-constructor. > > Best regards, > Karli > > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ ViennaCL-support mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/viennacl-support
