Karl,
No problem, the copy-constructor sounds like a perfect solution. Thanks for
doing this.
How big is your system?
The sparse matrix is approx 10^10 with about 1 million total non-zero
elements.
> 2.5min for 5 time steps sounds a lot to me.
I should have been more clear, sorry. The 2.5min includes a bunch of other
routines that are being run for the timestep, so it is more than just the
matrix solve. However, that 12s is entirely attributable to the difference
between STL and the copy and the opterator() access. Also, running on a
single laptop core instead of a cluster like it should be!
However, one still has to compare against the available column indices
Makes sense. In my case, I think I can just say I need the 3rd, or 4th
non-zero row item as I "know" where things are. but that's a non-generic
case.
Cheers
Chris
On 21 April 2017 at 04:34, Karl Rupp <[email protected]> wrote:
> Hi Chris,
>
> please apologize my late reply.
>
>
> This is a local search operation
>>
>>
>> Oh, that isn't at all what I expected. I assumed with the row, col
>> offset it could just index the CSR array directly?
>>
>
> when you call operator(), you pass the row and column index. The row index
> jumps at the beginning of nonzeros for that row in the CSR array. However,
> one still has to compare against the available column indices to finally
> pick the correct entry (or create a new one...). Only for dense matrices
> you can locate the respective entry in the matrix directly.
>
>
>
> By how much does your code slow down?
>>
>>
>> The "optimization"? Over 5 time steps or so it was 12 s slower, out
>> of a total of 2.5min or so. So enough that when I run it for 15000
>> time steps it adds up!
>>
>
> So it's 10 percent. How big is your system? 2.5min for 5 time steps sounds
> a lot to me.
>
>
> Also, do you fill the CSR matrix by increasing row index, or is
>> your code filling rows at random?
>>
>>
>> I'm filling the CSR via operator(), and that is by increasing row
>> index.
>>
>
> Ok, this should be acceptable in terms of performance.
>
>
> However, when it is run in parallel with openmp, it will
>> effectively be random.
>>
>
> In parallel you should really fill the CSR array directly (possibly with
> the exception of the first time step, where you build the sparsity pattern)
>
>
> What are you trying to accomplish?
>>
>>
>> With a OpenMP backend, I want to avoid the copy from STL ->
>> compressed_matrix. So my idea is to pre-allocate A, a
>> compressed_matrix on the host, regardless of what backend I'm using
>> (instead of the STL variant). Then I want to either solve directly
>> using A, or I want to copy A to a GPU and solve it on the GPU if
>> configured. For the former, this is currently working well, barring
>> the operator() issues we are discussing above. The problem arises
>> with the 2nd case. I could do the context change, but once it's been
>> copied to the GPU I have to copy it *back* to take advantage of the
>> pre-allocated matrix. That is, I'd like to avoid any additional
>> memory allocations. I would like to just copy(A,gpu_A) when gpu is
>> available. However, there is no copy for compressed_matrix to
>> comprssed_matrix.
>>
>
> Thanks, that helps me with understanding the setting better. Let me add a
> copy-constructor for compressed_matrix for you, so you can avoid the
> unnecessary copy back to the host. Copying the numerical entries for a
> fixed sparsity pattern can be done efficiently; I'll send you a code
> snippet when I'm done with the copy-constructor.
>
> Best regards,
> Karli
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
ViennaCL-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/viennacl-support