Hi Karl, I was wondering if you had any thoughts on how I should proceed
with the copy? Cheers Chris Lewis
On Wed, 12 Apr 2017 at 18:46 Chris Marsh <[email protected]> wrote:
Hi Karl,
>
> This is a local search operation
>
>
> Oh, that isn't at all what I expected. I assumed with the row, col offset
> it could just index the CSR array directly?
>
> By how much does your code slow down?
>
>
> The "optimization"? Over 5 time steps or so it was 12 s slower, out of a
> total of 2.5min or so. So enough that when I run it for 15000 time steps it
> adds up!
>
> Also, do you fill the CSR matrix by increasing row index, or is your code
>> filling rows at random?
>
>
> I'm filling the CSR via operator(), and that is by increasing row index.
> However, when it is run in parallel with openmp, it will effectively be
> random.
>
> What are you trying to accomplish?
>
>
> With a OpenMP backend, I want to avoid the copy from STL ->
> compressed_matrix. So my idea is to pre-allocate A, a compressed_matrix on
> the host, regardless of what backend I'm using (instead of the STL
> variant). Then I want to either solve directly using A, or I want to copy A
> to a GPU and solve it on the GPU if configured. For the former, this is
> currently working well, barring the operator() issues we are discussing
> above. The problem arises with the 2nd case. I could do the context
> change, but once it's been copied to the GPU I have to copy it *back* to
> take advantage of the pre-allocated matrix. That is, I'd like to avoid any
> additional memory allocations. I would like to just copy(A,gpu_A) when gpu
> is available. However, there is no copy for compressed_matrix to
> comprssed_matrix.
>
> Cheers
> Chris
>
> On 12 April 2017 at 04:19, Karl Rupp <[email protected]> wrote:
>
>> Hi Chris,
>>
>>
>> I'm an earth scientist so the way this code works is I have many time
>>> steps (e.g., 1hr) of observation data (e.g., wind) that I use for
>>> solving, amongst many things, a transport equation. You can imagine it
>>> as a tight loop over the observations where the inside (build the FVM)
>>> is done many, many times. Therefore the copy from STL to
>>> compressed_matrix is showing up in my profiling (using Intel VTune). The
>>> lack of performance increase is in the construction of the linear
>>> system; everything else has remained constant.
>>>
>>> I'm surprised operator(), and by extension entry_proxy, is that much
>>> slower. Where is it incurring the overhead?
>>>
>>
>> With each call to operator(), it needs to look up the respective entry in
>> the system matrix. This is a local search operation, hence takes much more
>> time than 'just working on the CSR arrays directly'. At this point you
>> really pay for the convenience of operator(), and I see no way of
>> completely avoiding those costs.
>>
>> By how much does your code slow down? I see some room for optimizing the
>> existing implementation for the host-based backend. Also, do you fill the
>> CSR matrix by increasing row index, or is your code filling rows at random?
>>
>>
>>
>> As a point of clarification, compressed_matrix when created like
>>> A(viennacl::context::context(viennacl::MAIN_MEMORY))
>>> really does exist on the host, correct? ALL of the internal code calls
>>> it gpu_matrix...
>>>
>>
>> Yes, it really creates the buffers on the host. The internal use of
>> 'gpu_matrix' is a historic relic from a time when ViennaCL only supported
>> OpenCL.
>>
>>
>> Lastly, I've run into a bit of a problem. There is no copy for
>>> compressed_matrix (host) -> compressed_matrix (gpu).
>>> Am I missing something?
>>>
>>
>> What are you trying to accomplish? If you just want to shift your data
>> over to CUDA or OpenCL or from CUDA/OpenCL back to the host, use
>> A.switch_memory_context(new_ctx).
>>
>> Best regards,
>> Karli
>>
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
ViennaCL-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/viennacl-support