Thanks for the detailed reply.
I'm an earth scientist so the way this code works is I have many time steps
(e.g., 1hr) of observation data (e.g., wind) that I use for solving,
amongst many things, a transport equation. You can imagine it as a tight
loop over the observations where the inside (build the FVM) is done many,
many times. Therefore the copy from STL to compressed_matrix is showing up
in my profiling (using Intel VTune). The lack of performance increase is in
the construction of the linear system; everything else has remained
constant.
I'm surprised operator(), and by extension entry_proxy, is that much
slower. Where is it incurring the overhead?
As a point of clarification, compressed_matrix when created like
A(viennacl::context::context(viennacl::MAIN_MEMORY))
really does exist on the host, correct? ALL of the internal code calls it
gpu_matrix...
Lastly, I've run into a bit of a problem. There is no copy for
compressed_matrix (host) -> compressed_matrix (gpu).
Am I missing something?
Cheers
Chris
On 11 April 2017 at 02:29, Karl Rupp <[email protected]> wrote:
> Hi Chris,
>
>
> Ok, this seemed to work very well. I can then modify the element
>> internal vector to zero out the matrix for my finite volume
>> implementation, &c and preserve the sparsity information so-as to use
>> operator() quickly.
>>
>
> it's still best to avoid operator() if you aim for maximum performance,
> but instead work on the CSR arrays directly. Chances are, however, that
> more time in already spent on other parts of your finite volume
> application, in which case case there's no need for further optimizing this
> part.
>
>
> The whole point of doing this was to avoid 2 sets of copies from main
>> memory STL format to main memory compressed_matrix format when using
>> OpenMP.
>>
>
> Yes, that's definitely the right way to do.
>
>
> However I'm not seeing any performance increase, and rather I am
>> seeing a performance decrease!
>>
>> Is this to be expected?
>>
>
> In which part do you see the performance decrease? If it's in the
> assembly, then work on the CSR arrays directly. Or are you referring to
> other parts, e.g. sparse matrix-vector products?
>
> Best regards,
> Karli
>
>
>
> On 7 April 2017 at 10:12, Chris Marsh <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Hi,
>>
>> Right, it's the sparsity pattern that you have no way of knowing a
>> priori during allocation. The parallel insert is then of course an
>> issue without the 2 passes...
>> I have to build a new A and b many, many times (during some
>> timestepping) so 2 passes is probably not much faster than what I'm
>> getting with copy. The sparsity pattern will stay constant. If I
>> initialize the sparsity, then operator() should work, correct? And
>> make my parallel code faster, i.e., not require 2 passes.
>>
>> Following this further: if I use a std::map< ... > sparse
>> representation, and copy it to a compressed_matrix, it should set up
>> the sparse structure for me. Then, I can use operator() without slow
>> down, and access in parallel as the sparsity will be correctly
>> setup. Reasonable approach for host only? For GPU, I obviously will
>> still need to copy. But this approach, if it works, should also
>> reduce code duplication.....
>>
>> (I'm trying to avoid learning CSR at the moment, have a time crunch!)
>>
>> Cheers
>> Chris
>>
>>
>> On 7 April 2017 at 00:21, Karl Rupp <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Hey,
>>
>> On 04/06/2017 11:48 PM, Chris Marsh wrote:
>>
>> Unless you are changing only a few entries, this is
>> likely to be too
>> slow.
>>
>> Big time :)
>>
>> Ok, so even though it is pre allocated for the right number
>> of nnz
>> values, operator() still incurs the cost? Must admit that is
>> not what
>> I'd have expected.
>>
>>
>> Well, this is a sparse matrix. Since operator() deals with a
>> single entry, there is no way this could be fast (note that CSR
>> has requirements on entries from the same row being located
>> consecutively in memory)
>>
>>
>> When I obtain those CSR buffers, they will be the correct
>> size, and I
>> should be able to insert into them in parallel, correct?
>>
>>
>> Yes, exactly.
>> You may need to populate the matrix in two passes: The first
>> determines the sparsity pattern, the second writes the actual
>> numerical values.
>>
>> Best regards,
>> Karli
>>
>>
>>
>> On 6 April 2017 at 13:13, Karl Rupp <[email protected]
>> <mailto:[email protected]>
>> <mailto:[email protected]
>>
>> <mailto:[email protected]>>> wrote:
>>
>> Hi!
>>
>>
>>
>> On 04/06/2017 06:44 PM, Chris Marsh wrote:
>>
>> Hi,
>>
>> I know the number of non-zero entries for a sparse
>> matrix so I
>> am trying
>> to pre-allocate it with
>>
>> viennacl::compressed_matrix<vcl_scalar_type>
>> vl_C(row, col, nnz);
>>
>>
>> At this point your matrix is still empty (i.e. no
>> nonzeros). It only
>> preallocated an array to hold up to 'nnz' entries.
>>
>>
>> and access it with vl_C.operator().
>>
>>
>> Unless you are changing only a few entries, this is
>> likely to be too
>> slow.
>>
>>
>> I am using host only memory context, with ViennaCL
>> 1.7.1 from
>> homebrew.
>>
>> How should I proceed with this?
>>
>>
>> To fill the CSR format efficiencly, have a look here:
>>
>> https://sourceforge.net/p/viennacl/discussion/1143678/thread
>> /325a937c/?limit=25#d6f0
>> <https://sourceforge.net/p/viennacl/discussion/1143678/threa
>> d/325a937c/?limit=25#d6f0>
>>
>> <https://sourceforge.net/p/viennacl/discussion/1143678/threa
>> d/325a937c/?limit=25#d6f0
>> <https://sourceforge.net/p/viennacl/discussion/1143678/threa
>> d/325a937c/?limit=25#d6f0>>
>>
>> For host-based memory, an example of how to get pointers
>> to the
>> three CSR arrays is here:
>>
>> https://github.com/viennacl/viennacl-dev/blob/master/viennac
>> l/linalg/host_based/sparse_matrix_operations.hpp#L115
>> <https://github.com/viennacl/viennacl-dev/blob/master/vienna
>> cl/linalg/host_based/sparse_matrix_operations.hpp#L115>
>>
>> <https://github.com/viennacl/viennacl-dev/blob/master/vienna
>> cl/linalg/host_based/sparse_matrix_operations.hpp#L115
>> <https://github.com/viennacl/viennacl-dev/blob/master/vienna
>> cl/linalg/host_based/sparse_matrix_operations.hpp#L115>>
>>
>> Best regards,
>> Karli
>>
>>
>>
>>
>>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
ViennaCL-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/viennacl-support