Thanks for the detailed reply.

I'm an earth scientist so the way this code works is I have many time steps
(e.g., 1hr) of observation data (e.g., wind) that I use for solving,
amongst many things, a transport equation. You can imagine it as a tight
loop over the observations where the inside (build the FVM) is done many,
many times. Therefore the copy  from STL to compressed_matrix is showing up
in my profiling (using Intel VTune). The lack of performance increase is in
the construction of the linear system; everything else has remained
constant.

I'm surprised operator(), and by extension entry_proxy, is that much
slower.  Where is it incurring the overhead?

As a point of clarification, compressed_matrix when created like
A(viennacl::context::context(viennacl::MAIN_MEMORY))
really does exist on the host, correct? ALL of the internal code calls it
gpu_matrix...

Lastly, I've run into a bit of a problem. There is no copy for
compressed_matrix (host) -> compressed_matrix (gpu).
Am I missing something?

Cheers
Chris



On 11 April 2017 at 02:29, Karl Rupp <[email protected]> wrote:

> Hi Chris,
>
>
> Ok, this seemed to work very well. I can then modify the element
>> internal vector to zero out the matrix for my finite volume
>> implementation, &c and preserve the sparsity information so-as to use
>> operator() quickly.
>>
>
> it's still best to avoid operator() if you aim for maximum performance,
> but instead work on the CSR arrays directly. Chances are, however, that
> more time in already spent on other parts of your finite volume
> application, in which case case there's no need for further optimizing this
> part.
>
>
> The whole point of doing this was to avoid 2 sets of copies from main
>> memory STL format to main memory compressed_matrix format when using
>> OpenMP.
>>
>
> Yes, that's definitely the right way to do.
>
>
> However I'm not seeing any performance increase, and rather I am
>> seeing a performance decrease!
>>
>> Is this to be expected?
>>
>
> In which part do you see the performance decrease? If it's in the
> assembly, then work on the CSR arrays directly. Or are you referring to
> other parts, e.g. sparse matrix-vector products?
>
> Best regards,
> Karli
>
>
>
> On 7 April 2017 at 10:12, Chris Marsh <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>     Hi,
>>
>>     Right, it's the sparsity pattern that you have no way of knowing a
>>     priori during allocation. The parallel insert is then of course an
>>     issue without the 2 passes...
>>     I have to build a new A and b many, many times (during some
>>     timestepping) so 2 passes is probably not much faster than what I'm
>>     getting with copy. The sparsity pattern will stay constant. If I
>>     initialize the sparsity, then operator() should work, correct? And
>>     make my parallel code faster, i.e., not require 2 passes.
>>
>>     Following this further: if I use a std::map< ... > sparse
>>     representation, and copy it to a compressed_matrix, it should set up
>>     the sparse structure for me. Then, I can use operator() without slow
>>     down, and access in parallel as the sparsity will be correctly
>>     setup. Reasonable approach for host only? For GPU, I obviously will
>>     still need to copy. But this approach, if it works, should also
>>     reduce code duplication.....
>>
>>     (I'm trying to avoid learning CSR at the moment, have a time crunch!)
>>
>>     Cheers
>>     Chris
>>
>>
>>     On 7 April 2017 at 00:21, Karl Rupp <[email protected]
>>     <mailto:[email protected]>> wrote:
>>
>>         Hey,
>>
>>         On 04/06/2017 11:48 PM, Chris Marsh wrote:
>>
>>                 Unless you are changing only a few entries, this is
>>             likely to be too
>>                 slow.
>>
>>             Big time :)
>>
>>             Ok, so even though it is pre allocated for the right number
>>             of nnz
>>             values, operator() still incurs the cost? Must admit that is
>>             not what
>>             I'd have expected.
>>
>>
>>         Well, this is a sparse matrix. Since operator() deals with a
>>         single entry, there is no way this could be fast (note that CSR
>>         has requirements on entries from the same row being located
>>         consecutively in memory)
>>
>>
>>             When I obtain those CSR buffers, they will be the correct
>>             size, and I
>>             should be able to insert into them in parallel, correct?
>>
>>
>>         Yes, exactly.
>>         You may need to populate the matrix in two passes: The first
>>         determines the sparsity pattern, the second writes the actual
>>         numerical values.
>>
>>         Best regards,
>>         Karli
>>
>>
>>
>>             On 6 April 2017 at 13:13, Karl Rupp <[email protected]
>>             <mailto:[email protected]>
>>             <mailto:[email protected]
>>
>>             <mailto:[email protected]>>> wrote:
>>
>>                 Hi!
>>
>>
>>
>>                 On 04/06/2017 06:44 PM, Chris Marsh wrote:
>>
>>                     Hi,
>>
>>                     I know the number of non-zero entries for a sparse
>>             matrix so I
>>                     am trying
>>                     to pre-allocate it with
>>
>>                      viennacl::compressed_matrix<vcl_scalar_type>
>>             vl_C(row, col, nnz);
>>
>>
>>                 At this point your matrix is still empty (i.e. no
>>             nonzeros). It only
>>                 preallocated an array to hold up to 'nnz' entries.
>>
>>
>>                     and access it with vl_C.operator().
>>
>>
>>                 Unless you are changing only a few entries, this is
>>             likely to be too
>>                 slow.
>>
>>
>>                     I am using host only memory context, with ViennaCL
>>             1.7.1 from
>>                     homebrew.
>>
>>                     How should I proceed with this?
>>
>>
>>                 To fill the CSR format efficiencly, have a look here:
>>
>>             https://sourceforge.net/p/viennacl/discussion/1143678/thread
>> /325a937c/?limit=25#d6f0
>>             <https://sourceforge.net/p/viennacl/discussion/1143678/threa
>> d/325a937c/?limit=25#d6f0>
>>
>>             <https://sourceforge.net/p/viennacl/discussion/1143678/threa
>> d/325a937c/?limit=25#d6f0
>>             <https://sourceforge.net/p/viennacl/discussion/1143678/threa
>> d/325a937c/?limit=25#d6f0>>
>>
>>                 For host-based memory, an example of how to get pointers
>>             to the
>>                 three CSR arrays is here:
>>
>>             https://github.com/viennacl/viennacl-dev/blob/master/viennac
>> l/linalg/host_based/sparse_matrix_operations.hpp#L115
>>             <https://github.com/viennacl/viennacl-dev/blob/master/vienna
>> cl/linalg/host_based/sparse_matrix_operations.hpp#L115>
>>
>>             <https://github.com/viennacl/viennacl-dev/blob/master/vienna
>> cl/linalg/host_based/sparse_matrix_operations.hpp#L115
>>             <https://github.com/viennacl/viennacl-dev/blob/master/vienna
>> cl/linalg/host_based/sparse_matrix_operations.hpp#L115>>
>>
>>                 Best regards,
>>                 Karli
>>
>>
>>
>>
>>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
ViennaCL-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/viennacl-support

Reply via email to