Re: [petsc-dev] Supporting OpenCL matrix assembly

Karl Rupp Tue, 24 Sep 2013 07:08:40 -0700

Hey,

On 09/24/2013 03:53 PM, Jed Brown wrote:

Karl Rupp <[email protected]> writes:

I'm not talking about CSR vs. COO from the SpMV point of view, but
rather on how to store the actual data in global memory without
expensive subsequent sorts.


Sure, but this seems like such a minor detail.  With PetscScalar=double
and PetscInt=int, we have 16 bytes/entry for COO and (nominally) 12
bytes/entry for CSR, and it only needs to go to GPU global memory and
back, not across to the CPU.  I doubt the difference between 12 and 16
bytes/entry during assembly is a bottleneck.

I'm not worried about 12 bytes vs. 16 bytes, but rather about theordering of entries as a whole. If one assembles into somethingCSR-like, then one can either run the SpMV right away, or merge entriesin each row of the matrix which have the same column indices. Mergingsuch entries can usually be done in shared memory, so the memory costsis one read and write of the matrix nonzero entries in worst case.

On the contrary, if everything is assembled into a general COO format,then one needs to sort the triplets by row first in order to be evenable to run SpMVs. The memory transactions required for this areO(N log(N)) with N being the number of nonzeros. N is in almost allcases larger than 10^6, so the log(N) hurts...


Best regards,
Karli

Re: [petsc-dev] Supporting OpenCL matrix assembly

Reply via email to