Hi,
On 09/08/2018 16:37, nnunn wrote:
When using hex elements with order say 4 or 5, the number of non-zeros
in the GiMMiK kernels gets quite high.
E.g., with n=4, tgradpcoru_upts [hex] appears to get 1819 non-zeros from
46875 entries (i.e. about 4% non-zeros)
Just wondering, for high-order 3d runs, what's an appropriate way to
replace the GiMMiK kernels with more normal matrix multiplication?
PS: even with 1819 non-zeros, the hard-wired (const) GiMMiK kernels
appear to run fine, but at some point I guess the number of registers
required must outweigh the cost of loading the const mats from memory?
From class CUDAGiMMiKKernels(CUDAKernelProvider):
# Check that A is reasonably sparse
if np.count_nonzero(a.get()) > self.max_nnz:
raise NotSuitableError('Matrix too dense for GiMMiK')
default self.max_nnz: [512]
If you wish to disable GiMMiK you can place the key
gimmik-max-nnz = 0
into the [backend-<your backed>] section of the config file. This will
cause GiMMiK to raise the NotSuitableError you showed above, and thus
result in the multiplication being handled by dense BLAS.
The point at which GiMMiK becomes unprofitable depends heavily on the
form of the matrices, which backend you are using, and the hardware you
are running on. In retrospect, 512 is probably a tad on the low side --
at least for matrices which are sparse. Although, saying that, for
matrices which are dense 512 is sometimes on the high side.
Regards, Freddie.
--
You received this message because you are subscribed to the Google Groups "PyFR
Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send an email to [email protected].
Visit this group at https://groups.google.com/group/pyfrmailinglist.
For more options, visit https://groups.google.com/d/optout.