On Aug 8, 2013, at 9:14 AM, Karl Rupp <[email protected]> wrote:
> Hi Michael,
>
> > We have recently been trying to re-align our OpenMP fork
>
>> 2) Nonzero-based thread partitioning:
>> Rather than evenly dividing the number of rows among threads, we can
>> partition the thread ownership ranges according to the number of
>> non-zeros in each row. This balances the work load between threads and
>> thus increases strong scalability due to optimised bandwidth
>> utilisation. In general, this optimisation should integrate well with
>> threadcomms, since it only changes the thread ownership ranges, but it
>> does require some structural changes since nnz is currently not passed
>> to PetscLayoutSetUp. Any thoughts on whether people regard such a scheme
>> as useful would be greatly appreciated.
>
> This is a reasonable optimization, I used a similar strategy for sparse
> matrices on the GPU. Others should comment on whether the interface change to
> PetscLayoutSetUp is acceptable.
I don't think PetscLayoutSetUp() should be complicated in this fashion.
This kind of non-trivial parallel partitioning decisions that depend on the
mesh or graph of the problem are made by DMs.
Barry
>