On Thu, Aug 8, 2013 at 5:37 AM, Michael Lange <[email protected]>wrote:
> Hi, > > We have recently been trying to re-align our OpenMP fork ( > https://bitbucket.org/**ggorman/petsc-3.3-omp<https://bitbucket.org/ggorman/petsc-3.3-omp>) > with petsc/master. Much of our early work has now been superseded by the > threadcomm implementations. Nevertheless, there are still a few algorithmic > differences between the two branches: > > 1) Enforcing MPI latency hiding by using task-based spMV: > If the MPI implementation used does not actually provide truly > asynchronous communication in hardware, performance can be increased by > dedicating a single thread to overlapping MPI communication in PETSc. > However, this is arguably a vendor-specific fix which requires significant > code changes (ie the parallel section needs to be raised up by one level). > So perhaps the strategy should be to give guilty vendors a hard time rather > than messing up the current abstraction. > > 2) Nonzero-based thread partitioning: > Rather than evenly dividing the number of rows among threads, we can > partition the thread ownership ranges according to the number of non-zeros > in each row. This balances the work load between threads and thus increases > strong scalability due to optimised bandwidth utilisation. In general, this > optimisation should integrate well with threadcomms, since it only changes > the thread ownership ranges, but it does require some structural changes > since nnz is currently not passed to PetscLayoutSetUp. Any thoughts on > whether people regard such a scheme as useful would be greatly appreciated. > I think this should be handled by changing the AIJ data structure. Going all the way to "2D" partitions would also allow us to handle power-law matrix graphs. This would keep the thread implementation simple, and at the same time be more flexible. Matt > 3) MatMult_SeqBAIJ not threaded: > Is there a reason why MatMult has not been threaded for BAIJ matrices, or > is somebody already working on this? If not, I would like to prepare a pull > request for this using the same approach as MatMult_SeqAIJ. > > We would welcome any suggestions/feedback on this, in particular the > second point. Up to date benchmarking results for the first two methods, > including BlueGene/Q, can be found in: > http://arxiv.org/abs/1307.4567 > > Kind regards, > > Michael Lange > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
