Re: [petsc-dev] Improving and stabilizing GPU support

Karl Rupp Fri, 19 Jul 2013 15:32:37 -0700

Hi Dave,

> Your list sounds great to me. Glad that you and Paul are working onthis together.


My main interests are in better preconditioner support and better multi-GPU/MPI
scalability.

This is follow-up work then. There are a couple of 'simple'preconditioners (polynomial preconditioning, maybe some point-blockJacobi) which can also be useful as smoothers and which we can add inthe near future. We should just get the 'infrastructure' work done firstso that we don't have to unnecessarily adjust too much code later on.

Is there any progress on Steve Dalton's work on the cusp algebraic multigrid
preconditioner with PETSc?  I believe Jed said in a previous email that Steve
was going to be working on adding MPI support for that as well as other
enhancements.

Yes, Steve is working on this right here at our division. Jed can give amore detailed answer on this.

Will there be any improvements for GPU preconditioners in ViennaCL 1.5.0?
When do you expect ViennaCL 1.5.0 to be available in PETSc?

Jed gave me a good hint with respect to D-ILU0, which I'll also add toPETSc. As with other GPU-accelerations using ILU, it will require aproper matrix ordering to give good performance. I'm somewhat tempted toport the SA-AMG implementation in CUSP to OpenCL as well, but thiscertainly won't be in 1.5.0.

I'm also interested in trying the PETSc ViennaCL support on the Xeon Phi.
Do you have a schedule for when that might be ready for friendly testing?

With OpenCL you can already test this now. Just install the Intel OpenCLSDK on your Xeon Phi machine, configure with --download-viennacl,--with-opencl-include=..., --with-opencl-lib=..., and pass the

  -viennacl_device_accelerator
flag in addition to -vec_type viennacl -mat_type aijviennacl when executing.

Unfortunately the application memory bandwidth we get on the Xeon Phi istoo limited to be useful for off-loaded execution as it is the case withOpenCL: Even the folks at Intel couldn't obtain more than ~95 GB/seceven when filling up the whole MIC with just two vectors forbenchmarking a simple copy operation. Thus, I don't think our effortsare currently well spent on trying a fully native execution of PETSc onthe MIC, because the trend is going more towards a tighterCPU/accelerator integration on the same die rather than piggy-backingvia PCI-Express. Anyway, I'll let you know if there are any updates onthis front.


Best regards,
Karli

Re: [petsc-dev] Improving and stabilizing GPU support

Reply via email to