Hi Dave,

> Your list sounds great to me. Glad that you and Paul are working on this together.

My main interests are in better preconditioner support and better multi-GPU/MPI
scalability.

This is follow-up work then. There are a couple of 'simple' preconditioners (polynomial preconditioning, maybe some point-block Jacobi) which can also be useful as smoothers and which we can add in the near future. We should just get the 'infrastructure' work done first so that we don't have to unnecessarily adjust too much code later on.


Is there any progress on Steve Dalton's work on the cusp algebraic multigrid
preconditioner with PETSc?  I believe Jed said in a previous email that Steve
was going to be working on adding MPI support for that as well as other
enhancements.

Yes, Steve is working on this right here at our division. Jed can give a more detailed answer on this.


Will there be any improvements for GPU preconditioners in ViennaCL 1.5.0?
When do you expect ViennaCL 1.5.0 to be available in PETSc?

Jed gave me a good hint with respect to D-ILU0, which I'll also add to PETSc. As with other GPU-accelerations using ILU, it will require a proper matrix ordering to give good performance. I'm somewhat tempted to port the SA-AMG implementation in CUSP to OpenCL as well, but this certainly won't be in 1.5.0.


I'm also interested in trying the PETSc ViennaCL support on the Xeon Phi.
Do you have a schedule for when that might be ready for friendly testing?

With OpenCL you can already test this now. Just install the Intel OpenCL SDK on your Xeon Phi machine, configure with --download-viennacl, --with-opencl-include=..., --with-opencl-lib=..., and pass the
  -viennacl_device_accelerator
flag in addition to -vec_type viennacl -mat_type aijviennacl when executing.

Unfortunately the application memory bandwidth we get on the Xeon Phi is too limited to be useful for off-loaded execution as it is the case with OpenCL: Even the folks at Intel couldn't obtain more than ~95 GB/sec even when filling up the whole MIC with just two vectors for benchmarking a simple copy operation. Thus, I don't think our efforts are currently well spent on trying a fully native execution of PETSc on the MIC, because the trend is going more towards a tighter CPU/accelerator integration on the same die rather than piggy-backing via PCI-Express. Anyway, I'll let you know if there are any updates on this front.

Best regards,
Karli

Reply via email to