What should we use a for programming model for PETSc on multi-core systems? Currently for conventional multicore we have only have one MPI process per core and for GPU we have subclasses of Vec and Mat with custom CUDA code.
Should we introduce subclasses of Vec and Mat built on pthreads (this is what Bill G recommends, and not to use OpenMP)? Is there a way to have some kind of consistent model between conventional multicore and GPU multi-core? If not the same code. What about this MCUDA stuff? Barry
