Karli, I'm not aware of any polynomial preconditioners for the gpu available in petsc with or without the txpetscgpu package. I'd love to try them out if they were though and would love to hear that I am wrong.
Dave ________________________________________ From: petsc-dev-bounces at mcs.anl.gov [petsc-dev-bounces at mcs.anl.gov] on behalf of Karl Rupp [[email protected]] Sent: Wednesday, May 01, 2013 7:52 PM To: petsc-dev at mcs.anl.gov Subject: Re: [petsc-dev] PETSc multi-GPU assembly - current status Hi Florian, > This is loosely a follow up to [1]. In this thread a few potential ways > for making GPU assembly work with PETSc were discussed and to me the two > most promising appeared to be: > 1) Create a PETSc matrix from a pre-assembled CSR structure, or > 2) Preallocate a PETSc matrix and get the handle to pass the row > pointer, column indices and values array to a custom assembly routine. I still consider these two to be the most promising (and general) approaches. On the other hand, to my knowledge the infrastructure hasn't changed a lot since then. Some additional functionality from CUSPARSE was added, while I added ViennaCL-bindings to branch 'next' (i.e. still a few corners to polish). This means that you could technically use the much more jit-friendly OpenCL (and, as a follow-up, complain at NVIDIA and AMD over the higher latencies than with CUDA). > We compute > local assembly matrices on the GPU and a crucial requirement is that the > matrix *only* lives in device device, we want to avoid any host <-> > device data transfers. One of the reasons why - despite its attractiveness - this hasn't taken off is because good preconditioners are typically still required in such a setting. Other than the smoothed aggregation in CUSP, there is not much which does *not* require a copy to the host. Particularly when thinking about multi-GPU you're entering the regime where a good preconditioner on the CPU will still outperform a GPU assembly with poor preconditioner. > So far we have been using CUSP with a custom (generated) assembly into > our own CUSP-compatible CSR data structure for a single GPU. Since CUSP > doesn't give us multi-GPU solvers out of the box we'd rather use > existing infrastructure that works rather than rolling our own. I guess this is good news for you: Steve Dalton will work with us during the summer to extend the CUSP-SA-AMG to distributed memory. Other than that, I think there's currently only the functionality from CUSPARSE and polynomial preconditioners, available through the txpetscgpu package. Aside from that I also have a couple of plans on that front spinning in my head, yet I couldn't find the time for implementing this yet. > At the time of [1] supporting GPU assembly in one form or the other was > on the roadmap, but the implementation direction seemed to not have been > finally decided. Was there any progress since then or anything to add to > the discussion? Is there even (experimental) code we might be able to > use? Note that we're using petsc4py to interface to PETSc. Did you have a look at snes/examples/tutorials/ex52? I'm currently converting/extending this to OpenCL, so it serves as a playground for a future interface. Matt might have some additional comments on this. Best regards, Karli
