A few things. (1) Our implementation of the LSPP preconditioner is in a PCShell. (2) The algorithm uses a Lanczos algorithm (I think) to compute the Polynomial coefficients. However it is limited to SPD matrices. The technique could be extended to non symmetric matrices, I believe.
It would not be very hard to make LSPP available in PETSc that could be used on any piece of hardware for any matrix. All one needs is an AXPY and a MatMult to do the preconditioner application. The setup phase will require porting of our shell code into a PETSc class. I would be happy to share the PCShell and then discuss how to move it into the code. -Paul > Hmm, Paul mentioned the following paper a couple of weeks back: > > http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=6319205&contentType=Conference+Publications > > > > from which I concluded that this is already part of the txpetscgpu > package. Paul, this is the case, isn't it? > > Best regards, > Karli > > > > >> >> ________________________________________ >> From: petsc-dev-bounces at mcs.anl.gov [petsc-dev-bounces at mcs.anl.gov] >> on behalf of Karl Rupp [rupp at mcs.anl.gov] >> Sent: Wednesday, May 01, 2013 7:52 PM >> To: petsc-dev at mcs.anl.gov >> Subject: Re: [petsc-dev] PETSc multi-GPU assembly - current status >> >> Hi Florian, >> >> > This is loosely a follow up to [1]. In this thread a few potential >> ways >>> for making GPU assembly work with PETSc were discussed and to me the >>> two >>> most promising appeared to be: >>> 1) Create a PETSc matrix from a pre-assembled CSR structure, or >>> 2) Preallocate a PETSc matrix and get the handle to pass the row >>> pointer, column indices and values array to a custom assembly routine. >> >> I still consider these two to be the most promising (and general) >> approaches. On the other hand, to my knowledge the infrastructure hasn't >> changed a lot since then. Some additional functionality from CUSPARSE >> was added, while I added ViennaCL-bindings to branch 'next' (i.e. still >> a few corners to polish). This means that you could technically use the >> much more jit-friendly OpenCL (and, as a follow-up, complain at NVIDIA >> and AMD over the higher latencies than with CUDA). >> >>> We compute >>> local assembly matrices on the GPU and a crucial requirement is that >>> the >>> matrix *only* lives in device device, we want to avoid any host <-> >>> device data transfers. >> >> One of the reasons why - despite its attractiveness - this hasn't taken >> off is because good preconditioners are typically still required in such >> a setting. Other than the smoothed aggregation in CUSP, there is not >> much which does *not* require a copy to the host. Particularly when >> thinking about multi-GPU you're entering the regime where a good >> preconditioner on the CPU will still outperform a GPU assembly with poor >> preconditioner. >> >> >>> So far we have been using CUSP with a custom (generated) assembly into >>> our own CUSP-compatible CSR data structure for a single GPU. Since CUSP >>> doesn't give us multi-GPU solvers out of the box we'd rather use >>> existing infrastructure that works rather than rolling our own. >> >> I guess this is good news for you: Steve Dalton will work with us during >> the summer to extend the CUSP-SA-AMG to distributed memory. Other than >> that, I think there's currently only the functionality from CUSPARSE and >> polynomial preconditioners, available through the txpetscgpu package. >> >> Aside from that I also have a couple of plans on that front spinning in >> my head, yet I couldn't find the time for implementing this yet. >> >> >>> At the time of [1] supporting GPU assembly in one form or the other was >>> on the roadmap, but the implementation direction seemed to not have >>> been >>> finally decided. Was there any progress since then or anything to >>> add to >>> the discussion? Is there even (experimental) code we might be able to >>> use? Note that we're using petsc4py to interface to PETSc. >> >> Did you have a look at snes/examples/tutorials/ex52? I'm currently >> converting/extending this to OpenCL, so it serves as a playground for a >> future interface. Matt might have some additional comments on this. >> >> Best regards, >> Karli >>
