Hi Xiangdong,

Maybe I am misunderstanding you, but it sounds like you want an exact direct 
solution, so I don't understand why you are using an incomplete factorization 
solver for this. SuperLU_DIST (as Mark has suggested) or MUMPS are two such 
packages that provide MPI-parallel sparse LU factorization. If you need GPU 
support, SuperLU_DIST has such support. I don't know the status of our support 
for using the GPU capabilities of this, though -- I assume another developer 
can chime in regarding this.

Note that the ILU provided by "chowiluiennacl" employs a very different 
algorithm than the standard PCILU in PETSc, and you shouldn't expect to get the 
same incomplete factorization. The algorithm is described in this paper by Chow 
and Patel:

https://www.cc.gatech.edu/~echow/pubs/parilu-sisc.pdf

Best regards,
Richard

On 1/15/20 11:39 AM, Xiangdong wrote:
I just submitted the issue: https://gitlab.com/petsc/petsc/issues/535

What I really want is an exact Block Tri-diagonal solver on GPU. Since for 
block tridiagonal system, ILU0 would be the same as ILU. So I tried the 
chowiluviennacl. but I found that the default parameters does not produce the 
same ILU0 factorization as the CPU ones (PCILU). My guess is that if I increase 
the number of sweeps chow_patel_ilu_config.sweeps(3), it may give a better 
result. So the option Keys would be helpful.

Since Mark mentioned the Superlu's GPU feature, can I use superlu or hypre's 
GPU functionality through PETSc?

Thank you.

Xiangdong

On Wed, Jan 15, 2020 at 2:22 PM Matthew Knepley 
<[email protected]<mailto:[email protected]>> wrote:
On Wed, Jan 15, 2020 at 1:48 PM Xiangdong 
<[email protected]<mailto:[email protected]>> wrote:
In the ViennaCL manual 
http://viennacl.sourceforge.net/doc/manual-algorithms.html

It did expose two parameters:

// configuration of preconditioner:
viennacl::linalg::chow_patel_tag chow_patel_ilu_config;
chow_patel_ilu_config.sweeps(3); // three nonlinear sweeps
chow_patel_ilu_config.jacobi_iters(2); // two Jacobi iterations per triangular 
'solve' Rx=r

and mentioned that:
The number of nonlinear sweeps and Jacobi iterations need to be set 
problem-specific for best performance.

In the PETSc' implementation:

viennacl::linalg::chow_patel_tag ilu_tag;
    ViennaCLAIJMatrix *mat = (ViennaCLAIJMatrix*)gpustruct->mat;
    ilu->CHOWILUVIENNACL = new 
viennacl::linalg::chow_patel_ilu_precond<viennacl::compressed_matrix<PetscScalar>
 >(*mat, ilu_tag);

The default is used. Is it possible to expose these two parameters so that user 
can change it through option keys?

Yes. Do you mind making an issue for it? That way we can better keep track.

https://gitlab.com/petsc/petsc/issues

  Thanks,

    Matt

Thank you.

Xiangdong

On Wed, Jan 15, 2020 at 12:40 PM Matthew Knepley 
<[email protected]<mailto:[email protected]>> wrote:
On Wed, Jan 15, 2020 at 9:59 AM Xiangdong 
<[email protected]<mailto:[email protected]>> wrote:
Maybe I am not clear. I want to solve the block tridiagonal system  Tx=b a few 
times with same T but different b. On CPU, I can have it by applying the ILU0 
and reuse the factorization. Since it is block tridiagonal, ILU0 would give 
same results as LU.

I am trying to do the same thing on GPU with chowiluviennacl, but found default 
factorization does not produce the exact factorization for tridiagonal system. 
Can we tight the drop off tolerance so that it can work as LU for tridiagonal 
system?

There are no options in our implementation. You could look at the ViennaCL 
manual to see if we missed something.

  Thanks,

    Matt

Thank you.

Xiangdong

On Wed, Jan 15, 2020 at 9:41 AM Matthew Knepley 
<[email protected]<mailto:[email protected]>> wrote:
On Wed, Jan 15, 2020 at 9:36 AM Xiangdong 
<[email protected]<mailto:[email protected]>> wrote:
Can chowiluviennacl do ilu0?

I need to solve a tri-diagonal system directly. If I apply the PCILU, I will 
obtain the exact solution with preonly + pcilu. However, the preonly + 
chowiluviennacl will not provide the exact solution. Any option keys to set the 
CHOWILUVIENNACL filling level or dropping off tolerance like the standard ilu?

No. However, such a scheme makes less sense here. This algorithm spawns a 
individual threads for individual elements. Drop tolerance
is not less work, it is sparser, but that should not matter for a tridiagonal 
system. Levels also is not applicable since you have only 1 level.

  Thanks,

    Matt

Thank you.

Best,
Xiangdong



On Tue, Jan 14, 2020 at 10:05 PM Matthew Knepley 
<[email protected]<mailto:[email protected]>> wrote:
On Tue, Jan 14, 2020 at 9:56 PM Xiangdong 
<[email protected]<mailto:[email protected]>> wrote:
Dear Developers,

I have a quick question about the chowiluviennacl. When I tried to use it, I 
found that it only works for np=1, not np>1. However, in the description of 
chowiluviennacl.cxx, it says "the ViennaCL Chow-Patel parallel ILU 
preconditioner".

By parallel, this means shared memory parallelism on the GPU.

I am wondering whether I am using it correctly. Does chowiluviennacl work for 
np>1?

I do not believe so. I do not see why it could not be extended, but that would 
mean writing some more code.

  Thanks,

    Matt

In addition, are there option keys for the chowiluviennacl one can try?
Thank you.

Best,
Xiangdong


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>

Reply via email to