I don't think this has anything to do with the specific solver but is because you are loading both a vector and matrix from a file and when it uses the default parallel layout for each, because you have -matload_block_size 1 and -vecload_block_size 10 they do not get the same layout.
Remove the -matload_block_size 1 and -vecload_block_size 10 they don't mean anything here anyways. Does this resolve the problem? Barry > On May 24, 2017, at 3:06 PM, Danyang Su <danyang...@gmail.com> wrote: > > Dear Hong, > > I just tested with different number of processors for the same matrix. It > sometimes got "ERROR: Arguments are incompatible" for different number of > processors. It works fine using 4, 8, or 24 processors, but failed with > "ERROR: Arguments are incompatible" using 16 or 48 processors. The error > information is attached. I tested this on my local computer with 6 cores 12 > threads. Any suggestion on this? > > Thanks, > Danyang > > On 17-05-24 12:28 PM, Danyang Su wrote: >> Hi Hong, >> >> Awesome. Thanks for testing the case. I will try your options for the code >> and get back to you later. >> >> Regards, >> >> Danyang >> >> On 17-05-24 12:21 PM, Hong wrote: >>> Danyang : >>> I tested your data. >>> Your matrices encountered zero pivots, e.g. >>> petsc/src/ksp/ksp/examples/tutorials (master) >>> $ mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs b_react_in_2.bin >>> -ksp_monitor -ksp_error_if_not_converged >>> >>> [15]PETSC ERROR: Zero pivot in LU factorization: >>> http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot >>> [15]PETSC ERROR: Zero pivot row 1249 value 2.05808e-14 tolerance 2.22045e-14 >>> ... >>> >>> Adding option '-sub_pc_factor_shift_type nonzero', I got >>> mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs b_react_in_2.bin >>> -ksp_monitor -ksp_error_if_not_converged -sub_pc_factor_shift_type nonzero >>> -mat_view ascii::ascii_info >>> >>> Mat Object: 24 MPI processes >>> type: mpiaij >>> rows=450000, cols=450000 >>> total: nonzeros=6991400, allocated nonzeros=6991400 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node (on process 0) routines >>> 0 KSP Residual norm 5.849777711755e+01 >>> 1 KSP Residual norm 6.824179430230e-01 >>> 2 KSP Residual norm 3.994483555787e-02 >>> 3 KSP Residual norm 6.085841461433e-03 >>> 4 KSP Residual norm 8.876162583511e-04 >>> 5 KSP Residual norm 9.407780665278e-05 >>> Number of iterations = 5 >>> Residual norm 0.00542891 >>> >>> Hong >>> Hi Matt, >>> >>> Yes. The matrix is 450000x450000 sparse. The hypre takes hundreds of >>> iterates, not for all but in most of the timesteps. The matrix is not well >>> conditioned, with nonzero entries range from 1.0e-29 to 1.0e2. I also made >>> double check if there is anything wrong in the parallel version, however, >>> the matrix is the same with sequential version except some round error >>> which is relatively very small. Usually for those not well conditioned >>> matrix, direct solver should be faster than iterative solver, right? But >>> when I use the sequential iterative solver with ILU prec developed almost >>> 20 years go by others, the solver converge fast with appropriate >>> factorization level. In other words, when I use 24 processor using hypre, >>> the speed is almost the same as as the old sequential iterative solver >>> using 1 processor. >>> >>> I use most of the default configuration for the general case with pretty >>> good speedup. And I am not sure if I miss something for this problem. >>> >>> Thanks, >>> >>> Danyang >>> >>> On 17-05-24 11:12 AM, Matthew Knepley wrote: >>>> On Wed, May 24, 2017 at 12:50 PM, Danyang Su <danyang...@gmail.com> wrote: >>>> Hi Matthew and Barry, >>>> >>>> Thanks for the quick response. >>>> I also tried superlu and mumps, both work but it is about four times >>>> slower than ILU(dt) prec through hypre, with 24 processors I have tested. >>>> >>>> You mean the total time is 4x? And you are taking hundreds of iterates? >>>> That seems hard to believe, unless you are dropping >>>> a huge number of elements. >>>> When I look into the convergence information, the method using ILU(dt) >>>> still takes 200 to 3000 linear iterations for each newton iteration. One >>>> reason is this equation is hard to solve. As for the general cases, the >>>> same method works awesome and get very good speedup. >>>> >>>> I do not understand what you mean here. >>>> I also doubt if I use hypre correctly for this case. Is there anyway to >>>> check this problem, or is it possible to increase the factorization level >>>> through hypre? >>>> >>>> I don't know. >>>> >>>> Matt >>>> Thanks, >>>> >>>> Danyang >>>> >>>> On 17-05-24 04:59 AM, Matthew Knepley wrote: >>>>> On Wed, May 24, 2017 at 2:21 AM, Danyang Su <danyang...@gmail.com> wrote: >>>>> Dear All, >>>>> >>>>> I use PCFactorSetLevels for ILU and PCFactorSetFill for other >>>>> preconditioning in my code to help solve the problems that the default >>>>> option is hard to solve. However, I found the latter one, PCFactorSetFill >>>>> does not take effect for my problem. The matrices and rhs as well as the >>>>> solutions are attached from the link below. I obtain the solution using >>>>> hypre preconditioner and it takes 7 and 38 iterations for matrix 1 and >>>>> matrix 2. However, if I use other preconditioner, the solver just failed >>>>> at the first matrix. I have tested this matrix using the native >>>>> sequential solver (not PETSc) with ILU preconditioning. If I set the >>>>> incomplete factorization level to 0, this sequential solver will take >>>>> more than 100 iterations. If I increase the factorization level to 1 or >>>>> more, it just takes several iterations. This remind me that the PC factor >>>>> for this matrices should be increased. However, when I tried it in PETSc, >>>>> it just does not work. >>>>> >>>>> Matrix and rhs can be obtained from the link below. >>>>> >>>>> https://eilinator.eos.ubc.ca:8443/index.php/s/CalUcq9CMeblk4R >>>>> >>>>> Would anyone help to check if you can make this work by increasing the PC >>>>> factor level or fill? >>>>> >>>>> We have ILU(k) supported in serial. However ILU(dt) which takes a >>>>> tolerance only works through Hypre >>>>> >>>>> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html >>>>> >>>>> I recommend you try SuperLU or MUMPS, which can both be downloaded >>>>> automatically by configure, and >>>>> do a full sparse LU. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> Thanks and regards, >>>>> >>>>> Danyang >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which >>>>> their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> http://www.caam.rice.edu/~mk51/ >>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> http://www.caam.rice.edu/~mk51/ >>> >>> >> > > <outscreen_p16.txt><outscreen_p48.txt><outscreen_p8.txt>