I don't think this has anything to do with the specific solver but is 
because you are loading both a vector and matrix from a file and when it uses 
the default parallel layout for each, because you have -matload_block_size 1  
and -vecload_block_size 10 they do not get the same layout.

   Remove the -matload_block_size 1  and -vecload_block_size 10 they don't mean 
anything here anyways.

   Does this resolve the problem?

   Barry

> On May 24, 2017, at 3:06 PM, Danyang Su <danyang...@gmail.com> wrote:
> 
> Dear Hong,
> 
> I just tested with different number of processors for the same matrix. It 
> sometimes got "ERROR: Arguments are incompatible" for different number of 
> processors. It works fine using 4, 8, or 24 processors, but failed with 
> "ERROR: Arguments are incompatible" using 16 or 48 processors. The error 
> information is attached. I tested this on my local computer with 6 cores 12 
> threads. Any suggestion on this?
> 
> Thanks,
> Danyang
> 
> On 17-05-24 12:28 PM, Danyang Su wrote:
>> Hi Hong,
>> 
>> Awesome. Thanks for testing the case. I will try your options for the code 
>> and get back to you later.
>> 
>> Regards,
>> 
>> Danyang
>> 
>> On 17-05-24 12:21 PM, Hong wrote:
>>> Danyang :
>>> I tested your data.
>>> Your matrices encountered zero pivots, e.g.
>>> petsc/src/ksp/ksp/examples/tutorials (master)
>>> $ mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs b_react_in_2.bin 
>>> -ksp_monitor -ksp_error_if_not_converged
>>> 
>>> [15]PETSC ERROR: Zero pivot in LU factorization: 
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot
>>> [15]PETSC ERROR: Zero pivot row 1249 value 2.05808e-14 tolerance 2.22045e-14
>>> ...
>>> 
>>> Adding option '-sub_pc_factor_shift_type nonzero', I got
>>> mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs b_react_in_2.bin 
>>> -ksp_monitor -ksp_error_if_not_converged -sub_pc_factor_shift_type nonzero 
>>> -mat_view ascii::ascii_info
>>> 
>>> Mat Object: 24 MPI processes
>>>   type: mpiaij
>>>   rows=450000, cols=450000
>>>   total: nonzeros=6991400, allocated nonzeros=6991400
>>>   total number of mallocs used during MatSetValues calls =0
>>>     not using I-node (on process 0) routines
>>>   0 KSP Residual norm 5.849777711755e+01
>>>   1 KSP Residual norm 6.824179430230e-01
>>>   2 KSP Residual norm 3.994483555787e-02
>>>   3 KSP Residual norm 6.085841461433e-03
>>>   4 KSP Residual norm 8.876162583511e-04
>>>   5 KSP Residual norm 9.407780665278e-05
>>> Number of iterations =   5
>>> Residual norm 0.00542891
>>> 
>>> Hong
>>> Hi Matt,
>>> 
>>> Yes. The matrix is 450000x450000 sparse. The hypre takes hundreds of 
>>> iterates, not for all but in most of the timesteps. The matrix is not well 
>>> conditioned, with nonzero entries range from 1.0e-29 to 1.0e2. I also made 
>>> double check if there is anything wrong in the parallel version, however, 
>>> the matrix is the same with sequential version except some round error 
>>> which is relatively very small. Usually for those not well conditioned 
>>> matrix, direct solver should be faster than iterative solver, right? But 
>>> when I use the sequential iterative solver with ILU prec developed almost 
>>> 20 years go by others, the solver converge fast with appropriate 
>>> factorization level. In other words, when I use 24 processor using hypre, 
>>> the speed is almost the same as as the old sequential iterative solver 
>>> using 1 processor.
>>> 
>>> I use most of the default configuration for the general case with pretty 
>>> good speedup. And I am not sure if I miss something for this problem.
>>> 
>>> Thanks,
>>> 
>>> Danyang
>>> 
>>> On 17-05-24 11:12 AM, Matthew Knepley wrote:
>>>> On Wed, May 24, 2017 at 12:50 PM, Danyang Su <danyang...@gmail.com> wrote:
>>>> Hi Matthew and Barry,
>>>> 
>>>> Thanks for the quick response. 
>>>> I also tried superlu and mumps, both work but it is about four times 
>>>> slower than ILU(dt) prec through hypre, with 24 processors I have tested.
>>>> 
>>>> You mean the total time is 4x? And you are taking hundreds of iterates? 
>>>> That seems hard to believe, unless you are dropping
>>>> a huge number of elements. 
>>>> When I look into the convergence information, the method using ILU(dt) 
>>>> still takes 200 to 3000 linear iterations for each newton iteration. One 
>>>> reason is this equation is hard to solve. As for the general cases, the 
>>>> same method works awesome and get very good speedup.
>>>> 
>>>> I do not understand what you mean here. 
>>>> I also doubt if I use hypre correctly for this case. Is there anyway to 
>>>> check this problem, or is it possible to increase the factorization level 
>>>> through hypre?
>>>> 
>>>> I don't know.
>>>> 
>>>>   Matt 
>>>> Thanks,
>>>> 
>>>> Danyang
>>>> 
>>>> On 17-05-24 04:59 AM, Matthew Knepley wrote:
>>>>> On Wed, May 24, 2017 at 2:21 AM, Danyang Su <danyang...@gmail.com> wrote:
>>>>> Dear All,
>>>>> 
>>>>> I use PCFactorSetLevels for ILU and PCFactorSetFill for other 
>>>>> preconditioning in my code to help solve the problems that the default 
>>>>> option is hard to solve. However, I found the latter one, PCFactorSetFill 
>>>>> does not take effect for my problem. The matrices and rhs as well as the 
>>>>> solutions are attached from the link below. I obtain the solution using 
>>>>> hypre preconditioner and it takes 7 and 38 iterations for matrix 1 and 
>>>>> matrix 2. However, if I use other preconditioner, the solver just failed 
>>>>> at the first matrix. I have tested this matrix using the native 
>>>>> sequential solver (not PETSc) with ILU preconditioning. If I set the 
>>>>> incomplete factorization level to 0, this sequential solver will take 
>>>>> more than 100 iterations. If I increase the factorization level to 1 or 
>>>>> more, it just takes several iterations. This remind me that the PC factor 
>>>>> for this matrices should be increased. However, when I tried it in PETSc, 
>>>>> it just does not work.
>>>>> 
>>>>> Matrix and rhs can be obtained from the link below.
>>>>> 
>>>>> https://eilinator.eos.ubc.ca:8443/index.php/s/CalUcq9CMeblk4R
>>>>> 
>>>>> Would anyone help to check if you can make this work by increasing the PC 
>>>>> factor level or fill?
>>>>> 
>>>>> We have ILU(k) supported in serial. However ILU(dt) which takes a 
>>>>> tolerance only works through Hypre
>>>>> 
>>>>>   http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html
>>>>> 
>>>>> I recommend you try SuperLU or MUMPS, which can both be downloaded 
>>>>> automatically by configure, and
>>>>> do a full sparse LU.
>>>>> 
>>>>>   Thanks,
>>>>> 
>>>>>     Matt
>>>>>  
>>>>> Thanks and regards,
>>>>> 
>>>>> Danyang
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> What most experimenters take for granted before they begin their 
>>>>> experiments is infinitely more interesting than any results to which 
>>>>> their experiments lead.
>>>>> -- Norbert Wiener
>>>>> 
>>>>> http://www.caam.rice.edu/~mk51/
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> What most experimenters take for granted before they begin their 
>>>> experiments is infinitely more interesting than any results to which their 
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>> 
>>>> http://www.caam.rice.edu/~mk51/
>>> 
>>> 
>> 
> 
> <outscreen_p16.txt><outscreen_p48.txt><outscreen_p8.txt>

Reply via email to