Could even write the "normal" logview information before gathering the data 
to ensure no early crash.


> On Feb 27, 2022, at 4:36 PM, Jed Brown <[email protected]> wrote:
> 
> That's sounds okay, just need to be able to guarantee that no system errors 
> can prevent us from finishing writing the -log_view.
> 
> Barry Smith <[email protected]> writes:
> 
>>  This would be after the user code is complete, PETSc memory has all been 
>> freed and we can put a signal catch around the code to prevent such crashes.
>> 
>>> On Feb 27, 2022, at 4:24 PM, Jed Brown <[email protected]> wrote:
>>> 
>>> I assume this would be running VecWAXPY on CPU (and GPU) with some empty 
>>> ranks? I'd be mildly concerned about allocating GPU memory because a crash 
>>> here would be really bad.
>>> 
>>> Barry Smith <[email protected]> writes:
>>> 
>>>>  At PetscLogView() the code could see how long the run was, if it was 
>>>> greater than n seconds it could automatically run a few levels of streams 
>>>> (taking presumably well less than a few seconds) and adjust suitable the 
>>>> output. If the user runs, for example, 10min they surely don't mind .5 
>>>> seconds to get more useful information.
>>>> 
>>>> 
>>>> 
>>>>> On Feb 27, 2022, at 3:41 PM, Jed Brown <[email protected]> wrote:
>>>>> 
>>>>> Probably not implied by -log_view alone, but -streams_view or some such 
>>>>> doing it automatically would save having to context switch elsewhere to 
>>>>> obtain that data.
>>>>> 
>>>>> Barry Smith <[email protected]> writes:
>>>>> 
>>>>>> We should think about have -log_view automatically running streams on 
>>>>>> subsets of ranks and using the resulting information to provide guidance 
>>>>>> to users on interpretating the -log_view output instead of expecting 
>>>>>> users to run streams themselves on their system and then figuring out 
>>>>>> what to do.
>>>>>> 
>>>>>>> On Feb 27, 2022, at 9:50 AM, Gong Yujie <[email protected]> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I'm using the GMRES with ASM preconditioner with sub-domain solver 
>>>>>>> ILU(2) to solve an elasticity problem. First, I use 16 cores to test 
>>>>>>> the computation time, then use 32 cores to run the same code with the 
>>>>>>> same parameters.  But I just get about 10% speed up. From the log file 
>>>>>>> I found that the computation time of KSPSolve() and MatSolve() just 
>>>>>>> decrease a little bit. My PETSc version is 3.16.0 and use 
>>>>>>> --with-debugging=0 when configure it. The matrix size is about 7*10^6. 
>>>>>>> Some detail of the log is shown below:
>>>>>>> 
>>>>>>> 16-cores:
>>>>>>> ------------------------------------------------------------------------------------------------------------------------
>>>>>>> Event                Count      Time (sec)     Flop                     
>>>>>>>          --- Global ---  --- Stage ----  Total
>>>>>>>                 Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  
>>>>>>> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>>>>>> ------------------------------------------------------------------------------------------------------------------------
>>>>>>> MatMult              664 1.0 5.0794e+01 1.6 2.70e+10 1.1 7.1e+04 
>>>>>>> 4.8e+04 1.0e+00  7 13 49 20  0   7 13 49 20  0  8010
>>>>>>> MatSolve             663 1.0 1.9868e+02 1.1 1.43e+11 1.1 0.0e+00 
>>>>>>> 0.0e+00 0.0e+00 33 70  0  0  0  33 70  0  0  0 10932
>>>>>>> MatLUFactorNum         1 1.0 6.1501e+00 1.1 1.40e+10 1.1 0.0e+00 
>>>>>>> 0.0e+00 0.0e+00  1  7  0  0  0   1  7  0  0  0 35056
>>>>>>> MatILUFactorSym        1 1.0 1.5566e+01 1.2 0.00e+00 0.0 0.0e+00 
>>>>>>> 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>>>>>>> KSPSetUp               2 1.0 5.9627e-03 1.9 0.00e+00 0.0 0.0e+00 
>>>>>>> 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>>> KSPSolve               1 1.0 2.5168e+02 1.0 1.90e+11 1.1 1.4e+05 
>>>>>>> 4.8e+04 1.3e+03 44 93 98 40 89  44 93 98 40 90 11437
>>>>>>> KSPGMRESOrthog       641 1.0 1.8980e+01 1.7 1.82e+10 1.1 0.0e+00 
>>>>>>> 0.0e+00 6.4e+02  3  9  0  0 43   3  9  0  0 44 14578
>>>>>>> PCSetUp                2 1.0 2.2480e+01 1.1 1.40e+10 1.1 5.3e+02 
>>>>>>> 6.5e+05 7.0e+00  4  7  0  2  0   4  7  0  2  0  9591
>>>>>>> PCSetUpOnBlocks        1 1.0 2.1555e+01 1.1 1.40e+10 1.1 0.0e+00 
>>>>>>> 0.0e+00 0.0e+00  3  7  0  0  0   3  7  0  0  0 10002
>>>>>>> PCApply              663 1.0 2.0296e+02 1.1 1.43e+11 1.1 7.0e+04 
>>>>>>> 4.8e+04 1.0e+00 33 70 49 20  0  33 70 49 20  0 10701
>>>>>>> PCApplyOnBlocks      663 1.0 1.9908e+02 1.1 1.43e+11 1.1 0.0e+00 
>>>>>>> 0.0e+00 0.0e+00 33 70  0  0  0  33 70  0  0  0 10910
>>>>>>> 
>>>>>>> 32-cores:
>>>>>>> ------------------------------------------------------------------------------------------------------------------------
>>>>>>> Event                Count      Time (sec)     Flop                     
>>>>>>>          --- Global ---  --- Stage ----  Total
>>>>>>>                 Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  
>>>>>>> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>>>>>> ------------------------------------------------------------------------------------------------------------------------
>>>>>>> MatMult              671 1.0 4.7602e+01 2.0 1.39e+10 1.1 1.7e+05 
>>>>>>> 2.8e+04 1.0e+00  7 13 49 23  0   7 13 49 23  0  8637
>>>>>>> MatSolve             670 1.0 1.7800e+02 1.1 7.56e+10 1.1 0.0e+00 
>>>>>>> 0.0e+00 0.0e+00 33 71  0  0  0  33 71  0  0  0 12544
>>>>>>> MatLUFactorNum         1 1.0 3.5714e+00 1.1 7.16e+09 1.1 0.0e+00 
>>>>>>> 0.0e+00 0.0e+00  1  7  0  0  0   1  7  0  0  0 60743
>>>>>>> MatILUFactorSym        1 1.0 8.4088e+00 1.2 0.00e+00 0.0 0.0e+00 
>>>>>>> 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>>>>>>> KSPSetUp               2 1.0 3.8060e-03 1.5 0.00e+00 0.0 0.0e+00 
>>>>>>> 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>>> KSPSolve               1 1.0 2.1680e+02 1.0 9.95e+10 1.1 3.5e+05 
>>>>>>> 2.8e+04 1.3e+03 44 93 98 47 89  44 93 98 47 90 13592
>>>>>>> KSPGMRESOrthog       648 1.0 1.6999e+01 2.0 9.39e+09 1.1 0.0e+00 
>>>>>>> 0.0e+00 6.5e+02  2  9  0  0 43   2  9  0  0 44 16450
>>>>>>> PCSetUp                2 1.0 1.2439e+01 1.1 7.16e+09 1.1 1.3e+03 
>>>>>>> 3.7e+05 7.0e+00  2  7  0  2  0   2  7  0  2  0 17440
>>>>>>> PCSetUpOnBlocks        1 1.0 1.1876e+01 1.1 7.16e+09 1.1 0.0e+00 
>>>>>>> 0.0e+00 0.0e+00  2  7  0  0  0   2  7  0  0  0 18267
>>>>>>> PCApply              670 1.0 1.8235e+02 1.1 7.56e+10 1.1 1.7e+05 
>>>>>>> 2.7e+04 1.0e+00 34 71 49 23  0  34 71 49 23  0 12245
>>>>>>> PCApplyOnBlocks      670 1.0 1.7838e+02 1.1 7.56e+10 1.1 0.0e+00 
>>>>>>> 0.0e+00 0.0e+00 33 71  0  0  0  33 71  0  0  0 12517
>>>>>>> 
>>>>>>> Hope you can help me!
>>>>>>> 
>>>>>>> Best Regards,
>>>>>>> Yujie

Reply via email to