Could even write the "normal" logview information before gathering the data to ensure no early crash.
> On Feb 27, 2022, at 4:36 PM, Jed Brown <[email protected]> wrote: > > That's sounds okay, just need to be able to guarantee that no system errors > can prevent us from finishing writing the -log_view. > > Barry Smith <[email protected]> writes: > >> This would be after the user code is complete, PETSc memory has all been >> freed and we can put a signal catch around the code to prevent such crashes. >> >>> On Feb 27, 2022, at 4:24 PM, Jed Brown <[email protected]> wrote: >>> >>> I assume this would be running VecWAXPY on CPU (and GPU) with some empty >>> ranks? I'd be mildly concerned about allocating GPU memory because a crash >>> here would be really bad. >>> >>> Barry Smith <[email protected]> writes: >>> >>>> At PetscLogView() the code could see how long the run was, if it was >>>> greater than n seconds it could automatically run a few levels of streams >>>> (taking presumably well less than a few seconds) and adjust suitable the >>>> output. If the user runs, for example, 10min they surely don't mind .5 >>>> seconds to get more useful information. >>>> >>>> >>>> >>>>> On Feb 27, 2022, at 3:41 PM, Jed Brown <[email protected]> wrote: >>>>> >>>>> Probably not implied by -log_view alone, but -streams_view or some such >>>>> doing it automatically would save having to context switch elsewhere to >>>>> obtain that data. >>>>> >>>>> Barry Smith <[email protected]> writes: >>>>> >>>>>> We should think about have -log_view automatically running streams on >>>>>> subsets of ranks and using the resulting information to provide guidance >>>>>> to users on interpretating the -log_view output instead of expecting >>>>>> users to run streams themselves on their system and then figuring out >>>>>> what to do. >>>>>> >>>>>>> On Feb 27, 2022, at 9:50 AM, Gong Yujie <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I'm using the GMRES with ASM preconditioner with sub-domain solver >>>>>>> ILU(2) to solve an elasticity problem. First, I use 16 cores to test >>>>>>> the computation time, then use 32 cores to run the same code with the >>>>>>> same parameters. But I just get about 10% speed up. From the log file >>>>>>> I found that the computation time of KSPSolve() and MatSolve() just >>>>>>> decrease a little bit. My PETSc version is 3.16.0 and use >>>>>>> --with-debugging=0 when configure it. The matrix size is about 7*10^6. >>>>>>> Some detail of the log is shown below: >>>>>>> >>>>>>> 16-cores: >>>>>>> ------------------------------------------------------------------------------------------------------------------------ >>>>>>> Event Count Time (sec) Flop >>>>>>> --- Global --- --- Stage ---- Total >>>>>>> Max Ratio Max Ratio Max Ratio Mess AvgLen >>>>>>> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>>>>>> ------------------------------------------------------------------------------------------------------------------------ >>>>>>> MatMult 664 1.0 5.0794e+01 1.6 2.70e+10 1.1 7.1e+04 >>>>>>> 4.8e+04 1.0e+00 7 13 49 20 0 7 13 49 20 0 8010 >>>>>>> MatSolve 663 1.0 1.9868e+02 1.1 1.43e+11 1.1 0.0e+00 >>>>>>> 0.0e+00 0.0e+00 33 70 0 0 0 33 70 0 0 0 10932 >>>>>>> MatLUFactorNum 1 1.0 6.1501e+00 1.1 1.40e+10 1.1 0.0e+00 >>>>>>> 0.0e+00 0.0e+00 1 7 0 0 0 1 7 0 0 0 35056 >>>>>>> MatILUFactorSym 1 1.0 1.5566e+01 1.2 0.00e+00 0.0 0.0e+00 >>>>>>> 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>>>>> KSPSetUp 2 1.0 5.9627e-03 1.9 0.00e+00 0.0 0.0e+00 >>>>>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>> KSPSolve 1 1.0 2.5168e+02 1.0 1.90e+11 1.1 1.4e+05 >>>>>>> 4.8e+04 1.3e+03 44 93 98 40 89 44 93 98 40 90 11437 >>>>>>> KSPGMRESOrthog 641 1.0 1.8980e+01 1.7 1.82e+10 1.1 0.0e+00 >>>>>>> 0.0e+00 6.4e+02 3 9 0 0 43 3 9 0 0 44 14578 >>>>>>> PCSetUp 2 1.0 2.2480e+01 1.1 1.40e+10 1.1 5.3e+02 >>>>>>> 6.5e+05 7.0e+00 4 7 0 2 0 4 7 0 2 0 9591 >>>>>>> PCSetUpOnBlocks 1 1.0 2.1555e+01 1.1 1.40e+10 1.1 0.0e+00 >>>>>>> 0.0e+00 0.0e+00 3 7 0 0 0 3 7 0 0 0 10002 >>>>>>> PCApply 663 1.0 2.0296e+02 1.1 1.43e+11 1.1 7.0e+04 >>>>>>> 4.8e+04 1.0e+00 33 70 49 20 0 33 70 49 20 0 10701 >>>>>>> PCApplyOnBlocks 663 1.0 1.9908e+02 1.1 1.43e+11 1.1 0.0e+00 >>>>>>> 0.0e+00 0.0e+00 33 70 0 0 0 33 70 0 0 0 10910 >>>>>>> >>>>>>> 32-cores: >>>>>>> ------------------------------------------------------------------------------------------------------------------------ >>>>>>> Event Count Time (sec) Flop >>>>>>> --- Global --- --- Stage ---- Total >>>>>>> Max Ratio Max Ratio Max Ratio Mess AvgLen >>>>>>> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>>>>>> ------------------------------------------------------------------------------------------------------------------------ >>>>>>> MatMult 671 1.0 4.7602e+01 2.0 1.39e+10 1.1 1.7e+05 >>>>>>> 2.8e+04 1.0e+00 7 13 49 23 0 7 13 49 23 0 8637 >>>>>>> MatSolve 670 1.0 1.7800e+02 1.1 7.56e+10 1.1 0.0e+00 >>>>>>> 0.0e+00 0.0e+00 33 71 0 0 0 33 71 0 0 0 12544 >>>>>>> MatLUFactorNum 1 1.0 3.5714e+00 1.1 7.16e+09 1.1 0.0e+00 >>>>>>> 0.0e+00 0.0e+00 1 7 0 0 0 1 7 0 0 0 60743 >>>>>>> MatILUFactorSym 1 1.0 8.4088e+00 1.2 0.00e+00 0.0 0.0e+00 >>>>>>> 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>>>>> KSPSetUp 2 1.0 3.8060e-03 1.5 0.00e+00 0.0 0.0e+00 >>>>>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>> KSPSolve 1 1.0 2.1680e+02 1.0 9.95e+10 1.1 3.5e+05 >>>>>>> 2.8e+04 1.3e+03 44 93 98 47 89 44 93 98 47 90 13592 >>>>>>> KSPGMRESOrthog 648 1.0 1.6999e+01 2.0 9.39e+09 1.1 0.0e+00 >>>>>>> 0.0e+00 6.5e+02 2 9 0 0 43 2 9 0 0 44 16450 >>>>>>> PCSetUp 2 1.0 1.2439e+01 1.1 7.16e+09 1.1 1.3e+03 >>>>>>> 3.7e+05 7.0e+00 2 7 0 2 0 2 7 0 2 0 17440 >>>>>>> PCSetUpOnBlocks 1 1.0 1.1876e+01 1.1 7.16e+09 1.1 0.0e+00 >>>>>>> 0.0e+00 0.0e+00 2 7 0 0 0 2 7 0 0 0 18267 >>>>>>> PCApply 670 1.0 1.8235e+02 1.1 7.56e+10 1.1 1.7e+05 >>>>>>> 2.7e+04 1.0e+00 34 71 49 23 0 34 71 49 23 0 12245 >>>>>>> PCApplyOnBlocks 670 1.0 1.7838e+02 1.1 7.56e+10 1.1 0.0e+00 >>>>>>> 0.0e+00 0.0e+00 33 71 0 0 0 33 71 0 0 0 12517 >>>>>>> >>>>>>> Hope you can help me! >>>>>>> >>>>>>> Best Regards, >>>>>>> Yujie
