Thanks so much, Satish,
On Tue, Dec 15, 2020 at 9:33 AM Satish Balay via petsc-users < petsc-users@mcs.anl.gov> wrote: > For one - I think using '--log-file=valgrind-%q{HOSTNAME}-%p.log' might > help [to keep the logs from each process separate] > > And I think the TMPDIR recommendation is to have a different value for > each of the nodes [where the "pid" clash comes from] and perhaps > "TMPDIR=/tmp" might work "TMPDIR=/tmp" worked out. Fande > - as this would be local disk on each node [vs /var/tmp/ - which is > probably a shared TMP across nodes] > > But then - PBS or this MPI requires a shared TMP? > > Satish > > On Tue, 15 Dec 2020, Yaqi Wang wrote: > > > Fande, > > > > Did you try set TMPDIR for valgrind? > > > > Sent from my iPhone > > > > > On Dec 15, 2020, at 1:23 AM, Barry Smith <bsm...@petsc.dev> wrote: > > > > > > > > > No idea. Perhaps petscmpiexec could be modified so it only ran > valgrind on the first 10 ranks? Not clear how to do that. Or valgrind > should get a MR that removes this small arbitrary limitation on the number > of processes. 576 is so 2000 :-) > > > > > > > > > Barry > > > > > > > > >> On Dec 14, 2020, at 11:59 PM, Fande Kong <fdkong...@gmail.com> wrote: > > >> > > >> Hi All, > > >> > > >> I tried to use valgrind to check if the simulation is valgrind clean > because I saw some random communication fails during the simulation. > > >> > > >> I tried this command-line > > >> > > >> petscmpiexec -valgrind -n 576 ../../../moose-app-oprof -i input.i > -log_view -snes_view > > >> > > >> > > >> But I got the following error messages: > > >> > > >> valgrind: Unable to start up properly. Giving up. > > >> ==75586== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_8c3fabf2 > > >> ==75586== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_8cac2243 > > >> ==75586== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_da8d30c0 > > >> ==75586== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_877871f9 > > >> ==75586== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_c098953e > > >> ==75586== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_aa649f9f > > >> ==75586== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_097498ec > > >> ==75586== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_bfc534b5 > > >> ==75586== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_7604c74a > > >> ==75586== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_a1fd96bb > > >> ==75586== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_4c8857d8 > > >> valgrind: Startup or configuration error: > > >> valgrind: Can't create client cmdline file in > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_4c8857d8 > > >> valgrind: Unable to start up properly. Giving up. > > >> ==75596== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75596_cmdline_bc5492bb > > >> ==75596== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75596_cmdline_ec59a3d8 > > >> valgrind: Startup or configuration error: > > >> valgrind: Can't create client cmdline file in > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75596_cmdline_ec59a3d8 > > >> valgrind: Unable to start up properly. Giving up. > > >> ==75597== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_b036bdf2 > > >> ==75597== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_105acc43 > > >> ==75597== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_9fb792c0 > > >> ==75597== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_30602bf9 > > >> ==75597== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_21eec73e > > >> ==75597== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_0b53e99f > > >> ==75597== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_73e31aec > > >> ==75597== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_486e8eb5 > > >> ==75597== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_db8c194a > > >> ==75597== VG_(mkstemp): failed to create temp file: > /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_839780bb > > >> > > >> > > >> I did a bit search online, and found something related > https://stackoverflow.com/questions/13707211/what-causes-mkstemp-to-fail-when-running-many-simultaneous-valgrind-processes > > >> > > >> But do not know what is the right way to fix the issue. > > >> > > >> Thanks so much, > > >> > > >> Fande, > > >> > > > > > > >