For one - I think using '--log-file=valgrind-%q{HOSTNAME}-%p.log' might help [to keep the logs from each process separate]
And I think the TMPDIR recommendation is to have a different value for each of the nodes [where the "pid" clash comes from] and perhaps "TMPDIR=/tmp" might work - as this would be local disk on each node [vs /var/tmp/ - which is probably a shared TMP across nodes] But then - PBS or this MPI requires a shared TMP? Satish On Tue, 15 Dec 2020, Yaqi Wang wrote: > Fande, > > Did you try set TMPDIR for valgrind? > > Sent from my iPhone > > > On Dec 15, 2020, at 1:23 AM, Barry Smith <bsm...@petsc.dev> wrote: > > > > > > No idea. Perhaps petscmpiexec could be modified so it only ran valgrind > > on the first 10 ranks? Not clear how to do that. Or valgrind should get a > > MR that removes this small arbitrary limitation on the number of processes. > > 576 is so 2000 :-) > > > > > > Barry > > > > > >> On Dec 14, 2020, at 11:59 PM, Fande Kong <fdkong...@gmail.com> wrote: > >> > >> Hi All, > >> > >> I tried to use valgrind to check if the simulation is valgrind clean > >> because I saw some random communication fails during the simulation. > >> > >> I tried this command-line > >> > >> petscmpiexec -valgrind -n 576 ../../../moose-app-oprof -i input.i > >> -log_view -snes_view > >> > >> > >> But I got the following error messages: > >> > >> valgrind: Unable to start up properly. Giving up. > >> ==75586== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_8c3fabf2 > >> ==75586== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_8cac2243 > >> ==75586== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_da8d30c0 > >> ==75586== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_877871f9 > >> ==75586== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_c098953e > >> ==75586== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_aa649f9f > >> ==75586== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_097498ec > >> ==75586== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_bfc534b5 > >> ==75586== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_7604c74a > >> ==75586== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_a1fd96bb > >> ==75586== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_4c8857d8 > >> valgrind: Startup or configuration error: > >> valgrind: Can't create client cmdline file in > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_4c8857d8 > >> valgrind: Unable to start up properly. Giving up. > >> ==75596== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75596_cmdline_bc5492bb > >> ==75596== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75596_cmdline_ec59a3d8 > >> valgrind: Startup or configuration error: > >> valgrind: Can't create client cmdline file in > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75596_cmdline_ec59a3d8 > >> valgrind: Unable to start up properly. Giving up. > >> ==75597== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_b036bdf2 > >> ==75597== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_105acc43 > >> ==75597== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_9fb792c0 > >> ==75597== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_30602bf9 > >> ==75597== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_21eec73e > >> ==75597== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_0b53e99f > >> ==75597== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_73e31aec > >> ==75597== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_486e8eb5 > >> ==75597== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_db8c194a > >> ==75597== VG_(mkstemp): failed to create temp file: > >> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_839780bb > >> > >> > >> I did a bit search online, and found something related > >> https://stackoverflow.com/questions/13707211/what-causes-mkstemp-to-fail-when-running-many-simultaneous-valgrind-processes > >> > >> But do not know what is the right way to fix the issue. > >> > >> Thanks so much, > >> > >> Fande, > >> > > >