On Fri, 2014-01-31 at 16:34 +0000, David Carter wrote: Hello David, > Hi Philippe, > > > > Upgraded to 3.9.0 as you suggested and ran with these options: > > > > -v -v -v -d -d -d --trace-sched=yes > --trace-syscalls=yes --trace-signals=yes --quiet --track-origins=yes > --free-fill=7a --child-silent-after-fork=yes --fair-sched=no > > > > After some time, a bunch of processes went into 'pipe_w' status. > These were single-threaded processes. Their logfiles (which were > enormous - hundreds of gigabytes!) all contained this line: > > > > --23014-- SCHED[3]: TRC: YIELD The above trace is strange: the SCHED[3] indicates that this is valgrind thread id 3 which is doing the trace. That seems to indicate that there was (at least at some point in time) more than one thread in the game. The YIELD scheduler trace is explained as: case VEX_TRC_JMP_YIELD: /* Explicit yield, because this thread is in a spin-lock or something. Only let the thread run for a short while longer. Because swapping to another thread is expensive, we're prepared to let this thread eat a little more CPU before swapping to another. That means that short term spins waiting for hardware to poke memory won't cause a thread swap. */ if (dispatch_ctr > 1000) dispatch_ctr = 1000; break;
> > > > Each of the processes showed only one thread: I have already seen in the past that GDB is not always able to show the various threads (not clear when, I suspect this might happen with a static thread lib ?). To double check the nr of threads, you can do one of the following: from the shell: vgdb -l and then for each reported PIDNR: vgdb --pid=<PIDNR> v.info scheduler That will show the state of the valgrind scheduler, and the list of threads known by valgrind, and ask valgrind to produce a stack trace (guest stack trace) for each thread. Or alternatively: ls /proc/<PIDNR>/task will show the list of threads nr at linux level. > #3 0x00000000380daa98 in vgModuleLocal_sema_down > (sema=0x802001830, as_LL=0 '\000') at m_scheduler/sema.c:109 > > #4 0x0000000038083687 in vgPlain_acquire_BigLock_LL > (tid=1, who=0x80956dde0 "") at m_scheduler/scheduler.c:355 > > #5 vgPlain_acquire_BigLock (tid=1, who=0x80956dde0 > "") at m_scheduler/scheduler.c:277 The above indicates that the thread is trying to acquire the valgrind "big lock". When using --fair-sched=no, the big lock is implemented using a pipe. Writing a character on the pipe releases the lock. Reading the character on the pipe is the lock acquisition. If the process is blocked on reading on the pipe, then it looks like the "lock" character was not written back ? Maybe strace -f valgrind and see what is happening with the lock character. The lock character loops over A .. Z and then back to A. Check that the lock is properly released some short time before the trial to re-acquire it. > > > strace showed the same as before (i.e. read on a high-numbered > filehandle, around 1026 or 1027). Someone has suggested that this > would indicate that valgrind is calling dup2 to create new > filehandles. Evidence from lsof also bears this out, showing only 77 > open files for each process. The fd's not relevant to our application > are: In the below, I guess that 1028 and 1029 is the pipe used for the valgrind lock. > > > > COMMAND PID USER FD TYPE DEVICE SIZE > NODE NAME > > memcheck- 2071 nbezj7v 5r FIFO 0,6 > 297571407 pipe > > memcheck- 2071 nbezj7v 7u sock 0,5 > 297780139 can't identify protocol > > memcheck- 2071 nbezj7v 8w FIFO 0,6 > 297571410 pipe > > memcheck- 2071 nbezj7v 9r CHR 1,3 > 3908 /dev/null > > memcheck- 2071 nbezj7v 10r DIR 253,0 4096 > 2 / > > memcheck- 2071 nbezj7v 1025u REG 253,0 637 > 1114475 /tmp/valgrind_proc_2071_cmdline_ad8659c2 (deleted) > > memcheck- 2071 nbezj7v 1026u REG 253,0 256 > 1114491 /tmp/valgrind_proc_2071_auxv_ad8659c2 (deleted) > > memcheck- 2071 nbezj7v 1028r FIFO 0,6 > 297571563 pipe > > memcheck- 2071 nbezj7v 1029w FIFO 0,6 > 297571563 pipe > > memcheck- 2071 nbezj7v 1030r FIFO 253,0 > 1114706 /tmp/vgdb-pipe-from-vgdb-to-2071-by-USERNAME-on-??? > > > > I tried vgdb, but not a lot of luck. After invoking 'valgrind > --vgdb=yes --vgdb-error=0 /path/to/my/exe', I then got this in another > terminal: > > > > $ gdb /path/to/my/exe > "/path/to/my/exe": not in executable format: File > truncated > > (gdb) target remote > | /apps1/pkgs/valgrind-3.9.0/bin/vgdb --pid=30352 > > Remote debugging using > | /apps1/pkgs/valgrind-3.9.0/bin/vgdb --pid=30352 > > relaying data between gdb and process 30352 > > Remote register badly formatted: > T0506:0000000000000000;07:30f0fffe0f000000;10:700aa05d38000000;thread:7690; > > here: > 00000000;07:30f0fffe0f000000;10:700aa05d38000000;thread:7690; > > Try to load the executable by `file' first, > > you may also check `set/show architecture'. > > > > This also caused the vgdb server to hang up. I tried with the 'file' > command made no difference. The "not in executable format" is totally > expected - we run a optimised lightweight "test shell" process which > loads a bunch of heavy debug so's. The "not in executable format" is not expected by me :). What kind of executable is that ? I thought that gdb should be able to "understand" the executables that are launchable on e.g. red hat 5 (if I guessed the distro properly). In the shell, what is 'file /path/to/my/exe' telling ? Maybe you could download and compile the last gdb version (7.6 or 7.7 if it has just been produced) and see if gdb is now more intelligent ? > > > > What is the next stage, can I try different options? Or perhaps > instrument/change the source code in some way in order to figure out > what is happening? As detailed above, you could: * confirm the list of threads in your process (ls /proc/..., vgdb v.info scheduler) * if v.info scheduler works, you might guess what your application is doing from the stack trace(s) * maybe you could debug using a newer gdb * strace -f valgrind .... /path/to/my/exe might also give some lights on what happens with the valgrind big lock. Philippe ------------------------------------------------------------------------------ WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users