I'm using Valgrind 3.8.1 to run fuzzing tests on control software for an embedded system. The tests are run on
Linux pod 3.11.4-201.fc19.x86_64 #1 SMP Thu Oct 10 14:11:18 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Valgrind itself is the standard Fedora 19 build with the following arguments: valgrind -v --leak-check=full --leak-resolution=high --show-possibly-lost=yes --track-fds=yes --num-callers=24 --vgdb=no Normally all this works fine, but last night I did an overnight run on a new dataset and after about eight hours valgrind started looping. I wasn't sure what to do about that... there were no diagnostics either from my software or from valgrind. When I attached to the program with strace I got this endless series: clock_gettime(CLOCK_MONOTONIC, {124051, 458677331}) = 0 rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0 rt_sigtimedwait([HUP INT ILL BUS FPE KILL SEGV TERM STOP WINCH SYS RTMIN RT_1], 0x4028a5e30, {0, 0}, 8) = -1 EAGAIN (Resource temporarily unavailable) rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0 gettid() = 30464 gettid() = 30464 write(1029, "A", 1) = 1 gettid() = 30464 read(1028, "A", 1) = 1 gettid() = 30464 clock_gettime(CLOCK_MONOTONIC, {124051, 469072047}) = 0 rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0 rt_sigtimedwait([HUP INT ILL BUS FPE KILL SEGV TERM STOP WINCH SYS RTMIN RT_1], 0x4028a5e30, {0, 0}, 8) = -1 EAGAIN (Resource temporarily unavailable) rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0 gettid() = 30464 gettid() = 30464 write(1029, "B", 1) = 1 gettid() = 30464 read(1028, "B", 1) = 1 gettid() It loops through the English alphabet, one letter at a time, uppercase A to uppercase Z, and then repeats. TID 30464 is the PID of the valgrind process. File descriptors 1028 and 1029 appear to be the opposite ends of a pipe, but I have no idea why the alphabet would be cycling through it. The sigtimedwait() call is polling for one of those signals and not getting it. We have no such calls explicitly in our code, and no RT calls at all, which leads me to think this is valgrind's internal code, but of course it could still be measuring something within the main program. I ran gstack on it but got only: gstack 30464 #0 0x0000000038033cb0 in vgMemCheck_helperc_LOADV64le () #1 0x0000000403290318 in ?? () #2 0x00000004028a5f00 in ?? () #3 0x0000000000000000 in ?? () which doesn't tell me a whole lot, and I don't find vgMemCheck_helperc anywhere in the code, so I guess it's something synthesized at compile time. My test is scripted so I can re-run it easily (given time), but could use some pointers on how to approach this and get more information. I imagine that rebuilding valgrind with -g might help with gstack, but is there a way I can signal or stop Valgrind and have it display a stack trace? It looks like SIGINT is blocked, and sending TERM merely stops the process and I get nothing. Thanks for any pointers! Tim ------------------------------------------------------------------------------ Android is increasing in popularity, but the open development platform that developers love is also attractive to malware creators. Download this white paper to learn more about secure code signing practices that can help keep Android apps secure. http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users