Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output
> Normally, if it is the OOM that kills a process, you should find a trace of > this in the system logs. I looked in every system log I could find, there was no indication of OOM killing it in any system log. > I do not understand what you mean by reducing the nr of callers from 12 to 6. > What are these callers ? Is that some threads of the process you are running > under valgrind ? > I mean the --num-callers option core option to valgrind. By default this is 12, and I didn't specify it. I tried using --num-callers=6 to reduce memory consumption. From the valgrind manual this means " Specifies the maximum number of entries shown in stack traces that identify program locations.". By reducing it to 6 I was hoping to reduce valgrind memory consumption in case it really was OOM killer, which I really doubt now. > And just in case: are you using the last version of Valgrind ? Yes I used the last version of valgrind and many earlier versions. > You might use "strace" on valgrind to see what is going on at the time > _exit(0) is called. I did use 'strace' and dmesg. Neither indicated it was OOM killer. I did happen to save the strace log when the SIGKILL happened. Here is the part around the _exit(0): read(2040, "R", 1) = 1 gettid()= 3332 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP SYS], NULL, 8) = 0 rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP SYS], 8) = 0 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP SYS], NULL, 8) = 0 gettid()= 3332 write(2041, "S", 1) = 1 exit(0) = ? +++ killed by SIGKILL +++ Don't understand why strace log has exit(0) without the underscore, I know for a fact that it was with the underscore. The strace log doesn't indicate anything special happening around the _exit(0). When I removed it the SIGKILL went away. > You might also start valgrind with some debug trace e.g. -d -d -d -d -v -v > -v -v Was not aware of this and didn't try it. Don't have time to try it now. Regards, Rob ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output
> When running memcheck on a massive monolith embedded executable > (237MB stripped, 1.8GiB unstripped), after I stop the executable under > valgrind I see the "HEAP SUMMARY" but then valgrind dies before any leak > reports are printed. The parent process sees that the return status of > memcheck is that it was SIGKILLed (status returned in waitpid call is '9'). We found that removing a call to _exit(0) made it so that valgrind is no longer SIGKILLED. Any ideas why using _exit(0) may get rid of valgrind getting SIGKILLed? Previously exit(0) was called, without the leading underscore, but changed it to _exit(0) to really make sure no memory was being deallocated. This worked well on a different process, so we carried it over to this one, that is why we did it. Even with exit(0) (no underscore), in this process there is not much deallocation going on in exit handlers, so have lots of doubts that valgrind/memcheck was using too much memory and invoking the OOM killer. Using strace and dmesg while we had _exit(0) in use didn't show that OOM killer was SIGKILLing valgrind. I also tried reducing number of callers from 12 to 6 when using _exit(0), still got the SIGKILL. Also tried using a system that had an additional 4GByte of memory, and also got the SIGKILL there. So I have many doubts that Valgrind was getting SIGKILLed due to too much memory usage. Don't know why removing _exit(0) got rid of the SIGKILL. Was wondering if anyone had any ideas? ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
[Valgrind-users] massif doesn't detect 'definite' leaks?
I tried 'massif' on a simple program shown below where there are "definitely lost" leaks. massif doesn't seem to find "definitely lost" leaks, is this correct? I'm tried with both 3.19.0 and 3.15.0 versions of valgrind/massif, same result, "definitely lost" leaks are not found. I launch massif via: valgrind --tool=massif --sigill-diagnostics=no --error-limit=no --massif-out-file=definitely.%p.massif definitely.elf When I use memcheck it does find these definite leaks as below: ==29917== 60 bytes in 3 blocks are definitely lost in loss record 1 of 1 ==29917==at 0x402F67C: malloc (vg_replace_malloc.c:381) ==29917==by 0x80491D1: f2() (definitely.cpp:11) ==29917==by 0x804920F: f1() (definitely.cpp:17) ==29917==by 0x8049262: main (definitely.cpp:25) But massif doesn't find them at all? Is this correct? When I use massif on a program with "still reachable" it does find the still reachable, but it isn't finding definite leaks. Shouldn't massif also find definite leaks? The C code for "definitely.elf" is below: #include #include #include void* f2() { return malloc(20); } void f1() { f2(); } int main() { for (int i = 1; i <= 3; i++) { f1(); } return 0; } Thanks, Rob ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output
> > If you want to know for sure who killed it then strace it while it > > runs and it should show you who sends the signel but my bet is that > > it's the kernel. > I tried strace -p on my process before I triggered its exit. The strace output ends saying with: "+++ killed by SIGKILL +++", but I don't find anything about who sent it. > Or possibly watch `dmesg -w` running in another shell. > I tried 'dmesg -w' but it didn't say anything about the SIGKILL. Is there something that has to be configured for dmesg to say the source of the SIGKILL? ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output
Thanks Tom. Do you think I'd have better luck using the "massif" tool? Would "massif" be able to avoid the OOM killer? Or is there a way to reduce the amount of memory that memcheck will use? -Original Message- From: Tom Hughes Sent: Friday, August 5, 2022 10:08 AM To: Bresalier, Rob (Nokia - US/Murray Hill) ; valgrind-users@lists.sourceforge.net Subject: Re: memcheck is getting SIGKILLed before leak report is output On 05/08/2022 14:09, Bresalier, Rob (Nokia - US/Murray Hill) wrote: > When running memcheck on a massive monolith embedded executable (237MB > stripped, 1.8GiB unstripped), after I stop the executable under > valgrind I see the “HEAP SUMMARY” but then valgrind dies before any > leak reports are printed. The parent process sees that the return > status of memcheck is that it was SIGKILLed (status returned in > waitpid call is ‘9’). I am 99.9% sure that the parent process is not the one > sending the SIGKILL. > Is it possible that valgrind SIGKILLs itself? Is there a reason that > the linux kernel (Wind River Linux) could be sending a SIGKILL to > valgrind/memcheck? I do not see any messages about Out of Memory/OOM > killer killing valgrind. Previous experience with this executable is > that there are almost 3 million leak reports (most of them are “still > reachable”), could that be occupying too much memory. Any ideas/advice > to figure out what is going on? Almost certainly the kernel OOM kiied it. If you want to know for sure who killed it then strace it while it runs and it should show you who sends the signel but my bet is that it's the kernel. > One thing I see in the logs is about “unhandled ioctl 0xa5 with no > size/direction hints”. Could this be a trigger for this crash/sigkill? Not really, no. Tom -- Tom Hughes (t...@compton.nu) http://compton.nu/ ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output
> If finding memory leaks is the only goal (for instance, if you are satisfied > that > memcheck has found all the overrun blocks, uninitialized reads, etc.) then > https://github.com/KDE/heaptrack is the best tool. Thanks! I didn't know about heaptrack. I will look definitely into that. Does heaptrack also show the 'still reachable' types of leaks that memcheck does? Any chance that the 'massif' tool would survive the OOM killer? This may be easier for me to get going as I already have valgrind built. Is there anything that can be done with memcheck to make it consume less memory? ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
[Valgrind-users] memcheck is getting SIGKILLed before leak report is output
When running memcheck on a massive monolith embedded executable (237MB stripped, 1.8GiB unstripped), after I stop the executable under valgrind I see the "HEAP SUMMARY" but then valgrind dies before any leak reports are printed. The parent process sees that the return status of memcheck is that it was SIGKILLed (status returned in waitpid call is '9'). I am 99.9% sure that the parent process is not the one sending the SIGKILL. Is it possible that valgrind SIGKILLs itself? Is there a reason that the linux kernel (Wind River Linux) could be sending a SIGKILL to valgrind/memcheck? I do not see any messages about Out of Memory/OOM killer killing valgrind. Previous experience with this executable is that there are almost 3 million leak reports (most of them are "still reachable"), could that be occupying too much memory. Any ideas/advice to figure out what is going on? We don't seem to get the sigkill if valgrind/memcheck is stopped earlier in the life of this executable. But to find the leak I need it to run past that point. I've tried many different versions of valgrind that have worked to find leaks on this executable in the past (3.16.1, 3.18.1, 3.19.0) but they all have this same issue of being sigkilled before any leaks get printed. One thing I see in the logs is about "unhandled ioctl 0xa5 with no size/direction hints". Could this be a trigger for this crash/sigkill? Would appreciate any ideas/advice. Thanks, Rob ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
[Valgrind-users] Possible to suppress low block count loss records?
We are trying to track down a suspected place in our code that keeps accumulating memory in a 'still reachable'. When I turn on still reachable and run my process for a few hours and then stop the process to get the valgrind reports there are over 2.7 million loss records which are mostly still reachables. It would take forever for valgrind to print this out. The large majority of "still reachable" that I want to ignore allocate just a few blocks. I would like to suppress these and only output "still reachables" that allocated 100 blocks or more. The suppression mechanism seems to only be to suppress particular backtraces. But I would like to suppress based on number of blocks instead, suppress loss records with a small number of blocks. Is this possible to suppress based on block count without patching valgrind? If not possible without patching valgrind, any hints on where I could patch valgrind to accomplish this? Thanks, Rob ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users