Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-31 Thread Bresalier, Rob (Nokia - US/Murray Hill)
> Normally, if it is the OOM that kills a process, you should find a trace of 
> this in the system logs.

I looked in every system log I could find, there was no indication of OOM 
killing it in any system log.

> I do not understand what you mean by reducing the nr of callers from 12 to 6.
> What are these callers ? Is that some threads of the process you are running
> under valgrind ?
> 

I mean the --num-callers option core option to valgrind. By default this is 12, 
and I didn't specify it. I tried using --num-callers=6 to reduce memory 
consumption. From the valgrind manual this means " Specifies the maximum number 
of entries shown in stack traces that identify program locations.". By reducing 
it to 6 I was hoping to reduce valgrind memory consumption in case it really 
was OOM killer, which I really doubt now.

> And just in case: are you using the last version of Valgrind ?

Yes I used the last version of valgrind and many earlier versions.

> You might use "strace" on valgrind to see what is going on at the time
> _exit(0) is called.

I did use 'strace' and dmesg. Neither indicated it was OOM killer.

I did happen to save the strace log when the SIGKILL happened. Here is the part 
around the _exit(0):

read(2040, "R", 1)  = 1
gettid()= 3332
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP SYS], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP SYS], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP SYS], NULL, 8) = 0
gettid()= 3332
write(2041, "S", 1) = 1
exit(0) = ?
+++ killed by SIGKILL +++

Don't understand why strace log has exit(0) without the underscore, I know for 
a fact that it was with the underscore.

The strace log doesn't indicate anything special happening around the _exit(0). 
When I removed it the SIGKILL went away.

> You might also start valgrind with some debug trace e.g.  -d -d -d -d -v -v 
> -v -v

Was not aware of this and didn't try it. Don't have time to try it now.

Regards,
Rob

___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-31 Thread Bresalier, Rob (Nokia - US/Murray Hill)
> When running memcheck on a massive monolith embedded executable
> (237MB stripped, 1.8GiB unstripped), after I stop the executable under
> valgrind I see the "HEAP SUMMARY" but then valgrind dies before any leak
> reports are printed. The parent process sees that the return status of
> memcheck is that it was SIGKILLed (status returned in waitpid call is '9').

We found that removing a call to _exit(0) made it so that valgrind is no longer 
SIGKILLED.

Any ideas why using _exit(0) may get rid of valgrind getting SIGKILLed?

Previously exit(0) was called, without the leading underscore, but changed it 
to _exit(0) to really make sure no memory was being deallocated. This worked 
well on a different process, so we carried it over to this one, that is why we 
did it.

Even with exit(0) (no underscore), in this process there is not much 
deallocation going on in exit handlers, so have lots of doubts that 
valgrind/memcheck was using too much memory and invoking the OOM killer.

Using strace and dmesg while we had _exit(0) in use didn't show that OOM killer 
was SIGKILLing valgrind.

I also tried reducing number of callers from 12 to 6 when using _exit(0), still 
got the SIGKILL.

Also tried using a system that had an additional 4GByte of memory, and also got 
the SIGKILL there.

So I have many doubts that Valgrind was getting SIGKILLed due to too much 
memory usage.

Don't know why removing _exit(0) got rid of the SIGKILL. Was wondering if 
anyone had any ideas?


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


[Valgrind-users] massif doesn't detect 'definite' leaks?

2022-08-05 Thread Bresalier, Rob (Nokia - US/Murray Hill)
I tried 'massif' on a simple program shown below where there are "definitely 
lost" leaks.

massif doesn't seem to find "definitely lost" leaks, is this correct?

I'm tried with both 3.19.0 and 3.15.0 versions of valgrind/massif, same result, 
"definitely lost" leaks are not found.

I launch massif via:

valgrind --tool=massif --sigill-diagnostics=no --error-limit=no 
--massif-out-file=definitely.%p.massif definitely.elf

When I use memcheck it does find these definite leaks as below:

==29917== 60 bytes in 3 blocks are definitely lost in loss record 1 of 1
==29917==at 0x402F67C: malloc (vg_replace_malloc.c:381)
==29917==by 0x80491D1: f2() (definitely.cpp:11)
==29917==by 0x804920F: f1() (definitely.cpp:17)
==29917==by 0x8049262: main (definitely.cpp:25)

But massif doesn't find them at all? Is this correct?

When I use massif on a program with "still reachable" it does find the still 
reachable, but it isn't finding definite leaks.

Shouldn't massif also find definite leaks?

The C code for "definitely.elf" is below:

#include 
#include 
#include 

void*
f2()
{
  return malloc(20);
}

void
f1()
{
  f2();
}

int
main()
{
  for (int i = 1; i <= 3; i++)
  {
f1();
  }

  return 0;
}

Thanks,
Rob
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-05 Thread Bresalier, Rob (Nokia - US/Murray Hill)
> > If you want to know for sure who killed it then strace it while it
> > runs and it should show you who sends the signel but my bet is that
> > it's the kernel.
> 

I tried strace -p  on my process before I triggered its exit. The strace 
output ends saying with: "+++ killed by SIGKILL +++", but I don't find anything 
about who sent it.

> Or possibly watch `dmesg -w` running in another shell.
> 

I tried 'dmesg -w' but it didn't say anything about the SIGKILL. Is there 
something that has to be configured for dmesg to say the source of the SIGKILL?

___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-05 Thread Bresalier, Rob (Nokia - US/Murray Hill)
Thanks Tom.

Do you think I'd have better luck using the "massif" tool? Would "massif" be 
able to avoid the OOM killer?

Or is there a way to reduce the amount of memory that memcheck will use?

-Original Message-
From: Tom Hughes  
Sent: Friday, August 5, 2022 10:08 AM
To: Bresalier, Rob (Nokia - US/Murray Hill) ; 
valgrind-users@lists.sourceforge.net
Subject: Re: memcheck is getting SIGKILLed before leak report is output

On 05/08/2022 14:09, Bresalier, Rob (Nokia - US/Murray Hill) wrote:

> When running memcheck on a massive monolith embedded executable (237MB 
> stripped, 1.8GiB unstripped), after I stop the executable under 
> valgrind I see the “HEAP SUMMARY” but then valgrind dies before any 
> leak reports are printed. The parent process sees that the return 
> status of memcheck is that it was SIGKILLed (status returned in 
> waitpid call is ‘9’). I am 99.9% sure that the parent process is not the one 
> sending the SIGKILL.
> Is it possible that valgrind SIGKILLs itself? Is there a reason that 
> the linux kernel (Wind River Linux) could be sending a SIGKILL to 
> valgrind/memcheck? I do not see any messages about Out of Memory/OOM 
> killer killing valgrind. Previous experience with this executable is 
> that there are almost 3 million leak reports (most of them are “still 
> reachable”), could that be occupying too much memory. Any ideas/advice 
> to figure out what is going on?

Almost certainly the kernel OOM kiied it.

If you want to know for sure who killed it then strace it while it runs and it 
should show you who sends the signel but my bet is that it's the kernel.

> One thing I see in the logs is about “unhandled ioctl 0xa5 with no 
> size/direction hints”. Could this be a trigger for this crash/sigkill?

Not really, no.

Tom

--
Tom Hughes (t...@compton.nu)
http://compton.nu/

___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-05 Thread Bresalier, Rob (Nokia - US/Murray Hill)
> If finding memory leaks is the only goal (for instance, if you are satisfied 
> that
> memcheck has found all the overrun blocks, uninitialized reads, etc.) then
> https://github.com/KDE/heaptrack  is the best tool.  

Thanks! I didn't know about heaptrack. I will look definitely into that. Does 
heaptrack also show  the 'still reachable' types of leaks that memcheck does?

Any chance that the 'massif' tool would survive the OOM killer? This may be 
easier for me to get going as I already have valgrind built.

Is there anything that can be done with memcheck to make it consume less memory?

___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


[Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-05 Thread Bresalier, Rob (Nokia - US/Murray Hill)
When running memcheck on a massive monolith embedded executable (237MB 
stripped, 1.8GiB unstripped), after I stop the executable under valgrind I see 
the "HEAP SUMMARY" but then valgrind dies before any leak reports are printed. 
The parent process sees that the return status of memcheck is that it was 
SIGKILLed (status returned in waitpid call is '9'). I am 99.9% sure that the 
parent process is not the one sending the SIGKILL. Is it possible that valgrind 
SIGKILLs itself? Is there a reason that the linux kernel (Wind River Linux) 
could be sending a SIGKILL to valgrind/memcheck? I do not see any messages 
about Out of Memory/OOM killer killing valgrind. Previous experience with this 
executable is that there are almost 3 million leak reports (most of them are 
"still reachable"), could that be occupying too much memory. Any ideas/advice 
to figure out what is going on?

We don't seem to get the sigkill if valgrind/memcheck is stopped earlier in the 
life of this executable. But to find the leak I need it to run past that point.

I've tried many different versions of valgrind that have worked to find leaks 
on this executable in the past (3.16.1, 3.18.1, 3.19.0) but they all have this 
same issue of being sigkilled before any leaks get printed.

One thing I see in the logs is about "unhandled ioctl 0xa5 with no 
size/direction hints". Could this be a trigger for this crash/sigkill?

Would appreciate any ideas/advice.

Thanks,
Rob
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


[Valgrind-users] Possible to suppress low block count loss records?

2022-07-13 Thread Bresalier, Rob (Nokia - US/Murray Hill)
We are trying to track down a suspected place in our code that keeps 
accumulating memory in a 'still reachable'.

When I turn on still reachable and run my process for a few hours and then stop 
the process to get the valgrind reports there are over 2.7 million loss records 
which are mostly still reachables. It would take forever for valgrind to print 
this out.

The large majority of "still reachable" that I want to ignore allocate just a 
few blocks. I would like to suppress these and only output "still reachables" 
that allocated 100 blocks or more.

The suppression mechanism seems to only be to suppress particular backtraces.

But I would like to suppress based on number of blocks instead, suppress loss 
records with a small number of blocks.

Is this possible to suppress based on block count without patching valgrind?

If not possible without patching valgrind, any hints on where I could patch 
valgrind to accomplish this?

Thanks,
Rob
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users