Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-09-01 Thread Tom Hughes via Valgrind-users

On 01/09/2022 01:03, Bresalier, Rob (Nokia - US/Murray Hill) wrote:


Don't understand why strace log has exit(0) without the underscore, I know for 
a fact that it was with the underscore.


Because exit() and _exit() are C library functions but both call
the SYS_exit system call and that is what strace shows.

The difference is that _exit doesn't run atexit() handlers or do
any other cleanup before calling SYS_exit.

Tom

--
Tom Hughes (t...@compton.nu)
http://compton.nu/


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-31 Thread Bresalier, Rob (Nokia - US/Murray Hill)
> Normally, if it is the OOM that kills a process, you should find a trace of 
> this in the system logs.

I looked in every system log I could find, there was no indication of OOM 
killing it in any system log.

> I do not understand what you mean by reducing the nr of callers from 12 to 6.
> What are these callers ? Is that some threads of the process you are running
> under valgrind ?
> 

I mean the --num-callers option core option to valgrind. By default this is 12, 
and I didn't specify it. I tried using --num-callers=6 to reduce memory 
consumption. From the valgrind manual this means " Specifies the maximum number 
of entries shown in stack traces that identify program locations.". By reducing 
it to 6 I was hoping to reduce valgrind memory consumption in case it really 
was OOM killer, which I really doubt now.

> And just in case: are you using the last version of Valgrind ?

Yes I used the last version of valgrind and many earlier versions.

> You might use "strace" on valgrind to see what is going on at the time
> _exit(0) is called.

I did use 'strace' and dmesg. Neither indicated it was OOM killer.

I did happen to save the strace log when the SIGKILL happened. Here is the part 
around the _exit(0):

read(2040, "R", 1)  = 1
gettid()= 3332
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP SYS], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP SYS], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP SYS], NULL, 8) = 0
gettid()= 3332
write(2041, "S", 1) = 1
exit(0) = ?
+++ killed by SIGKILL +++

Don't understand why strace log has exit(0) without the underscore, I know for 
a fact that it was with the underscore.

The strace log doesn't indicate anything special happening around the _exit(0). 
When I removed it the SIGKILL went away.

> You might also start valgrind with some debug trace e.g.  -d -d -d -d -v -v 
> -v -v

Was not aware of this and didn't try it. Don't have time to try it now.

Regards,
Rob

___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-31 Thread Philippe Waroquiers
On Wed, 2022-08-31 at 17:42 +, Bresalier, Rob (Nokia - US/Murray Hill) 
wrote:
> > When running memcheck on a massive monolith embedded executable
> > (237MB stripped, 1.8GiB unstripped), after I stop the executable under
> > valgrind I see the "HEAP SUMMARY" but then valgrind dies before any leak
> > reports are printed. The parent process sees that the return status of
> > memcheck is that it was SIGKILLed (status returned in waitpid call is '9').
> 
> We found that removing a call to _exit(0) made it so that valgrind is no 
> longer
> SIGKILLED.
> 
> Any ideas why using _exit(0) may get rid of valgrind getting SIGKILLed?
> 
> Previously exit(0) was called, without the leading underscore, but changed it 
> to
> _exit(0) to really make sure no memory was being deallocated. This worked 
> well on a
> different process, so we carried it over to this one, that is why we did it.
> 
> Even with exit(0) (no underscore), in this process there is not much 
> deallocation going
> on in exit handlers, so have lots of doubts that valgrind/memcheck was using 
> too much
> memory and invoking the OOM killer.
> 
> Using strace and dmesg while we had _exit(0) in use didn't show that OOM 
> killer was
> SIGKILLing valgrind.
> 
> I also tried reducing number of callers from 12 to 6 when using _exit(0), 
> still got the
> SIGKILL.
> 
> Also tried using a system that had an additional 4GByte of memory, and also 
> got the
> SIGKILL there.
> 
> So I have many doubts that Valgrind was getting SIGKILLed due to too much 
> memory usage.
> 
> Don't know why removing _exit(0) got rid of the SIGKILL. Was wondering if 
> anyone had any
> ideas?
Normally, if it is the OOM that kills a process, you should find a trace of 
this in the
system logs.

I do not understand what you mean by reducing the nr of callers from 12 to 6.
What are these callers ? Is that some threads of the process you are
running under valgrind ?

And just in case: are you using the last version of Valgrind ?

You might use "strace" on valgrind to see what is going on at the time _exit(0) 
is called.
You might also start valgrind with some debug trace e.g.  -d -d -d -d -v -v -v 
-v

Philippe




___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-31 Thread Bresalier, Rob (Nokia - US/Murray Hill)
> When running memcheck on a massive monolith embedded executable
> (237MB stripped, 1.8GiB unstripped), after I stop the executable under
> valgrind I see the "HEAP SUMMARY" but then valgrind dies before any leak
> reports are printed. The parent process sees that the return status of
> memcheck is that it was SIGKILLed (status returned in waitpid call is '9').

We found that removing a call to _exit(0) made it so that valgrind is no longer 
SIGKILLED.

Any ideas why using _exit(0) may get rid of valgrind getting SIGKILLed?

Previously exit(0) was called, without the leading underscore, but changed it 
to _exit(0) to really make sure no memory was being deallocated. This worked 
well on a different process, so we carried it over to this one, that is why we 
did it.

Even with exit(0) (no underscore), in this process there is not much 
deallocation going on in exit handlers, so have lots of doubts that 
valgrind/memcheck was using too much memory and invoking the OOM killer.

Using strace and dmesg while we had _exit(0) in use didn't show that OOM killer 
was SIGKILLing valgrind.

I also tried reducing number of callers from 12 to 6 when using _exit(0), still 
got the SIGKILL.

Also tried using a system that had an additional 4GByte of memory, and also got 
the SIGKILL there.

So I have many doubts that Valgrind was getting SIGKILLed due to too much 
memory usage.

Don't know why removing _exit(0) got rid of the SIGKILL. Was wondering if 
anyone had any ideas?


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-06 Thread Philippe Waroquiers
> 
> > Is there anything that can be done with memcheck to make it consume less 
> > memory?
> 
> No.
In fact, Yes :).
Or more precisely, yes, memory can be somewhat reduced :).
See my other mail.

Philippe




___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-06 Thread Philippe Waroquiers
On Fri, 2022-08-05 at 15:34 +, Bresalier, Rob (Nokia - US/Murray Hill) 
wrote:
> > If finding memory leaks is the only goal (for instance, if you are 
> > satisfied that
> > memcheck has found all the overrun blocks, uninitialized reads, etc.) then
> > https://github.com/KDE/heaptrack  is the best tool.  
> 
> Thanks! I didn't know about heaptrack. I will look definitely into that. Does 
> heaptrack
> also show  the 'still reachable' types of leaks that memcheck does?
> 
> Any chance that the 'massif' tool would survive the OOM killer? This may be 
> easier for
> me to get going as I already have valgrind built.
> 
> Is there anything that can be done with memcheck to make it consume less 
> memory?
You might be interested in looking at the slides of the FOSDEM presentation
  'Tuning Valgrind for your workload'
https://archive.fosdem.org/2015/schedule/event/valgrind_tuning/attachments/slides/743/export/events/attachments/valgrind_tuning/slides/743/tuning_V_for_your_workload.pdf

There are several things you can do to reduce memcheck memory usage.

Note also that you can also run leak search while your program runs,
either via memcheck client requests or from the shell, using vgdb.

Philippe




___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-06 Thread Julian Seward




Is there anything that can be done with memcheck to make it consume less memory?


First of all, figure out whether memcheck got sigkilled because the machine
ran out of space, or because you hit some shell limit/ulimit.  In the former
case, you can then try adding swap space to the machine.  In the latter case
you'll need to mess with the shell's ulimit settings.

You could also try reducing the (data) size of the workload.

Massif and Memcheck are different tools and do largely different things.
Whether or not you can use one or the other depends a lot on the specifics
of what problem you're trying to solve.

J



___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-05 Thread John Reiser

Is there anything that can be done with memcheck to make it consume less memory?


No.


Well, you can use the command-line argument "--num-callers=" to reduce 
the length
of tracebacks that are stored in the "red zones" just before and after an 
allocated block.
This might help enough if you have zillions of "still reachable" blocks.  But 
you get
shorter tracebacks, which might not give enough information to find and fix the 
leak
quickly.
If you do not have zillions of "still reachable" blocks, then --num-callers
will not help so much; but probably would not be needed anyway.




___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-05 Thread John Reiser

Does heaptrack also show  the 'still reachable' types of leaks that memcheck 
does?


Heaptrack intercepts malloc+free+etc, then logs the parameters, result, and
traceback; but otherwise lets the progcess-original malloc+free+etc do the work.
Heaptrack does not notice, and does not care, what you do with the result of
malloc(), except whether or not the pointer returned by malloc() ever gets
passed as an argument to free().

When heaptrack performs analysis, then any result from malloc() that has not
been free()d is a "leak" as far as heaptrack is concerned.  So that includes
what memcheck calls "still reachable" but not (yet) a leak.


Any chance that the 'massif' tool would survive the OOM killer? This may be 
easier for me to get going as I already have valgrind built.


Worth a try if you have a day or so to spend.  Like all valgrind tools,
massif relies on emulating the instruction stream, so the basic ~10X
run-time slowdown applies.


Is there anything that can be done with memcheck to make it consume less memory?


No.


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-05 Thread Bresalier, Rob (Nokia - US/Murray Hill)
> > If you want to know for sure who killed it then strace it while it
> > runs and it should show you who sends the signel but my bet is that
> > it's the kernel.
> 

I tried strace -p  on my process before I triggered its exit. The strace 
output ends saying with: "+++ killed by SIGKILL +++", but I don't find anything 
about who sent it.

> Or possibly watch `dmesg -w` running in another shell.
> 

I tried 'dmesg -w' but it didn't say anything about the SIGKILL. Is there 
something that has to be configured for dmesg to say the source of the SIGKILL?

___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-05 Thread Bresalier, Rob (Nokia - US/Murray Hill)
Thanks Tom.

Do you think I'd have better luck using the "massif" tool? Would "massif" be 
able to avoid the OOM killer?

Or is there a way to reduce the amount of memory that memcheck will use?

-Original Message-
From: Tom Hughes  
Sent: Friday, August 5, 2022 10:08 AM
To: Bresalier, Rob (Nokia - US/Murray Hill) ; 
valgrind-users@lists.sourceforge.net
Subject: Re: memcheck is getting SIGKILLed before leak report is output

On 05/08/2022 14:09, Bresalier, Rob (Nokia - US/Murray Hill) wrote:

> When running memcheck on a massive monolith embedded executable (237MB 
> stripped, 1.8GiB unstripped), after I stop the executable under 
> valgrind I see the “HEAP SUMMARY” but then valgrind dies before any 
> leak reports are printed. The parent process sees that the return 
> status of memcheck is that it was SIGKILLed (status returned in 
> waitpid call is ‘9’). I am 99.9% sure that the parent process is not the one 
> sending the SIGKILL.
> Is it possible that valgrind SIGKILLs itself? Is there a reason that 
> the linux kernel (Wind River Linux) could be sending a SIGKILL to 
> valgrind/memcheck? I do not see any messages about Out of Memory/OOM 
> killer killing valgrind. Previous experience with this executable is 
> that there are almost 3 million leak reports (most of them are “still 
> reachable”), could that be occupying too much memory. Any ideas/advice 
> to figure out what is going on?

Almost certainly the kernel OOM kiied it.

If you want to know for sure who killed it then strace it while it runs and it 
should show you who sends the signel but my bet is that it's the kernel.

> One thing I see in the logs is about “unhandled ioctl 0xa5 with no 
> size/direction hints”. Could this be a trigger for this crash/sigkill?

Not really, no.

Tom

--
Tom Hughes (t...@compton.nu)
http://compton.nu/

___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-05 Thread Bresalier, Rob (Nokia - US/Murray Hill)
> If finding memory leaks is the only goal (for instance, if you are satisfied 
> that
> memcheck has found all the overrun blocks, uninitialized reads, etc.) then
> https://github.com/KDE/heaptrack  is the best tool.  

Thanks! I didn't know about heaptrack. I will look definitely into that. Does 
heaptrack also show  the 'still reachable' types of leaks that memcheck does?

Any chance that the 'massif' tool would survive the OOM killer? This may be 
easier for me to get going as I already have valgrind built.

Is there anything that can be done with memcheck to make it consume less memory?

___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-05 Thread Julian Seward

On 05/08/2022 16:08, Tom Hughes via Valgrind-users wrote:


If you want to know for sure who killed it then strace it while
it runs and it should show you who sends the signel but my bet is
that it's the kernel.


Or possibly watch `dmesg -w` running in another shell.

J


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-05 Thread John Reiser

When running memcheck on a massive monolith embedded executable (237MB 
stripped, 1.8GiB unstripped), after I stop the executable under valgrind I see 
the “HEAP SUMMARY” but then valgrind dies before any leak reports are printed.

If finding memory leaks is the only goal (for instance, if you are satisfied
that memcheck has found all the overrun blocks, uninitialized reads, etc.)
then https://github.com/KDE/heaptrack  is the best tool.  The data-gathering
phase runs in any Linux process using LD_PRELOAD and libunwind.  The analysis
phase runs a GUI under KDE, and/or generates *useful* text reports: leaks by
individual size, leaks by total size for a given traceback, allocations
(leaked or not) by frequency or total size, etc.  I like the text-only
analysis, which avoids the requirement for KDE.  Heaptrack CPU overhead
tends to be around 20% or less, so it does not take forever.  Heaptrack
does require disk space to record data (sequential access only),
so you may need several gigabytes (locally or via network.)


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-05 Thread Tom Hughes via Valgrind-users

On 05/08/2022 14:09, Bresalier, Rob (Nokia - US/Murray Hill) wrote:

When running memcheck on a massive monolith embedded executable (237MB 
stripped, 1.8GiB unstripped), after I stop the executable under valgrind 
I see the “HEAP SUMMARY” but then valgrind dies before any leak reports 
are printed. The parent process sees that the return status of memcheck 
is that it was SIGKILLed (status returned in waitpid call is ‘9’). I am 
99.9% sure that the parent process is not the one sending the SIGKILL. 
Is it possible that valgrind SIGKILLs itself? Is there a reason that the 
linux kernel (Wind River Linux) could be sending a SIGKILL to 
valgrind/memcheck? I do not see any messages about Out of Memory/OOM 
killer killing valgrind. Previous experience with this executable is 
that there are almost 3 million leak reports (most of them are “still 
reachable”), could that be occupying too much memory. Any ideas/advice 
to figure out what is going on?


Almost certainly the kernel OOM kiied it.

If you want to know for sure who killed it then strace it while
it runs and it should show you who sends the signel but my bet is
that it's the kernel.

One thing I see in the logs is about “unhandled ioctl 0xa5 with no 
size/direction hints”. Could this be a trigger for this crash/sigkill?


Not really, no.

Tom

--
Tom Hughes (t...@compton.nu)
http://compton.nu/


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


[Valgrind-users] memcheck is getting SIGKILLed before leak report is output

2022-08-05 Thread Bresalier, Rob (Nokia - US/Murray Hill)
When running memcheck on a massive monolith embedded executable (237MB 
stripped, 1.8GiB unstripped), after I stop the executable under valgrind I see 
the "HEAP SUMMARY" but then valgrind dies before any leak reports are printed. 
The parent process sees that the return status of memcheck is that it was 
SIGKILLed (status returned in waitpid call is '9'). I am 99.9% sure that the 
parent process is not the one sending the SIGKILL. Is it possible that valgrind 
SIGKILLs itself? Is there a reason that the linux kernel (Wind River Linux) 
could be sending a SIGKILL to valgrind/memcheck? I do not see any messages 
about Out of Memory/OOM killer killing valgrind. Previous experience with this 
executable is that there are almost 3 million leak reports (most of them are 
"still reachable"), could that be occupying too much memory. Any ideas/advice 
to figure out what is going on?

We don't seem to get the sigkill if valgrind/memcheck is stopped earlier in the 
life of this executable. But to find the leak I need it to run past that point.

I've tried many different versions of valgrind that have worked to find leaks 
on this executable in the past (3.16.1, 3.18.1, 3.19.0) but they all have this 
same issue of being sigkilled before any leaks get printed.

One thing I see in the logs is about "unhandled ioctl 0xa5 with no 
size/direction hints". Could this be a trigger for this crash/sigkill?

Would appreciate any ideas/advice.

Thanks,
Rob
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users