Re: [Valgrind-users] Getting SIGKILL to work in MariaDB

2019-08-13 Thread Philippe Waroquiers
On Tue, 2019-08-13 at 16:54 +0300, Michael Widenius wrote:

> I can try that at once. Thanks.
> Testing...
> hm.. It would be good to describe in README_DEVELOPERS how to generate the
> autoconfigure scripts.  Now it says one to run 'make dist', which will
> not work out of the box.
README has a section "Building and installing it".
It looks like README_DEVELOPERS should start with the same title,
and reference the README section.

> Same problem with README.solaris. Please add that one should run first
> 'autogen.sh' and then configure.
README.solaris has a pointer to the README file in the "Compilation" section,
but again that can be made more precise.

I will improve both files to add the relevant reference to README.



> Currently the kill signal is never sent to the process and the process
> continues to run.
Yes, that is the bug 409141 (fixed in the git version).

> 
> > You can avoid this interception by instead doing something like
> >system("kill -9 me");
> > as no way valgrind will be able to intercept this.
> > (of course, me has to be the pid of the calling process).
> 
> In this case the process, in this case mysqld, needs to know who it's
> 'father' process is.
> This is a bit hard to provide to mysqld, as we only know the pid after
> valgrind is started with mysqld as an option.  Another issue is that
> mysqld doesn't know if it's run under valgrind or not and it should
> only kill it's parent if it's valgrind.
Not sure I understand, so I will explain what I understood:
You used to have in a process some lines of code sending SIGKILL
to itself, so as to commit suicide ("hard exit").
Due to bug 409141, when running under valgrind, this hangs.
You have bypassed the problem by rather sending SIGFPE, but you would
like to go back to SIGKILL.
With the fix to 409141, sending SIGKILL again really kills
the process.

The remaining problem is that SIGKILL is still causing a leak search
to be done.

To really have an "hard exit", you can replace the lines of code sending SIGKILL
by the following 3 lines:
  char cmd[1000];
  sprintf(cmd, "kill -9 %d", getpid());
  system (cmd);
With this, you really have a "hard exit".

> (Assuming I understood correctly what you meant with  calling process;
> I think you mean the 'valgrind' process in this case)
Note that when a process "runs under valgrind", there is still only one
single process, which contains both the valgrind code and the code of
the program being run "under valgrind".
So, I am not sure to understand why there is anything
special to do related to parent process when running under valgrind.

> 
> > > Please open a bug at https://bugs.kde.org/enter_bug.cgi?product=valgrind
> > > and attach a minimal test case if at all possible.
> 
> Will try to create a test case, but it's not that easy as for simple
> programs valgrind seams to pass KILL forwards.  Maybe this is only a
> problem with threaded programs.
> 
> https://www.mail-archive.com/valgrind-users@lists.sourceforge.net/msg06862.html
> seams to highlight the same problem.
As far as I can see, this message and your problem of hanging after sending
SIGKILL is the bug 409141.
And you have double checked the git version also fixes the hanging bug.
So, IMO, there is no need to have another reproducer.


> I have now tested with 10.3.16.GIT from today and did run it with
> --trace-signals=yes.
> In this case the SIGKILL is sent to the process and the process is
> killed so it looks
> like the bugs is fixed. Great!
> However one problem still reminds. After the kill, we still get a
> report of leaks which
> is not that relevant when a process is killed hard.
> Hope you will figure out a way to stop this report!
Well, the valgrind signal sending interception code has rather be explicitly
designed to do various "end of life actions" (such as run the leak search)
when a process terminates or dies, including the case where a process
sends a fatal signal to itself.

It is of course possible to implement something to really have a "hard exit"
and/or disable leak search.  What is the best feature/way to do that is not
(yet) clear to me: a more general approach such as loading a supp file
or allow to change valgrind parameters is more attractive than e.g.
a very specialised request such as VALGRIND_HARD_EXIT();

In the meantime, if my understanding is correct,
   system ("kill -9 ..."); 
as explained above gives a hard exit without allowing valgrind to
intercept the signal and do a leak search or any other closing actions.

If that does not work/is not usable, then I have missed something in
what you are trying to do.

Further comments welcome ...

Philippe




___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Getting SIGKILL to work in MariaDB

2019-08-13 Thread Michael Widenius
Hi!

On Mon, Aug 12, 2019 at 11:59 PM Philippe Waroquiers
 wrote:
>
> On Mon, 2019-08-12 at 13:49 +0100, Tom Hughes wrote:
> > On 12/08/2019 13:15, Michael Widenius wrote:
> >
> > > The request I would like you to consider is to do one of the following:
> > > - Ensure that sending a sigkill works and in this case there should
> > > not be any valgrind leak report.

> What version of Valgrind are you using ?

valgrind-3.15.0  (latest from download page)

> I recently fixed (10 of July) in the git repository a bug with an infinite 
> loop
> or a hang when a process sends a signal to itself.
>
> See bugs 409141 and 409367.

Sending SIGFPE works, but not SIGKILL.  I would prefer to use SIGKILL
as this used, at least in the past, to kill the process without any
leak reporting.

> So, you might test with a recent git version of Valgrind.

I can try that at once. Thanks.
Testing...
hm.. It would be good to describe in README_DEVELOPERS how to generate the
autoconfigure scripts.  Now it says one to run 'make dist', which will
not work out of the box.
Same problem with README.solaris. Please add that one should run first
'autogen.sh' and then configure.

See more about this at end of email:

> > > - Add an api call where we could specify that we don't want any leak
> > > reports from now on.  If this would exist then I could call this when
> > > we are about to send the SIGFPE/SIGKILL signal to the server.
> A stackoverflow discussion led to a suggestion to have a
> monitor command/client request to load a new suppression file.
> This would allow to load a supp file suppressing all leaks.

Yes, could be useful but then the binary would need to know where the
suppression files are
, which is harder for dynamic test system that starts the mysqld
binary with different patch depending on configuration.

> Allowing to load a suppression file via a client request might be useful
> in other circumstances (the stackoverflow case was to avoid any leak
> search when a program fails between a fork and an exec, but it
> could be used to suppress flexibly whatever kind of errors and/or
> automatically load a supp file e.g. when loading a shared lib).

Yes, I agree that for many programs that would be useful. However for
any program that just need a way to kill itself 'hard' and not get any
more reports from valgrind that is a bit of
overkill.

> Alternatively, the only reason why valgrind can do a leak search
> when your process calls "kill (me, 9);"
> is that valgrind is intercepting the syscall and does a leak
> search before really self-killing.

Currently the kill signal is never sent to the process and the process
continues to run.

> You can avoid this interception by instead doing something like
>system("kill -9 me");
> as no way valgrind will be able to intercept this.
> (of course, me has to be the pid of the calling process).

In this case the process, in this case mysqld, needs to know who it's
'father' process is.
This is a bit hard to provide to mysqld, as we only know the pid after
valgrind is started with mysqld as an option.  Another issue is that
mysqld doesn't know if it's run under valgrind or not and it should
only kill it's parent if it's valgrind.
(Assuming I understood correctly what you meant with  calling process;
I think you mean the 'valgrind' process in this case)

> > Please open a bug at https://bugs.kde.org/enter_bug.cgi?product=valgrind
> > and attach a minimal test case if at all possible.

Will try to create a test case, but it's not that easy as for simple
programs valgrind seams to pass KILL forwards.  Maybe this is only a
problem with threaded programs.

https://www.mail-archive.com/valgrind-users@lists.sourceforge.net/msg06862.html
seams to highlight the same problem.

currently one can see this in MariaDB 10.4 from git by running:
mysql-test-run --valgrind innodb.alter_copy

which I am agree is not a simple test cases.  If I can't figure a good
workaround for MariaDB
I will try to create a simpler test case and report it properly.

> > Also the output of running valgrind with --trace-signals=yes would be
> > a good thing to provide.

I have now tested with 10.3.16.GIT from today and did run it with
--trace-signals=yes.
In this case the SIGKILL is sent to the process and the process is
killed so it looks
like the bugs is fixed. Great!
However one problem still reminds. After the kill, we still get a
report of leaks which
is not that relevant when a process is killed hard.
Hope you will figure out a way to stop this report!

Regards,
Monty


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users