Re: [Valgrind-users] Getting SIGKILL to work in MariaDB

2019-08-31 Thread Philippe Waroquiers
On Wed, 2019-08-21 at 14:44 +0200, Philippe Waroquiers wrote:
> On Mon, 2019-08-12 at 15:15 +0300, Michael Widenius wrote:
> > Something like the following would be very useful:
> > VALGRIND_IGNORE_LEAKS(VALGRIND_LEAK_INDIRECT | VALGRIND_LEAK_DEFINITE...)
> See patch attached to https://bugs.kde.org/show_bug.cgi?id=411134
> This patch allows to change various command line options after
> startup, including the options telling if/how to do a leak search.
Patch pushed today to the Valgrind git repository as 3a803036.

Philippe




___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Getting SIGKILL to work in MariaDB

2019-08-21 Thread Philippe Waroquiers
On Mon, 2019-08-12 at 15:15 +0300, Michael Widenius wrote:
> Something like the following would be very useful:
> VALGRIND_IGNORE_LEAKS(VALGRIND_LEAK_INDIRECT | VALGRIND_LEAK_DEFINITE...)
See patch attached to https://bugs.kde.org/show_bug.cgi?id=411134
This patch allows to change various command line options after
startup, including the options telling if/how to do a leak search.

Feedback welcome ...

Philippe




___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Getting SIGKILL to work in MariaDB

2019-08-13 Thread Philippe Waroquiers
On Tue, 2019-08-13 at 16:54 +0300, Michael Widenius wrote:

> I can try that at once. Thanks.
> Testing...
> hm.. It would be good to describe in README_DEVELOPERS how to generate the
> autoconfigure scripts.  Now it says one to run 'make dist', which will
> not work out of the box.
README has a section "Building and installing it".
It looks like README_DEVELOPERS should start with the same title,
and reference the README section.

> Same problem with README.solaris. Please add that one should run first
> 'autogen.sh' and then configure.
README.solaris has a pointer to the README file in the "Compilation" section,
but again that can be made more precise.

I will improve both files to add the relevant reference to README.



> Currently the kill signal is never sent to the process and the process
> continues to run.
Yes, that is the bug 409141 (fixed in the git version).

> 
> > You can avoid this interception by instead doing something like
> >system("kill -9 me");
> > as no way valgrind will be able to intercept this.
> > (of course, me has to be the pid of the calling process).
> 
> In this case the process, in this case mysqld, needs to know who it's
> 'father' process is.
> This is a bit hard to provide to mysqld, as we only know the pid after
> valgrind is started with mysqld as an option.  Another issue is that
> mysqld doesn't know if it's run under valgrind or not and it should
> only kill it's parent if it's valgrind.
Not sure I understand, so I will explain what I understood:
You used to have in a process some lines of code sending SIGKILL
to itself, so as to commit suicide ("hard exit").
Due to bug 409141, when running under valgrind, this hangs.
You have bypassed the problem by rather sending SIGFPE, but you would
like to go back to SIGKILL.
With the fix to 409141, sending SIGKILL again really kills
the process.

The remaining problem is that SIGKILL is still causing a leak search
to be done.

To really have an "hard exit", you can replace the lines of code sending SIGKILL
by the following 3 lines:
  char cmd[1000];
  sprintf(cmd, "kill -9 %d", getpid());
  system (cmd);
With this, you really have a "hard exit".

> (Assuming I understood correctly what you meant with  calling process;
> I think you mean the 'valgrind' process in this case)
Note that when a process "runs under valgrind", there is still only one
single process, which contains both the valgrind code and the code of
the program being run "under valgrind".
So, I am not sure to understand why there is anything
special to do related to parent process when running under valgrind.

> 
> > > Please open a bug at https://bugs.kde.org/enter_bug.cgi?product=valgrind
> > > and attach a minimal test case if at all possible.
> 
> Will try to create a test case, but it's not that easy as for simple
> programs valgrind seams to pass KILL forwards.  Maybe this is only a
> problem with threaded programs.
> 
> https://www.mail-archive.com/valgrind-users@lists.sourceforge.net/msg06862.html
> seams to highlight the same problem.
As far as I can see, this message and your problem of hanging after sending
SIGKILL is the bug 409141.
And you have double checked the git version also fixes the hanging bug.
So, IMO, there is no need to have another reproducer.


> I have now tested with 10.3.16.GIT from today and did run it with
> --trace-signals=yes.
> In this case the SIGKILL is sent to the process and the process is
> killed so it looks
> like the bugs is fixed. Great!
> However one problem still reminds. After the kill, we still get a
> report of leaks which
> is not that relevant when a process is killed hard.
> Hope you will figure out a way to stop this report!
Well, the valgrind signal sending interception code has rather be explicitly
designed to do various "end of life actions" (such as run the leak search)
when a process terminates or dies, including the case where a process
sends a fatal signal to itself.

It is of course possible to implement something to really have a "hard exit"
and/or disable leak search.  What is the best feature/way to do that is not
(yet) clear to me: a more general approach such as loading a supp file
or allow to change valgrind parameters is more attractive than e.g.
a very specialised request such as VALGRIND_HARD_EXIT();

In the meantime, if my understanding is correct,
   system ("kill -9 ..."); 
as explained above gives a hard exit without allowing valgrind to
intercept the signal and do a leak search or any other closing actions.

If that does not work/is not usable, then I have missed something in
what you are trying to do.

Further comments welcome ...

Philippe




___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Getting SIGKILL to work in MariaDB

2019-08-13 Thread Michael Widenius
Hi!

On Mon, Aug 12, 2019 at 11:59 PM Philippe Waroquiers
 wrote:
>
> On Mon, 2019-08-12 at 13:49 +0100, Tom Hughes wrote:
> > On 12/08/2019 13:15, Michael Widenius wrote:
> >
> > > The request I would like you to consider is to do one of the following:
> > > - Ensure that sending a sigkill works and in this case there should
> > > not be any valgrind leak report.

> What version of Valgrind are you using ?

valgrind-3.15.0  (latest from download page)

> I recently fixed (10 of July) in the git repository a bug with an infinite 
> loop
> or a hang when a process sends a signal to itself.
>
> See bugs 409141 and 409367.

Sending SIGFPE works, but not SIGKILL.  I would prefer to use SIGKILL
as this used, at least in the past, to kill the process without any
leak reporting.

> So, you might test with a recent git version of Valgrind.

I can try that at once. Thanks.
Testing...
hm.. It would be good to describe in README_DEVELOPERS how to generate the
autoconfigure scripts.  Now it says one to run 'make dist', which will
not work out of the box.
Same problem with README.solaris. Please add that one should run first
'autogen.sh' and then configure.

See more about this at end of email:

> > > - Add an api call where we could specify that we don't want any leak
> > > reports from now on.  If this would exist then I could call this when
> > > we are about to send the SIGFPE/SIGKILL signal to the server.
> A stackoverflow discussion led to a suggestion to have a
> monitor command/client request to load a new suppression file.
> This would allow to load a supp file suppressing all leaks.

Yes, could be useful but then the binary would need to know where the
suppression files are
, which is harder for dynamic test system that starts the mysqld
binary with different patch depending on configuration.

> Allowing to load a suppression file via a client request might be useful
> in other circumstances (the stackoverflow case was to avoid any leak
> search when a program fails between a fork and an exec, but it
> could be used to suppress flexibly whatever kind of errors and/or
> automatically load a supp file e.g. when loading a shared lib).

Yes, I agree that for many programs that would be useful. However for
any program that just need a way to kill itself 'hard' and not get any
more reports from valgrind that is a bit of
overkill.

> Alternatively, the only reason why valgrind can do a leak search
> when your process calls "kill (me, 9);"
> is that valgrind is intercepting the syscall and does a leak
> search before really self-killing.

Currently the kill signal is never sent to the process and the process
continues to run.

> You can avoid this interception by instead doing something like
>system("kill -9 me");
> as no way valgrind will be able to intercept this.
> (of course, me has to be the pid of the calling process).

In this case the process, in this case mysqld, needs to know who it's
'father' process is.
This is a bit hard to provide to mysqld, as we only know the pid after
valgrind is started with mysqld as an option.  Another issue is that
mysqld doesn't know if it's run under valgrind or not and it should
only kill it's parent if it's valgrind.
(Assuming I understood correctly what you meant with  calling process;
I think you mean the 'valgrind' process in this case)

> > Please open a bug at https://bugs.kde.org/enter_bug.cgi?product=valgrind
> > and attach a minimal test case if at all possible.

Will try to create a test case, but it's not that easy as for simple
programs valgrind seams to pass KILL forwards.  Maybe this is only a
problem with threaded programs.

https://www.mail-archive.com/valgrind-users@lists.sourceforge.net/msg06862.html
seams to highlight the same problem.

currently one can see this in MariaDB 10.4 from git by running:
mysql-test-run --valgrind innodb.alter_copy

which I am agree is not a simple test cases.  If I can't figure a good
workaround for MariaDB
I will try to create a simpler test case and report it properly.

> > Also the output of running valgrind with --trace-signals=yes would be
> > a good thing to provide.

I have now tested with 10.3.16.GIT from today and did run it with
--trace-signals=yes.
In this case the SIGKILL is sent to the process and the process is
killed so it looks
like the bugs is fixed. Great!
However one problem still reminds. After the kill, we still get a
report of leaks which
is not that relevant when a process is killed hard.
Hope you will figure out a way to stop this report!

Regards,
Monty


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Getting SIGKILL to work in MariaDB

2019-08-12 Thread Philippe Waroquiers
On Mon, 2019-08-12 at 13:49 +0100, Tom Hughes wrote:
> On 12/08/2019 13:15, Michael Widenius wrote:
> 
> > The request I would like you to consider is to do one of the following:
> > - Ensure that sending a sigkill works and in this case there should
> > not be any valgrind leak report.
What version of Valgrind are you using ?

I recently fixed (10 of July) in the git repository a bug with an infinite loop
or a hang when a process sends a signal to itself.

See bugs 409141 and 409367.

So, you might test with a recent git version of Valgrind.

> > - Add an api call where we could specify that we don't want any leak
> > reports from now on.  If this would exist then I could call this when
> > we are about to send the SIGFPE/SIGKILL signal to the server.
A stackoverflow discussion led to a suggestion to have a
monitor command/client request to load a new suppression file.
This would allow to load a supp file suppressing all leaks.

Allowing to load a suppression file via a client request might be useful
in other circumstances (the stackoverflow case was to avoid any leak
search when a program fails between a fork and an exec, but it
could be used to suppress flexibly whatever kind of errors and/or
automatically load a supp file e.g. when loading a shared lib).

Alternatively, the only reason why valgrind can do a leak search
when your process calls "kill (me, 9);"
is that valgrind is intercepting the syscall and does a leak
search before really self-killing.

You can avoid this interception by instead doing something like
   system("kill -9 me");
as no way valgrind will be able to intercept this.
(of course, me has to be the pid of the calling process).

Philippe



> 
> Please open a bug at https://bugs.kde.org/enter_bug.cgi?product=valgrind
> and attach a minimal test case if at all possible.
> 
> Also the output of running valgrind with --trace-signals=yes would be
> a good thing to provide.
> 
> Tom
> 



___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Getting SIGKILL to work in MariaDB

2019-08-12 Thread Tom Hughes

On 12/08/2019 13:15, Michael Widenius wrote:


The request I would like you to consider is to do one of the following:
- Ensure that sending a sigkill works and in this case there should
not be any valgrind leak report.
- Add an api call where we could specify that we don't want any leak
reports from now on.  If this would exist then I could call this when
we are about to send the SIGFPE/SIGKILL signal to the server.


Please open a bug at https://bugs.kde.org/enter_bug.cgi?product=valgrind
and attach a minimal test case if at all possible.

Also the output of running valgrind with --trace-signals=yes would be
a good thing to provide.

Tom

--
Tom Hughes (t...@compton.nu)
http://compton.nu/


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


[Valgrind-users] Getting SIGKILL to work in MariaDB

2019-08-12 Thread Michael Widenius
Hi!

I am the creator of MySQL and MariaDB and a valgrind user since a LONG
time!  valgrind has been used to verify the integrity of MySQL since
the very early times of valgrind!

I have a small request to solve a small problem with valgrind in MariaDB:

To be able to test recovery during testing, our test system takes down
the mysqld server process hard and schedules a restart of the server.

We used to do that by sending sigkill internally in the the server to
itself and then the test waits for the server to restart and then the
test continues.

I noticed recently that when running the server under valgrind, the
sigkill was ignored and thus the test stalled. (I think this worked
with some earlier version of valgrind, but I am not sure about this)

I have now fixed this by sending a SIGFPE signal instead and setting
an internal variable to mark that we shouldn't write to the log that
the server died (as the test system who examines the logs would think
something went wrong).

The problem is that when the server dies hard with SIGFPE, the memtool
writes out the leaks and the test system thinks something is wrong.

The request I would like you to consider is to do one of the following:
- Ensure that sending a sigkill works and in this case there should
not be any valgrind leak report.
- Add an api call where we could specify that we don't want any leak
reports from now on.  If this would exist then I could call this when
we are about to send the SIGFPE/SIGKILL signal to the server.

Something like the following would be very useful:
VALGRIND_IGNORE_LEAKS(VALGRIND_LEAK_INDIRECT | VALGRIND_LEAK_DEFINITE...)

Regards,
Monty


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users