Re: [Valgrind-users] Getting SIGKILL to work in MariaDB
On Wed, 2019-08-21 at 14:44 +0200, Philippe Waroquiers wrote: > On Mon, 2019-08-12 at 15:15 +0300, Michael Widenius wrote: > > Something like the following would be very useful: > > VALGRIND_IGNORE_LEAKS(VALGRIND_LEAK_INDIRECT | VALGRIND_LEAK_DEFINITE...) > See patch attached to https://bugs.kde.org/show_bug.cgi?id=411134 > This patch allows to change various command line options after > startup, including the options telling if/how to do a leak search. Patch pushed today to the Valgrind git repository as 3a803036. Philippe ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Getting SIGKILL to work in MariaDB
On Mon, 2019-08-12 at 15:15 +0300, Michael Widenius wrote: > Something like the following would be very useful: > VALGRIND_IGNORE_LEAKS(VALGRIND_LEAK_INDIRECT | VALGRIND_LEAK_DEFINITE...) See patch attached to https://bugs.kde.org/show_bug.cgi?id=411134 This patch allows to change various command line options after startup, including the options telling if/how to do a leak search. Feedback welcome ... Philippe ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Getting SIGKILL to work in MariaDB
On Tue, 2019-08-13 at 16:54 +0300, Michael Widenius wrote: > I can try that at once. Thanks. > Testing... > hm.. It would be good to describe in README_DEVELOPERS how to generate the > autoconfigure scripts. Now it says one to run 'make dist', which will > not work out of the box. README has a section "Building and installing it". It looks like README_DEVELOPERS should start with the same title, and reference the README section. > Same problem with README.solaris. Please add that one should run first > 'autogen.sh' and then configure. README.solaris has a pointer to the README file in the "Compilation" section, but again that can be made more precise. I will improve both files to add the relevant reference to README. > Currently the kill signal is never sent to the process and the process > continues to run. Yes, that is the bug 409141 (fixed in the git version). > > > You can avoid this interception by instead doing something like > >system("kill -9 me"); > > as no way valgrind will be able to intercept this. > > (of course, me has to be the pid of the calling process). > > In this case the process, in this case mysqld, needs to know who it's > 'father' process is. > This is a bit hard to provide to mysqld, as we only know the pid after > valgrind is started with mysqld as an option. Another issue is that > mysqld doesn't know if it's run under valgrind or not and it should > only kill it's parent if it's valgrind. Not sure I understand, so I will explain what I understood: You used to have in a process some lines of code sending SIGKILL to itself, so as to commit suicide ("hard exit"). Due to bug 409141, when running under valgrind, this hangs. You have bypassed the problem by rather sending SIGFPE, but you would like to go back to SIGKILL. With the fix to 409141, sending SIGKILL again really kills the process. The remaining problem is that SIGKILL is still causing a leak search to be done. To really have an "hard exit", you can replace the lines of code sending SIGKILL by the following 3 lines: char cmd[1000]; sprintf(cmd, "kill -9 %d", getpid()); system (cmd); With this, you really have a "hard exit". > (Assuming I understood correctly what you meant with calling process; > I think you mean the 'valgrind' process in this case) Note that when a process "runs under valgrind", there is still only one single process, which contains both the valgrind code and the code of the program being run "under valgrind". So, I am not sure to understand why there is anything special to do related to parent process when running under valgrind. > > > > Please open a bug at https://bugs.kde.org/enter_bug.cgi?product=valgrind > > > and attach a minimal test case if at all possible. > > Will try to create a test case, but it's not that easy as for simple > programs valgrind seams to pass KILL forwards. Maybe this is only a > problem with threaded programs. > > https://www.mail-archive.com/valgrind-users@lists.sourceforge.net/msg06862.html > seams to highlight the same problem. As far as I can see, this message and your problem of hanging after sending SIGKILL is the bug 409141. And you have double checked the git version also fixes the hanging bug. So, IMO, there is no need to have another reproducer. > I have now tested with 10.3.16.GIT from today and did run it with > --trace-signals=yes. > In this case the SIGKILL is sent to the process and the process is > killed so it looks > like the bugs is fixed. Great! > However one problem still reminds. After the kill, we still get a > report of leaks which > is not that relevant when a process is killed hard. > Hope you will figure out a way to stop this report! Well, the valgrind signal sending interception code has rather be explicitly designed to do various "end of life actions" (such as run the leak search) when a process terminates or dies, including the case where a process sends a fatal signal to itself. It is of course possible to implement something to really have a "hard exit" and/or disable leak search. What is the best feature/way to do that is not (yet) clear to me: a more general approach such as loading a supp file or allow to change valgrind parameters is more attractive than e.g. a very specialised request such as VALGRIND_HARD_EXIT(); In the meantime, if my understanding is correct, system ("kill -9 ..."); as explained above gives a hard exit without allowing valgrind to intercept the signal and do a leak search or any other closing actions. If that does not work/is not usable, then I have missed something in what you are trying to do. Further comments welcome ... Philippe ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Getting SIGKILL to work in MariaDB
Hi! On Mon, Aug 12, 2019 at 11:59 PM Philippe Waroquiers wrote: > > On Mon, 2019-08-12 at 13:49 +0100, Tom Hughes wrote: > > On 12/08/2019 13:15, Michael Widenius wrote: > > > > > The request I would like you to consider is to do one of the following: > > > - Ensure that sending a sigkill works and in this case there should > > > not be any valgrind leak report. > What version of Valgrind are you using ? valgrind-3.15.0 (latest from download page) > I recently fixed (10 of July) in the git repository a bug with an infinite > loop > or a hang when a process sends a signal to itself. > > See bugs 409141 and 409367. Sending SIGFPE works, but not SIGKILL. I would prefer to use SIGKILL as this used, at least in the past, to kill the process without any leak reporting. > So, you might test with a recent git version of Valgrind. I can try that at once. Thanks. Testing... hm.. It would be good to describe in README_DEVELOPERS how to generate the autoconfigure scripts. Now it says one to run 'make dist', which will not work out of the box. Same problem with README.solaris. Please add that one should run first 'autogen.sh' and then configure. See more about this at end of email: > > > - Add an api call where we could specify that we don't want any leak > > > reports from now on. If this would exist then I could call this when > > > we are about to send the SIGFPE/SIGKILL signal to the server. > A stackoverflow discussion led to a suggestion to have a > monitor command/client request to load a new suppression file. > This would allow to load a supp file suppressing all leaks. Yes, could be useful but then the binary would need to know where the suppression files are , which is harder for dynamic test system that starts the mysqld binary with different patch depending on configuration. > Allowing to load a suppression file via a client request might be useful > in other circumstances (the stackoverflow case was to avoid any leak > search when a program fails between a fork and an exec, but it > could be used to suppress flexibly whatever kind of errors and/or > automatically load a supp file e.g. when loading a shared lib). Yes, I agree that for many programs that would be useful. However for any program that just need a way to kill itself 'hard' and not get any more reports from valgrind that is a bit of overkill. > Alternatively, the only reason why valgrind can do a leak search > when your process calls "kill (me, 9);" > is that valgrind is intercepting the syscall and does a leak > search before really self-killing. Currently the kill signal is never sent to the process and the process continues to run. > You can avoid this interception by instead doing something like >system("kill -9 me"); > as no way valgrind will be able to intercept this. > (of course, me has to be the pid of the calling process). In this case the process, in this case mysqld, needs to know who it's 'father' process is. This is a bit hard to provide to mysqld, as we only know the pid after valgrind is started with mysqld as an option. Another issue is that mysqld doesn't know if it's run under valgrind or not and it should only kill it's parent if it's valgrind. (Assuming I understood correctly what you meant with calling process; I think you mean the 'valgrind' process in this case) > > Please open a bug at https://bugs.kde.org/enter_bug.cgi?product=valgrind > > and attach a minimal test case if at all possible. Will try to create a test case, but it's not that easy as for simple programs valgrind seams to pass KILL forwards. Maybe this is only a problem with threaded programs. https://www.mail-archive.com/valgrind-users@lists.sourceforge.net/msg06862.html seams to highlight the same problem. currently one can see this in MariaDB 10.4 from git by running: mysql-test-run --valgrind innodb.alter_copy which I am agree is not a simple test cases. If I can't figure a good workaround for MariaDB I will try to create a simpler test case and report it properly. > > Also the output of running valgrind with --trace-signals=yes would be > > a good thing to provide. I have now tested with 10.3.16.GIT from today and did run it with --trace-signals=yes. In this case the SIGKILL is sent to the process and the process is killed so it looks like the bugs is fixed. Great! However one problem still reminds. After the kill, we still get a report of leaks which is not that relevant when a process is killed hard. Hope you will figure out a way to stop this report! Regards, Monty ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Getting SIGKILL to work in MariaDB
On Mon, 2019-08-12 at 13:49 +0100, Tom Hughes wrote: > On 12/08/2019 13:15, Michael Widenius wrote: > > > The request I would like you to consider is to do one of the following: > > - Ensure that sending a sigkill works and in this case there should > > not be any valgrind leak report. What version of Valgrind are you using ? I recently fixed (10 of July) in the git repository a bug with an infinite loop or a hang when a process sends a signal to itself. See bugs 409141 and 409367. So, you might test with a recent git version of Valgrind. > > - Add an api call where we could specify that we don't want any leak > > reports from now on. If this would exist then I could call this when > > we are about to send the SIGFPE/SIGKILL signal to the server. A stackoverflow discussion led to a suggestion to have a monitor command/client request to load a new suppression file. This would allow to load a supp file suppressing all leaks. Allowing to load a suppression file via a client request might be useful in other circumstances (the stackoverflow case was to avoid any leak search when a program fails between a fork and an exec, but it could be used to suppress flexibly whatever kind of errors and/or automatically load a supp file e.g. when loading a shared lib). Alternatively, the only reason why valgrind can do a leak search when your process calls "kill (me, 9);" is that valgrind is intercepting the syscall and does a leak search before really self-killing. You can avoid this interception by instead doing something like system("kill -9 me"); as no way valgrind will be able to intercept this. (of course, me has to be the pid of the calling process). Philippe > > Please open a bug at https://bugs.kde.org/enter_bug.cgi?product=valgrind > and attach a minimal test case if at all possible. > > Also the output of running valgrind with --trace-signals=yes would be > a good thing to provide. > > Tom > ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Getting SIGKILL to work in MariaDB
On 12/08/2019 13:15, Michael Widenius wrote: The request I would like you to consider is to do one of the following: - Ensure that sending a sigkill works and in this case there should not be any valgrind leak report. - Add an api call where we could specify that we don't want any leak reports from now on. If this would exist then I could call this when we are about to send the SIGFPE/SIGKILL signal to the server. Please open a bug at https://bugs.kde.org/enter_bug.cgi?product=valgrind and attach a minimal test case if at all possible. Also the output of running valgrind with --trace-signals=yes would be a good thing to provide. Tom -- Tom Hughes (t...@compton.nu) http://compton.nu/ ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
[Valgrind-users] Getting SIGKILL to work in MariaDB
Hi! I am the creator of MySQL and MariaDB and a valgrind user since a LONG time! valgrind has been used to verify the integrity of MySQL since the very early times of valgrind! I have a small request to solve a small problem with valgrind in MariaDB: To be able to test recovery during testing, our test system takes down the mysqld server process hard and schedules a restart of the server. We used to do that by sending sigkill internally in the the server to itself and then the test waits for the server to restart and then the test continues. I noticed recently that when running the server under valgrind, the sigkill was ignored and thus the test stalled. (I think this worked with some earlier version of valgrind, but I am not sure about this) I have now fixed this by sending a SIGFPE signal instead and setting an internal variable to mark that we shouldn't write to the log that the server died (as the test system who examines the logs would think something went wrong). The problem is that when the server dies hard with SIGFPE, the memtool writes out the leaks and the test system thinks something is wrong. The request I would like you to consider is to do one of the following: - Ensure that sending a sigkill works and in this case there should not be any valgrind leak report. - Add an api call where we could specify that we don't want any leak reports from now on. If this would exist then I could call this when we are about to send the SIGFPE/SIGKILL signal to the server. Something like the following would be very useful: VALGRIND_IGNORE_LEAKS(VALGRIND_LEAK_INDIRECT | VALGRIND_LEAK_DEFINITE...) Regards, Monty ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users