Re: [lttng-dev] [lttng-ust] Missing events when just before a process exits
> -Original Message- > From: Mathieu Desnoyers [mailto:mathieu.desnoy...@efficios.com] > Sent: Wednesday, March 14, 2018 6:25 PM > To: THEUNISSEN Rolf <rolf.theunis...@altran.com> > Cc: lttng-dev <lttng-dev@lists.lttng.org> > Subject: Re: [lttng-dev] [lttng-ust] Missing events when just before a process > exits > > - On Mar 14, 2018, at 12:37 PM, THEUNISSEN Rolf > rolf.theunis...@altran.com wrote: > > >> -Original Message- > >> From: Jonathan Rajotte-Julien > >> [mailto:jonathan.rajotte-jul...@efficios.com] > >> Sent: Tuesday, March 13, 2018 4:35 PM > >> To: THEUNISSEN Rolf <rolf.theunis...@altran.com> > >> Cc: lttng-dev@lists.lttng.org > >> Subject: Re: [lttng-dev] [lttng-ust] Missing events when just before > >> a process exits > >> > >> Hi Rolf, > >> > >> On Tue, Mar 13, 2018 at 10:48:17AM +, THEUNISSEN Rolf wrote: > >> > Hi, > >> > > >> > I am currently tracing many processes with LTTng-UST on a system > >> > under > >> heavy load, with 2 CPUs. > >> > > >> > The traces seem to be missing ust-events near the end of a trace. > >> > After > >> doing some analysis, it seems that events are missing from the other > >> CPU then the process exits on. More concretely: > >> > > >> > During the lifetime of a process: > >> > - Process X executes on CPU 0,1: all events are in the trace Just > >> > before the process is about to exit: > >> > - Process X executes on CPU 0: missing events from the trace > >> > >> How are you validating this? > > > > I am also using kernel tracing, to trace the scheduler. In > > TraceCompass I can view the states of the processes and CPUs. > > > >> > >> > A few mili-seconds later: > >> > - Process X executes on CPU1 and exits: The last events are in the > >> > trace > >> > >> This is not expected. Do you have a reproducer so we can check what > >> might be happening? > > > > I am not allowed to provide details about my current traces. And it is > > hard to make a small example, but I got some more clues what is > happening, see below. > > > >> > >> We will need the version for lttng-ust, lttng-tools, babeltrace. > > > > I am using version 2.10, for analyzing the traces I am using > > TraceCompass Kernel version: 3.10.62 > > > >> > >> Note that you can run your application with lttng-ust in debug mode > >> using LTTNG_UST_DEBUG=y. Debug statements will be outputted on > stderr. > >> > >> e.g: > >> $ LTTNG_UST_DEBUG=y ./my_sample_app > >> > > > > Enableling this debugging flag gave me a clue what is happening. For > > all processes, the registration is going OK. But for some processes, > > the un-registration is not logged. On closer analysis (now with > > tracepoints from SystemTap), I see that the processes for which the > > events are missing and the un-registration is missing are killed by > > SIGKILL signal. This probably explains why some data can be lost. But > > maybe strange that only messages get lost that were traced on a different > CPU. > > > > I remain puzzled why the SIGKILL arrives at the process, as a SIGTERM > > is sent to the process. Somehow the kernel decides to deliver an > > SIGKILL instead of a SIGTERM. > > When using lttng-ust with per-uid buffers, there is a known design limitation > regarding process terminated by non-handled kill signals: if this occurs while > the buffer is being written to (between reserve and commit), it will cause the > rest of the per-cpu buffer to be unreadable. This applies to all events > recorded by all applications of the same user id for that particular CPU. > Recovering from this involves destroying and re-creating the tracing session. > > If this kind of scenario is likely for you, you might want to consider using > per- > pid buffers (lttng enable-channel -u --buffers-pid), which do not suffer from > this design limitation. The downside of per-pid buffers is that it allocates > more memory on your system for buffering, and adds extra overhead when > used with processes that have a short life-time. > > Thanks, > > Mathieu > Thanks for your reply, indeed I was using --buffers-uid when I first observed this issue. Then I switched back to --buffers-pid, as I suspected it to be more robust, however the issue did not resolve (though it might occur less often). Anyways, it is expected to l
Re: [lttng-dev] [lttng-ust] Missing events when just before a process exits
- On Mar 14, 2018, at 12:37 PM, THEUNISSEN Rolf rolf.theunis...@altran.com wrote: >> -Original Message- >> From: Jonathan Rajotte-Julien [mailto:jonathan.rajotte-jul...@efficios.com] >> Sent: Tuesday, March 13, 2018 4:35 PM >> To: THEUNISSEN Rolf <rolf.theunis...@altran.com> >> Cc: lttng-dev@lists.lttng.org >> Subject: Re: [lttng-dev] [lttng-ust] Missing events when just before a >> process >> exits >> >> Hi Rolf, >> >> On Tue, Mar 13, 2018 at 10:48:17AM +, THEUNISSEN Rolf wrote: >> > Hi, >> > >> > I am currently tracing many processes with LTTng-UST on a system under >> heavy load, with 2 CPUs. >> > >> > The traces seem to be missing ust-events near the end of a trace. After >> doing some analysis, it seems that events are missing from the other CPU >> then the process exits on. More concretely: >> > >> > During the lifetime of a process: >> > - Process X executes on CPU 0,1: all events are in the trace Just >> > before the process is about to exit: >> > - Process X executes on CPU 0: missing events from the trace >> >> How are you validating this? > > I am also using kernel tracing, to trace the scheduler. In TraceCompass I can > view the states of the processes and CPUs. > >> >> > A few mili-seconds later: >> > - Process X executes on CPU1 and exits: The last events are in the >> > trace >> >> This is not expected. Do you have a reproducer so we can check what might >> be happening? > > I am not allowed to provide details about my current traces. And it is hard to > make a small example, but I got some more clues what is happening, see below. > >> >> We will need the version for lttng-ust, lttng-tools, babeltrace. > > I am using version 2.10, for analyzing the traces I am using TraceCompass > Kernel version: 3.10.62 > >> >> Note that you can run your application with lttng-ust in debug mode using >> LTTNG_UST_DEBUG=y. Debug statements will be outputted on stderr. >> >> e.g: >> $ LTTNG_UST_DEBUG=y ./my_sample_app >> > > Enableling this debugging flag gave me a clue what is happening. For all > processes, the registration is going OK. But for some processes, the > un-registration is not logged. On closer analysis (now with tracepoints from > SystemTap), I see that the processes for which the events are missing and the > un-registration is missing are killed by SIGKILL signal. This probably > explains > why some data can be lost. But maybe strange that only messages get lost that > were traced on a different CPU. > > I remain puzzled why the SIGKILL arrives at the process, as a SIGTERM is sent > to > the process. Somehow the kernel decides to deliver an SIGKILL instead of a > SIGTERM. When using lttng-ust with per-uid buffers, there is a known design limitation regarding process terminated by non-handled kill signals: if this occurs while the buffer is being written to (between reserve and commit), it will cause the rest of the per-cpu buffer to be unreadable. This applies to all events recorded by all applications of the same user id for that particular CPU. Recovering from this involves destroying and re-creating the tracing session. If this kind of scenario is likely for you, you might want to consider using per-pid buffers (lttng enable-channel -u --buffers-pid), which do not suffer from this design limitation. The downside of per-pid buffers is that it allocates more memory on your system for buffering, and adds extra overhead when used with processes that have a short life-time. Thanks, Mathieu > > Rolf > >> Cheers >> >> -- >> Jonathan Rajotte-Julien >> EfficiOS > ___ > lttng-dev mailing list > lttng-dev@lists.lttng.org > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [lttng-ust] Missing events when just before a process exits
> -Original Message- > From: Jonathan Rajotte-Julien [mailto:jonathan.rajotte-jul...@efficios.com] > Sent: Tuesday, March 13, 2018 4:35 PM > To: THEUNISSEN Rolf <rolf.theunis...@altran.com> > Cc: lttng-dev@lists.lttng.org > Subject: Re: [lttng-dev] [lttng-ust] Missing events when just before a process > exits > > Hi Rolf, > > On Tue, Mar 13, 2018 at 10:48:17AM +, THEUNISSEN Rolf wrote: > > Hi, > > > > I am currently tracing many processes with LTTng-UST on a system under > heavy load, with 2 CPUs. > > > > The traces seem to be missing ust-events near the end of a trace. After > doing some analysis, it seems that events are missing from the other CPU > then the process exits on. More concretely: > > > > During the lifetime of a process: > > - Process X executes on CPU 0,1: all events are in the trace Just > > before the process is about to exit: > > - Process X executes on CPU 0: missing events from the trace > > How are you validating this? I am also using kernel tracing, to trace the scheduler. In TraceCompass I can view the states of the processes and CPUs. > > > A few mili-seconds later: > > - Process X executes on CPU1 and exits: The last events are in the > > trace > > This is not expected. Do you have a reproducer so we can check what might > be happening? I am not allowed to provide details about my current traces. And it is hard to make a small example, but I got some more clues what is happening, see below. > > We will need the version for lttng-ust, lttng-tools, babeltrace. I am using version 2.10, for analyzing the traces I am using TraceCompass Kernel version: 3.10.62 > > Note that you can run your application with lttng-ust in debug mode using > LTTNG_UST_DEBUG=y. Debug statements will be outputted on stderr. > > e.g: > $ LTTNG_UST_DEBUG=y ./my_sample_app > Enableling this debugging flag gave me a clue what is happening. For all processes, the registration is going OK. But for some processes, the un-registration is not logged. On closer analysis (now with tracepoints from SystemTap), I see that the processes for which the events are missing and the un-registration is missing are killed by SIGKILL signal. This probably explains why some data can be lost. But maybe strange that only messages get lost that were traced on a different CPU. I remain puzzled why the SIGKILL arrives at the process, as a SIGTERM is sent to the process. Somehow the kernel decides to deliver an SIGKILL instead of a SIGTERM. Rolf > Cheers > > -- > Jonathan Rajotte-Julien > EfficiOS ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [lttng-ust] Missing events when just before a process exits
- On Mar 13, 2018, at 6:48 AM, THEUNISSEN Rolf rolf.theunis...@altran.com wrote: > Hi, > > I am currently tracing many processes with LTTng-UST on a system under heavy > load, with 2 CPUs. > > The traces seem to be missing ust-events near the end of a trace. After doing > some analysis, it seems that events are missing from the other CPU then the > process exits on. More concretely: > > During the lifetime of a process: > - Process X executes on CPU 0,1: all events are in the trace > Just before the process is about to exit: > - Process X executes on CPU 0: missing events from the trace > A few mili-seconds later: > - Process X executes on CPU1 and exits: The last events are in the trace > > Is this behavior expected, or am I running into an LTTng-UST bug? And would > there be any workaround? Can you provide more information about the tracing configuration you are using ? How is the session created ? Which versions of lttng-{tools,ust} and babeltrace do you use ? Are there any discarded events in the trace reported by babeltrace ? If you could provide the resulting trace, it would also be helpful. Thanks, Mathieu > > Kind regards, > Rolf Theunissen > > > ___ > lttng-dev mailing list > lttng-dev@lists.lttng.org > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [lttng-ust] Missing events when just before a process exits
Hi Rolf, On Tue, Mar 13, 2018 at 10:48:17AM +, THEUNISSEN Rolf wrote: > Hi, > > I am currently tracing many processes with LTTng-UST on a system under heavy > load, with 2 CPUs. > > The traces seem to be missing ust-events near the end of a trace. After doing > some analysis, it seems that events are missing from the other CPU then the > process exits on. More concretely: > > During the lifetime of a process: > - Process X executes on CPU 0,1: all events are in the trace > Just before the process is about to exit: > - Process X executes on CPU 0: missing events from the trace How are you validating this? > A few mili-seconds later: > - Process X executes on CPU1 and exits: The last events are in the trace This is not expected. Do you have a reproducer so we can check what might be happening? We will need the version for lttng-ust, lttng-tools, babeltrace. Note that you can run your application with lttng-ust in debug mode using LTTNG_UST_DEBUG=y. Debug statements will be outputted on stderr. e.g: $ LTTNG_UST_DEBUG=y ./my_sample_app Cheers -- Jonathan Rajotte-Julien EfficiOS ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] [lttng-ust] Missing events when just before a process exits
Hi, I am currently tracing many processes with LTTng-UST on a system under heavy load, with 2 CPUs. The traces seem to be missing ust-events near the end of a trace. After doing some analysis, it seems that events are missing from the other CPU then the process exits on. More concretely: During the lifetime of a process: - Process X executes on CPU 0,1: all events are in the trace Just before the process is about to exit: - Process X executes on CPU 0: missing events from the trace A few mili-seconds later: - Process X executes on CPU1 and exits: The last events are in the trace Is this behavior expected, or am I running into an LTTng-UST bug? And would there be any workaround? Kind regards, Rolf Theunissen ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev