Re: [lttng-dev] [lttng-ust] Missing events when just before a process exits

2018-03-16 Thread THEUNISSEN Rolf


> -Original Message-
> From: Mathieu Desnoyers [mailto:mathieu.desnoy...@efficios.com]
> Sent: Wednesday, March 14, 2018 6:25 PM
> To: THEUNISSEN Rolf <rolf.theunis...@altran.com>
> Cc: lttng-dev <lttng-dev@lists.lttng.org>
> Subject: Re: [lttng-dev] [lttng-ust] Missing events when just before a process
> exits
> 
> - On Mar 14, 2018, at 12:37 PM, THEUNISSEN Rolf
> rolf.theunis...@altran.com wrote:
> 
> >> -Original Message-
> >> From: Jonathan Rajotte-Julien
> >> [mailto:jonathan.rajotte-jul...@efficios.com]
> >> Sent: Tuesday, March 13, 2018 4:35 PM
> >> To: THEUNISSEN Rolf <rolf.theunis...@altran.com>
> >> Cc: lttng-dev@lists.lttng.org
> >> Subject: Re: [lttng-dev] [lttng-ust] Missing events when just before
> >> a process exits
> >>
> >> Hi Rolf,
> >>
> >> On Tue, Mar 13, 2018 at 10:48:17AM +, THEUNISSEN Rolf wrote:
> >> > Hi,
> >> >
> >> > I am currently tracing many processes with LTTng-UST on a system
> >> > under
> >> heavy load, with 2 CPUs.
> >> >
> >> > The traces seem to be missing ust-events near the end of a trace.
> >> > After
> >> doing some analysis, it seems that events are missing from the other
> >> CPU then the process exits on. More concretely:
> >> >
> >> > During the lifetime of a process:
> >> > - Process X executes on CPU 0,1: all events are in the trace Just
> >> > before the process is about to exit:
> >> > - Process X executes on CPU 0: missing events from the trace
> >>
> >> How are you validating this?
> >
> > I am also using kernel tracing, to trace the scheduler. In
> > TraceCompass I can view the states of the processes and CPUs.
> >
> >>
> >> > A few mili-seconds later:
> >> > - Process X executes on CPU1 and exits: The last events are in the
> >> > trace
> >>
> >> This is not expected. Do you have a reproducer so we can check what
> >> might be happening?
> >
> > I am not allowed to provide details about my current traces. And it is
> > hard to make a small example, but I got some more clues what is
> happening, see below.
> >
> >>
> >> We will need the version for lttng-ust, lttng-tools, babeltrace.
> >
> > I am using version 2.10, for analyzing the traces I am using
> > TraceCompass Kernel version: 3.10.62
> >
> >>
> >> Note that you can run your application with lttng-ust in debug mode
> >> using LTTNG_UST_DEBUG=y. Debug statements will be outputted on
> stderr.
> >>
> >> e.g:
> >> $ LTTNG_UST_DEBUG=y ./my_sample_app
> >>
> >
> > Enableling this debugging flag gave me a clue what is happening. For
> > all processes, the registration is going OK. But for some processes,
> > the un-registration is not logged. On closer analysis (now with
> > tracepoints from SystemTap), I see that the processes for which the
> > events are missing and the un-registration is missing are killed by
> > SIGKILL signal. This probably explains why some data can be lost. But
> > maybe strange that only messages get lost that were traced on a different
> CPU.
> >
> > I remain puzzled why the SIGKILL arrives at the process, as a SIGTERM
> > is sent to the process. Somehow the kernel decides to deliver an
> > SIGKILL instead of a SIGTERM.
> 
> When using lttng-ust with per-uid buffers, there is a known design limitation
> regarding process terminated by non-handled kill signals: if this occurs while
> the buffer is being written to (between reserve and commit), it will cause the
> rest of the per-cpu buffer to be unreadable. This applies to all events
> recorded by all applications of the same user id for that particular CPU.
> Recovering from this involves destroying and re-creating the tracing session.
> 
> If this kind of scenario is likely for you, you might want to consider using 
> per-
> pid buffers (lttng enable-channel -u --buffers-pid), which do not suffer from
> this design limitation. The downside of per-pid buffers is that it allocates
> more memory on your system for buffering, and adds extra overhead when
> used with processes that have a short life-time.
> 
> Thanks,
> 
> Mathieu
> 

Thanks for your reply, indeed I was using --buffers-uid when I first observed 
this issue. Then I switched back to --buffers-pid, as I suspected it to be more 
robust, however the issue did not resolve (though it might occur less often). 
Anyways, it is expected to l

Re: [lttng-dev] [lttng-ust] Missing events when just before a process exits

2018-03-14 Thread Mathieu Desnoyers
- On Mar 14, 2018, at 12:37 PM, THEUNISSEN Rolf rolf.theunis...@altran.com 
wrote:

>> -Original Message-
>> From: Jonathan Rajotte-Julien [mailto:jonathan.rajotte-jul...@efficios.com]
>> Sent: Tuesday, March 13, 2018 4:35 PM
>> To: THEUNISSEN Rolf <rolf.theunis...@altran.com>
>> Cc: lttng-dev@lists.lttng.org
>> Subject: Re: [lttng-dev] [lttng-ust] Missing events when just before a 
>> process
>> exits
>> 
>> Hi Rolf,
>> 
>> On Tue, Mar 13, 2018 at 10:48:17AM +, THEUNISSEN Rolf wrote:
>> > Hi,
>> >
>> > I am currently tracing many processes with LTTng-UST on a system under
>> heavy load, with 2 CPUs.
>> >
>> > The traces seem to be missing ust-events near the end of a trace. After
>> doing some analysis, it seems that events are missing from the other CPU
>> then the process exits on. More concretely:
>> >
>> > During the lifetime of a process:
>> > - Process X executes on CPU 0,1: all events are in the trace Just
>> > before the process is about to exit:
>> > - Process X executes on CPU 0: missing events from the trace
>> 
>> How are you validating this?
> 
> I am also using kernel tracing, to trace the scheduler. In TraceCompass I can
> view the states of the processes and CPUs.
> 
>> 
>> > A few mili-seconds later:
>> > - Process X executes on CPU1 and exits: The last events are in the
>> > trace
>> 
>> This is not expected. Do you have a reproducer so we can check what might
>> be happening?
> 
> I am not allowed to provide details about my current traces. And it is hard to
> make a small example, but I got some more clues what is happening, see below.
> 
>> 
>> We will need the version for lttng-ust, lttng-tools, babeltrace.
> 
> I am using version 2.10, for analyzing the traces I am using TraceCompass
> Kernel version: 3.10.62
> 
>> 
>> Note that you can run your application with lttng-ust in debug mode using
>> LTTNG_UST_DEBUG=y. Debug statements will be outputted on stderr.
>> 
>> e.g:
>> $ LTTNG_UST_DEBUG=y ./my_sample_app
>> 
> 
> Enableling this debugging flag gave me a clue what is happening. For all
> processes, the registration is going OK. But for some processes, the
> un-registration is not logged. On closer analysis (now with tracepoints from
> SystemTap), I see that the processes for which the events are missing and the
> un-registration is missing are killed by SIGKILL signal. This probably 
> explains
> why some data can be lost. But maybe strange that only messages get lost that
> were traced on a different CPU.
> 
> I remain puzzled why the SIGKILL arrives at the process, as a SIGTERM is sent 
> to
> the process. Somehow the kernel decides to deliver an SIGKILL instead of a
> SIGTERM.

When using lttng-ust with per-uid buffers, there is a known design
limitation regarding process terminated by non-handled kill signals: if
this occurs while the buffer is being written to (between reserve and commit),
it will cause the rest of the per-cpu buffer to be unreadable. This applies to 
all
events recorded by all applications of the same user id for that particular
CPU. Recovering from this involves destroying and re-creating the tracing 
session.

If this kind of scenario is likely for you, you might want to consider using
per-pid buffers (lttng enable-channel -u --buffers-pid), which do not suffer
from this design limitation. The downside of per-pid buffers is that it 
allocates
more memory on your system for buffering, and adds extra overhead when used with
processes that have a short life-time.

Thanks,

Mathieu


> 
> Rolf
> 
>> Cheers
>> 
>> --
>> Jonathan Rajotte-Julien
>> EfficiOS
> ___
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [lttng-ust] Missing events when just before a process exits

2018-03-14 Thread THEUNISSEN Rolf
> -Original Message-
> From: Jonathan Rajotte-Julien [mailto:jonathan.rajotte-jul...@efficios.com]
> Sent: Tuesday, March 13, 2018 4:35 PM
> To: THEUNISSEN Rolf <rolf.theunis...@altran.com>
> Cc: lttng-dev@lists.lttng.org
> Subject: Re: [lttng-dev] [lttng-ust] Missing events when just before a process
> exits
> 
> Hi Rolf,
> 
> On Tue, Mar 13, 2018 at 10:48:17AM +, THEUNISSEN Rolf wrote:
> > Hi,
> >
> > I am currently tracing many processes with LTTng-UST on a system under
> heavy load, with 2 CPUs.
> >
> > The traces seem to be missing ust-events near the end of a trace. After
> doing some analysis, it seems that events are missing from the other CPU
> then the process exits on. More concretely:
> >
> > During the lifetime of a process:
> > - Process X executes on CPU 0,1: all events are in the trace Just
> > before the process is about to exit:
> > - Process X executes on CPU 0: missing events from the trace
> 
> How are you validating this?

I am also using kernel tracing, to trace the scheduler. In TraceCompass I can 
view the states of the processes and CPUs.

> 
> > A few mili-seconds later:
> > - Process X executes on CPU1 and exits: The last events are in the
> > trace
> 
> This is not expected. Do you have a reproducer so we can check what might
> be happening?

I am not allowed to provide details about my current traces. And it is hard to 
make a small example, but I got some more clues what is happening, see below.

> 
> We will need the version for lttng-ust, lttng-tools, babeltrace.

I am using version 2.10, for analyzing the traces I am using TraceCompass
Kernel version: 3.10.62

> 
> Note that you can run your application with lttng-ust in debug mode using
> LTTNG_UST_DEBUG=y. Debug statements will be outputted on stderr.
> 
> e.g:
> $ LTTNG_UST_DEBUG=y ./my_sample_app
> 

Enableling this debugging flag gave me a clue what is happening. For all 
processes, the registration is going OK. But for some processes, the 
un-registration is not logged. On closer analysis (now with tracepoints from 
SystemTap), I see that the processes for which the events are missing and the 
un-registration is missing are killed by SIGKILL signal. This probably explains 
why some data can be lost. But maybe strange that only messages get lost that 
were traced on a different CPU.

I remain puzzled why the SIGKILL arrives at the process, as a SIGTERM is sent 
to the process. Somehow the kernel decides to deliver an SIGKILL instead of a 
SIGTERM.

Rolf

> Cheers
> 
> --
> Jonathan Rajotte-Julien
> EfficiOS
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [lttng-ust] Missing events when just before a process exits

2018-03-13 Thread Mathieu Desnoyers
- On Mar 13, 2018, at 6:48 AM, THEUNISSEN Rolf rolf.theunis...@altran.com 
wrote:

> Hi,
> 
> I am currently tracing many processes with LTTng-UST on a system under heavy
> load, with 2 CPUs.
> 
> The traces seem to be missing ust-events near the end of a trace. After doing
> some analysis, it seems that events are missing from the other CPU then the
> process exits on. More concretely:
> 
> During the lifetime of a process:
> - Process X executes on CPU 0,1: all events are in the trace
> Just before the process is about to exit:
> - Process X executes on CPU 0: missing events from the trace
> A few mili-seconds later:
> - Process X executes on CPU1 and exits: The last events are in the trace
> 
> Is this behavior expected, or am I running into an LTTng-UST bug? And would
> there be any workaround?

Can you provide more information about the tracing configuration you are using ?
How is the session created ?

Which versions of lttng-{tools,ust} and babeltrace do you use ? Are there any
discarded events in the trace reported by babeltrace ?

If you could provide the resulting trace, it would also be helpful.

Thanks,

Mathieu


> 
> Kind regards,
> Rolf Theunissen
> 
> 
> ___
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [lttng-ust] Missing events when just before a process exits

2018-03-13 Thread Jonathan Rajotte-Julien
Hi Rolf,

On Tue, Mar 13, 2018 at 10:48:17AM +, THEUNISSEN Rolf wrote:
> Hi,
> 
> I am currently tracing many processes with LTTng-UST on a system under heavy 
> load, with 2 CPUs.
> 
> The traces seem to be missing ust-events near the end of a trace. After doing 
> some analysis, it seems that events are missing from the other CPU then the 
> process exits on. More concretely:
> 
> During the lifetime of a process:
> - Process X executes on CPU 0,1: all events are in the trace
> Just before the process is about to exit:
> - Process X executes on CPU 0: missing events from the trace

How are you validating this?

> A few mili-seconds later:
> - Process X executes on CPU1 and exits: The last events are in the trace

This is not expected. Do you have a reproducer so we can check what might be
happening?

We will need the version for lttng-ust, lttng-tools, babeltrace.

Note that you can run your application with lttng-ust in debug mode using
LTTNG_UST_DEBUG=y. Debug statements will be outputted on stderr.

e.g:
$ LTTNG_UST_DEBUG=y ./my_sample_app

Cheers

-- 
Jonathan Rajotte-Julien
EfficiOS
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [lttng-ust] Missing events when just before a process exits

2018-03-13 Thread THEUNISSEN Rolf
Hi,

I am currently tracing many processes with LTTng-UST on a system under heavy 
load, with 2 CPUs.

The traces seem to be missing ust-events near the end of a trace. After doing 
some analysis, it seems that events are missing from the other CPU then the 
process exits on. More concretely:

During the lifetime of a process:
- Process X executes on CPU 0,1: all events are in the trace
Just before the process is about to exit:
- Process X executes on CPU 0: missing events from the trace
A few mili-seconds later:
- Process X executes on CPU1 and exits: The last events are in the trace

Is this behavior expected, or am I running into an LTTng-UST bug? And would 
there be any workaround?

Kind regards,
Rolf Theunissen


___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev