On 25.03.20 08:59, Bradley Valdenebro Peter (DC-AE/ESW52) wrote:
Thanks for your support Jan.
We have already tried with ftrace and we couldn't reproduce the issue. We think
because the introduced overhead. We usually record the following events:
sudo trace-cmd start -e ipi -e irq -e rcu -e workqueue_execute_end -e
workqueue_execute_start -e mm_page_alloc -e mm_page_free
-e sched_migrate_task -e sched_switch -e
cobalt_irq_exit -e cobalt_irq_entry -e cobalt_thread_migrate -e
-e cobalt_head_sysexit -e cobalt_clock_entry -e
Would changing to trace-cmd record -e "cobalt*" -e sched -e signal make a
If you already have troubles reproducing with your reduced event set,
this won't be easier with the extended cobalt events. I would rather try
to reduce the set further, maybe leaving out rcu, syscalls, allocations
and maybe even interrupts.
We have also tried using the ipipe tracer but there we also had no luck. We
didn't use xntrace_user_freeze() but we monitored the trace IRQs-off times in
Using xntrace_user_freeze() might be worth a try.
If you can reproduce under enabled ipipe trace, then this makes sense.
From: Jan Kiszka <jan.kis...@siemens.com>
Sent: 24 March 2020 12:53
To: Bradley Valdenebro Peter (DC-AE/ESW52)
Subject: Re: RT thread seems blocked
On 24.03.20 12:43, Bradley Valdenebro Peter (DC-AE/ESW52) wrote:
We run some tests during the weekend and although we have less occurrences we
still indeed see them.
One interesting thing we see besides the 1ms stalling is that sometimes we see
over 100us between IRQ and ISR start.
At this point we do not know how to proceed further. Any help or suggestion is
That is usually where you should start to look into tracing. Option A is
event-level tracing via standard ftrace. Use
trace-cmd record -e "cobalt*" -e sched -e signal
to record such a trace. Ideally instrument the point where the latency is off in your
application via xnftrace_printf() (will leave a mark in that recorded trace). Check via
"trace-cmd report" if the schedule around that peak is what you would expect.
For digging even deeper, down to function level:
specifically look into xntrace_user_freeze(), calling that from your
application when you spot an excessive latency to stop the trace.
If the result is not clear to you, share it to get a community inspection.
Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux