On 05.04.22 19:23, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
>>> How about additionally widening the suspected race window by adding a
>>> delay to lostage_task_wakeup?
>>
>> Excellent idea! :-)
> 
> Yeah, with a dealy in lostage_task_wakeup() my WARN_ON_ONCE() triggers
> very quickly.
> 
> [  123.237698] ------------[ cut here ]------------
> [  123.238755] WARNING: CPU: 1 PID: 1411 at kernel/xenomai/thread.c:2158 
> xnthread_relax+0x5d4/0x680
> [  123.240698] Modules linked in: loader(OE) tun bridge stp llc nft_fib_inet 
> nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv6 nft_reject 
> nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables 
> xeno_can_peak_pci xeno_can_sja1000 xeno_can xeno_16550A libcrc32c nfnetlink 
> xeno_rtipc snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device snd_pcm snd_timer 
> snd soundcore sunrpc pktcdvd rt_e1000 rt_e1000_new crc32_pclmul rtnet 
> i2c_piix4 bochs drm_vram_helper drm_kms_helper syscopyarea sysfillrect 
> sysimgblt fb_sys_fops cec drm_ttm_helper ttm drm e1000 serio_raw crc32c_intel 
> ata_generic pata_acpi floppy qemu_fw_cfg fuse
> [  123.252790] CPU: 1 PID: 1411 Comm: app Tainted: G           OE     
> 5.15.9xeno3.2-x8664G-rw #6
> [  123.255001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
> [  123.257443] IRQ stage: Linux
> [  123.258090] RIP: 0010:xnthread_relax+0x5d4/0x680
> [  123.259136] Code: 18 05 00 00 e8 5d 8d 13 00 41 8b 97 18 05 00 00 48 8d b3 
> 6c 02 00 00 48 c7 c7 a0 8f 6d 82 e8 5c 96 cc 00 0f 0b e9 9d fc ff ff <0f> 0b 
> e9 af fd ff ff 65 44 8b 2d a5 2d ca 7e 41 83 fd 03 77 7a 45
> [  123.263163] RSP: 0018:ffff88811381fb60 EFLAGS: 00010202
> [  123.264330] RAX: 0000000000000022 RBX: ffffc90000bf6408 RCX: 
> ffffffff8137e13b
> [  123.265893] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 
> ffff88811381fc18
> [  123.267474] RBP: 1ffff11022703f6e R08: ffffed1022703f84 R09: 
> ffffed1022703f84
> [  123.269095] R10: ffff88811381fc1b R11: ffffed1022703f83 R12: 
> ffff88811381fc10
> [  123.270662] R13: ffff888104f03e00 R14: 0000000000000000 R15: 
> ffffc90000bf6428
> [  123.272222] FS:  00007fb4d0aca700(0000) GS:ffff88811b080000(0000) 
> knlGS:0000000000000000
> [  123.273983] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  123.275262] CR2: 0000000000a74008 CR3: 0000000111938003 CR4: 
> 0000000000170ea0
> [  123.276823] Call Trace:
> [  123.277401]  <TASK>
> [  123.277878]  ? xnthread_wait_period+0x4c0/0x4c0
> [  123.278899]  ? xnsynch_release+0x690/0x690
> [  123.279828]  ? __cobalt_sem_destroy+0x2dd/0x630
> [  123.280848]  ? recalibrate_cpu_khz+0x10/0x10
> [  123.281812]  ? xnthread_set_periodic+0x3a0/0x3a0
> [  123.282855]  ? recalibrate_cpu_khz+0x10/0x10
> [  123.283816]  ? ktime_get_mono_fast_ns+0xdb/0x120
> [  123.284852]  ? xnlock_dbg_release+0xd9/0x170
> [  123.285812]  prepare_for_signal+0x297/0x3a0
> [  123.286765]  ? CoBaLt_serialdbg+0x140/0x140
> [  123.287709]  ? cobalt_thread_setschedparam_ex+0x1a0/0x1a0
> [  123.288906]  handle_head_syscall+0x6e2/0x810
> [  123.289867]  ? __cobalt_cond_wait_prologue+0xf60/0xf60
> [  123.291017]  ? CoBaLt_trace+0x650/0x650
> [  123.291887]  ? cobalt_thread_setschedparam_ex+0x1a0/0x1a0
> [  123.293088]  pipeline_syscall+0x8e/0x140
> [  123.293979]  syscall_enter_from_user_mode+0x30/0x80
> [  123.295076]  do_syscall_64+0x1d/0xa0
> [  123.295895]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> 

Ok, we are seeing clearer but not everything. Could you find out if work
and thread are running on different CPUs? Or, via tracing, what led to
this case otherwise?

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux

Reply via email to