Re: "switchtest" problem with recent 5.10 dovetail and distro-like kernel config

Jan Kiszka via Xenomai Thu, 07 Oct 2021 05:46:23 -0700

On 07.10.21 11:05, Bezdeka, Florian (T RDA IOT SES-DE) wrote:
> On Thu, 2021-10-07 at 08:55 +0000, Bezdeka, Florian via Xenomai wrote:
>> On Thu, 2021-10-07 at 08:53 +0000, Bezdeka, Florian via Xenomai wrote:
>>> On Thu, 2021-10-07 at 10:37 +0200, Jan Kiszka wrote:
>>>> On 05.10.21 17:18, Henning Schild wrote:
>>>>> Am Tue, 5 Oct 2021 16:38:12 +0200
>>>>> schrieb "Bezdeka, Florian (T RDA IOT SES-DE)"
>>>>> <florian.bezd...@siemens.com>:
>>>>>
>>>>>> On Tue, 2021-10-05 at 16:29 +0200, Jan Kiszka via Xenomai wrote:
>>>>>>> On 24.09.21 15:31, Henning Schild via Xenomai wrote:
>>>>>>>> Am Wed, 22 Sep 2021 16:21:54 +0200
>>>>>>>> schrieb Henning Schild via Xenomai <xenomai@xenomai.org>:
>>>>>>>>
>>>>>>>>> Am Wed, 22 Sep 2021 14:36:49 +0200
>>>>>>>>> schrieb Philippe Gerum <r...@xenomai.org>:
>>>>>>>>>
>>>>>>>>>> Henning Schild via Xenomai <xenomai@xenomai.org> writes:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> getting sick of maintaining a "small" kernel config i
>>>>>>>>>>> wanted to switch to a config derived from the debian11 5.10
>>>>>>>>>>> kernel. Basically that config plus the few switches coming
>>>>>>>>>>> in with dovetail.
>>>>>>>>>>>
>>>>>>>>>>> That passed all tests in qemu but on a real machine (big
>>>>>>>>>>> xeon with raid0) the "switchtest" has an issue.
>>>>>>>>>>>
>>>>>>>>>>> # /lib/xenomai/testsuite/switchtest
>>>>>>>>>>> == Testing FPU check routines...
>>>>>>>>>>> r0: 1 != 2
>>>>>>>>>>> r1: 1 != 2
>>>>>>>>>>> r2: 1 != 2
>>>>>>>>>>> r3: 1 != 2
>>>>>>>>>>> r4: 1 != 2
>>>>>>>>>>> r5: 1 != 2
>>>>>>>>>>> r6: 1 != 2
>>>>>>>>>>> r7: 1 != 2
>>>>>>>>>>> ymm0: 1/1 != 2/2
>>>>>>>>>>> ymm1: 1/1 != 2/2
>>>>>>>>>>> ymm2: 1/1 != 2/2
>>>>>>>>>>> ymm3: 1/1 != 2/2
>>>>>>>>>>> ymm4: 1/1 != 2/2
>>>>>>>>>>> ymm5: 1/1 != 2/2
>>>>>>>>>>> ymm6: 1/1 != 2/2
>>>>>>>>>>> ymm7: 1/1 != 2/2
>>>>>>>>>>> == FPU check routines: OK.
>>>>>>>>>>> == Threads: ... a lot of threads ...
>>>>>>>>>>> 30-16 rtuo_ufpp30-17 rtuo_ufpp30-18 rtuo_ufps30-19
>>>>>>>>>>> rtuo_ufps30-20 rtuo_ufpp_ufps30-21 rtuo_ufpp_ufps30-22
>>>>>>>>>>> sleeper_ufpthread_create: Resource temporarily unavailable
>>>>>>>>>>> ps31-0 rtk31-1 rtk31-2 rtk_fp31-3 rtk_fp31-4
>>>>>>>>>>> rtk_fp_ufpp31-5 rtk_fp_ufpp31-6 rtup31-7 rtup31-8
>>>>>>>>>>> rtup_ufpp31-9 rtup_ufpp31-10 rtus
>>>>>>>>>>>
>>>>>>>>>>> followed by only 0s
>>>>>>>>>>>
>>>>>>>>>>> RTH|---------cpu|ctx switches|-------total
>>>>>>>>>>> RTD|           0|           0|           0
>>>>>>>>>>> RTD|           1|           0|           0
>>>>>>>>>>> RTD|           2|           0|           0
>>>>>>>>>>> RTD|           3|           0|           0
>>>>>>>>>>> RTD|           4|           0|           0
>>>>>>>>>>> RTD|           5|           0|           0
>>>>>>>>>>> RTD|           6|           0|           0
>>>>>>>>>>>
>>>>>>>>>>> So it gets a
>>>>>>>>>>>
>>>>>>>>>>> sleeper_ufpthread_create: Resource temporarily unavailable
>>>>>>>>>>>
>>>>>>>>>>> but swallows that and keeps going. I think there might be a
>>>>>>>>>>> first issue with that error return not being dealt with.
>>>>>>>>>>>
>>>>>>>>>>> A second run get the "switchtest" stuck after crtl+c while
>>>>>>>>>>> the kernel starts complaining.
>>>>>>>>>>>
>>>>>>>>>>> [ 1229.072052] list_del corruption. prev->next should be
>>>>>>>>>>> ffffb24e46fb0548, but was ffffffff9a7356b0 [ 1229.072053]
>>>>>>>>>>> ------------[ cut here ]------------ [ 1229.072054] kernel
>>>>>>>>>>> BUG at lib/list_debug.c:51! [ 1229.072054] invalid opcode:
>>>>>>>>>>> 0000 [#1] SMP PTI IRQ_PIPELINE [ 1229.072054] CPU: 31 PID:
>>>>>>>>>>> 1857 Comm: switchtest Tainted: G            E
>>>>>>>>>>> 5.10.61-xenomai-1 #1 [ 1229.072055] Hardware name: secret [
>>>>>>>>>>> 1229.072055] IRQ stage: Linux [ 1229.072055] RIP:
>>>>>>>>>>> 0010:__list_del_entry_valid.cold+0x31/0x47 [ 1229.072056]
>>>>>>>>>>> Code: 99 32 9a e8 5f 0d ff ff 0f 0b 48 c7 c7 00 9a 32 9a e8
>>>>>>>>>>> 51 0d ff ff 0f 0b 48 89 b [ 1229.072057] RSP:
>>>>>>>>>>> 0018:ffffb24e4b5f7e30 EFLAGS: 00010046 [ 1229.072061] RAX:
>>>>>>>>>>> 0000000000000057 RBX: ffffb24e46fb04b0 RCX:
>>>>>>>>>>> 0000000000000000 [ 1229.072062] RDX: 0000000000000000 RSI:
>>>>>>>>>>> ffffa10e9fd9bba0 RDI: ffffa10e9fd97b68 [ 1229.072064] RBP:
>>>>>>>>>>> 0000000000000000 R08: 0000000000000001 R09:
>>>>>>>>>>> 000000000000000f [ 1229.072066] R10: 000000000000000f R11:
>>>>>>>>>>> ffffa10e9fd97bd6 R12: 0000000000000000 [ 1229.072069] R13:
>>>>>>>>>>> ffffa0ff488c7380 R14: 0000000000000000 R15:
>>>>>>>>>>> 0000000000000000 [ 1229.072072] FS:  00007f57e687bb80(0000)
>>>>>>>>>>> GS:ffffa10e9fd80000(0000) knlGS:0000000000000000 [
>>>>>>>>>>> 1229.072074] CS: 0010 DS: 0000 ES: 0000 CR0:
>>>>>>>>>>> 0000000080050033 [ 1229.072076] CR2: 000055921bfa921c CR3:
>>>>>>>>>>> 00000001062fe002 CR4: 00000000001706e0 [ 1229.072078] Call
>>>>>>>>>>> Trace: [ 1229.072080] xntimer_destroy+0x81/0x150 [
>>>>>>>>>>> 1229.072082] __xnthread_cleanup+0x1e/0x2a0 [ 1229.072084]
>>>>>>>>>>> cobalt_handle_taskexit_event+0x1d/0x40 [ 1229.072086]
>>>>>>>>>>> do_exit+0xe1/0xab0 [ 1229.072088]  ?
>>>>>>>>>>> signal_wake_up_state+0x23/0x40 [ 1229.072090]
>>>>>>>>>>> do_group_exit+0x33/0xa0 [ 1229.072092]
>>>>>>>>>>> __x64_sys_exit_group+0x14/0x20 [ 1229.072111]
>>>>>>>>>>> do_syscall_64+0x3f/0x90 [ 1229.072113]
>>>>>>>>>>> entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1229.072115]
>>>>>>>>>>> RIP: 0033:0x7f57e6948699 [ 1229.072117] Code: 00 4c 8b 05
>>>>>>>>>>> f9 27 0f 00 be e7 00 00 00 ba 3c 00 00 00 eb 12 0f 1f 44 00
>>>>>>>>>>> 00 89 0 [ 1229.072119] RSP: 002b:00007ffcfbda0078 EFLAGS:
>>>>>>>>>>> 00000246 ORIG_RAX: 00000000000000e7 [ 1229.072128] RAX:
>>>>>>>>>>> ffffffffffffffda RBX: 00007f57e6a3d610 RCX:
>>>>>>>>>>> 00007f57e6948699 [ 1229.072130] RDX: 000000000000003c RSI:
>>>>>>>>>>> 00000000000000e7 RDI: 0000000000000001 [ 1229.072132] RBP:
>>>>>>>>>>> 0000000000000001 R08: ffffffffffffff70 R09:
>>>>>>>>>>> 0000000000000001 [ 1229.072134] R10: 0000000000000005 R11:
>>>>>>>>>>> 0000000000000246 R12: 00007f57e6a3d610 [ 1229.072136] R13:
>>>>>>>>>>> 0000000000000002 R14: 00007f57e6a3dae8 R15:
>>>>>>>>>>> 0000000000000000 [ 1229.072138] Modules linked in: md4(E)
>>>>>>>>>>> sha512_ssse3(E) sha512_generic(E) cmac(E) nls_utf8(E) cifs)
>>>>>>>>>>> [ 1229.072169] mei(E) soundcore(E) sysimgblt(E) sg(E)
>>>>>>>>>>> evdev(E) ioatdma(E) joydev(E) watchdog(E) ac) [
>>>>>>>>>>> 1229.072185] ---[ end trace a1d1ac68468e74f0 ]--- [
>>>>>>>>>>> 1229.072186] RIP:
>>>>>>>>>>> 0010:__list_del_entry_valid.cold+0x31/0x47 [ 1229.072187]
>>>>>>>>>>> Code: 99 32 9a e8 5f 0d ff ff 0f 0b 48 c7 c7 00 9a 32 9a e8
>>>>>>>>>>> 51 0d ff ff 0f 0b 48 89 b [ 1229.072187] RSP:
>>>>>>>>>>> 0018:ffffb24e4b5f7e30 EFLAGS: 00010046 [ 1229.072188] RAX:
>>>>>>>>>>> 0000000000000057 RBX: ffffb24e46fb04b0 RCX:
>>>>>>>>>>> 0000000000000000 [ 1229.072188] RDX: 0000000000000000 RSI:
>>>>>>>>>>> ffffa10e9fd9bba0 RDI: ffffa10e9fd97b68 [ 1229.072189] RBP:
>>>>>>>>>>> 0000000000000000 R08: 0000000000000001 R09:
>>>>>>>>>>> 000000000000000f [ 1229.072189] R10: 000000000000000f R11:
>>>>>>>>>>> ffffa10e9fd97bd6 R12: 0000000000000000 [ 1229.072190] R13:
>>>>>>>>>>> ffffa0ff488c7380 R14: 0000000000000000 R15:
>>>>>>>>>>> 0000000000000000 [ 1229.072190] FS:  00007f57e687bb80(0000)
>>>>>>>>>>> GS:ffffa10e9fd80000(0000) knlGS:0000000000000000 [
>>>>>>>>>>> 1229.072190] CS: 0010 DS: 0000 ES: 0000 CR0:
>>>>>>>>>>> 0000000080050033 [ 1229.072191] CR2: 000055921bfa921c CR3:
>>>>>>>>>>> 00000001062fe002 CR4: 00000000001706e0 [ 1229.072191]
>>>>>>>>>>> Fixing recursive fault but reboot is needed!
>>>>>>>>>>>
>>>>>>>>>>> I will look into that eventually. For now i just wanted to
>>>>>>>>>>> share what i have so far, maybe someone knows what is going
>>>>>>>>>>> on or can give a hint. function tracing was way too verbose
>>>>>>>>>>> to find the rootcause of that EBUSY.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> EAGAIN, switchtest triggers this when many CPUs are available
>>>>>>>>>> but the system heap is too small (not enough space there to
>>>>>>>>>> create thread TCBs and other core objects). Normally,
>>>>>>>>>> switchtest exits properly after a while.
>>>>>>>>>
>>>>>>>>> Ok maybe that changed kernel config did shrink the available
>>>>>>>>> heap, thanks for the hint.
>>>>>>>>>
>>>>>>>>> If it is an EAGAIN because of too little resources, maybe it
>>>>>>>>> should keep trying with less and less CPUs. And spit a warning
>>>>>>>>> like "could only run on 20 out of 32 cores".
>>>>>>>>> At least that would be my first idea for a patch in case it is
>>>>>>>>> that.
>>>>>>>>
>>>>>>>> The test runs just fine when reducing the number of cores to run
>>>>>>>> on. In fact it was one too many ;)
>>>>>>>>
>>>>>>>> good
>>>>>>>> ./switchtest -T 5 --cpu-affinity=0-30
>>>>>>>> bad (where affinity includes all cpus)
>>>>>>>> ./switchtest -T 5 --cpu-affinity=0-31
>>>>>>>>
>>>>>>>>>> The kernel splat looks like some issue on the error path
>>>>>>>>>> which might not have been observed yet (maybe triggered by ^C
>>>>>>>>>> while switchtest was unwinding).
>>>>>>>>>
>>>>>>>>> I saw exactly that in probably three attempts. First switchtest
>>>>>>>>> killed with ctrl+c when the 0s come ... never any fun kernel
>>>>>>>>> message. But always fun kernel message when the second run was
>>>>>>>>> interrupted. So i would be tempted to call that reproducible
>>>>>>>>> and not a one-time bad timing.
>>>>>>>>
>>>>>>>> But when running the bad case two times in a row, it will always
>>>>>>>>
>>>>>>>> kernel BUG at lib/list_debug.c:51!
>>>>>>>>
>>>>>>>> Even when none of the two runs was interrupted by ctrl+c.
>>>>>>>>
>>>>>>>
>>>>>>> So, the main issue was solved by increasing resources, but we still
>>>>>>> have a problem with handling errors gracefully - is that the
>>>>>>> correct summary of this issue? Or where are we standing with this?
>>>>>
>>>>> By reducing the number of threads, but yes that issue was "worked
>>>>> around". The crash is something that was found as well and is still
>>>>> open, in fact pretty easy to reproduce.
>>>>>
>>>>
>>>> How can I reproduce that?
>>>
>>> I tried it again on the system that Henning was using when reporting
>>> the problem. We increased CONFIG_XENO_OPT_DEBUG from 512 to 4096 in the
>>> meantime.
> 
> s/CONFIG_XENO_OPT_DEBUG/CONFIG_XENO_OPT_REGISTRY_NRSLOTS
> 
>>>
>>> Taken while switchtest is running on that system:
>>>
>>> # cat /proc/xenomai/registry/usage
>>> 523/4096
>>>
>>> So with the old 512 slots we would run into a exhaustion on such a big
>>> system. IMHO that should be the root cause.
>>>
>>> Configuration problem, no bug.
>>
>> Correction: Maybe we have a problem in one of the error handling pathes
>> in switchtest?
> 
> Correction 2: I just overlooked the BUG() above. So I assume we have a
> problem, somewhere the registry exhaustion is not handled properly. The
> answer to "how can I reproduce that" would be "decrease
> CONFIG_XENO_OPT_REGISTRY_NRSLOTS and run switchtest".


Confirmed. I even just reproduced it with a simple pthread_create loop,
without any switchtest involved. Debugging...

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

Re: "switchtest" problem with recent 5.10 dovetail and distro-like kernel config

Reply via email to