Hello. I am running on a Xilinx ZCU102 (ARM64, Cortex A53) development board and I've written a lot of userspace software against the POSIX skin of Cobalt 3.1.
For context, I have set isolcpus=1,2,3 as kernel boot args. I only run my Xenomai tasks on CPUs 1,2,3. I know that despite isolcpus, the Linux kernel will still run a few kthreads on those cores. I am worried that the cause of my issues might be my Xenomai tasks completely starving those Linux kthreads. I can boot the board and then start and run my application software just fine. Sometimes, if I start and then restart my application software, I get a kernel oops related to paging a virtual address that is out of bounds. (I am using a 4KB page size). (FWIW the stack trace on this oops is not always in rcu code) [25721.667165] Unable to handle kernel paging request at virtual address ffffff484585c150 [25721.675303] Mem abort info: [25721.678199] ESR = 0x96000004 [25721.681365] Exception class = DABT (current EL), IL = 32 bits [25721.687445] SET = 0, FnV = 0 [25721.690610] EA = 0, S1PTW = 0 [25721.693854] Data abort info: [25721.696836] ISV = 0, ISS = 0x00000004 [25721.700786] CM = 0, WnR = 0 [25721.703862] [ffffff484585c150] address between user and kernel address ranges [25721.711184] Internal error: Oops: 96000004 [#1] PREEMPT SMP [25721.716791] Modules linked in: gpio_zynq xilinx_can xeno_can_ddc(O) xeno_can_sja1000_ddc(O) xeno_can gpio_xilinx [25721.727087] Process bordline_dm_4 (pid: 24396, stack limit = 0x00000000b3644306) [25721.734521] CPU: 2 PID: 24396 Comm: bordline_dm_4 Tainted: G W O 4.19.55 #1 [25721.742654] Hardware name: ZynqMP ZCU102 Rev1.0 (DT) [25721.747645] I-pipe domain: Linux [25721.750894] pstate: 00000085 (nzcv daIf -PAN -UAO) [25721.755718] pc : ___xnsched_run+0xf8/0x640 [25721.759840] lr : ___xnsched_run+0xf8/0x640 [25721.763960] sp : ffffff8008013df0 [25721.767290] x29: ffffff800a0168b8 x28: ffffff800ad17410 [25721.772647] x27: ffffff800911f200 x26: ffffff8008ed4520 [25721.778003] x25: ffffff8008bfb9e8 x24: ffffff8009349000 [25721.783359] x23: ffffffc838478000 x22: ffffffc802216180 [25721.788715] x21: ffffff800ad13e10 x20: ffffff800817883c [25721.794072] x19: ffffff8008013e70 x18: 0000000000000000 [25721.799427] x17: 0000000000000208 x16: 0000000000000000 [25721.804784] x15: 0000000000000000 x14: 0000000000000000 [25721.810140] x13: 0000000000000000 x12: 0000000000000000 [25721.815496] x11: 0000000000000000 x10: 0000000000001890 [25721.820852] x9 : ffffff8008013dd0 x8 : ffffffc802217a70 [25721.826208] x7 : ffffff80090d8f18 x6 : ffffff80090d8f08 [25721.831564] x5 : 000000000000374a x4 : 0000000000000000 [25721.836920] x3 : 0000000000000000 x2 : 276c5ce9e4fb7000 [25721.842276] x1 : ffffffc802216180 x0 : ffffffc83ab48340 [25721.847633] Call trace: [25721.850096] ___xnsched_run+0xf8/0x640 [25721.853871] Code: 92400273 d53b4220 363818a0 97fe0ccb (f8606aa0) [25721.859998] I-pipe tracer log (100 points): [25721.864198] | #func 0 ipipe_trace_panic_freeze+0xc (oops_enter+0x18) [25721.872415] | #func -1 oops_enter+0xc (die+0x40) [25721.878792] | #func -2 preempt_count_add+0x18 (_raw_spin_lock_irqsave+0x28) [25721.887531] | #func -2 ipipe_root_only+0x18 (ipipe_test_and_stall_root+0x1c) [25721.896351] | #func -2 ipipe_test_and_stall_root+0x14 (_raw_spin_lock_irqsave+0x1c) [25721.905781] | #func -3 _raw_spin_lock_irqsave+0x14 (die+0x38) [25721.913298] | #func -3 die+0x2c (die_kernel_fault+0x68) [25721.920290] | #func -4 preempt_count_sub+0x14 (wake_up_klogd+0xa0) [25721.928240] | #func -5 preempt_count_add+0x18 (wake_up_klogd+0x18) [25721.936191] | #func -5 wake_up_klogd+0xc (vprintk_emit+0xc4) [25721.943620] | #func -6 preempt_count_sub+0x14 (vprintk_emit+0x1a8) [25721.951565] | #func -6 preempt_count_sub+0x14 (__printk_safe_exit+0x44) [25721.959953] | #func -7 preempt_count_add+0x18 (__printk_safe_exit+0x18) [25721.968339] | #func -7 __printk_safe_exit+0xc (console_unlock.part.5+0x2d0) [25721.977074] | #func -7 preempt_count_sub+0x14 (_raw_spin_unlock+0x28) [25721.985283] | #func -8 _raw_spin_unlock+0x14 (console_unlock.part.5+0x2cc) [25721.993931] | #func -8 preempt_count_add+0x18 (_raw_spin_lock+0x20) [25722.001967] | #func -9 _raw_spin_lock+0x14 (console_unlock.part.5+0x2bc) [25722.010442] | #func -9 preempt_count_sub+0x14 (__printk_safe_exit+0x44) [25722.018827] | #func -9 preempt_count_add+0x18 (__printk_safe_exit+0x18) [25722.027213] | #func -10 __printk_safe_exit+0xc (__up_console_sem.isra.2+0x34) [25722.036035] | #func -10 preempt_count_sub+0x14 (_raw_spin_unlock_irqrestore+0x30) [25722.045203] | #func -11 _raw_spin_unlock_irqrestore+0x18 (up+0x54) [25722.053061] | #func -11 __ipipe_spin_unlock_debug+0x14 (up+0x48) [25722.060752] | #func -12 preempt_count_add+0x18 (_raw_spin_lock_irqsave+0x28) [25722.069490] | #func -12 ipipe_root_only+0x18 (ipipe_test_and_stall_root+0x1c) [25722.078311] | #func -12 ipipe_test_and_stall_root+0x14 (_raw_spin_lock_irqsave+0x1c) [25722.087741] | #func -13 _raw_spin_lock_irqsave+0x14 (up+0x20) [25722.095170] | #func -13 up+0x14 (__up_console_sem.isra.2+0x30) [25722.102680] | #func -14 preempt_count_sub+0x14 (__printk_safe_enter+0x40) [25722.111148] | #func -14 preempt_count_add+0x18 (__printk_safe_enter+0x18) [25722.119621] | #func -15 __printk_safe_enter+0xc (__up_console_sem.isra.2+0x20) [25722.128530] | #func -15 ipipe_root_only+0x18 (ipipe_test_and_stall_root+0x1c) [25722.137349] | #func -15 ipipe_test_and_stall_root+0x14 (__up_console_sem.isra.2+0x18) [25722.146867] | #func -16 __up_console_sem.isra.2+0x10 (console_unlock.part.5+0x2b4) [25722.156128] | #func -16 preempt_count_sub+0x14 (_raw_spin_unlock+0x28) [25722.164344] | #func -17 _raw_spin_unlock+0x14 (console_unlock.part.5+0x2b0) [25722.172992] | #func -17 preempt_count_add+0x18 (_raw_spin_lock+0x20) [25722.181027] | #func -18 _raw_spin_lock+0x14 (console_unlock.part.5+0x98) [25722.189415] | #func -18 preempt_count_sub+0x14 (__printk_safe_enter+0x40) [25722.197888] | #func -18 preempt_count_add+0x18 (__printk_safe_enter+0x18) [25722.206360] | #func -19 __printk_safe_enter+0xc (console_unlock.part.5+0x8c) [25722.215094] | #func -19 ipipe_root_only+0x18 (ipipe_test_and_stall_root+0x1c) [25722.223915] | #func -20 ipipe_test_and_stall_root+0x14 (console_unlock.part.5+0x84) [25722.233258] | #func -20 preempt_count_sub+0x14 (__printk_safe_exit+0x44) [25722.241647] | #func -21 preempt_count_add+0x18 (__printk_safe_exit+0x18) [25722.250033] | #func -21 __printk_safe_exit+0xc (console_unlock.part.5+0x2f4) [25722.258768] | #func -22 preempt_count_sub+0x14 (_raw_spin_unlock+0x28) [25722.266977] | #func -22 _raw_spin_unlock+0x14 (console_unlock.part.5+0x248) [25722.275626] | #func -22 preempt_count_add+0x18 (_raw_spin_lock+0x20) [25722.283661] | #func -23 _raw_spin_lock+0x14 (console_unlock.part.5+0x234) [25722.292136] | #func -23 preempt_count_sub+0x14 (_raw_spin_unlock+0x28) [25722.300347] | #func -24 _raw_spin_unlock+0x14 (vt_console_print+0x118) [25722.308559] | #func -24 __rcu_read_unlock+0x10 (atomic_notifier_call_chain+0x108) [25722.317731] | #func -24 notifier_call_chain+0x2c (atomic_notifier_call_chain+0x100) [25722.327072] | #func -25 __rcu_read_lock+0xc (atomic_notifier_call_chain+0xe8) [25722.335898] | #func -25 atomic_notifier_call_chain+0x24 (vt_console_print+0x2b4) [25722.344979] | #func -26 dummycon_cursor+0xc (set_cursor+0x88) [25722.352411] | #func -26 add_softcursor+0x14 (set_cursor+0x64) [25722.359841] | #func -27 set_cursor+0x14 (vt_console_print+0x2a0) [25722.367534] | #func -27 __rcu_read_unlock+0x10 (atomic_notifier_call_chain+0x108) [25722.376708] | #func -27 notifier_call_chain+0x2c (atomic_notifier_call_chain+0x100) [25722.386050] | #func -28 __rcu_read_lock+0xc (atomic_notifier_call_chain+0xe8) [25722.394876] | #func -28 atomic_notifier_call_chain+0x24 (vt_console_print+0x1dc) [25722.403957] | #func -29 __rcu_read_unlock+0x10 (atomic_notifier_call_chain+0x108) [25722.413125] | #func -29 notifier_call_chain+0x2c (atomic_notifier_call_chain+0x100) [25722.422466] | #func -30 __rcu_read_lock+0xc (atomic_notifier_call_chain+0xe8) [25722.431292] | #func -30 atomic_notifier_call_chain+0x24 (lf+0x80) [25722.439065] | #func -32 dummycon_scroll+0xc (con_scroll+0x210) [25722.446581] | #func -32 con_scroll+0x30 (lf+0xc8) [25722.452962] | #func -33 lf+0x1c (vt_console_print+0x2f0) [25722.459953] | #func -33 dummycon_putcs+0xc (vt_console_print+0x340) [25722.467903] | #func -34 __rcu_read_unlock+0x10 (atomic_notifier_call_chain+0x108) [25722.477077] | #func -34 notifier_call_chain+0x2c (atomic_notifier_call_chain+0x100) [25722.486418] | #func -35 __rcu_read_lock+0xc (atomic_notifier_call_chain+0xe8) [25722.495244] | #func -35 atomic_notifier_call_chain+0x24 (vt_console_print+0x234) [25722.504325] | #func -36 __rcu_read_unlock+0x10 (atomic_notifier_call_chain+0x108) [25722.513493] | #func -36 notifier_call_chain+0x2c (atomic_notifier_call_chain+0x100) [25722.522834] | #func -36 __rcu_read_lock+0xc (atomic_notifier_call_chain+0xe8) [25722.531660] | #func -37 atomic_notifier_call_chain+0x24 (vt_console_print+0x234) [25722.540742] | #func -37 __rcu_read_unlock+0x10 (atomic_notifier_call_chain+0x108) [25722.549909] | #func -38 notifier_call_chain+0x2c (atomic_notifier_call_chain+0x100) [25722.559251] | #func -38 __rcu_read_lock+0xc (atomic_notifier_call_chain+0xe8) [25722.568076] | #func -38 atomic_notifier_call_chain+0x24 (vt_console_print+0x234) [25722.577158] | #func -39 __rcu_read_unlock+0x10 (atomic_notifier_call_chain+0x108) [25722.586325] | #func -39 notifier_call_chain+0x2c (atomic_notifier_call_chain+0x100) [25722.595667] | #func -40 __rcu_read_lock+0xc (atomic_notifier_call_chain+0xe8) [25722.604492] | #func -40 atomic_notifier_call_chain+0x24 (vt_console_print+0x234) [25722.613574] | #func -41 __rcu_read_unlock+0x10 (atomic_notifier_call_chain+0x108) [25722.622742] | #func -41 notifier_call_chain+0x2c (atomic_notifier_call_chain+0x100) [25722.632083] | #func -41 __rcu_read_lock+0xc (atomic_notifier_call_chain+0xe8) [25722.640909] | #func -42 atomic_notifier_call_chain+0x24 (vt_console_print+0x234) [25722.649990] | #func -42 __rcu_read_unlock+0x10 (atomic_notifier_call_chain+0x108) [25722.659158] | #func -43 notifier_call_chain+0x2c (atomic_notifier_call_chain+0x100) [25722.668499] | #func -43 __rcu_read_lock+0xc (atomic_notifier_call_chain+0xe8) [25722.677325] | #func -44 atomic_notifier_call_chain+0x24 (vt_console_print+0x234) [25722.686407] | #func -44 __rcu_read_unlock+0x10 (atomic_notifier_call_chain+0x108) [25722.695574] | #func -45 notifier_call_chain+0x2c (atomic_notifier_call_chain+0x100) [25722.704915] | #func -45 __rcu_read_lock+0xc (atomic_notifier_call_chain+0xe8) [25722.713741] | #func -45 atomic_notifier_call_chain+0x24 (vt_console_print+0x234) [25722.722825] ---[ end trace 2297667c45d572b6 ]--- I also occasionally see these kernel logs: [ 989.859879] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 989.865825] rcu: 0-...!: (0 ticks this GP) idle=e96/1/0x4000000000000002 softirq=64453/64453 fqs=0 [ 989.874984] rcu: 2-...!: (1 GPs behind) idle=5aa/1/0x4000000000000002 softirq=19889/19890 fqs=0 [ 989.883876] rcu: (detected by 1, t=21022 jiffies, g=144381, q=23) [ 989.890073] Task dump for CPU 0: [ 989.893302] fcm-dm R running task 0 2682 1 0x00000202 [ 989.900380] Call trace: [ 989.902837] __switch_to+0x9c/0xf0 [ 989.906249] __FUNCTION__.48533+0x0/0x10 [ 989.910180] Task dump for CPU 2: [ 989.913415] bordline_dm_1 R running task 0 2764 1 0x00000220 [ 989.920493] Call trace: [ 989.922946] __switch_to+0x9c/0xf0 [ 989.926353] rcu: rcu_sched kthread starved for 21022 jiffies! g144381 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 [ 989.936643] rcu: RCU grace-period kthread stack dump: [ 989.941703] rcu_sched R running task 0 10 2 0x00000008 [ 989.948775] Call trace: [ 989.951228] __switch_to+0x9c/0xf0 [ 989.954635] __schedule+0x2f4/0x990 [ 989.958133] schedule+0x58/0x90 [ 989.961284] schedule_timeout+0x1e4/0x440 [ 989.965307] rcu_gp_kthread+0x5d0/0xf50 [ 989.969149] kthread+0x130/0x140 [ 989.972386] ret_from_fork+0x14/0x1c Any help is great appreciated. Thanks ~ Sam