Public bug reported:

While we are validating the FlexRAN on Ubuntu 20.04 with Low-Latency
kernel one of the long-run stability tests failed, looking for your
comments/support in understanding the problem. At this point, we
believe, some additional platform/OS level fine turning is required for
the flexRAN application to run stable (as you know flexran L1
application is very sensible to jitter and latency, hence we tried to
make sure flexran worker threads get highest priority, isolated cores,
irq isolation, etc.. ).


With Low-Latency kernel, FlexRAN short-term timer mode tests are passed.
 

But when we run oRAN mode test, flexran L1 application crashed with an
indication of - one of the worker thread was not able to complete its
processing in a given time.  This test was aimed for a long duration
(continuous until interrupted), goal is check for at least 12hr
stability.  Failure was observed at random time periods: ~2hr, ~4hr,
~45min…

Our initial thoughts are some interrupt raised and pre-empted the worker
core for longer period of time.

Attached is the dmesg output

Machine config is:
Configuration

Comments

uname -a

Linux flexran-ubuntu 5.4.0-77-lowlatency #86-Ubuntu SMP PREEMPT Thu Jun
17 03:26:36 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Kernel command line

# cat /proc/cmdline

BOOT_IMAGE=/vmlinuz-5.4.0-77-lowlatency root=/dev/mapper/ubuntu--vg-
ubuntu--lv ro maybe-ubiquity intel_iommu=on iommu=pt
usbcore.autosuspend=-1 selinux=0 enforcing=0 nmi_watchdog=0
softlockup_panic=0 audit=0 cgroup_disable=memory intel_pstate=disable
mce=off hugepagesz=1G hugepages=40 hugepagesz=2M hugepages=0
default_hugepagesz=1G kthread_cpus=0,20 irqaffinity=0,20 nohz=on
nosoftlockup nohz_full=1-19,21-39 rcu_nocbs=1-19,21-39 skew_tick=1
isolcpus=1-19,21-39

root@flexran-ubuntu:flexran-21.03#

Application core mask

0x1F0001F0

Cores: 4,5,6,7,8,24,25,26,27,28

CPU info

# lscpu

Architecture:                    x86_64

CPU op-mode(s):                  32-bit, 64-bit

Byte Order:                      Little Endian

Address sizes:                   46 bits physical, 48 bits virtual

CPU(s):                          40

On-line CPU(s) list:             0-39

Thread(s) per core:              2

Core(s) per socket:              20

Socket(s):                       1

NUMA node(s):                    1

Vendor ID:                       GenuineIntel

CPU family:                      6

Model:                           85

Model name:                      Intel(R) Xeon(R) Gold 6248 CPU @
2.50GHz

Stepping:                        7

CPU MHz:                         2500.301

BogoMIPS:                        3200.00

Virtualization:                  VT-x

L1d cache:                       640 KiB

L1i cache:                       640 KiB

L2 cache:                        20 MiB

L3 cache:                        27.5 MiB

NUMA node0 CPU(s):               0-39

Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages

Vulnerability L1tf:              Not affected

Vulnerability Mds:               Not affected

Vulnerability Meltdown:          Not affected

Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass
disabled via prctl and seccomp

Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers
and __user pointer sanitization

Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB
conditional, RSB filling

Vulnerability Srbds:             Not affected

Vulnerability Tsx async abort:   Mitigation; TSX disabled

Flags:                           fpu vme de pse tsc msr pae mce cx8 apic
sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
tm pbe syscall nx pdpe1gb

                                  rdtscp lm constant_tsc art
arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl

                                  vmx smx est tm2 ssse3 sdbg fma cx16
xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer
aes xsave avx f16c rdrand lah

                                 f_lm abm 3dnowprefetch cpuid_fault epb
cat_l3 cdp_l3 invpcid_single ssbd mba ibrs ibpb stibp ibrs_enhanced
tpr_shadow vnmi flexpriority ep

                                 t vpid ept_ad fsgsbase tsc_adjust bmi1
avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx
smap clflushopt clwb intel_pt

                                  avx512cd avx512bw avx512vl xsaveopt
xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local
dtherm ida arat pln pts pku o

                                 spke avx512_vnni md_clear flush_l1d
arch_capabilities


Irqbalence service is off

sysv-rc-conf --level 12345 irqbalance off

Flexran application (L1 and testmac) run on bare metal environment

No VM or Container

** Affects: linux-lowlatency (Ubuntu)
     Importance: Undecided
         Status: New

** Attachment added: "dmesg output"
   
https://bugs.launchpad.net/bugs/1938580/+attachment/5514894/+files/dmesg_output.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1938580

Title:
  Ubuntu low-latency kernel 20.04 fails stability tests

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-lowlatency/+bug/1938580/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to