[Kernel-packages] [Bug 2059951] Re: mlxbf_gige: stop interface during shutdown
** Tags removed: verification-needed-jammy-linux-bluefield ** Tags added: verification-done-jammy-linux-bluefield -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2059951 Title: mlxbf_gige: stop interface during shutdown Status in linux-bluefield package in Ubuntu: New Status in linux-bluefield source package in Jammy: Fix Committed Bug description: SRU Justification: [Impact] The mlxbf_gige driver shutdown() is invoked during reboot processing, but the stop() logic is not guaranteed to execute on all distros. If stop() does not execute NAPI remains enabled during system shutdown and can cause an exception if NAPI is scheduled when interface is shutdown but not stopped. [Fix] The networking interface managed by the mlxbf_gige driver must be stopped during reboot processing so that it is put into a clean state and driver callbacks won't be called. [Test Case] * Put BF platform into a reboot loop * Ensure that BF platform brings up "oob_net0" interface each reboot, and that no mlxbf_gige driver exceptions occur [Regression Potential] There is low potential for regression as this brings in upstream content. [Other] None To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2059951/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2059279] Re: mlxbf_gige: replace SAUCE patch for pause frame counters
** Tags removed: verification-needed-jammy-linux-bluefield ** Tags added: verification-done-jammy-linux-bluefield -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2059279 Title: mlxbf_gige: replace SAUCE patch for pause frame counters Status in linux-bluefield package in Ubuntu: New Status in linux-bluefield source package in Jammy: Fix Committed Bug description: SRU Justification: [Impact] The support for mlxbf_gige pause frame counters was added to Jammy via a SAUCE patch. After upstream review, a small issue was fixed that makes the SAUCE patch different from the upstream commit. [Fix] The fix is to replace the SAUCE patch in the Jammy repo with the upstream equivalent. [Test Case] * Boot BF platform and bring up "oob_net0" interface * Execute the command "ethtool -I -a oob_net0" to get baseline stats * Send heavy traffic into "oob_net0" interface * Re-run the above ethtool command, noting the pause frame counters [Regression Potential] There is low potential for regression as this brings in upstream content. [Other] None To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2059279/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2055086] Re: mlxbf_gige: add support for pause frame counters
** Tags removed: verification-needed-jammy-linux-bluefield ** Tags added: verification-done-jammy-linux-bluefield -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2055086 Title: mlxbf_gige: add support for pause frame counters Status in linux-bluefield package in Ubuntu: New Status in linux-bluefield source package in Jammy: Fix Committed Bug description: SRU Justification: [Impact] The BlueField-2 and BlueField-3 silicon designs include a provision for reporting pause frame counters, but the driver (mlxbf_gige) does not have software support yet. [Fix] The fix is to update the mlxbf_gige driver to support the "get_pause_stats()" callback, which enables display of pause frame counters via "ethtool -I -a ooob_net0". The support of this callback was added to Linux kernel 5.9.0 [Test Case] * Reboot BF platform, and configure oob_net0 interface * Issue command "ethtool -I -a oob_net0" to display pause frame counters * Send in multiple streams of traffic to oob_net0 interface, e.g. from two or three simultaneous SCP operations * Re-issue "ethtool -I -a oob_net0" and note difference in output [Regression Potential] The changed logic extends the ethtool support, so low risk for any problem. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2055086/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2051871] Re: mlxbf-tmfifo: Drop Tx network packet when Tx TmFIFO is full
** Tags removed: verification-needed-jammy-linux-bluefield ** Tags added: verification-done-jammy-linux-bluefield -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2051871 Title: mlxbf-tmfifo: Drop Tx network packet when Tx TmFIFO is full Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Jammy: Fix Committed Bug description: SRU Justification: [Impact] This is a cherry-pick fix from upstream to drop Tx network packet when Tx TmFIFO is full. [Fix] Starting from Linux 5.16 kernel, Tx timeout mechanism was added in the virtio_net driver which prints the "Tx timeout" warning message when a packet stays in Tx queue for too long. Below is an example of the reported message: "[494105.316739] virtio_net virtio1 tmfifo_net0: TX timeout on queue: 0, sq: output.0, vq: 0×1, name: output.0, usecs since last trans: 3079892256". This issue could happen when external host driver which drains the FIFO is restared, stopped or upgraded. To avoid such confusing "Tx timeout" messages, this commit adds logic to drop the outstanding Tx packet if it's not able to transmit in two seconds due to Tx FIFO full, which can be considered as congestion or out-of-resource drop. This commit also handles the special case that the packet is half- transmitted into the Tx FIFO. In such case, the packet is discarded with remaining length stored in vring->rem_padding. So paddings with zeros can be sent out when Tx space is available to maintain the integrity of the packet format. The padded packet will be dropped on the receiving side. [Test Case] Same functionality and testing as on BlueField-1/2/3. No functionality change. [Regression Potential] Same behavior from user perspective. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2051871/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2049930] Re: mlxbf_gige: disable RX_DMA during NAPI poll
** Tags removed: verification-needed-jammy-linux-bluefield ** Tags added: verification-done-jammy-linux-bluefield -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2049930 Title: mlxbf_gige: disable RX_DMA during NAPI poll Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Jammy: Fix Committed Bug description: SRU Justification: [Impact] A regression has been introduced in the latest 22.04 kernel by including the following two commits: revert "UBUNTU: SAUCE: Fix OOB handling RX packets in heavy traffic" addition of upstream "mlxbf_gige: fix receive packet race condition" The regression will manifest as connectivity problems on "oob_net0" interface [Fix] Modify the GIGE driver to include the RX_DMA disable logic in its NAPI poll routine This logic was part of "UBUNTU: SAUCE: Fix OOB handling RX packets in heavy traffic" but not part of upstream "mlxbf_gige: fix receive packet race condition" [Test Case] Boot BF3 platform and ensure oob_net0 interface is up and has IP address Bring up many IP interfaces, including oob_net0, and make sure OOB interface is continually accessible (e.g. via ping) [Regression Potential] Low risk change, adding back in prior logic. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2049930/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2047853] Re: mlxbf_gige: replace SAUCE patch for RX race condition
** Tags removed: verification-needed-jammy-linux-bluefield ** Tags added: verification-done-jammy-linux-bluefield -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2047853 Title: mlxbf_gige: replace SAUCE patch for RX race condition Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Jammy: Fix Committed Bug description: SRU Justification: [Impact] There is a SAUCE patch that addresses a mlxbf-gige RX race condition "UBUNTU: SAUCE: Fix OOB handling RX packets in heavy traffic" This SAUCE patch should be reverted and replaced with upstream content. [Fix] Bring in the following upstream commit from linux-next: "mlxbf_gige: fix receive packet race condition" [Test Case] Boot BF3 platform and ensure oob_net0 interface is up and has IP address Send heavy level of traffic into the BF3 oob_net0 and make sure no stalls occur [Regression Potential] Low risk change, cherry pick of upstream approved commit To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2047853/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2046657] Re: mmc: sdhci-of-dwcmshc: Replace sauce patches with upstream commits
** Tags removed: verification-needed-jammy-linux-bluefield ** Tags added: verification-done-jammy-linux-bluefield -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2046657 Title: mmc: sdhci-of-dwcmshc: Replace sauce patches with upstream commits Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Jammy: Fix Committed Bug description: SRU Justification: [Impact] The msdhci-of-dwcmshc driver in the Jammy repo consists of some SAUCE patches. These need to be replaced. [Fix] The fix is to revert the four SAUCE patches, replacing them with upstream commits for the same functionality. [Test Case] * Boot BF3 platform, verify no new errors [Regression Potential] The upstream commits are not exactly the same as the SAUCE patches, so technically there is a chance of regression, but its been well-tested and the functionality is the same. [Other] n/a To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2046657/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2045919] Re: mlxbf-bootctl: fails to report info about cards using dev keys
** Tags removed: verification-needed-jammy-linux-bluefield ** Tags added: verification-done-jammy-linux-bluefield -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2045919 Title: mlxbf-bootctl: fails to report info about cards using dev keys Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Jammy: Fix Committed Bug description: SRU Justification: [Impact] There is a gap in the mlxbf-bootctl driver logic, resulting in failure to report when a secure boot enabled card is using development keys. The program will instead report lifecycle state as "GA Secured" which is misleading. [Fix] Bring in the following upstream commit from linux-next: "mlxbf-bootctl: correctly identify secure boot with development keys" [Test Case] Enable secure boot on a BF3 card that is using development keys Login as root on BF3 Run the command "mlxbf-bootctl" Verify that the "lifecycle state" row shows "Secured (development)" [Regression Potential] Low risk change, cherry pick of upstream approved commit To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2045919/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2041996] Re: pwr-mlxbf: Several bug fixes for focal
** Tags removed: verification-needed-focal-linux-bluefield ** Tags added: verification-done-focal-linux-bluefield -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2041996 Title: pwr-mlxbf: Several bug fixes for focal Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Focal: Fix Committed Bug description: SRU Justification: [Impact] There is are several changes that needs to be made to pwr-mlxbf in focal: * There is a race condition between gpio-mlxbf2.c driver being loaded and pwr-mlxbf.c being loaded * When the module is removed, there is a panic due to NULL pointer access * soft reset needs to be replaced by graceful reboot [Fix] * Fix race condition between gpio-mlxbf2.c driver being loaded and pwr-mlxbf.c being loaded * Fix panic due to access to NULL pointer when driver is removed via rmmod * support graceful reboot instead of soft reset [Test Case] * all test cases are for BF2: * trigger the gpio toggling from the BMC: ipmitool raw 0x32 0xA1 0x02 This should trigger a graceful reboot of the DPU. * rmmod/modprobe * reboot test and make sure the driver is always loaded [Regression Potential] * Run the 100 reboot test and make sure that the driver is loaded with no issues. That the gpio graceful reboot works. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2041996/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2039869] Re: Devlink reload hangs: fix race and lock issue
** Tags removed: verification-needed-jammy-linux-bluefield ** Tags added: verification-done-jammy-linux-bluefield -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2039869 Title: Devlink reload hangs: fix race and lock issue Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Jammy: Fix Committed Bug description: Summary: Machine hangs when doing devlink reload How to reproduce: Host: [root@bu-lab24v ~]# echo '2' > /sys/class/net/ens2f0np0/device/sriov_numvfs Arm: root@bu-lab24v-oob:~# uname -r 5.15.0-1027-bluefield root@bu-lab24v-oob:~# devlink dev eswitch set pci/:03:00.0 mode switchdev root@bu-lab24v-oob:~# devlink dev reload pci/:03:00.0 *Hangs* Arm dmesg: [ 1089.747409] INFO: task devlink:8753 blocked for more than 120 seconds. [ 1089.760560] Tainted: G OE 5.15.0-1027-bluefield #29-Ubuntu [ 1089.775086] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1089.790829] task:devlink state:D stack:0 pid: 8753 ppid: 5090 flags:0x0004 [ 1089.790838] Call trace: [ 1089.790840] __switch_to+0xf8/0x150 [ 1089.790857] __schedule+0x2b8/0x790 [ 1089.790865] schedule+0x64/0x140 [ 1089.790870] schedule_preempt_disabled+0x18/0x24 [ 1089.790874] __mutex_lock.constprop.0+0x1a0/0x680 [ 1089.790878] __mutex_lock_slowpath+0x40/0x90 [ 1089.790883] mutex_lock+0x64/0x70 [ 1089.790887] devl_lock+0x1c/0x30 [ 1089.790893] mlx5_detach_device+0x58/0x190 [mlx5_core] [ 1089.791055] mlx5_unload_one+0x40/0xe4 [mlx5_core] [ 1089.791177] mlx5_devlink_reload_down+0x184/0x270 [mlx5_core] [ 1089.791318] devlink_reload+0x214/0x290 Fixes: Checking the OFED source code, we found this missing devl trap group also need to be backported to avoid deadlock. void mlx5_detach_device(struct mlx5_core_dev *dev, bool suspend) { ... #ifdef HAVE_DEVL_PORT_REGISTER #ifdef HAVE_DEVL_TRAP_GROUPS_REGISTER devl_assert_locked(priv_to_devlink(dev)); #else devl_lock(devlink); #endif /* HAVE_DEVL_TRAP_GROUPS_REGISTER */ #endif /* HAVE_DEVL_PORT_REGISTER */ mutex_lock(_intf_mutex); #ifdef HAVE_DEVL_PORT_REGISTER Related issue: #2032378 Devlink backport: fix race and lock issue So cherry-pick the patch below commit 852e85a704c2e11c050bdea286bc438aba4f4a22 Author: Jiri Pirko Date: Sat Jul 16 13:02:34 2022 +0200 net: devlink: add unlocked variants of devling_trap*() functions Add unlocked variants of devl_trap*() functions to be used in drivers called-in with devlink->lock held. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2039869/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2041995] Re: pwr-mlxbf: support graceful reboot instead of soft reset
** Tags removed: verification-needed-jammy-linux-bluefield ** Tags added: verification-done-jammy-linux-bluefield -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2041995 Title: pwr-mlxbf: support graceful reboot instead of soft reset Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Jammy: Fix Committed Bug description: SRU Justification: [Impact] There is a new feature request to replace the soft reset with a graceful reboot. We will use acpi events triggered by the irq in the pwr-mlxbf file to trigger the graceful reboot. [Fix] * Change the handling of GPIO7 (BF2) and GPIO6 (BF3). This gpio will trigger a graceful reboot instead of a forced reboot (soft reset). This is an acpi event. [Test Case] * trigger the gpio toggling from the BMC: ipmitool raw 0x32 0xA1 0x02 This should trigger a graceful reboot of the DPU. [Regression Potential] * make sure it works on BF2 and BF3 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2041995/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2042455] Re: Devlink backport: Fix mlx5 driver hangs due to mlx5_sf_hw_table_init
** Tags removed: verification-needed-jammy-linux-bluefield ** Tags added: verification-done-jammy-linux-bluefield -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2042455 Title: Devlink backport: Fix mlx5 driver hangs due to mlx5_sf_hw_table_init Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Jammy: Fix Committed Bug description: Summary: Machine hangs when loading OFED 2310 mlx5 driver at BlueField How to reproduce: # load the OFED driver Reason: BF got stuck and observed call trace "mlx5_sf_hw_table_init+0xf4/0x2d0 [mlx5_core] dmesg from minicom: [ 726.569928] INFO: task systemd-udevd:297 blocked for more than 604 seconds. [ 726.576895] Tainted: G OE 5.15.0-1029-bluefield #31-Ubuntu [ 726.584101] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.591913] task:systemd-udevd state:D stack:0 pid: 297 ppid: 280 flags:0x000d [ 726.600248] Call trace: [ 726.602680] __switch_to+0xf8/0x150 [ 726.606159] __schedule+0x2b8/0x790 [ 726.609634] schedule+0x64/0x140 [ 726.612850] schedule_preempt_disabled+0x18/0x24 [ 726.617453] __mutex_lock.constprop.0+0x1a0/0x680 [ 726.622141] __mutex_lock_slowpath+0x40/0x90 [ 726.626396] mutex_lock+0x64/0x70 [ 726.629695] devlink_resource_register+0x50/0x1a0 [ 726.634386] mlx5_sf_hw_table_init+0xf4/0x2d0 [mlx5_core] [ 726.639882] mlx5_init_one_devl_locked+0x1c8/0x784 [mlx5_core] [ 726.645791] probe_one+0x300/0x5f0 [mlx5_core] [ 726.650307] local_pci_probe+0x48/0xb4 [ 726.654043] pci_device_probe+0x18c/0x200 [ 726.658039] really_probe+0xd0/0x490 [ 726.661600] __driver_probe_device+0x148/0x190 [ 726.666029] driver_probe_device+0x48/0x180 [ 726.670198] __driver_attach+0x104/0x240 [ 726.674106] bus_for_each_dev+0x78/0xdc [ 726.677927] driver_attach+0x2c/0x40 [ 726.681486] bus_add_driver+0x154/0x270 [ 726.685307] driver_register+0x80/0x13c [ 726.689129] __pci_register_driver+0x4c/0x60 [ 726.693386] __init_backport+0xf0/0x1000 [mlx5_core] [ 726.698425] do_one_initcall+0x4c/0x250 [ 726.702248] do_init_module+0x50/0x260 [ 726.705983] load_module+0x9fc/0xbe0 [ 726.709543] __do_sys_finit_module+0xa8/0x114 [ 726.713885] __arm64_sys_finit_module+0x28/0x3c [ 726.718401] invoke_syscall+0x78/0x100 [ 726.722137] el0_svc_common.constprop.0+0x54/0x184 [ 726.726913] do_el0_svc+0x30/0xac [ 726.730215] el0_svc+0x48/0x160 [ 726.733341] el0t_64_sync_handler+0xa4/0x130 [ 726.737597] el0t_64_sync+0x1a4/0x1a8 [ 847.401924] INFO: task systemd-udevd:297 blocked for more than 724 seconds. [ 847.408891] Tainted: G OE 5.15.0-1029-bluefield #31-Ubuntu How to fix: This is related to https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2039869 and we need to backport/cherry-pick more patches from the series Patches are below Backport: f655dacb59ac net: devlink: remove unused locked functions Backport: 012ec02ae441 netdevsim: convert driver to use unlocked devlink API during init/fini Cherry-pick: eb0e9fa2c635 net: devlink: add unlocked variants of devlink_region_create/destroy() functions SKIP: 72a4c8c94efa mlxsw: convert driver to use unlocked devlink API during init/fini Backport: 70a2ff89369d net: devlink: add unlocked variants of devlink_dpipe*() functions Cherry-pick: 755cfa69c4ec net: devlink: add unlocked variants of devlink_sb*() functions Cherry-pick: c223d6a4bf6d net: devlink: add unlocked variants of devlink_resource*() functions Cherry-pick: 852e85a704c2 net: devlink: add unlocked variants of devling_trap*() functions Cherry-pick: e26fde2f5bef net: devlink: avoid false DEADLOCK warning reported by lock Thanks! To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2042455/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2038868] Re: gpio-mlxbf3: support valid mask
** Tags removed: verification-needed-jammy-linux-bluefield ** Tags added: verification-done-jammy-linux-bluefield -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2038868 Title: gpio-mlxbf3: support valid mask Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Jammy: Fix Committed Bug description: SRU Justification: [Impact] After syncing up the gpio-mlxbf3.c driver with the upstreamed version, we dropped the use of the valid_mask variable because kernels greater or equal to 6.2.0 dont need it. This is no longer needed in kernel versions >= 6.2.0 because valid_mask is populated by core gpio code. 5.15 kernel doesnt support that feature so we still need to explicitly define valid_mask like we did before. This doesnt impact the functionality of the GPIO driver but it is a security breach as it doesnt restrict the access to only gpios defined in the acpi table by valid_mask. [Fix] * define valid_mask and init_valid_mask [Test Case] * Make sure that the user (libgpiod) cannot access any other gpio besides 0->4 (gpiochip0) and 22-23 in gpiochip1. [Regression Potential] no known regression. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2038868/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2032378] Re: Devlink backport: fix race and lock issue
** Tags removed: verification-needed-jammy ** Tags added: verification-done-jammy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2032378 Title: Devlink backport: fix race and lock issue Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Jammy: Fix Committed Bug description: Summary: machine get stuck if try to switch to switchdev mode while toggling ns over BF kernel. This is due to missing lock refactor series devlink and driver from kernel 6.0 How to test: 1. Configure SRIOV 2. Enable switchdev mode 3. Devlink reload 4. Add/Del network namespace Note: Need to run the test multiple times to reproduce the issue. How to fix: Need to backport a series of devlink patches into BlueField 5.15 kernel, from 6.0 upstream kernel. Patches needed for 5.4/5.15 kernel: First series: 367dfa121205^..f0680ef0f949 second series: 5502e8712c9b^..c90005b5f75c Also needed: 644a66c60f02 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2032378/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2037403] Re: PCI BARs larger than 128GB are disabled
** Tags removed: verification-needed-jammy ** Tags added: verification-done-jammy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2037403 Title: PCI BARs larger than 128GB are disabled Status in linux package in Ubuntu: Incomplete Status in linux-bluefield package in Ubuntu: New Status in linux source package in Jammy: Fix Committed Status in linux-bluefield source package in Jammy: Fix Committed Bug description: [SRU Justification] [Impact] Current linux-bluefield kernel reports that PCI BARs larger than 128GB are disabled. Increase the maximum PCI BAR size from 128GB to 8TB for future expansion. [Test Plan] Use updated kernel with PCI device with PCI BARs larger than 128GB. [Where problems could occur] No potential problems are expected. [Other Info] Testing has been provided by the customer. This patch is a cherry-pick from the upstream 5.18 kernel. It is intended for linux-bluefield kernel re-spin but it may be also suitable for the generic kernel (separate submission). To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2037403/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2033439] Re: Cherry-pick gpio and pinctrl drivers from upstream
** Tags removed: verification-needed-jammy ** Tags added: verification-done-jammy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2033439 Title: Cherry-pick gpio and pinctrl drivers from upstream Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Jammy: Fix Committed Bug description: SRU Justification: [Impact] Cherry pick gpio-mlxbf3.c and pinctrl-mlxbf3.c patches from linux- next. [Fix] * Revert existing pinctrl-mlxbf3.c driver changes * Revert existing gpio-mlxbf3.c driver changes * Cherry-pick all the following commits from linux-next: gpio-mlxbf3.c: cd33f216d241520385a5166ae73a0771197a9f0b 38a700efc51080c7184f71edbf5e49561da9754f 1d2a22fa6d2511d5871d87c15b4fe7a944fe3b2a pinctrl-mlxbf3.c: d11f932808dc689717e409bbc81b5093e7902fc9 743d3336029ffe2bb38e982a3b572ced243c6d43 c0f84760b01e8d8b59e9e186a4f7fa8f081a4488 69657e60b8a7faf83b583c658ec7ce1f5ece9eb3 [Test Case] * All test cases are for BF3 only * Check that the gpio-mlxbf3 driver gets loaded at boot time without issues * Check that the pinctrl-mlxbf3 driver gets loaded at boot time without issues * use libgpiod to test the access to gpio pin 0 through 4 i.e. read and write. * rmmod and modprobe of both drivers * Check that pwr-mlxbf driver is loaded successfully * Check that mlxbf-gige driver is loaded successfully and the irq is initialized properly (dmesg | grep PHY) [Regression Potential] * We introduced a dependency of the gpio-mlxbf3 driver on pinctrl-mlxbf3. So we need to make sure that doesnt trigger any regressions with loading other dependent drivers * make sure that removing/reloading the driver/restarting the DPU doesnt cause any panic related to these drivers. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2033439/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2033551] Re: mlxbf_bootctl: replace SAUCE patches with upstream
** Tags removed: kernel-spammed-jammy-linux-bluefield-v2 verification-needed-jammy verification-needed-jammy-linux-bluefield ** Tags added: verification-done-jammy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2033551 Title: mlxbf_bootctl: replace SAUCE patches with upstream Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Jammy: Fix Committed Bug description: SRU Justification: [Impact] The mlxbf_bootctl BlueField platform driver in the Jammy repo consists of some SAUCE patches. These need to be replaced. [Fix] The fix is to revert the four SAUCE patches, replacing them with upstream commits for the same functionality. One patch for the mlxbf_bootctl driver is not yet upstreamed, so that patch will remain as SAUCE patch for now. [Test Case] * Boot BF2/BF3 platform, verify no new errors * Program MFG fields via "bfcfg" tool: 1) With current image, use 'bfcfg -d' to display current values 2) Reboot, stopping at UEFI menu, trigger 'Reset MFG Info' 3) With new image, use 'bfcfg' to push new or same MFG fields 4) Verify that the MFG fields are programmed properly via 'bfcfg -d' 5) Repeat process to replace proper MFG fields * Display BlueField boot log 1) cd /sys/bus/platform/devices/MLNXBF04:00 2) cat rsh_log 3) Should see boot log entries relevant to last boot [Regression Potential] The upstream commits are not exactly the same as the SAUCE patches, so technically there is a chance of regression, but its been well-tested and the functionality is the same. [Other] n/a To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2033551/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2035374] Re: pwr-mlxbf: update Kconfig with upstream changes
** Tags removed: kernel-spammed-jammy-linux-bluefield-v2 verification-needed-jammy verification-needed-jammy-linux-bluefield ** Tags added: verification-done-jammy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2035374 Title: pwr-mlxbf: update Kconfig with upstream changes Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Jammy: Fix Committed Bug description: SRU Justification: [Impact] The Mellanox BlueField power driver (pwr-mlxbf) has a dependency on either the BlueField-2 or BlueField-3 GPIO driver. The current Kconfig does not have the BlueField-3 dependency. [Fix] There is an upstream commit that extends the Kconfig to include the GPIO_MLXBF3 dependency. [Test Case] Build the power driver with and without GPIO_MLXBF2 and GPIO_MLXBF3 present in the kernel configuration. Verify power driver is built. [Regression Potential] * none [Other] * none To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2035374/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2034578] Re: Support IPSEC full offload implementation
** Tags removed: kernel-spammed-jammy-linux-bluefield-v2 verification-needed-jammy verification-needed-jammy-linux-bluefield ** Tags added: verification-done-jammy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2034578 Title: Support IPSEC full offload implementation Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Jammy: Fix Committed Bug description: Summary: Align Kernel IPsec Full offload implementation in the DPU to the upstream Full offload in all components: OFED, Strongswan, etc. This is in order for DPU Kernel IPsec to include policy offload and be fully aligned to what CX Kernel customers will use. How to test: Host 1 Enable sriov and set namespace. ip link set eth2 up echo '1' > /sys/class/net/eth2/device/sriov_numvfs ip netns add nt1 ip link set eth4 netns nt1 ip netns exec nt1 ifconfig eth4 11.11.11.1/24 up BF on host 1: Set steering mode to "dmfs". By default, it is "smfs" and not supported for now. /opt/mellanox/iproute2/sbin/devlink dev eswitch set pci/:03:00.0 mode legacy echo 'dmfs' > /sys/bus/pci/devices/:03:00.0/net/p0/compat/devlink/steering_mode echo 'full' > /sys/class/net/p0/compat/devlink/ipsec_mode /opt/mellanox/iproute2/sbin/devlink dev eswitch set pci/:03:00.0 mode switchdev /opt/mellanox/iproute2/sbin/devlink dev eswitch set pci/:03:00.1 mode legacy echo 'dmfs' > /sys/bus/pci/devices/:03:00.1/net/p1/compat/devlink/steering_mode echo 'full' > /sys/class/net/p1/compat/devlink/ipsec_mode /opt/mellanox/iproute2/sbin/devlink dev eswitch set pci/:03:00.1 mode switchdev IPSec configure /opt/mellanox/iproute2/sbin/ip xfrm policy add src 2.2.2.2 dst 2.2.2.3 offload packet dev p0 dir out tmpl src 2.2.2.2/16 dst 2.2.2.3/16 proto esp reqid 0xb29ed314 mode transport priority 12 /opt/mellanox/iproute2/sbin/ip xfrm policy add src 2.2.2.3 dst 2.2.2.2 offload packet dev p0 dir in tmpl src 2.2.2.3/16 dst 2.2.2.2/16 proto esp reqid 0xc35aa26e mode transport priority 12 /opt/mellanox/iproute2/sbin/ip xfrm state add src 2.2.2.2/16 dst 2.2.2.3/16 proto esp spi 0xb29ed314 reqid 0xb29ed314 mode transport aead 'rfc4106(gcm(aes))' 0x20f01f80a26f633d85617465686c32552c92c42f 128 offload packet dev p0 dir out sel src 2.2.2.2/16 dst 2.2.2.3/16 flag esn replay-window 64 /opt/mellanox/iproute2/sbin/ip xfrm state add src 2.2.2.3/16 dst 2.2.2.2/16 proto esp spi 0xc35aa26e reqid 0xc35aa26e mode transport aead 'rfc4106(gcm(aes))' 0x6cb228189b4c6e82e66e46920a2cde39187de4ba 128 offload packet dev p0 dir in sel src 2.2.2.3/16 dst 2.2.2.2/16 flag esn replay-window 64 OVS configure. Clear all bridges before configure if there's already default bridges in BF. ovs-vsctl set Open_vSwitch . other_config:hw-offload=false # need to restart ovs after setting this command ovs-vsctl add-br br-int ovs-vsctl add-port br-int pf0vf0 -- set interface pf0vf0 options:representor=[0] ovs-vsctl add-port br-int vxlan0 -- set interface vxlan0 type=vxlan options:key=100 options:local_ip=2.2.2.2 options:remote_ip=2.2.2.3 options:dst_port=4789 Configure IP ifconfig p0 2.2.2.2/16 up Host2: Enable sriov and set namespace. ip link set eth2 up echo '1' > /sys/class/net/eth2/device/sriov_numvfs ip netns add nt1 ip link set eth4 netns nt1 ip netns exec nt1 ifconfig eth4 11.11.11.2/24 up BF on host 2 Set steering mode /opt/mellanox/iproute2/sbin/devlink dev eswitch set pci/:03:00.0 mode legacy echo 'dmfs' > /sys/bus/pci/devices/:03:00.0/net/p0/compat/devlink/steering_mode echo 'full' > /sys/class/net/p0/compat/devlink/ipsec_mode /opt/mellanox/iproute2/sbin/devlink dev eswitch set pci/:03:00.0 mode switchdev /opt/mellanox/iproute2/sbin/devlink dev eswitch set pci/:03:00.1 mode legacy echo 'dmfs' > /sys/bus/pci/devices/:03:00.1/net/p1/compat/devlink/steering_mode echo 'full' > /sys/class/net/p1/compat/devlink/ipsec_mode /opt/mellanox/iproute2/sbin/devlink dev eswitch set pci/:03:00.1 mode switchdev IPSec configure /opt/mellanox/iproute2/sbin/ip xfrm policy add src 2.2.2.3 dst 2.2.2.2 offload packet dev p0 dir out tmpl src 2.2.2.3/16 dst 2.2.2.2/16 proto esp reqid 0xc35aa26e mode transport priority 12 /opt/mellanox/iproute2/sbin/ip xfrm policy add src 2.2.2.2 dst 2.2.2.3 offload packet dev p0 dir in tmpl src 2.2.2.2/16 dst 2.2.2.3/16 proto esp reqid 0xb29ed314 mode transport priority 12 /opt/mellanox/iproute2/sbin/ip xfrm state add src 2.2.2.3/16 dst 2.2.2.2/16 proto esp spi 0xc35aa26e reqid 0xc35aa26e mode transport aead 'rfc4106(gcm(aes))' 0x6cb228189b4c6e82e66e46920a2cde39187de4ba 128 offload packet dev p0 dir out sel src 2.2.2.3/16 dst 2.2.2.2/16 flag esn replay-window 64 /opt/mellanox/iproute2/sbin/ip xfrm state add src 2.2.2.2/16 dst 2.2.2.3/16
[Kernel-packages] [Bug 2034594] Re: mlxbf-ptm: add thermal envelope debugfs node
** Tags removed: verification-needed-jammy ** Tags added: verification-done-jammy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2034594 Title: mlxbf-ptm: add thermal envelope debugfs node Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Jammy: Fix Committed Bug description: SRU Justification: [Impact] mlxbf-ptm is a kernel driver that provides debufgs interface for system software to monitor Bluefield devices' power and thermal management parameters. [Fix] This change: * This change adds additional debugfs node that gets the current thermal envelope set in the thermal state machine [Test Case] * After installing the kernel module, debugfs entry at /sys/kernel/debug/mlxbf-ptm will be created * monitors/status folder contains read-only parameters reported from the Bluefield device * cat /sys/kernel/debug/mlxbf-ptm/temp_envelope shows current thermal envelope [Regression Potential] * Minimal - as these are read-only parameters that report Bluefield device information. [Other] * This code is likely to change depending on feedback we received from maintainers. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2034594/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2035128] Re: mlxbf-gige: Enable the OOB port in mlxbf_gige_open
** Tags removed: verification-needed-focal ** Tags added: verification-done-focal ** Tags removed: verification-needed-jammy ** Tags added: verification-done-jammy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2035128 Title: mlxbf-gige: Enable the OOB port in mlxbf_gige_open Status in linux-bluefield package in Ubuntu: New Status in linux-bluefield source package in Focal: Fix Committed Status in linux-bluefield source package in Jammy: Fix Committed Bug description: SRU Justification: [Impact] At the moment, the OOB port is enabled in the mlxbf_gige_probe function. If the mlxbf_gige_open is not executed, this could cause pause frames to increase in the case where there is high backgroud traffic. This results in clogging the BMC port as well. [Fix] * Move enabling the OOB port to mlxbf_gige_open. [Test Case] * Main test for this bug: Check that the BMC is always pingable while pushing a BFB Other tests: * Check if the gige driver is loaded * Check that the oob_net0 interface is up and operational * Do the reboot test and powercycle test and check the oob_net0 interface again * Push BFB multiple times and make sure the OOB is up and running [Regression Potential] Since are moving code that hasn't moved since BF2, it is important to make sure that there is no regression. Make sure that the OOB interface is always up and pingable after the reboot test, the powercycle test and after pushing a BFB. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2035128/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2030765] Re: mlxbf-gige: Fix kernel panic after reboot
After performing 1000 reboots, oob interface has an ip and no kernel panic have been discovered. ** Tags removed: verification-needed-focal-linux-bluefield ** Tags added: verification-done-focal-linux-bluefield -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2030765 Title: mlxbf-gige: Fix kernel panic after reboot Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Focal: Fix Committed Status in linux-bluefield source package in Jammy: Fix Committed Bug description: SRU Justification: [Impact] Previously we fixed 2022370 by modifying the shutdown() function but that creates a bug which needs to be reverted. It doesnt deinitialize the oob_net0 interface properly after reboot and causes issues for the UEFI OOB driver. [Fix] * Revert 2022370 changes * Instead, just check that "priv" is not NULL. [Test Case] * Run the reboot test 1000 times * Check that there is no kernel panic after reboot * Check that the UEFI OOB interface gets an ip [Regression Potential] * We could still see a kernel panic since it is very intermittent. QA reproduced it once every 350 reboots. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2030765/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1978079] Re: EFI pstore not cleared on boot
root@bu-lab26v-oob:~# cat /etc/mlnx-release DOCA_2.2.0_BSP_4.2.1_Ubuntu_20.04-2.sru.5.4.0-1070 root@bu-lab26v-oob:~# uname -a Linux bu-lab26v-oob 5.4.0-1070-bluefield #76-Ubuntu SMP PREEMPT Wed Aug 30 16:56:35 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux root@bu-lab26v-oob:~# cat /sys/module/pstore/parameters/backend efi root@bu-lab26v-oob:~# systemctl status systemd-pstore.service ● systemd-pstore.service - Platform Persistent Storage Archival Loaded: loaded (/lib/systemd/system/systemd-pstore.service; enabled; vendor preset: enabled) Active: inactive (dead) Condition: start condition failed at Tue 2023-09-19 14:16:32 UTC; 1 day 1h ago └─ ConditionDirectoryNotEmpty=/sys/fs/pstore was not met Docs: man:systemd-pstore(8) Sep 19 14:16:28 localhost systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped. Sep 19 14:16:32 localhost systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped. root@bu-lab26v-oob:~# echo 1 > /proc/sys/kernel/sysrq root@bu-lab26v-oob:~# echo 1 > /proc/sys/kernel/panic root@bu-lab26v-oob:~# echo "c" > /proc/sysrq-trigger *system rebooted* root@bu-lab26v-oob:~# ls /sys/fs/pstore root@bu-lab26v-oob:~# ls /var/lib/systemd/pstore 169522441 169522442 root@bu-lab26v-oob:~# systemctl status systemd-pstore.service ● systemd-pstore.service - Platform Persistent Storage Archival Loaded: loaded (/lib/systemd/system/systemd-pstore.service; enabled; vendor preset: enabled) Active: active (exited) since Wed 2023-09-20 15:41:29 UTC; 56s ago Docs: man:systemd-pstore(8) Process: 485 ExecStart=/lib/systemd/systemd-pstore (code=exited, status=0/SUCCESS) Main PID: 485 (code=exited, status=0/SUCCESS) Sep 20 15:41:29 localhost systemd-pstore[485]: PStore dmesg-efi-169522441409001 moved to /var/lib/systemd/pstore/169522441/dmesg-efi-169522441409001 Sep 20 15:41:29 localhost systemd-pstore[485]: PStore dmesg-efi-169522441408001 moved to /var/lib/systemd/pstore/169522441/dmesg-efi-169522441408001 Sep 20 15:41:29 localhost systemd-pstore[485]: PStore dmesg-efi-169522441407001 moved to /var/lib/systemd/pstore/169522441/dmesg-efi-169522441407001 Sep 20 15:41:29 localhost systemd-pstore[485]: PStore dmesg-efi-169522441406001 moved to /var/lib/systemd/pstore/169522441/dmesg-efi-169522441406001 Sep 20 15:41:29 localhost systemd-pstore[485]: PStore dmesg-efi-169522441405001 moved to /var/lib/systemd/pstore/169522441/dmesg-efi-169522441405001 Sep 20 15:41:29 localhost systemd-pstore[485]: PStore dmesg-efi-169522441304001 moved to /var/lib/systemd/pstore/169522441/dmesg-efi-169522441304001 Sep 20 15:41:29 localhost systemd-pstore[485]: PStore dmesg-efi-169522441303001 moved to /var/lib/systemd/pstore/169522441/dmesg-efi-169522441303001 Sep 20 15:41:29 localhost systemd-pstore[485]: PStore dmesg-efi-169522441302001 moved to /var/lib/systemd/pstore/169522441/dmesg-efi-169522441302001 Sep 20 15:41:29 localhost systemd-pstore[485]: PStore dmesg-efi-169522441301001 moved to /var/lib/systemd/pstore/169522441/dmesg-efi-169522441301001 Sep 20 15:41:29 localhost systemd[1]: Finished Platform Persistent Storage Archival. ** Tags removed: verification-needed-focal-linux-bluefield ** Tags added: verification-done-focal-linux-bluefield -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/1978079 Title: EFI pstore not cleared on boot Status in linux-bluefield package in Ubuntu: New Status in systemd package in Ubuntu: Fix Released Status in linux-bluefield source package in Focal: Fix Committed Status in systemd source package in Focal: Fix Released Status in systemd source package in Impish: Won't Fix Status in linux-bluefield source package in Jammy: Fix Committed Status in systemd source package in Jammy: Fix Released Status in systemd source package in Kinetic: Fix Released Bug description: [Impact] Systemd has a systemd-pstore component that scans the pstore on boot and if non-empty, takes all previously created dumps, transfers them into its journal and removes the pstore elements. This is very important on UEFI systems, which only have a limited amount of space for variables. In Ubuntu, the kernel is configured with CONFIG_EFI_VARS_PSTORE=m which means the EFI pstore support gets loaded dynamically. In all of my boots, this dynamic module loading happened *after* systemd tried to check for pstore variables. So systemd-pstore never starts and never clears the UEFI variable store. I see this happening in AWS on Graviton instances, which eventually run out of space to store the dumps. On real hardware, this behavior may lead to unbootable systems. ``` $ systemctl status systemd-pstore ○ systemd-pstore.service - Platform Persistent Storage Archival Loaded: loaded (/lib/systemd/system/systemd-pstore.service;
[Kernel-packages] [Bug 2028197] Re: rshim console truncates dmesg output due to tmfifo issue
** Tags removed: verification-needed-focal-linux-bluefield ** Tags added: verification-done-focal-linux-bluefield -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2028197 Title: rshim console truncates dmesg output due to tmfifo issue Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Focal: Fix Committed Status in linux-bluefield source package in Jammy: Fix Committed Bug description: SRU Justification: [Impact] mlxbf_tmfifo driver does not always schedule work for VIRTIO CONSOLE nottification. The result is with rshim console, the dmesg output would be cut off or simply cat a big text file to stdout. [Fix] * Make sure TM_TX_LWM_IRQ bit is set when there is a notification of virtio console event. [Test Case] * Compare the output of dmesg from rshim console and ssh session. Make sure both are the same. * Issue is reproduced w/o fix and compare the same dmesg output from the same as above. [Regression Potential] The changes affect only virtio console event handling only and been tested fully. [Others] This issue is the same on focal and Jammy. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2028197/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2030765] Re: mlxbf-gige: Fix kernel panic after reboot
After performing 1000 reboots, oob interface has an ip and no kernel panic have been discovered. ** Tags removed: verification-needed-jammy ** Tags added: verification-done-jammy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2030765 Title: mlxbf-gige: Fix kernel panic after reboot Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Focal: Fix Committed Status in linux-bluefield source package in Jammy: Fix Committed Bug description: SRU Justification: [Impact] Previously we fixed 2022370 by modifying the shutdown() function but that creates a bug which needs to be reverted. It doesnt deinitialize the oob_net0 interface properly after reboot and causes issues for the UEFI OOB driver. [Fix] * Revert 2022370 changes * Instead, just check that "priv" is not NULL. [Test Case] * Run the reboot test 1000 times * Check that there is no kernel panic after reboot * Check that the UEFI OOB interface gets an ip [Regression Potential] * We could still see a kernel panic since it is very intermittent. QA reproduced it once every 350 reboots. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2030765/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2022370] Re: mlxbf-gige: Fix kernel panic at shutdown
Using 5.15.0-1019-bluefield, After performing 1000 reboots kernel panic not observed. ** Tags removed: verification-needed-jammy ** Tags added: verification-done-jammy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2022370 Title: mlxbf-gige: Fix kernel panic at shutdown Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Focal: Fix Committed Status in linux-bluefield source package in Jammy: Fix Committed Bug description: SRU Justification: [Impact] We occasionally see a race condition (once every 350 reboots) where napi is still running (mlxbf_gige_poll) while a shutdown has been initiated through "reboot". Since mlxbf_gige_poll is still running, it tries to access a NULL pointer and as a result causes a kernel panic. [Fix] The fix is to explicitly disable napi and dequeue it during shutdown. mlxbf_gige_remove already calls: unregister_netdev->unregister_netdevice->unregister_netdev_queue-> rollback_registered->rollback_registered_many->dev_close_many-> __dev_close_many->ndo_stop->mlxbf_gige_stop which stops napi So use mlxbf_gige_remove in place of the existing shutdown logic. [Test Case] * Issue at least 1000 reboots from linux and make sure there is no panic caused by the mlxbf-gige driver. [Regression Potential] * since this issue is hard to reproduce, it hasn't been tested thoroughly yet. so it needs several reboot loops to validate it. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2022370/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2021749] Re: mlxbf-tmfifo: robust fix to drop over-sized packet or no Rx descriptors
** Tags removed: verification-needed-jammy ** Tags added: verification-done-jammy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2021749 Title: mlxbf-tmfifo: robust fix to drop over-sized packet or no Rx descriptors Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Focal: Fix Committed Status in linux-bluefield source package in Jammy: Fix Committed Bug description: SRU Justification: [Impact] This change is needed to avoid potential tmfifo console stuck when network interface is down or receiving oversized packets. [Fix] Drop the Rx packets in the above cases. Since tmfifo is shared resource for both console and networking. Dropping such network packets can continue to process the fifo and avoid console stuck. [Test Case] Same functionality and testing as on BlueField-1/2/3. No functionality change. Add negative tests: 1. 'ifconfig tmfifo_net0 down' on ARM side during traffic, then verify rshim console doesn't stuck. 2. Config MTU to 4000 on rshim host and send oversized packet with command 'ping 192.168.100.2 -s 2000". It shouldn't cause any stuck. [Regression Potential] Same behavior from user perspective. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2021749/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2012571] Re: net/sched: cls_api: Support hardware miss to tc action
Hello, using 5.15.0-1019-bluefield, we see the tuples were offloaded and we also see the offload entries. Thanks ** Tags removed: verification-needed-jammy ** Tags added: verification-done-jammy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2012571 Title: net/sched: cls_api: Support hardware miss to tc action Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Focal: Fix Released Status in linux-bluefield source package in Jammy: Fix Committed Bug description: * Explain the bug(s) Currently tc miss interface only supports resuming from a specific tc chain. If a packet modification is done before a missable action such as CT and there is a miss in CT after it, this may cause a miss match when resuming re-executing the same chain in software, and wrong packet count. This use case for example is a stateless (static) nat. * brief explanation of fixes Add support for missing to a specific action instance, and support of per action hardware stats to update what was actually done in hardware. * How to test Create OVS bridge with 2 devices mlx5 rep devices. Enable HW offload and configure regular connection tracking OpenFlow rules with packet modification before the CT action (such as statless nat): e.g: ovs-ofctl del-flows br-ovs ovs-ofctl add-flow br-ovs arp,actions=normal ovs-ofctl add-flow br-ovs "in_port=1,table=0, ip,ct_state=-trk actions=mod_nw_dst=1.1.1.2,ct(table=1)" ovs-ofctl add-flow br-ovs "in_port=1,table=1, ip,ct_state=+trk+new actions=ct(commit),output:2" ovs-ofctl add-flow br-ovs "in_port=1,table=1, ip,ct_state=+trk+est, actions=output:2" ovs-ofctl add-flow br-ovs "in_port=2,table=0, ip,ct_state=-trk actions=ct(table=1)" ovs-ofctl add-flow br-ovs "in_port=2,table=1, ip,ct_state=+trk+est, actions=mod_nw_src=1.1.1.2,output:1" Config VF1 ip 1.1.1.1, VF2 ip 1.1.1.2 For VF2, add route and static neighbour to floating (router) ip 5.5.5.5 Then run a TCP connection, e.g: on mlx5 VF1 iperf -s #(which will listen on 1.1.1.2) on mlx5 VF2 iperf -c 5.5.5.5 -t 10#(this creates a packet from 1.1.1.1 -> 5.5.5.5, and nat will change this to 1.1.1.1->1.1.1.2) Optional: In different terminal, while traffic is running, check for offload: tcpdump -nnepi tcp and see no iperf tcp packets. Dump conntrack with relevant ip: cat /proc/net/nf_conntrack | grep -i 1.1.1.1 See tuples were offloaded: ipv4 2 tcp 6 src= 1.1.1.1 dst=1.1.1.2 sport=56394 dport=5001 packets=2 bytes=112 src=1.1.1.2 dst=1.1.1.1 sport=5001 dport=56394 packets=1777 bytes=665340 [HW_OFFLOAD] mark=0 zone=0 use=3 * What it could break. offload for modifications + ct and tc packet count. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2012571/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2015293] Re: netfilter: ctnetlink: Support offloaded conntrack entry deletion
Hello, Using 5.15.0-1019-bluefield, conntrack -F properly flushes the tuples. Thanks, ** Tags removed: verification-needed-jammy ** Tags added: verification-done-jammy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2015293 Title: netfilter: ctnetlink: Support offloaded conntrack entry deletion Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Focal: Fix Released Status in linux-bluefield source package in Jammy: Fix Committed Bug description: * Explain the bug(s) conntrack -D or conntrack -F doesn't delete offloaded tuples. * brief explanation of fixes Add support for to delete offloaded tuples via netlink interface and userspace conntrack utility. * How to test Create OVS bridge with 2 devices mlx5 rep devices. Enable HW offload and configure regular connection tracking OpenFlow rules: e.g: ovs-ofctl del-flows br-ovs ovs-ofctl add-flow br-ovs arp,actions=normal ovs-ofctl add-flow br-ovs "table=0, ip,ct_state=-trk actions=ct(table=1)" ovs-ofctl add-flow br-ovs "table=1, ip,ct_state=+trk+new actions=ct(commit),normal" ovs-ofctl add-flow br-ovs "table=1, ip,ct_state=+trk+est, actions=normal" Run a UDP connection, e.g: on mlx5 VF1 iperf -s -u on mlx5 VF2 iperf -c -u -t 10 Optional: In different terminal, while traffic is running, check for offload: tcpdump -nnepi udp and see no iperf udp packets. Dump conntrack with relevant ip: cat /proc/net/nf_conntrack | grep -i See tuples were offloaded: ipv4 2 udp 17 src=1.1.1.2 dst=1.1.1.3 sport=56394 dport=5001 packets=2 bytes=112 src=1.1.1.3 dst=1.1.1.2 sport=5001 dport=56394 packets=1777 bytes=665340 [HW_OFFLOAD] mark=0 zone=0 use=3 Flush the tuples: conntrack -F Verify tuples are deleted: cat /proc/net/nf_conntrack | grep -i Before fix, the above tuple shows again, after fix, it's deleted, and shows nothing. * What it could break. Conntrack -F / -D not working on offloaded tuples. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2015293/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2012571] Re: net/sched: cls_api: Support hardware miss to tc action
Hello, using 5.4.0-1062-bluefield, we see the tuples were offloaded and we also see the offload entries. Thanks ** Tags removed: verification-needed-focal ** Tags added: verification-done-focal -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2012571 Title: net/sched: cls_api: Support hardware miss to tc action Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Focal: Fix Committed Bug description: * Explain the bug(s) Currently tc miss interface only supports resuming from a specific tc chain. If a packet modification is done before a missable action such as CT and there is a miss in CT after it, this may cause a miss match when resuming re-executing the same chain in software, and wrong packet count. This use case for example is a stateless (static) nat. * brief explanation of fixes Add support for missing to a specific action instance, and support of per action hardware stats to update what was actually done in hardware. * How to test Create OVS bridge with 2 devices mlx5 rep devices. Enable HW offload and configure regular connection tracking OpenFlow rules with packet modification before the CT action (such as statless nat): e.g: ovs-ofctl del-flows br-ovs ovs-ofctl add-flow br-ovs arp,actions=normal ovs-ofctl add-flow br-ovs "in_port=1,table=0, ip,ct_state=-trk actions=mod_nw_dst=1.1.1.2,ct(table=1)" ovs-ofctl add-flow br-ovs "in_port=1,table=1, ip,ct_state=+trk+new actions=ct(commit),output:2" ovs-ofctl add-flow br-ovs "in_port=1,table=1, ip,ct_state=+trk+est, actions=output:2" ovs-ofctl add-flow br-ovs "in_port=2,table=0, ip,ct_state=-trk actions=ct(table=1)" ovs-ofctl add-flow br-ovs "in_port=2,table=1, ip,ct_state=+trk+est, actions=mod_nw_src=1.1.1.2,output:1" Config VF1 ip 1.1.1.1, VF2 ip 1.1.1.2 For VF2, add route and static neighbour to floating (router) ip 5.5.5.5 Then run a TCP connection, e.g: on mlx5 VF1 iperf -s #(which will listen on 1.1.1.2) on mlx5 VF2 iperf -c 5.5.5.5 -t 10#(this creates a packet from 1.1.1.1 -> 5.5.5.5, and nat will change this to 1.1.1.1->1.1.1.2) Optional: In different terminal, while traffic is running, check for offload: tcpdump -nnepi tcp and see no iperf tcp packets. Dump conntrack with relevant ip: cat /proc/net/nf_conntrack | grep -i 1.1.1.1 See tuples were offloaded: ipv4 2 tcp 6 src= 1.1.1.1 dst=1.1.1.2 sport=56394 dport=5001 packets=2 bytes=112 src=1.1.1.2 dst=1.1.1.1 sport=5001 dport=56394 packets=1777 bytes=665340 [HW_OFFLOAD] mark=0 zone=0 use=3 * What it could break. offload for modifications + ct and tc packet count. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2012571/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2015293] Re: netfilter: ctnetlink: Support offloaded conntrack entry deletion
Hello, Using 5.4.0-1062-bluefield, conntrack -F properly flushes the tuples. Thanks, ** Tags removed: verification-needed-focal ** Tags added: verification-done-focal -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2015293 Title: netfilter: ctnetlink: Support offloaded conntrack entry deletion Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Focal: Fix Committed Bug description: * Explain the bug(s) conntrack -D or conntrack -F doesn't delete offloaded tuples. * brief explanation of fixes Add support for to delete offloaded tuples via netlink interface and userspace conntrack utility. * How to test Create OVS bridge with 2 devices mlx5 rep devices. Enable HW offload and configure regular connection tracking OpenFlow rules: e.g: ovs-ofctl del-flows br-ovs ovs-ofctl add-flow br-ovs arp,actions=normal ovs-ofctl add-flow br-ovs "table=0, ip,ct_state=-trk actions=ct(table=1)" ovs-ofctl add-flow br-ovs "table=1, ip,ct_state=+trk+new actions=ct(commit),normal" ovs-ofctl add-flow br-ovs "table=1, ip,ct_state=+trk+est, actions=normal" Run a UDP connection, e.g: on mlx5 VF1 iperf -s -u on mlx5 VF2 iperf -c -u -t 10 Optional: In different terminal, while traffic is running, check for offload: tcpdump -nnepi udp and see no iperf udp packets. Dump conntrack with relevant ip: cat /proc/net/nf_conntrack | grep -i See tuples were offloaded: ipv4 2 udp 17 src=1.1.1.2 dst=1.1.1.3 sport=56394 dport=5001 packets=2 bytes=112 src=1.1.1.3 dst=1.1.1.2 sport=5001 dport=56394 packets=1777 bytes=665340 [HW_OFFLOAD] mark=0 zone=0 use=3 Flush the tuples: conntrack -F Verify tuples are deleted: cat /proc/net/nf_conntrack | grep -i Before fix, the above tuple shows again, after fix, it's deleted, and shows nothing. * What it could break. Conntrack -F / -D not working on offloaded tuples. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2015293/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp