[Kernel-packages] [Bug 2059310] Re: mlxbf_gige: call request_irq() after NAPI initialized

2024-04-08 Thread Tien Do
Verification

#
root@bu-lab62v2-oob:~# bfver
--/dev/mmcblk0boot0
BlueField ATF version: v2.2(release):4.7.0-17-g55e782a
BlueField UEFI version: 4.7.0-30-g91cad60
BlueField BSP version: 4.7.0.13101

OS Release Version: bf-bundle-2.7.0-11_24.04_ubuntu-22.04_dev
root@bu-lab62v2-oob:~#   

#
root@bu-lab62v2-oob:~# ifconfig oob_net0
oob_net0: flags=4163  mtu 1500
inet 10.15.1.66  netmask 255.255.255.0  broadcast 10.15.1.255
inet6 fe80::a288:c2ff:fe0e:865a  prefixlen 64  scopeid 0x20
ether a0:88:c2:0e:86:5a  txqueuelen 1000  (Ethernet)
RX packets 202  bytes 22026 (22.0 KB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 65  bytes 5360 (5.3 KB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

root@bu-lab62v2-oob:~#



root@bu-lab62v2-oob:~# kdump-config show
DUMP_MODE:  kdump
USE_KDUMP:  1
KDUMP_COREDIR:  /var/crash
crashkernel addr: 0xdd00
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-5.15.0-1040-bluefield
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic link to 
/var/lib/kdump/initrd.img-5.15.0-1040-bluefield
current state:ready to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-5.15.0-1040-bluefield 
root=UUID=0790b0b1-2540-4819-aed6-0e26caf975f5 ro console=hvc0 console=ttyAMA0 
earlycon=pl011,0x1301 fixrtc net.ifnames=0 biosdevname=0 
iommu.passthrough=1 isolcpus=6,7 reset_devices 
systemd.unit=kdump-tools-dump.service nr_cpus=1" 
--initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
root@bu-lab62v2-oob:~#   



root@bu-lab62v2-oob:~# echo c > /proc/sysrq-trigger
[  443.856844] sysrq: Trigger a crash
[  443.861063] Kernel panic - not syncing: sysrq triggered crash
[  443.866799] CPU: 0 PID: 10859 Comm: bash Kdump: loaded Tainted: G   
OE 5.15.0-1040-bluefield #42-Ubuntu
[  443.877215] Hardware name: https://www.mellanox.com BlueField SoC/BlueField 
SoC, BIOS 4.7.0.13101 Apr  5 2024
[  443.887109] Call trace:
[  443.889540]  dump_backtrace+0x0/0x200
[  443.893194]  show_stack+0x20/0x2c
[  443.896494]  dump_stack_lvl+0x68/0x84
[  443.900145]  dump_stack+0x18/0x34
[  443.903444]  panic+0x1b0/0x3a0
[  443.906486]  sysrq_reset_seq_param_set+0x0/0x9c
[  443.911003]  __handle_sysrq+0xc4/0x250
[  443.914737]  write_sysrq_trigger+0xbc/0x1a0
[  443.918905]  proc_reg_write+0xb0/0x10c
[  443.922640]  vfs_write+0xf8/0x2c4
[  443.925941]  ksys_write+0x70/0x100
[  443.929328]  __arm64_sys_write+0x24/0x30
[  443.933235]  invoke_syscall+0x78/0x100
[  443.936971]  el0_svc_common.constprop.0+0x54/0x184
[  443.941747]  do_el0_svc+0x30/0xac
[  443.945046]  el0_svc+0x48/0x160
[  443.948175]  el0t_64_sync_handler+0xa4/0x12c
[  443.952430]  el0t_64_sync+0x1a4/0x1a8
[  443.956082] SMP: stopping secondary CPUs
[  443.960307] Starting crashdump kernel...
[  443.964217] Bye!


** Changed in: linux-bluefield (Ubuntu Jammy)
   Status: Confirmed => Fix Committed

** Tags removed: verification-needed-jammy-linux-bluefield
** Tags added: verification-done-jammy-linux-bluefield

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-bluefield in Ubuntu.
https://bugs.launchpad.net/bugs/2059310

Title:
  mlxbf_gige: call request_irq() after NAPI initialized

Status in linux-bluefield package in Ubuntu:
  New
Status in linux-bluefield source package in Jammy:
  Fix Committed

Bug description:
  SRU Justification:

  [Impact]
  The mlxbf_gige driver encounters a NULL pointer exception in
  mlxbf_gige_open() when kdump is enabled.  The exception happens
  because there is a pending RX interrupt before the call to
  request_irq(RX IRQ) executes.  Then, the RX IRQ handler fires
  immediately after this request_irq() completes. The RX IRQ handler
  runs "napi_schedule()" before NAPI is fully initialized via
  "netif_napi_add()" and "napi_enable()", both which happen later
  in the open() logic.

  [Fix]
  The logic in mlxbf_gige_open() must fully initialize NAPI before
  any calls to request_irq() execute.

  [Test Case]
  * Boot BF platform and bring up "oob_net0" interface
  * Enable kdump completely
  * Trigger kdump via "echo c > /proc/sysrq-trigger"
  * There should be no exceptions from mlxbf_gige driver

  [Regression Potential]
  There is low potential for regression as this brings in upstream content.

  [Other]
  None

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2059310/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2059310] Re: mlxbf_gige: call request_irq() after NAPI initialized

2024-04-08 Thread Tien Do
** Changed in: linux-bluefield (Ubuntu Jammy)
   Status: Fix Committed => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-bluefield in Ubuntu.
https://bugs.launchpad.net/bugs/2059310

Title:
  mlxbf_gige: call request_irq() after NAPI initialized

Status in linux-bluefield package in Ubuntu:
  New
Status in linux-bluefield source package in Jammy:
  Fix Committed

Bug description:
  SRU Justification:

  [Impact]
  The mlxbf_gige driver encounters a NULL pointer exception in
  mlxbf_gige_open() when kdump is enabled.  The exception happens
  because there is a pending RX interrupt before the call to
  request_irq(RX IRQ) executes.  Then, the RX IRQ handler fires
  immediately after this request_irq() completes. The RX IRQ handler
  runs "napi_schedule()" before NAPI is fully initialized via
  "netif_napi_add()" and "napi_enable()", both which happen later
  in the open() logic.

  [Fix]
  The logic in mlxbf_gige_open() must fully initialize NAPI before
  any calls to request_irq() execute.

  [Test Case]
  * Boot BF platform and bring up "oob_net0" interface
  * Enable kdump completely
  * Trigger kdump via "echo c > /proc/sysrq-trigger"
  * There should be no exceptions from mlxbf_gige driver

  [Regression Potential]
  There is low potential for regression as this brings in upstream content.

  [Other]
  None

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2059310/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2020606] Re: net: openvswitch: fix race on port output

2023-07-07 Thread Tien Do
** Tags removed: verification-needed-jammy
** Tags added: verification-done-jammy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-bluefield in Ubuntu.
https://bugs.launchpad.net/bugs/2020606

Title:
  net: openvswitch: fix race on port output

Status in linux-bluefield package in Ubuntu:
  Invalid
Status in linux-bluefield source package in Focal:
  Fix Committed
Status in linux-bluefield source package in Jammy:
  Fix Committed

Bug description:
  fix race on port output

  * Explain the bug(s)

  there is a race condition on port output

  * brief explanation of fixes

  We found that upstream linux has the same issue and fix

  commit 066b86787fa3d97b7aefb5ac0a99a22dad2d15f8
  Author: Felix Huettner 
  Date:   Wed Apr 5 07:53:41 2023 +

  net: openvswitch: fix race on port output

  assume the following setup on a single machine:
  1. An openvswitch instance with one bridge and default flows
  2. two network namespaces "server" and "client"
  3. two ovs interfaces "server" and "client" on the bridge
  4. for each ovs interface a veth pair with a matching name and 32 rx and
 tx queues
  5. move the ends of the veth pairs to the respective network namespaces
  6. assign ip addresses to each of the veth ends in the namespaces (needs
 to be the same subnet)
  7. start some http server on the server network namespace
  8. test if a client in the client namespace can reach the http server
  ...
  Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
  Co-developed-by: Luca Czesla 
  Signed-off-by: Luca Czesla 
  Signed-off-by: Felix Huettner 
  Reviewed-by: Eric Dumazet 
  Reviewed-by: Simon Horman 
  Link: https://lore.kernel.org/r/ZC0pBXBAgh7c76CA@kernel-bug-kernel-bug
  Signed-off-by: Jakub Kicinski 

  * How to test
  After configuring QoS on BF-3:
  Disable all priorities
  Set priorities [0, 0, 0, 0, 0, 0, 0, 0]
  Remove all policies
  Remove all traffic classes
  Add policy (policy_name=ND_traffic, traffic_type=RDMA, priority=5, 
port=2, dscp=None)
  Add policy (policy_name=TCP_traffic, traffic_type=TCP, priority=4, 
port=26000, dscp=None)
  Add traffic class two prio TC with prio_list=[5, 4], min_bw=76, algorithm=ETS
  Enable QOS on adapter SLOT 5 Port 1
  Restart driver (detach VFs -> restart with devcon -> attach VFs)

  BF-3 sometimes stops responding. need to run it multiple times!

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2020606/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1989495] Re: mlxbf_gige: need to clear MDIO gateway lock after read

2023-02-17 Thread Tien Do
** Tags removed: kernel-spammed-jammy-linux-bluefield verification-needed-jammy
** Tags added: verification-done-jammy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-bluefield in Ubuntu.
https://bugs.launchpad.net/bugs/1989495

Title:
  mlxbf_gige: need to clear MDIO gateway lock after read

Status in linux-bluefield package in Ubuntu:
  Invalid
Status in linux-bluefield source package in Focal:
  Fix Released
Status in linux-bluefield source package in Jammy:
  Fix Committed

Bug description:
  SRU Justification:

  [Impact]
  The BlueField-2 GIGE logic accesses the MDIO device via reads/writes
  to a gateway (GW) register. The MDIO GW lock is set after read. Make
  sure to always clear to indicate that the GW register is not being used.
  If the lock is mistakenly interpreted as ACTIVE, then subsequent MDIO
  accesses will be blocked and PHY device will be inaccessible.

  [Fix]
  For each MDIO read and write transaction, the last step should
  be to clear the MDIO GW lock.

  [Test Case]
  Boot the BlueField-2 platform
  Bring up the "oob_net0" interface via DHCP or static IP
  Ping and file transfer over "oob_net0" should work properly
  Bounce the "oob_net0" interface a few times and repeat tests

  [Regression Potential]
  * Low risk for causing a regression, tested well in our lab.

  [Other]
  * None

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1989495/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1991551] Re: i2c-mlxbf.c: Sync up driver with upstreaming

2023-02-17 Thread Tien Do
** Tags removed: kernel-spammed-jammy-linux-bluefield verification-needed-jammy
** Tags added: verification-done-jammy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-bluefield in Ubuntu.
https://bugs.launchpad.net/bugs/1991551

Title:
  i2c-mlxbf.c: Sync up driver with upstreaming

Status in linux-bluefield package in Ubuntu:
  Invalid
Status in linux-bluefield source package in Focal:
  Fix Released
Status in linux-bluefield source package in Jammy:
  Fix Committed

Bug description:
  SRU Justification:

  [Impact]

  Several i2c-mlxbf.c patches have been upstreamed recently and are in
  linux-next at the moment. So revert all changes in both Focal (5.4)
  and Jammy (5.15) and cherry-pick all changes from linux-next master.

  IMPORTANT NOTE: The new linux i2c support for BF3 also REQUIRES
  upgrading the UEFI firmware version. For BF1 and BF2 however, this
  driver should be backward compatible with older UEFI versions.

  [Fix]

  * There is a total of 17 commits: 8 reverts, 8 cherry-picks from
  linux-next and one internal "UBUNTU: SAUCE:" to ad the driver version.
  Following is the list of commits cherry-picked.

  * i2c: mlxbf: incorrect base address passed during io write
  From 


  * i2c: mlxbf: prevent stack overflow in mlxbf_i2c_smbus_start_transaction()
  From 


  * i2c: mlxbf: remove IRQF_ONESHOT
  From 


  * i2c: mlxbf: Fix frequency calculation
  From 


  * i2c: mlxbf: support lock mechanism
  From 


  * i2c: mlxbf: add multi slave functionality
  From 


  * i2c: mlxbf: support BlueField-3 SoC
  From 


  * i2c: mlxbf: remove device tree support
  From 


  * Also add the i2c driver version to keep track of the changes
  internally.

  [Test Case]

  * Check that the i2c driver loads without errors on BF1 and BF2 (dmesg, and 
check i2c1 sysfs is created)
  * Check that the i2c module can be removed and reloaded without errors.
  * Check that IPMB services work successfully on systems with BMC. This is the 
best way to test the i2c driver. Run ipmitool command from the BF: ipmitool mc 
info
  * Run ipmitool from the BMC as well: ipmitool -I ipmb mc info
  * This driver should be backward compatible
  * Make sure this driver is backward compatible with older UEFI version (only 
applicable for BF1 and BF2)
  * Make sure this driver works for BF1, BF2 and BF3 with the latest UEFI 
version.

  [Regression Potential]

  * The i2c driver could fail to load because the UEFI firmware hasn't been 
upgraded
  * The i2c driver would fail to load due to a bug
  * IPMB code which utilises the i2c driver fails to work (ipmitool commands)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1991551/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1998863] Re: mlxbf-pmc: Fix event string typo

2023-02-17 Thread Tien Do
** Tags removed: kernel-spammed-focal-linux-bluefield 
kernel-spammed-jammy-linux-bluefield verification-needed-focal 
verification-needed-jammy
** Tags added: verification-done-jammy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-bluefield in Ubuntu.
https://bugs.launchpad.net/bugs/1998863

Title:
  mlxbf-pmc: Fix event string typo

Status in linux-bluefield package in Ubuntu:
  New
Status in linux-bluefield source package in Focal:
  Fix Committed
Status in linux-bluefield source package in Jammy:
  Fix Committed

Bug description:
  SRU Justification:

  [Impact]

  Due to a typo, duplicate events are shown

  [Fix]

  * Fix event name typo.

  [Test Case]

  * cd /sys/class/hwmon/hwmon0/tilenetX ("X" may be 0..7)
  root@localhost:/sys/class/hwmon/hwmon0/tilenet3# cat event_list | grep CRED
  15: CDN_DIAG_N_OUT_OF_CRED
  16: CDN_DIAG_S_OUT_OF_CRED
  17: CDN_DIAG_E_OUT_OF_CRED
  18: CDN_DIAG_W_OUT_OF_CRED
  19: CDN_DIAG_C_OUT_OF_CRED
  25: DDN_DIAG_N_OUT_OF_CRED
  26: DDN_DIAG_S_OUT_OF_CRED
  27: DDN_DIAG_E_OUT_OF_CRED
  28: DDN_DIAG_W_OUT_OF_CRED
  29: DDN_DIAG_C_OUT_OF_CRED
  35: NDN_DIAG_N_OUT_OF_CRED  <== Incorrectly was set to NDN_DIAG_S_OUT_OF_CRED
  36: NDN_DIAG_S_OUT_OF_CRED
  37: NDN_DIAG_E_OUT_OF_CRED
  38: NDN_DIAG_W_OUT_OF_CRED
  39: NDN_DIAG_C_OUT_OF_CRED
   

  [Regression Potential]

  * Duplicate events still get shown

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1998863/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1995148] Re: mlxbf_gige: need to add BlueField-3 support

2023-02-17 Thread Tien Do
** Tags removed: kernel-spammed-jammy-linux-bluefield verification-needed-jammy
** Tags added: verification-done-jammy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-bluefield in Ubuntu.
https://bugs.launchpad.net/bugs/1995148

Title:
  mlxbf_gige: need to add BlueField-3 support

Status in linux-bluefield package in Ubuntu:
  Invalid
Status in linux-bluefield source package in Jammy:
  Fix Committed

Bug description:
  SRU Justification:

  [Impact]

  The GIGE driver ("mlxbf_gige") in the Jammy repo does not yet
  support the BlueField-3 SoC.  The "mlxbf_gige" driver which
  supports BlueField-2 SoC currently will be extended to include
  the BlueField-3 support.

  [Fix]
  The BlueField-3 support will be provided in the form of a set
  of 4 patches, to be sent in as a pull request. 

  [Test Case]
  BlueField-2: Boot platform and bring up "oob_net0" interface properly
   Test that network traffic works properly
  BlueField-3: Tested in simulator platforms and early silicon

  [Regression Potential]
  Since this is initial support there is chance of regression
  on BlueField-2 platforms, but it's been tested there.

  [Other]
  n/a

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1995148/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1998857] Re: tick/rcu: Stop allowing RCU_SOFTIRQ in idle

2023-02-17 Thread Tien Do
** Tags removed: kernel-spammed-jammy-linux-bluefield verification-needed-jammy
** Tags added: verification-done-jammy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-bluefield in Ubuntu.
https://bugs.launchpad.net/bugs/1998857

Title:
  tick/rcu: Stop allowing RCU_SOFTIRQ in idle

Status in linux-bluefield package in Ubuntu:
  New
Status in linux-bluefield source package in Focal:
  Fix Committed
Status in linux-bluefield source package in Jammy:
  Fix Committed

Bug description:
  * Explain the bug
  RCU_SOFTIRQ used to be special in that it could be raised on purpose
  within the idle path to prevent from stopping the tick. Some code still
  prevents from unnecessary warnings related to this specific behaviour
  while entering in dynticks-idle mode.

  However the nohz layout has changed quite a bit in ten years, and the
  removal of CONFIG_RCU_FAST_NO_HZ has been the final straw to this
  safe-conduct. Now the RCU_SOFTIRQ vector is expected to be raised from
  sane places.

  * Brief explanation of fixes
   
  Stop allowing RCU_SOFTIRQ in idle
   
  * How to test
   
  In boot dmesg, the warning should be gone.
   
  * What it could break.
   
  N/A

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1998857/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp