Public bug reported:
SRU Justification:
[Impact]
This is reproducible on systems which already have heavy background
traffic. On top of that, the user issues one of the 2 docker pulls below:
docker pull nvcr.io/ea-doca-hbn/hbn/hbn:latest
OR
docker pull gitlab-master.nvidia.com:5005/dl/dgx/tritonserver:22.02-py3-qa
The second one is a very large container (17GB)
When they run docker pull, the OOB interface stops being pingable,
the docker pull is interrupted for a very long time (>3mn) or
times out.
[Fix]
* Update the RX_CQE_CI before updating the RX_PI to avoid a race condition
where we wrongly inform HW that there is space for the WQE.
* disable the RX DMA while we are handling incoming packets to avoid overflow.
[Test Case]
* Created a script which loops 200 times and does a docker pull in each loop:
docker pull nvcr.io/ea-doca-hbn/hbn/hbn:latest
OR
docker pull gitlab-master.nvidia.com:5005/dl/dgx/tritonserver:22.02-py3-qa
[Regression Potential]
* This could result in slower handling since we are disabling/enabling the DMA
periodically.
* Although this fix has been tested by the people who opened the bug, QA needs
to thoroughly test it to make sure it is not reproducible.
** Affects: linux-bluefield (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1964984
Title:
Fix OOB handling RX packets in heavy traffic
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1964984/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs