Hi vpp-dev, As many of you already know, we tried enabling unit tests in ARM VPP jobs the last release cycle, but we only managed to fix all make test failures during release procedures and we agreed that enabling it would be better after 1810 is released.
Enabling the unit testing (i.e. running full make verify) when there are failures is, in my opinion, a fool's errand. If people see consistent failures that are not related to their patches, they're much less likely to investigate whether this time there's really a legitimate failure and more likely to just ignore the job since it's just not working. That means we'll really need to iron out all of the failures before doing anything else. So let's talk about the failures. There were new failures in master almost immediately after the release and these are seemingly also reproducible on x86 (although I have no idea why they don't show up in CI): * VPP-1475<https://jira.fd.io/browse/VPP-1475> - IP4 random reassembly failure * VPP-1476<https://jira.fd.io/browse/VPP-1476> - L2fib missing packets Not long after these issues, new issues cropped up. At one point even the sanity test didn't pass (that was addressed by https://gerrit.fd.io/r/#/c/15841/, thanks, Neale!) and there was an issue with sessions tests (fixed by https://gerrit.fd.io/r/#/c/15947/, thanks, Florin!). But there are still more issues that need our attention: * VPP-1490<https://jira.fd.io/browse/VPP-1490> - Looks like traffic isn't working on ARM on Ubuntu1604 * VPP-1491<https://jira.fd.io/browse/VPP-1491> - GBD l2 endpoint learning. The tests actually pass with the debug build * VPP-1497<https://jira.fd.io/browse/VPP-1497> - Parallel test execution on ARM produced many more failures. I haven't investigated this much yet * And there is a new failure in a CDP test, this is not in Jira yet (there are some problems with accessing stuff in lab, curses!) This very much seems like a game of whack-a-mole - we fix a few issues and new appear right away. This might suggest that the current approach of me finding issues on an ARM server and then notifying vpp-dev might not be ideal if we want to enable unit testing in 1901 (and we really do! :)). Or maybe this is not the right time to enable testing and we should focus on it more a few weeks before release? What's the best way to ensure that we'll get testing in as soon as possible? In any case, we'll need a lot of help from you. I urge everyone (or at least a few key people) to get access to the FD.io lab (you'll need a GPG key that Ed Warnicke or some other trusted anchor will sign and then request access using the fd.io helpdesk) so that you can use our hardware we're reserved for this purpose. We could also always debug via a call, but that's just not efficient and you'll need some ARM hardware for development anyway (or to just fix issues that show up in verify jobs). When it comes to the individual issues, any feedback is appreciated, like just the author acknowledging the issue and maybe adding whether they have time to look at it or what more information they need. Let's make VPP development much more smoother for ARM ASAP, guys. :) Thanks, Juraj
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#11289): https://lists.fd.io/g/vpp-dev/message/11289 Mute This Topic: https://lists.fd.io/mt/28167603/21656 Group Owner: [email protected] Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
