Hi vpp-dev,

As many of you already know, we tried enabling unit tests in ARM VPP jobs the 
last release cycle, but we only managed to fix all make test failures during 
release procedures and we agreed that enabling it would be better after 1810 is 
released.

Enabling the unit testing (i.e. running full make verify) when there are 
failures is, in my opinion, a fool's errand. If people see consistent failures 
that are not related to their patches, they're much less likely to investigate 
whether this time there's really a legitimate failure and more likely to just 
ignore the job since it's just not working. That means we'll really need to 
iron out all of the failures before doing anything else.

So let's talk about the failures. There were new failures in master almost 
immediately after the release and these are seemingly also reproducible on x86 
(although I have no idea why they don't show up in CI):

*        VPP-1475<https://jira.fd.io/browse/VPP-1475> - IP4 random reassembly 
failure

*        VPP-1476<https://jira.fd.io/browse/VPP-1476> - L2fib missing packets

Not long after these issues, new issues cropped up. At one point even the 
sanity test didn't pass (that was addressed by 
https://gerrit.fd.io/r/#/c/15841/, thanks, Neale!) and there was an issue with 
sessions tests (fixed by https://gerrit.fd.io/r/#/c/15947/, thanks, Florin!). 
But there are still more issues that need our attention:

*        VPP-1490<https://jira.fd.io/browse/VPP-1490> - Looks like traffic 
isn't working on ARM on Ubuntu1604

*        VPP-1491<https://jira.fd.io/browse/VPP-1491> - GBD l2 endpoint 
learning. The tests actually pass with the debug build

*        VPP-1497<https://jira.fd.io/browse/VPP-1497> - Parallel test execution 
on ARM produced many more failures. I haven't investigated this much yet

*        And there is a new failure in a CDP test, this is not in Jira yet 
(there are some problems with accessing stuff in lab, curses!)

This very much seems like a game of whack-a-mole - we fix a few issues and new 
appear right away. This might suggest that the current approach of me finding 
issues on an ARM server and then notifying vpp-dev might not be ideal if we 
want to enable unit testing in 1901 (and we really do! :)). Or maybe this is 
not the right time to enable testing and we should focus on it more a few weeks 
before release? What's the best way to ensure that we'll get testing in as soon 
as possible?

In any case, we'll need a lot of help from you. I urge everyone (or at least a 
few key people) to get access to the FD.io lab (you'll need a GPG key that Ed 
Warnicke or some other trusted anchor will sign and then request access using 
the fd.io helpdesk) so that you can use our hardware we're reserved for this 
purpose. We could also always debug via a call, but that's just not efficient 
and you'll need some ARM hardware for development anyway (or to just fix issues 
that show up in verify jobs).

When it comes to the individual issues, any feedback is appreciated, like just 
the author acknowledging the issue and maybe adding whether they have time to 
look at it or what more information they need.

Let's make VPP development much more smoother for ARM ASAP, guys. :)

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11289): https://lists.fd.io/g/vpp-dev/message/11289
Mute This Topic: https://lists.fd.io/mt/28167603/21656
Group Owner: [email protected]
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to