Re: FreeBSD CI builds fail

2019-07-29 Thread Jerome Magnin
On Tue, Jul 23, 2019 at 08:37:37PM +0200, Jerome Magnin wrote:
> On Tue, Jul 23, 2019 at 07:09:57PM +0200, Tim Düsterhus wrote:
> > Jérôme,
> > Ilya,
> > 
> > I noticed that FreeBSD CI fails since
> > https://github.com/haproxy/haproxy/commit/885f64fb6da0a349dd3182d21d337b528225c517.
> > 
> > 
> > One example is here: https://github.com/haproxy/haproxy/runs/169980019
> > 
> > It should be investigated whether the reg-test is valid for FreeBSD and
> > either be fixed or disabled.
> > 
> > Best regards
> > Tim Düsterhus
> > 
> Thanks Tim and Ilya,
> 
> This one fails because there's a L4 timeout, I can probably update the regex 
> to
> take that into account, the interesting part is the failure and the step at
> which it fails, but for now we expect a connection failure and not a timeout.
> 
> I'm a bit more concerned about the other one reported by Ilya where the 
> backend
> server started by VTest won't accept connection. I'll look into this one
> further.

We have decided to exclude this test on non Linux system for the time being as
it triggers a race condition in VTest.
https://github.com/haproxy/haproxy/commit/0d00b544c3bdc9dc1796aca28bad46b3c1867184

Jérôme



Re: FreeBSD CI builds fail

2019-07-24 Thread Willy Tarreau
On Wed, Jul 24, 2019 at 10:01:33AM +0200, Tim Düsterhus wrote:
> Am 24.07.19 um 05:55 schrieb Willy Tarreau:
> > I also noticed the build failure but couldn't find any link to the build
> > history to figure when it started to fail. How did you figure that the
> > commit above was the first one ?
> 
> While I did it as Ilya did by scrolling through GitHub's commit list,

That was the least natural way for me to do it. Thank Ilya for the
screenshot by the way. I clicked on the red cross, the the freebsd
link reporting the failure, and searched the history there but couldn't
find it.

> there is also:
> 
> Travis: https://travis-ci.com/haproxy/haproxy/builds
> Cirrus: https://cirrus-ci.com/github/haproxy/haproxy

Ah yes this one is more useful, that's what I was looking for. I just
cannot figure how to reach it when I'm on the build status page :-/

> Keep in mind for both that only the current head after a push is being
> built, so larger pushes might hide issues to CI.

Of course! But the goal is not to build every single commit either but
to detect early that something went wrong instead of discovering it after
a version is released as we used to in the past.

> In this specific case
> the offending patch was pushed together with 7764a57d3292b6b4f1e488b
> ("BUG/MEDIUM: threads: cpu-map designating a single") and only the
> latter was tested.

Yep!

> > Ideally we'd need a level of failure in CI builds. Some should be just of
> > level "info" and not cause a build error because we'd know they are likely
> > to fail but are still interested in the logs. But I don't think we can do
> > this.
> > 
> 
> I'm not sure this is possible either, but I also don't think it's a good
> idea, because then you get used to this kind of issue and ignore it. For
> example this one would probably have been written off as "ah, it's just
> flaky" instead of actually investigating what's wrong:
> https://github.com/haproxy/haproxy/issues/118

It's true. But what is also true is that the tests are not meant to be
run in the CI build environment but on developers' machines first. Being
able to run in the CI env is a bonus. As a aside effect of some technical
constraints imposed by such environments (slow VMs with flaky timings,
host enforcing at least a little bit of security, etc) we do expect that
some tests will randomly fail. These ones could be tagged as such and
just report a counter of failures among the more or less expected ones.
When you're used to see that 4 to 6 tests usually fail and suddenly you
find 13 that have failed, you can be interested in having a look there,
even if it's possibly to just start it again to confirm. And these ones
should not fail at all in more controlled environments.

There's nothing really problematic here in the end, this just constantly
reminds us that not all tests can be automated.

By the way maybe we could have some form of exclusion for tags instead
of deciding that a test only belongs to one type. Because the reality
is that we do *not* want to run certain tests. The most common ones we
don't want to run locally are "slow" and "bug", which are already
exclusive to each other. But by tagging tests with multiple labels we
could then decide to exclude some labels during the build. And in this
case we could tag some tests as "flaky-on-cirrus", "flaky-on-travis",
"flaky-in-vm", "flaky-in-container", "flaky-firewall" etc and ignore
them in such environments.

Cheers,
Willy



Re: FreeBSD CI builds fail

2019-07-24 Thread Tim Düsterhus
Willy,

Am 24.07.19 um 05:55 schrieb Willy Tarreau:
> I also noticed the build failure but couldn't find any link to the build
> history to figure when it started to fail. How did you figure that the
> commit above was the first one ?

While I did it as Ilya did by scrolling through GitHub's commit list,
there is also:

Travis: https://travis-ci.com/haproxy/haproxy/builds
Cirrus: https://cirrus-ci.com/github/haproxy/haproxy

Keep in mind for both that only the current head after a push is being
built, so larger pushes might hide issues to CI. In this specific case
the offending patch was pushed together with 7764a57d3292b6b4f1e488b
("BUG/MEDIUM: threads: cpu-map designating a single") and only the
latter was tested.

>> This one fails because there's a L4 timeout, I can probably update the regex 
>> to
>> take that into account, the interesting part is the failure and the step at
>> which it fails, but for now we expect a connection failure and not a timeout.
> 
> There's always the possibility (especially in CI environments) that some
> rules are in place on the system to prevent connections to unexpected ports.
> 
> Ideally we'd need a level of failure in CI builds. Some should be just of
> level "info" and not cause a build error because we'd know they are likely
> to fail but are still interested in the logs. But I don't think we can do
> this.
> 

I'm not sure this is possible either, but I also don't think it's a good
idea, because then you get used to this kind of issue and ignore it. For
example this one would probably have been written off as "ah, it's just
flaky" instead of actually investigating what's wrong:
https://github.com/haproxy/haproxy/issues/118

Best regards
Tim Düsterhus



Re: FreeBSD CI builds fail

2019-07-24 Thread Илья Шипицин
ср, 24 июл. 2019 г. в 08:55, Willy Tarreau :

> Hi guys,
>
> On Tue, Jul 23, 2019 at 08:37:37PM +0200, Jerome Magnin wrote:
> > On Tue, Jul 23, 2019 at 07:09:57PM +0200, Tim Düsterhus wrote:
> > > Jérôme,
> > > Ilya,
> > >
> > > I noticed that FreeBSD CI fails since
> > >
> https://github.com/haproxy/haproxy/commit/885f64fb6da0a349dd3182d21d337b528225c517
> .
> > >
> > >
> > > One example is here: https://github.com/haproxy/haproxy/runs/169980019
>
> I also noticed the build failure but couldn't find any link to the build
> history to figure when it started to fail. How did you figure that the
> commit above was the first one ?
>


[image: Screenshot from 2019-07-24 11-43-30.png]


>
> > This one fails because there's a L4 timeout, I can probably update the
> regex to
> > take that into account, the interesting part is the failure and the step
> at
> > which it fails, but for now we expect a connection failure and not a
> timeout.
>
> There's always the possibility (especially in CI environments) that some
> rules are in place on the system to prevent connections to unexpected
> ports.
>
> Ideally we'd need a level of failure in CI builds. Some should be just of
> level "info" and not cause a build error because we'd know they are likely
> to fail but are still interested in the logs. But I don't think we can do
> this.
>
> Willy
>


Re: FreeBSD CI builds fail

2019-07-23 Thread Willy Tarreau
Hi guys,

On Tue, Jul 23, 2019 at 08:37:37PM +0200, Jerome Magnin wrote:
> On Tue, Jul 23, 2019 at 07:09:57PM +0200, Tim Düsterhus wrote:
> > Jérôme,
> > Ilya,
> > 
> > I noticed that FreeBSD CI fails since
> > https://github.com/haproxy/haproxy/commit/885f64fb6da0a349dd3182d21d337b528225c517.
> > 
> > 
> > One example is here: https://github.com/haproxy/haproxy/runs/169980019

I also noticed the build failure but couldn't find any link to the build
history to figure when it started to fail. How did you figure that the
commit above was the first one ?

> This one fails because there's a L4 timeout, I can probably update the regex 
> to
> take that into account, the interesting part is the failure and the step at
> which it fails, but for now we expect a connection failure and not a timeout.

There's always the possibility (especially in CI environments) that some
rules are in place on the system to prevent connections to unexpected ports.

Ideally we'd need a level of failure in CI builds. Some should be just of
level "info" and not cause a build error because we'd know they are likely
to fail but are still interested in the logs. But I don't think we can do
this.

Willy



Re: FreeBSD CI builds fail

2019-07-23 Thread Jerome Magnin
On Tue, Jul 23, 2019 at 07:09:57PM +0200, Tim Düsterhus wrote:
> Jérôme,
> Ilya,
> 
> I noticed that FreeBSD CI fails since
> https://github.com/haproxy/haproxy/commit/885f64fb6da0a349dd3182d21d337b528225c517.
> 
> 
> One example is here: https://github.com/haproxy/haproxy/runs/169980019
> 
> It should be investigated whether the reg-test is valid for FreeBSD and
> either be fixed or disabled.
> 
> Best regards
> Tim Düsterhus
> 
Thanks Tim and Ilya,

This one fails because there's a L4 timeout, I can probably update the regex to
take that into account, the interesting part is the failure and the step at
which it fails, but for now we expect a connection failure and not a timeout.

I'm a bit more concerned about the other one reported by Ilya where the backend
server started by VTest won't accept connection. I'll look into this one
further.

Jérôme