Re: Tests timeout on my ARM64 test VM

2020-03-15 Thread Илья Шипицин
tests are easy to reproduce manually.

each test consiss of few parts

1) prepare config
2) start haproxy
3) perform queries against it


it might be something like iptables (just guess).
I would have a look inside some (failing) vtc and create config to start
haproxy manually

On Sun, Mar 15, 2020, 6:30 PM Martin Grigorov 
wrote:

> Hi Willy,
>
> On Fri, Mar 13, 2020 at 10:03 PM Willy Tarreau  wrote:
>
>> Hi Martin,
>>
>> On Fri, Mar 13, 2020 at 12:35:12PM +0200, Martin Grigorov wrote:
>> > Hi ,
>> >
>> > Suddenly today the build is again green here!
>> > I didn't make any changes to my testing setup.
>> > It must be something on the OS level but I wasn't able to figure out
>> what
>> > makes the HAProxy tests timeout in the last several days.
>>
>> We've had issues with the abns test on other platforms in the past,
>> namely s390x and ppc64le. It used to occasionally break on x86_64 as
>> well but far less frequently. It was affected by two bugs that were
>> solved yesterday after a few days of investigation and testing. We've
>> seen yet another failure again on ppc64 while it was expected not to
>> fail, so I'd be careful before claiming victory. However the abns
>> test is extremely time sensitive and uses short delays around 15ms to
>> try to trigger the issue, and in a VM it is possible to see this
>> happen from time to time due to noisy neighbors. That's why I'm
>> staying extremely prudent on the verdict. The PPC64 machine I tested
>> on is provided by Minicloud and is a VM running on a real CPU, so it's
>> much less affected by timing issues. I've run the test several hundreds
>> of times in a row and couldn't make it fail anymore.
>>
>> So don't worry too much if it appeared and disappeared. The change
>> that emphasized it was the increase in default maxconn (304e17eb8),
>> apparently just due to a slightly longer startup time! And the ones
>> expected to have fixed it are between bdb00c5d and 4b3f27b included.
>>
>> Note that I didn't manage to make it fail on arm64 (real machine,
>> SolidRun's Macchiatobin).
>>
>> Hoping this clarifies the situation.
>>
>
> Thank you for this explanation!
>
> The problem here was that the tests were failing even with older commits.
> I've tried to git bisect the problem but no matter how far back in Git
> history I went the problem was still there. But the logs from the tests
> runs from just few days back were OK, no errors & timeouts.
> Also the ARM64 tests on TravisCI were OK. And Travis's ARM64 nodes are
> less powerful than mine VM.
> Those are the reasons I believe that the problem was at my VM.
> I just needed help with reading HAProxy's tests error logs. I didn't know
> how to approach debugging them.
>
> Regards,
> Martin
>
>
>>
>> Regards,
>> Willy
>>
>


Re: Tests timeout on my ARM64 test VM

2020-03-15 Thread Martin Grigorov
Hi Willy,

On Fri, Mar 13, 2020 at 10:03 PM Willy Tarreau  wrote:

> Hi Martin,
>
> On Fri, Mar 13, 2020 at 12:35:12PM +0200, Martin Grigorov wrote:
> > Hi ,
> >
> > Suddenly today the build is again green here!
> > I didn't make any changes to my testing setup.
> > It must be something on the OS level but I wasn't able to figure out what
> > makes the HAProxy tests timeout in the last several days.
>
> We've had issues with the abns test on other platforms in the past,
> namely s390x and ppc64le. It used to occasionally break on x86_64 as
> well but far less frequently. It was affected by two bugs that were
> solved yesterday after a few days of investigation and testing. We've
> seen yet another failure again on ppc64 while it was expected not to
> fail, so I'd be careful before claiming victory. However the abns
> test is extremely time sensitive and uses short delays around 15ms to
> try to trigger the issue, and in a VM it is possible to see this
> happen from time to time due to noisy neighbors. That's why I'm
> staying extremely prudent on the verdict. The PPC64 machine I tested
> on is provided by Minicloud and is a VM running on a real CPU, so it's
> much less affected by timing issues. I've run the test several hundreds
> of times in a row and couldn't make it fail anymore.
>
> So don't worry too much if it appeared and disappeared. The change
> that emphasized it was the increase in default maxconn (304e17eb8),
> apparently just due to a slightly longer startup time! And the ones
> expected to have fixed it are between bdb00c5d and 4b3f27b included.
>
> Note that I didn't manage to make it fail on arm64 (real machine,
> SolidRun's Macchiatobin).
>
> Hoping this clarifies the situation.
>

Thank you for this explanation!

The problem here was that the tests were failing even with older commits.
I've tried to git bisect the problem but no matter how far back in Git
history I went the problem was still there. But the logs from the tests
runs from just few days back were OK, no errors & timeouts.
Also the ARM64 tests on TravisCI were OK. And Travis's ARM64 nodes are less
powerful than mine VM.
Those are the reasons I believe that the problem was at my VM.
I just needed help with reading HAProxy's tests error logs. I didn't know
how to approach debugging them.

Regards,
Martin


>
> Regards,
> Willy
>


Re: Tests timeout on my ARM64 test VM

2020-03-13 Thread Илья Шипицин
once in a while I saw "reg-tests/compression/lua_validation.vtc" failed.
say 1 times out of 20.

it is slow and racy by nature.

also, seems 3 weeks of Linaro cloud are passed, 1 week is left.
I'll do more tries

ср, 11 мар. 2020 г. в 19:14, Martin Grigorov :

>
>
> On Wed, Mar 11, 2020 at 3:06 PM Илья Шипицин  wrote:
>
>> I will a look during next weekend
>>
>
> Thank you, Илья!
>
>
>>
>> BTW, I've managed to get Linaro VM :)
>>
>
> Congrats! :-)
>
>
>>
>> On Wed, Mar 11, 2020, 5:40 PM Martin Grigorov 
>> wrote:
>>
>>> Hi,
>>>
>>> On Mon, Mar 9, 2020 at 10:22 PM Martin Grigorov <
>>> martin.grigo...@gmail.com> wrote:
>>>
 Hi,

 I am not sure what have changed on my test ARM64 VM but the reg tests
 started timing out.
 Everything is fine on my dev machine (x86_64) and at Travis (
 https://travis-ci.com/haproxy/haproxy).
 I don't think it is ARM64 related. Most probably some OS setting or
 something.
 I've rebooted the system just to make sure it is not some busy port or
 opened file descriptor but
 it still fails the same way.

 Does someone see in the attached logs what could be the problem?

>>>
>>> Anyone can help me here ?
>>>
>>> Martin
>>>
>>>
 Thank you!

 Martin

>>>


Re: Tests timeout on my ARM64 test VM

2020-03-13 Thread Willy Tarreau
Hi Martin,

On Fri, Mar 13, 2020 at 12:35:12PM +0200, Martin Grigorov wrote:
> Hi ,
> 
> Suddenly today the build is again green here!
> I didn't make any changes to my testing setup.
> It must be something on the OS level but I wasn't able to figure out what
> makes the HAProxy tests timeout in the last several days.

We've had issues with the abns test on other platforms in the past,
namely s390x and ppc64le. It used to occasionally break on x86_64 as
well but far less frequently. It was affected by two bugs that were
solved yesterday after a few days of investigation and testing. We've
seen yet another failure again on ppc64 while it was expected not to
fail, so I'd be careful before claiming victory. However the abns
test is extremely time sensitive and uses short delays around 15ms to
try to trigger the issue, and in a VM it is possible to see this
happen from time to time due to noisy neighbors. That's why I'm
staying extremely prudent on the verdict. The PPC64 machine I tested
on is provided by Minicloud and is a VM running on a real CPU, so it's
much less affected by timing issues. I've run the test several hundreds
of times in a row and couldn't make it fail anymore.

So don't worry too much if it appeared and disappeared. The change
that emphasized it was the increase in default maxconn (304e17eb8),
apparently just due to a slightly longer startup time! And the ones
expected to have fixed it are between bdb00c5d and 4b3f27b included.

Note that I didn't manage to make it fail on arm64 (real machine,
SolidRun's Macchiatobin).

Hoping this clarifies the situation.

Regards,
Willy



Re: Tests timeout on my ARM64 test VM

2020-03-13 Thread Martin Grigorov
Hi Илья,

Suddenly today the build is again green here!
I didn't make any changes to my testing setup.
It must be something on the OS level but I wasn't able to figure out what
makes the HAProxy tests timeout in the last several days.

Regards,
Martin

On Wed, Mar 11, 2020 at 4:13 PM Martin Grigorov 
wrote:

>
>
> On Wed, Mar 11, 2020 at 3:06 PM Илья Шипицин  wrote:
>
>> I will a look during next weekend
>>
>
> Thank you, Илья!
>
>
>>
>> BTW, I've managed to get Linaro VM :)
>>
>
> Congrats! :-)
>
>
>>
>> On Wed, Mar 11, 2020, 5:40 PM Martin Grigorov 
>> wrote:
>>
>>> Hi,
>>>
>>> On Mon, Mar 9, 2020 at 10:22 PM Martin Grigorov <
>>> martin.grigo...@gmail.com> wrote:
>>>
 Hi,

 I am not sure what have changed on my test ARM64 VM but the reg tests
 started timing out.
 Everything is fine on my dev machine (x86_64) and at Travis (
 https://travis-ci.com/haproxy/haproxy).
 I don't think it is ARM64 related. Most probably some OS setting or
 something.
 I've rebooted the system just to make sure it is not some busy port or
 opened file descriptor but
 it still fails the same way.

 Does someone see in the attached logs what could be the problem?

>>>
>>> Anyone can help me here ?
>>>
>>> Martin
>>>
>>>
 Thank you!

 Martin

>>>


Re: Tests timeout on my ARM64 test VM

2020-03-11 Thread Martin Grigorov
On Wed, Mar 11, 2020 at 3:06 PM Илья Шипицин  wrote:

> I will a look during next weekend
>

Thank you, Илья!


>
> BTW, I've managed to get Linaro VM :)
>

Congrats! :-)


>
> On Wed, Mar 11, 2020, 5:40 PM Martin Grigorov 
> wrote:
>
>> Hi,
>>
>> On Mon, Mar 9, 2020 at 10:22 PM Martin Grigorov <
>> martin.grigo...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am not sure what have changed on my test ARM64 VM but the reg tests
>>> started timing out.
>>> Everything is fine on my dev machine (x86_64) and at Travis (
>>> https://travis-ci.com/haproxy/haproxy).
>>> I don't think it is ARM64 related. Most probably some OS setting or
>>> something.
>>> I've rebooted the system just to make sure it is not some busy port or
>>> opened file descriptor but
>>> it still fails the same way.
>>>
>>> Does someone see in the attached logs what could be the problem?
>>>
>>
>> Anyone can help me here ?
>>
>> Martin
>>
>>
>>> Thank you!
>>>
>>> Martin
>>>
>>


Re: Tests timeout on my ARM64 test VM

2020-03-11 Thread Илья Шипицин
I will a look during next weekend

BTW, I've managed to get Linaro VM :)

On Wed, Mar 11, 2020, 5:40 PM Martin Grigorov 
wrote:

> Hi,
>
> On Mon, Mar 9, 2020 at 10:22 PM Martin Grigorov 
> wrote:
>
>> Hi,
>>
>> I am not sure what have changed on my test ARM64 VM but the reg tests
>> started timing out.
>> Everything is fine on my dev machine (x86_64) and at Travis (
>> https://travis-ci.com/haproxy/haproxy).
>> I don't think it is ARM64 related. Most probably some OS setting or
>> something.
>> I've rebooted the system just to make sure it is not some busy port or
>> opened file descriptor but
>> it still fails the same way.
>>
>> Does someone see in the attached logs what could be the problem?
>>
>
> Anyone can help me here ?
>
> Martin
>
>
>> Thank you!
>>
>> Martin
>>
>


Re: Tests timeout on my ARM64 test VM

2020-03-11 Thread Martin Grigorov
Hi,

On Mon, Mar 9, 2020 at 10:22 PM Martin Grigorov 
wrote:

> Hi,
>
> I am not sure what have changed on my test ARM64 VM but the reg tests
> started timing out.
> Everything is fine on my dev machine (x86_64) and at Travis (
> https://travis-ci.com/haproxy/haproxy).
> I don't think it is ARM64 related. Most probably some OS setting or
> something.
> I've rebooted the system just to make sure it is not some busy port or
> opened file descriptor but
> it still fails the same way.
>
> Does someone see in the attached logs what could be the problem?
>

Anyone can help me here ?

Martin


> Thank you!
>
> Martin
>