Re: Raise test timeouts?

2022-07-11 Thread Berenguer Blasi
Hi All, if I am parsing this thread correctly it seems we have a number of options to attack and some are already progressing: tmp misconfig, docker misconfig, unmatched resources in different CI envs, no definition of minimal HW requiremenets, etc. But so far nothing against merging

Re: Raise test timeouts?

2022-07-07 Thread Mick Semb Wever
However, the docker space > issue needs to be resolved first since we don't have the capacity to > experiment with those nodes out of commission. > ETA on fixing the docker space issues is this/next week. Once that lands we can take a look at the abnormal CPU usage on some nodes.

Re: Raise test timeouts?

2022-07-06 Thread Josh McKenzie
> Having parity between CI systems is important, no matter how we approach it. How much does the hardware allocation (cpu, memory, disk throughput, network throughput) differ between ASF Jenkins and circle midres? How much does the container isolation differ? i.e. why are we seeing bugged tests

Re: Raise test timeouts?

2022-07-06 Thread Mick Semb Wever
> > What I mean by that specifically: if you under-provision a node with 2 > cpus, 1.5 gigs of ram, slow disks, slow networking, and noisy neighbors, > and the nodes take so long with GC pauses, compaction, streaming, etc that > they don't correctly complete certain operations in expected time, >

Re: Raise test timeouts?

2022-07-06 Thread Ekaterina Dimitrova
Just wanted to bring up that we actually started seeing a trend pre-4.0 and it keeps showing up now on the way to 4.1 - legit bugs are found more in CircleCI when they do not pop up at all in Jenkins. So my appeal is to keep checking thoroughly also CircleCI even if some failures are not visible

Re: Raise test timeouts?

2022-07-06 Thread Josh McKenzie
Bringing discussion from JIRA (CASSANDRA-17729) to here: Mick said: > Agree with the notion that Jenkins (lower resources/more contention) is > better at exposing flakies, but that there's a trade-off between encouraging > flakies and creating difficult-to-deal-with noise. I come back to the

Re: Raise test timeouts?

2022-07-06 Thread Brandon Williams
I suspect there's another problem with some of the Jenkins nodes where the system CPU usage is high and drives the load much higher than other nodes, possibly causing timeouts. However, the docker space issue needs to be resolved first since we don't have the capacity to experiment with those

Re: Raise test timeouts?

2022-07-05 Thread Josh McKenzie
Another option would be to increase the resources dedicated to each agent container and run less in parallel. Or, best yet, do both (up timeouts and lower parallelization / up resources). As far as I can tell the failures on Jenkins aren't value-add compared to what we're seeing on circleci