[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config
[ https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083494#comment-17083494 ] David Capwell commented on CASSANDRA-15686: --- Ah so was used very recently. Interesting. > Improvements in circle CI default config > > > Key: CASSANDRA-15686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15686 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Kevin Gallardo >Priority: Normal > > I have been looking at and played around with the [default CircleCI > config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], > a few comments/questions regarding the following topics: > * Python dtests do not run successfully (200-300 failures) on {{medium}} > instances, they seem to only run with small flaky failures on {{large}} > instances or higher > * Python Upgrade tests: > ** Do not seem to run without many failures on any instance types / any > parallelism setting > ** Do not seem to parallelize well, it seems each container is going to > download multiple C* versions > ** Additionally it seems the configuration is not up to date, as currently > we get errors because {{JAVA8_HOME}} is not set > * Unit tests do not seem to parallelize optimally, number of test runners do > not reflect the available CPUs on the container. Ideally if # of runners == # > of CPUs, build time is improved, on any type of instances. > ** For instance when using the current configuration, running on medium > instances, build will use 1 junit test runner, but 2 CPUs are available. If > using 2 runners, the build time is reduced from 19min (at the current main > config of parallelism=4) to 12min. > * There are some typos in the file, some dtests say "Run Unit Tests" but > they are JVM dtests (see > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077], > > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386]) > So some ways to process these would be: > * Do the Python dtests run successfully for anyone on {{medium}} instances? > If not, would it make sense to bump them to {{large}} so that they can be run > successfully? > * Does anybody ever run the python upgrade tests on CircleCI and what is the > configuration that makes it work? > * Would it make sense to either hardcode the number of test runners in the > unit tests with `-Dtest.runners` in the config file to reflect the number of > CPUs on the instances, or change the build so that it is able to detect the > appropriate number of core available automatically? > Additionally, it seems this default config file (config.yml) is not as well > maintained as the > [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml] > (+its lowres/highres) version in the same folder (from CASSANDRA-14806). > What is the reasoning for maintaining these 2 versions of the build? Could > the better maintained version be used as the default? We could generate a > lowres version of the new config-2_1.yml, and rename it {{config.yml}} so > that it gets picked up by CircleCI automatically instead of the current > default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config
[ https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083479#comment-17083479 ] Michael Semb Wever commented on CASSANDRA-15686: bq. They talk about updating Jenkins but that didn't look to happen? Multiple runners were introduced in Jenkins CI with CASSANDRA-13078. We only [recently undid it|CASSANDRA-15651] when the code went from `ant test` (and `ant test-all`) to testclasslist. The former targets depend on `get-cores` and `get-mem, but the latter do not. > Improvements in circle CI default config > > > Key: CASSANDRA-15686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15686 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Kevin Gallardo >Priority: Normal > > I have been looking at and played around with the [default CircleCI > config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], > a few comments/questions regarding the following topics: > * Python dtests do not run successfully (200-300 failures) on {{medium}} > instances, they seem to only run with small flaky failures on {{large}} > instances or higher > * Python Upgrade tests: > ** Do not seem to run without many failures on any instance types / any > parallelism setting > ** Do not seem to parallelize well, it seems each container is going to > download multiple C* versions > ** Additionally it seems the configuration is not up to date, as currently > we get errors because {{JAVA8_HOME}} is not set > * Unit tests do not seem to parallelize optimally, number of test runners do > not reflect the available CPUs on the container. Ideally if # of runners == # > of CPUs, build time is improved, on any type of instances. > ** For instance when using the current configuration, running on medium > instances, build will use 1 junit test runner, but 2 CPUs are available. If > using 2 runners, the build time is reduced from 19min (at the current main > config of parallelism=4) to 12min. > * There are some typos in the file, some dtests say "Run Unit Tests" but > they are JVM dtests (see > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077], > > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386]) > So some ways to process these would be: > * Do the Python dtests run successfully for anyone on {{medium}} instances? > If not, would it make sense to bump them to {{large}} so that they can be run > successfully? > * Does anybody ever run the python upgrade tests on CircleCI and what is the > configuration that makes it work? > * Would it make sense to either hardcode the number of test runners in the > unit tests with `-Dtest.runners` in the config file to reflect the number of > CPUs on the instances, or change the build so that it is able to detect the > appropriate number of core available automatically? > Additionally, it seems this default config file (config.yml) is not as well > maintained as the > [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml] > (+its lowres/highres) version in the same folder (from CASSANDRA-14806). > What is the reasoning for maintaining these 2 versions of the build? Could > the better maintained version be used as the default? We could generate a > lowres version of the new config-2_1.yml, and rename it {{config.yml}} so > that it gets picked up by CircleCI automatically instead of the current > default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config
[ https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083462#comment-17083462 ] David Capwell commented on CASSANDRA-15686: --- [~mck] your link is jvm dtest (org.apache.cassandra.distributed.test.BootstrapTest.bootstrapTest) which would require more work to support concurrent runners (doable). That test looks to start a cluster, tear it down, then start another one; my guess is the second one fails since the first isn't fully dead yet? This looks like a bug to me. bq. This indicates that the unit tests were runner-safe at some point in the past, and have since been broken… https://issues.apache.org/jira/browse/CASSANDRA-13078?focusedCommentId=16039394=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16039394 bq. any thoughts on how this would interact with (for example) circleci or asf jenkins? -- Jeff bq. It's fine we would just override it for CircleCI so it's 1. For ASF Jenkins it really depends on how big those boxes, but I am pretty sure we can pick a number > 1. A quad-core box can handle 4 just fine. -- Ariel Looks like Circle CI has always been 1 runner. They talk about updating Jenkins but that didn't look to happen? If CI never ran with runners > 1 and it was only locally a few times, then I wouldn't say that this is a regression as failures are not consistent so plausible to not be seen the few times this was used? bq. It would be nice to return this functionality (automatic runner calculation) to build.xml in trunk. If builds get faster, then +1 from me. bq. Are the breakages just around a few code changes that has been introduced since the get-cores and get-mem ant targets were added? for example running multiple nodes on the one ipaddress but with different ports? org.apache.cassandra.net.ConnectionTest#doTestManual doesn't use random ports so would conflict with anything trying to use the default port (not sure what does that) but doesn't look bad to change that for that test, but we would also need to know which test also did that. Are there other tests? Not sure, but I wouldn't be opposed to changing to concurrent runners and mark the failures as blockers for alpha (like we do flaky tests) > Improvements in circle CI default config > > > Key: CASSANDRA-15686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15686 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Kevin Gallardo >Priority: Normal > > I have been looking at and played around with the [default CircleCI > config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], > a few comments/questions regarding the following topics: > * Python dtests do not run successfully (200-300 failures) on {{medium}} > instances, they seem to only run with small flaky failures on {{large}} > instances or higher > * Python Upgrade tests: > ** Do not seem to run without many failures on any instance types / any > parallelism setting > ** Do not seem to parallelize well, it seems each container is going to > download multiple C* versions > ** Additionally it seems the configuration is not up to date, as currently > we get errors because {{JAVA8_HOME}} is not set > * Unit tests do not seem to parallelize optimally, number of test runners do > not reflect the available CPUs on the container. Ideally if # of runners == # > of CPUs, build time is improved, on any type of instances. > ** For instance when using the current configuration, running on medium > instances, build will use 1 junit test runner, but 2 CPUs are available. If > using 2 runners, the build time is reduced from 19min (at the current main > config of parallelism=4) to 12min. > * There are some typos in the file, some dtests say "Run Unit Tests" but > they are JVM dtests (see > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077], > > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386]) > So some ways to process these would be: > * Do the Python dtests run successfully for anyone on {{medium}} instances? > If not, would it make sense to bump them to {{large}} so that they can be run > successfully? > * Does anybody ever run the python upgrade tests on CircleCI and what is the > configuration that makes it work? > * Would it make sense to either hardcode the number of test runners in the > unit tests with `-Dtest.runners` in the config file to reflect the number of > CPUs on the instances, or change the build so that it is able to detect the > appropriate number of core available automatically? > Additionally, it seems this default config file (config.yml) is not as well > maintained as the >
[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config
[ https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083259#comment-17083259 ] Michael Semb Wever commented on CASSANDRA-15686: bq. Who introduced the get-cores and get-mem ant targets? See CASSANDRA-13078 > Improvements in circle CI default config > > > Key: CASSANDRA-15686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15686 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Kevin Gallardo >Priority: Normal > > I have been looking at and played around with the [default CircleCI > config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], > a few comments/questions regarding the following topics: > * Python dtests do not run successfully (200-300 failures) on {{medium}} > instances, they seem to only run with small flaky failures on {{large}} > instances or higher > * Python Upgrade tests: > ** Do not seem to run without many failures on any instance types / any > parallelism setting > ** Do not seem to parallelize well, it seems each container is going to > download multiple C* versions > ** Additionally it seems the configuration is not up to date, as currently > we get errors because {{JAVA8_HOME}} is not set > * Unit tests do not seem to parallelize optimally, number of test runners do > not reflect the available CPUs on the container. Ideally if # of runners == # > of CPUs, build time is improved, on any type of instances. > ** For instance when using the current configuration, running on medium > instances, build will use 1 junit test runner, but 2 CPUs are available. If > using 2 runners, the build time is reduced from 19min (at the current main > config of parallelism=4) to 12min. > * There are some typos in the file, some dtests say "Run Unit Tests" but > they are JVM dtests (see > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077], > > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386]) > So some ways to process these would be: > * Do the Python dtests run successfully for anyone on {{medium}} instances? > If not, would it make sense to bump them to {{large}} so that they can be run > successfully? > * Does anybody ever run the python upgrade tests on CircleCI and what is the > configuration that makes it work? > * Would it make sense to either hardcode the number of test runners in the > unit tests with `-Dtest.runners` in the config file to reflect the number of > CPUs on the instances, or change the build so that it is able to detect the > appropriate number of core available automatically? > Additionally, it seems this default config file (config.yml) is not as well > maintained as the > [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml] > (+its lowres/highres) version in the same folder (from CASSANDRA-14806). > What is the reasoning for maintaining these 2 versions of the build? Could > the better maintained version be used as the default? We could generate a > lowres version of the new config-2_1.yml, and rename it {{config.yml}} so > that it gets picked up by CircleCI automatically instead of the current > default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config
[ https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083255#comment-17083255 ] Michael Semb Wever commented on CASSANDRA-15686: Who introduced the {{get-cores}} and {{get-mem}} ant targets? This indicates that the unit tests were runner-safe at some point in the past, and have since been broken… It would be nice to return this functionality (automatic runner calculation) to build.xml in trunk. Are the breakages just around a few code changes that has been introduced since the {{get-cores}} and {{get-mem}} ant targets were added? for example running multiple nodes on the one ipaddress but with different ports? > Improvements in circle CI default config > > > Key: CASSANDRA-15686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15686 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Kevin Gallardo >Priority: Normal > > I have been looking at and played around with the [default CircleCI > config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], > a few comments/questions regarding the following topics: > * Python dtests do not run successfully (200-300 failures) on {{medium}} > instances, they seem to only run with small flaky failures on {{large}} > instances or higher > * Python Upgrade tests: > ** Do not seem to run without many failures on any instance types / any > parallelism setting > ** Do not seem to parallelize well, it seems each container is going to > download multiple C* versions > ** Additionally it seems the configuration is not up to date, as currently > we get errors because {{JAVA8_HOME}} is not set > * Unit tests do not seem to parallelize optimally, number of test runners do > not reflect the available CPUs on the container. Ideally if # of runners == # > of CPUs, build time is improved, on any type of instances. > ** For instance when using the current configuration, running on medium > instances, build will use 1 junit test runner, but 2 CPUs are available. If > using 2 runners, the build time is reduced from 19min (at the current main > config of parallelism=4) to 12min. > * There are some typos in the file, some dtests say "Run Unit Tests" but > they are JVM dtests (see > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077], > > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386]) > So some ways to process these would be: > * Do the Python dtests run successfully for anyone on {{medium}} instances? > If not, would it make sense to bump them to {{large}} so that they can be run > successfully? > * Does anybody ever run the python upgrade tests on CircleCI and what is the > configuration that makes it work? > * Would it make sense to either hardcode the number of test runners in the > unit tests with `-Dtest.runners` in the config file to reflect the number of > CPUs on the instances, or change the build so that it is able to detect the > appropriate number of core available automatically? > Additionally, it seems this default config file (config.yml) is not as well > maintained as the > [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml] > (+its lowres/highres) version in the same folder (from CASSANDRA-14806). > What is the reasoning for maintaining these 2 versions of the build? Could > the better maintained version be used as the default? We could generate a > lowres version of the new config-2_1.yml, and rename it {{config.yml}} so > that it gets picked up by CircleCI automatically instead of the current > default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config
[ https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080641#comment-17080641 ] David Capwell commented on CASSANDRA-15686: --- bq. Do you think these could be fixed, or put in another bucket to run in another non-parallel suite? They could be fixed, but we would need to find all cases first. I don't assume its a small number but a lot of subtle cases (actual unit tests are safe, but there are a lot which touch disk/network and those are the main ones worry me). bq. You can see the runtime goes from 19min to 12. I ran with 8 cores with 8 runners. I ran twice and in both cases saw a latency regression for the builds, but also saw some cases where the JVM hangs because of memory issues. its possible that a lower value would be better, but didn't bother testing further. I "feel" that it really depends on what tests are in the same container, but this is just a guess. > Improvements in circle CI default config > > > Key: CASSANDRA-15686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15686 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Kevin Gallardo >Priority: Normal > > I have been looking at and played around with the [default CircleCI > config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], > a few comments/questions regarding the following topics: > * Python dtests do not run successfully (200-300 failures) on {{medium}} > instances, they seem to only run with small flaky failures on {{large}} > instances or higher > * Python Upgrade tests: > ** Do not seem to run without many failures on any instance types / any > parallelism setting > ** Do not seem to parallelize well, it seems each container is going to > download multiple C* versions > ** Additionally it seems the configuration is not up to date, as currently > we get errors because {{JAVA8_HOME}} is not set > * Unit tests do not seem to parallelize optimally, number of test runners do > not reflect the available CPUs on the container. Ideally if # of runners == # > of CPUs, build time is improved, on any type of instances. > ** For instance when using the current configuration, running on medium > instances, build will use 1 junit test runner, but 2 CPUs are available. If > using 2 runners, the build time is reduced from 19min (at the current main > config of parallelism=4) to 12min. > * There are some typos in the file, some dtests say "Run Unit Tests" but > they are JVM dtests (see > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077], > > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386]) > So some ways to process these would be: > * Do the Python dtests run successfully for anyone on {{medium}} instances? > If not, would it make sense to bump them to {{large}} so that they can be run > successfully? > * Does anybody ever run the python upgrade tests on CircleCI and what is the > configuration that makes it work? > * Would it make sense to either hardcode the number of test runners in the > unit tests with `-Dtest.runners` in the config file to reflect the number of > CPUs on the instances, or change the build so that it is able to detect the > appropriate number of core available automatically? > Additionally, it seems this default config file (config.yml) is not as well > maintained as the > [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml] > (+its lowres/highres) version in the same folder (from CASSANDRA-14806). > What is the reasoning for maintaining these 2 versions of the build? Could > the better maintained version be used as the default? We could generate a > lowres version of the new config-2_1.yml, and rename it {{config.yml}} so > that it gets picked up by CircleCI automatically instead of the current > default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config
[ https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080546#comment-17080546 ] Kevin Gallardo commented on CASSANDRA-15686: {quote}and saw more failures than normal (normally ~1-3 failures, had 9; most known flaky tests) {quote} Hm that's a bummer, I think I saw similar failures when testing out too, but did not investigate more and thought they were just flaky. Do you think these could be fixed, or put in another bucket to run in another non-parallel suite? For reference, this is the tests I had run a little while ago (about 2 weeks) for the *unit* tests only, on parallelizing the tests on current LOWRES config: - [Run of p=4 and i=medium with no change to the current build (which means it uses 1 runner by default)|https://app.circleci.com/pipelines/github/newkek/cassandra/4/workflows/16897387-cbf0-49e8-82bc-327d75885256] - [Same hardware, but hardcoding {{-Dtest.runners=2}}|https://app.circleci.com/pipelines/github/newkek/cassandra/5/workflows/56b341b5-8f74-414e-852e-52cba8acd28a] You can see the runtime goes from 19min to 12. I am looking at old experiments I ran, and it seems I was able to run with 8 runners on unit tests, and all that was reported seem like failures from flaky tests: [run|https://app.circleci.com/pipelines/github/newkek/cassandra/22/workflows/68bf0582-2b43-4b71-a610-e1e76c6ff418/jobs/90] and [corresponding config|https://github.com/newkek/cassandra/blob/00103c51d3c6b1fd0c534b649e146bbe45bcdf11/.circleci/config.yml#L975] > Improvements in circle CI default config > > > Key: CASSANDRA-15686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15686 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Kevin Gallardo >Priority: Normal > > I have been looking at and played around with the [default CircleCI > config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], > a few comments/questions regarding the following topics: > * Python dtests do not run successfully (200-300 failures) on {{medium}} > instances, they seem to only run with small flaky failures on {{large}} > instances or higher > * Python Upgrade tests: > ** Do not seem to run without many failures on any instance types / any > parallelism setting > ** Do not seem to parallelize well, it seems each container is going to > download multiple C* versions > ** Additionally it seems the configuration is not up to date, as currently > we get errors because {{JAVA8_HOME}} is not set > * Unit tests do not seem to parallelize optimally, number of test runners do > not reflect the available CPUs on the container. Ideally if # of runners == # > of CPUs, build time is improved, on any type of instances. > ** For instance when using the current configuration, running on medium > instances, build will use 1 junit test runner, but 2 CPUs are available. If > using 2 runners, the build time is reduced from 19min (at the current main > config of parallelism=4) to 12min. > * There are some typos in the file, some dtests say "Run Unit Tests" but > they are JVM dtests (see > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077], > > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386]) > So some ways to process these would be: > * Do the Python dtests run successfully for anyone on {{medium}} instances? > If not, would it make sense to bump them to {{large}} so that they can be run > successfully? > * Does anybody ever run the python upgrade tests on CircleCI and what is the > configuration that makes it work? > * Would it make sense to either hardcode the number of test runners in the > unit tests with `-Dtest.runners` in the config file to reflect the number of > CPUs on the instances, or change the build so that it is able to detect the > appropriate number of core available automatically? > Additionally, it seems this default config file (config.yml) is not as well > maintained as the > [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml] > (+its lowres/highres) version in the same folder (from CASSANDRA-14806). > What is the reasoning for maintaining these 2 versions of the build? Could > the better maintained version be used as the default? We could generate a > lowres version of the new config-2_1.yml, and rename it {{config.yml}} so > that it gets picked up by CircleCI automatically instead of the current > default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config
[ https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078863#comment-17078863 ] David Capwell commented on CASSANDRA-15686: --- Here is the script I used to get the container CPUs {code} $ cat ci/get_cpu_count #!/usr/bin/env bash #set -o xtrace set -o errexit set -o pipefail set -o nounset if [ -d /sys/fs/cgroup/cpu ]; then quota=$(cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us) period=$(cat /sys/fs/cgroup/cpu/cpu.cfs_period_us) shares=$(cat /sys/fs/cgroup/cpu/cpu.shares) if [ $quota -gt -1 ] && [ $period -gt 0 ]; then echo $(( $quota / $period )) exit elif [ $shares -gt -1 ] && [ $shares -ne 1024 ]; then # in docker shares default to 1024 so double check for that default awk " function ceil(x) { y=int(x); return ( x>y ? y+1 : y ) } BEGIN { print ceil($shares / 1024.0) } " exit fi fi nproc {code} > Improvements in circle CI default config > > > Key: CASSANDRA-15686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15686 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Kevin Gallardo >Priority: Normal > > I have been looking at and played around with the [default CircleCI > config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], > a few comments/questions regarding the following topics: > * Python dtests do not run successfully (200-300 failures) on {{medium}} > instances, they seem to only run with small flaky failures on {{large}} > instances or higher > * Python Upgrade tests: > ** Do not seem to run without many failures on any instance types / any > parallelism setting > ** Do not seem to parallelize well, it seems each container is going to > download multiple C* versions > ** Additionally it seems the configuration is not up to date, as currently > we get errors because {{JAVA8_HOME}} is not set > * Unit tests do not seem to parallelize optimally, number of test runners do > not reflect the available CPUs on the container. Ideally if # of runners == # > of CPUs, build time is improved, on any type of instances. > ** For instance when using the current configuration, running on medium > instances, build will use 1 junit test runner, but 2 CPUs are available. If > using 2 runners, the build time is reduced from 19min (at the current main > config of parallelism=4) to 12min. > * There are some typos in the file, some dtests say "Run Unit Tests" but > they are JVM dtests (see > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077], > > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386]) > So some ways to process these would be: > * Do the Python dtests run successfully for anyone on {{medium}} instances? > If not, would it make sense to bump them to {{large}} so that they can be run > successfully? > * Does anybody ever run the python upgrade tests on CircleCI and what is the > configuration that makes it work? > * Would it make sense to either hardcode the number of test runners in the > unit tests with `-Dtest.runners` in the config file to reflect the number of > CPUs on the instances, or change the build so that it is able to detect the > appropriate number of core available automatically? > Additionally, it seems this default config file (config.yml) is not as well > maintained as the > [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml] > (+its lowres/highres) version in the same folder (from CASSANDRA-14806). > What is the reasoning for maintaining these 2 versions of the build? Could > the better maintained version be used as the default? We could generate a > lowres version of the new config-2_1.yml, and rename it {{config.yml}} so > that it gets picked up by CircleCI automatically instead of the current > default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config
[ https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078840#comment-17078840 ] David Capwell commented on CASSANDRA-15686: --- [~mck] [~newkek] doesn't look like concurrent runners are safe =( {code} org.apache.cassandra.exceptions.ConfigurationException: 127.0.0.1:7014 is in use by another process. Change listen_address:storage_port in cassandra.yaml to values that do not conflict with other services at org.apache.cassandra.net.InboundConnectionInitiator.bind(InboundConnectionInitiator.java:159) at org.apache.cassandra.net.InboundConnectionInitiator.bind(InboundConnectionInitiator.java:181) at org.apache.cassandra.net.InboundSockets$InboundSocket.open(InboundSockets.java:95) at org.apache.cassandra.net.InboundSockets$InboundSocket.open(InboundSockets.java:82) at org.apache.cassandra.net.InboundSockets.open(InboundSockets.java:209) at org.apache.cassandra.net.ConnectionTest.lambda$doTest$8(ConnectionTest.java:241) at org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:262) at org.apache.cassandra.net.ConnectionTest.doTest(ConnectionTest.java:240) at org.apache.cassandra.net.ConnectionTest.test(ConnectionTest.java:229) at org.apache.cassandra.net.ConnectionTest.testCRCCorruption(ConnectionTest.java:725) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) {code} I ran with 8 cores and 16gb memory (matches HIGHER in circle ci) and saw more failures than normal (normally ~1-3 failures, had 9; most known flaky tests). At least for me, running with 8 runners didn't seem to improve job latency; felt the same (though only ran once so could have been a outlier). > Improvements in circle CI default config > > > Key: CASSANDRA-15686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15686 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Kevin Gallardo >Priority: Normal > > I have been looking at and played around with the [default CircleCI > config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], > a few comments/questions regarding the following topics: > * Python dtests do not run successfully (200-300 failures) on {{medium}} > instances, they seem to only run with small flaky failures on {{large}} > instances or higher > * Python Upgrade tests: > ** Do not seem to run without many failures on any instance types / any > parallelism setting > ** Do not seem to parallelize well, it seems each container is going to > download multiple C* versions > ** Additionally it seems the configuration is not up to date, as currently > we get errors because {{JAVA8_HOME}} is not set > * Unit tests do not seem to parallelize optimally, number of test runners do > not reflect the available CPUs on the container. Ideally if # of runners == # > of CPUs, build time is improved, on any type of instances. > ** For instance when using the current configuration, running on medium > instances, build will use 1 junit test runner, but 2 CPUs are available. If > using 2 runners, the build time is reduced from 19min (at the current main > config of parallelism=4) to 12min. > * There are some typos in the file, some dtests say "Run Unit Tests" but > they are JVM dtests (see > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077], > > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386]) > So some ways to process these would be: > * Do the Python dtests run successfully for anyone on {{medium}} instances? > If not, would it make sense to bump them to {{large}} so that they can be run > successfully? > * Does anybody ever run the python upgrade tests on CircleCI and what is the > configuration that makes it work? > * Would it make sense to either hardcode the number of test runners in the > unit tests with `-Dtest.runners` in the config file to reflect the number of > CPUs on the instances, or change the build so that it is able to detect the > appropriate number of core available automatically? > Additionally, it seems this default config file (config.yml) is not as well > maintained as the > [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml] > (+its lowres/highres) version in the same folder (from CASSANDRA-14806). > What is the reasoning for maintaining these 2 versions of the build? Could > the better maintained version be used as the
[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config
[ https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078822#comment-17078822 ] David Capwell commented on CASSANDRA-15686: --- FYI I tested setting runner based off container cpu (4cpu, 8gb memory) limits and got the following {code} [junit-timeout] [36.344s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached. {code} > Improvements in circle CI default config > > > Key: CASSANDRA-15686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15686 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Kevin Gallardo >Priority: Normal > > I have been looking at and played around with the [default CircleCI > config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], > a few comments/questions regarding the following topics: > * Python dtests do not run successfully (200-300 failures) on {{medium}} > instances, they seem to only run with small flaky failures on {{large}} > instances or higher > * Python Upgrade tests: > ** Do not seem to run without many failures on any instance types / any > parallelism setting > ** Do not seem to parallelize well, it seems each container is going to > download multiple C* versions > ** Additionally it seems the configuration is not up to date, as currently > we get errors because {{JAVA8_HOME}} is not set > * Unit tests do not seem to parallelize optimally, number of test runners do > not reflect the available CPUs on the container. Ideally if # of runners == # > of CPUs, build time is improved, on any type of instances. > ** For instance when using the current configuration, running on medium > instances, build will use 1 junit test runner, but 2 CPUs are available. If > using 2 runners, the build time is reduced from 19min (at the current main > config of parallelism=4) to 12min. > * There are some typos in the file, some dtests say "Run Unit Tests" but > they are JVM dtests (see > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077], > > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386]) > So some ways to process these would be: > * Do the Python dtests run successfully for anyone on {{medium}} instances? > If not, would it make sense to bump them to {{large}} so that they can be run > successfully? > * Does anybody ever run the python upgrade tests on CircleCI and what is the > configuration that makes it work? > * Would it make sense to either hardcode the number of test runners in the > unit tests with `-Dtest.runners` in the config file to reflect the number of > CPUs on the instances, or change the build so that it is able to detect the > appropriate number of core available automatically? > Additionally, it seems this default config file (config.yml) is not as well > maintained as the > [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml] > (+its lowres/highres) version in the same folder (from CASSANDRA-14806). > What is the reasoning for maintaining these 2 versions of the build? Could > the better maintained version be used as the default? We could generate a > lowres version of the new config-2_1.yml, and rename it {{config.yml}} so > that it gets picked up by CircleCI automatically instead of the current > default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config
[ https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078419#comment-17078419 ] Stefan Podkowinski commented on CASSANDRA-15686: "there is no guidance or document saying they are not reliable on low config" Can we update the README mentioned above on that? > Improvements in circle CI default config > > > Key: CASSANDRA-15686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15686 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Kevin Gallardo >Priority: Normal > > I have been looking at and played around with the [default CircleCI > config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], > a few comments/questions regarding the following topics: > * Python dtests do not run successfully (200-300 failures) on {{medium}} > instances, they seem to only run with small flaky failures on {{large}} > instances or higher > * Python Upgrade tests: > ** Do not seem to run without many failures on any instance types / any > parallelism setting > ** Do not seem to parallelize well, it seems each container is going to > download multiple C* versions > ** Additionally it seems the configuration is not up to date, as currently > we get errors because {{JAVA8_HOME}} is not set > * Unit tests do not seem to parallelize optimally, number of test runners do > not reflect the available CPUs on the container. Ideally if # of runners == # > of CPUs, build time is improved, on any type of instances. > ** For instance when using the current configuration, running on medium > instances, build will use 1 junit test runner, but 2 CPUs are available. If > using 2 runners, the build time is reduced from 19min (at the current main > config of parallelism=4) to 12min. > * There are some typos in the file, some dtests say "Run Unit Tests" but > they are JVM dtests (see > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077], > > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386]) > So some ways to process these would be: > * Do the Python dtests run successfully for anyone on {{medium}} instances? > If not, would it make sense to bump them to {{large}} so that they can be run > successfully? > * Does anybody ever run the python upgrade tests on CircleCI and what is the > configuration that makes it work? > * Would it make sense to either hardcode the number of test runners in the > unit tests with `-Dtest.runners` in the config file to reflect the number of > CPUs on the instances, or change the build so that it is able to detect the > appropriate number of core available automatically? > Additionally, it seems this default config file (config.yml) is not as well > maintained as the > [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml] > (+its lowres/highres) version in the same folder (from CASSANDRA-14806). > What is the reasoning for maintaining these 2 versions of the build? Could > the better maintained version be used as the default? We could generate a > lowres version of the new config-2_1.yml, and rename it {{config.yml}} so > that it gets picked up by CircleCI automatically instead of the current > default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config
[ https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078414#comment-17078414 ] Stefan Podkowinski commented on CASSANDRA-15686: I understand that it can be confusing to have tests available to run that have literally no chance to complete successfully using the low-res settings. But if you look at the [README|https://github.com/apache/cassandra/tree/trunk/.circleci] you'll see that we use the same file `config-2_1.yml` and just patch it for high-res settings. The idea was not having to maintain two versions of circle config files. > Improvements in circle CI default config > > > Key: CASSANDRA-15686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15686 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Kevin Gallardo >Priority: Normal > > I have been looking at and played around with the [default CircleCI > config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], > a few comments/questions regarding the following topics: > * Python dtests do not run successfully (200-300 failures) on {{medium}} > instances, they seem to only run with small flaky failures on {{large}} > instances or higher > * Python Upgrade tests: > ** Do not seem to run without many failures on any instance types / any > parallelism setting > ** Do not seem to parallelize well, it seems each container is going to > download multiple C* versions > ** Additionally it seems the configuration is not up to date, as currently > we get errors because {{JAVA8_HOME}} is not set > * Unit tests do not seem to parallelize optimally, number of test runners do > not reflect the available CPUs on the container. Ideally if # of runners == # > of CPUs, build time is improved, on any type of instances. > ** For instance when using the current configuration, running on medium > instances, build will use 1 junit test runner, but 2 CPUs are available. If > using 2 runners, the build time is reduced from 19min (at the current main > config of parallelism=4) to 12min. > * There are some typos in the file, some dtests say "Run Unit Tests" but > they are JVM dtests (see > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077], > > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386]) > So some ways to process these would be: > * Do the Python dtests run successfully for anyone on {{medium}} instances? > If not, would it make sense to bump them to {{large}} so that they can be run > successfully? > * Does anybody ever run the python upgrade tests on CircleCI and what is the > configuration that makes it work? > * Would it make sense to either hardcode the number of test runners in the > unit tests with `-Dtest.runners` in the config file to reflect the number of > CPUs on the instances, or change the build so that it is able to detect the > appropriate number of core available automatically? > Additionally, it seems this default config file (config.yml) is not as well > maintained as the > [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml] > (+its lowres/highres) version in the same folder (from CASSANDRA-14806). > What is the reasoning for maintaining these 2 versions of the build? Could > the better maintained version be used as the default? We could generate a > lowres version of the new config-2_1.yml, and rename it {{config.yml}} so > that it gets picked up by CircleCI automatically instead of the current > default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config
[ https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078372#comment-17078372 ] Kevin Gallardo commented on CASSANDRA-15686: bq. Unit tests should be able to complete on Circle CI with medium instances. The unit tests are run automatically, so that's the default. JVM dtests also run by default FWIW, they are able to complete on medium instances, as opposed to the Python dtests. bq. Pretty much all others won't due to limited resources. You shouldn't try running them, but if you do, then we can't guarantee they may complete unless high resource settings are used. Right, that was my understanding initially. Since those tests are available to run on the lowres config though, and there is no guidance or document saying they are not reliable on low config, and people get confused as to why the Python dtests for example run with 200-300 failures when using the default config. I was suggesting here to either bump up the default config, so that they can be run successfully, or remove them from the config since they can't be run (and only keep them in the HIGHRES file?), or maybe document this better. I understood David recommended fixing the tests to make them run on medium instances too, that sounds reasonable too IMO if it makes things less confusing in the long run. Thanks for the input > Improvements in circle CI default config > > > Key: CASSANDRA-15686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15686 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Kevin Gallardo >Priority: Normal > > I have been looking at and played around with the [default CircleCI > config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], > a few comments/questions regarding the following topics: > * Python dtests do not run successfully (200-300 failures) on {{medium}} > instances, they seem to only run with small flaky failures on {{large}} > instances or higher > * Python Upgrade tests: > ** Do not seem to run without many failures on any instance types / any > parallelism setting > ** Do not seem to parallelize well, it seems each container is going to > download multiple C* versions > ** Additionally it seems the configuration is not up to date, as currently > we get errors because {{JAVA8_HOME}} is not set > * Unit tests do not seem to parallelize optimally, number of test runners do > not reflect the available CPUs on the container. Ideally if # of runners == # > of CPUs, build time is improved, on any type of instances. > ** For instance when using the current configuration, running on medium > instances, build will use 1 junit test runner, but 2 CPUs are available. If > using 2 runners, the build time is reduced from 19min (at the current main > config of parallelism=4) to 12min. > * There are some typos in the file, some dtests say "Run Unit Tests" but > they are JVM dtests (see > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077], > > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386]) > So some ways to process these would be: > * Do the Python dtests run successfully for anyone on {{medium}} instances? > If not, would it make sense to bump them to {{large}} so that they can be run > successfully? > * Does anybody ever run the python upgrade tests on CircleCI and what is the > configuration that makes it work? > * Would it make sense to either hardcode the number of test runners in the > unit tests with `-Dtest.runners` in the config file to reflect the number of > CPUs on the instances, or change the build so that it is able to detect the > appropriate number of core available automatically? > Additionally, it seems this default config file (config.yml) is not as well > maintained as the > [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml] > (+its lowres/highres) version in the same folder (from CASSANDRA-14806). > What is the reasoning for maintaining these 2 versions of the build? Could > the better maintained version be used as the default? We could generate a > lowres version of the new config-2_1.yml, and rename it {{config.yml}} so > that it gets picked up by CircleCI automatically instead of the current > default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config
[ https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078223#comment-17078223 ] Stefan Podkowinski commented on CASSANDRA-15686: Unit tests should be able to complete on Circle CI with medium instances. The unit tests are run automatically, so that's the default. Pretty much all others won't due to limited resources. You shouldn't try running them, but if you do, then we can guarantee they may complete unless high resource settings are used. > Improvements in circle CI default config > > > Key: CASSANDRA-15686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15686 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Kevin Gallardo >Priority: Normal > > I have been looking at and played around with the [default CircleCI > config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], > a few comments/questions regarding the following topics: > * Python dtests do not run successfully (200-300 failures) on {{medium}} > instances, they seem to only run with small flaky failures on {{large}} > instances or higher > * Python Upgrade tests: > ** Do not seem to run without many failures on any instance types / any > parallelism setting > ** Do not seem to parallelize well, it seems each container is going to > download multiple C* versions > ** Additionally it seems the configuration is not up to date, as currently > we get errors because {{JAVA8_HOME}} is not set > * Unit tests do not seem to parallelize optimally, number of test runners do > not reflect the available CPUs on the container. Ideally if # of runners == # > of CPUs, build time is improved, on any type of instances. > ** For instance when using the current configuration, running on medium > instances, build will use 1 junit test runner, but 2 CPUs are available. If > using 2 runners, the build time is reduced from 19min (at the current main > config of parallelism=4) to 12min. > * There are some typos in the file, some dtests say "Run Unit Tests" but > they are JVM dtests (see > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077], > > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386]) > So some ways to process these would be: > * Do the Python dtests run successfully for anyone on {{medium}} instances? > If not, would it make sense to bump them to {{large}} so that they can be run > successfully? > * Does anybody ever run the python upgrade tests on CircleCI and what is the > configuration that makes it work? > * Would it make sense to either hardcode the number of test runners in the > unit tests with `-Dtest.runners` in the config file to reflect the number of > CPUs on the instances, or change the build so that it is able to detect the > appropriate number of core available automatically? > Additionally, it seems this default config file (config.yml) is not as well > maintained as the > [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml] > (+its lowres/highres) version in the same folder (from CASSANDRA-14806). > What is the reasoning for maintaining these 2 versions of the build? Could > the better maintained version be used as the default? We could generate a > lowres version of the new config-2_1.yml, and rename it {{config.yml}} so > that it gets picked up by CircleCI automatically instead of the current > default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config
[ https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077751#comment-17077751 ] David Capwell commented on CASSANDRA-15686: --- [~spod] can you also take a look at this JIRA? bq. Confirm the build is running as normal by using the config-2_1.yml directly renamed as config.yml, see build #42 here: https://app.circleci.com/pipelines/github/newkek/cassandra?branch=chg cool. That basically means we don't really need to generate LOWER as we could just use that. We may still want the script to do that just to validate the YAML is "correct", but wouldn't need to commit that. bq. the dtests don't build successfully either on medium I see this as a bug honestly. There are some tests which expect high resources but we don't run those in Circle CI (see [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml#L418], we explicitly disable the high resource tests with --skip-resource-intensive-tests). I personally feel that these should be fixed as part of the flaky test cleanup work many people are working on; everyone should be able to run the regression suite, if that isn't true it should be see as a bug IMO. bq. it would allow people with large instances access to build without having to switch their config. Does that make sense? IMO higher should only be faster, not the only correct config, so I personally feel that the default should reflect what new members are able to do and people who have special accounts have an extra step of switching to higher. bq. Hm afaict running with multiple runners is fine for the unit tests I would need to look closer. We spin up storage at least which I thought was shared (directory layout), so the system tables could conflict and cause race condition problems (even if every create table has no conflict). If I am wrong (happens a lot), then faster builds are awesome! =) > Improvements in circle CI default config > > > Key: CASSANDRA-15686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15686 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Kevin Gallardo >Priority: Normal > > I have been looking at and played around with the [default CircleCI > config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], > a few comments/questions regarding the following topics: > * Python dtests do not run successfully (200-300 failures) on {{medium}} > instances, they seem to only run with small flaky failures on {{large}} > instances or higher > * Python Upgrade tests: > ** Do not seem to run without many failures on any instance types / any > parallelism setting > ** Do not seem to parallelize well, it seems each container is going to > download multiple C* versions > ** Additionally it seems the configuration is not up to date, as currently > we get errors because {{JAVA8_HOME}} is not set > * Unit tests do not seem to parallelize optimally, number of test runners do > not reflect the available CPUs on the container. Ideally if # of runners == # > of CPUs, build time is improved, on any type of instances. > ** For instance when using the current configuration, running on medium > instances, build will use 1 junit test runner, but 2 CPUs are available. If > using 2 runners, the build time is reduced from 19min (at the current main > config of parallelism=4) to 12min. > * There are some typos in the file, some dtests say "Run Unit Tests" but > they are JVM dtests (see > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077], > > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386]) > So some ways to process these would be: > * Do the Python dtests run successfully for anyone on {{medium}} instances? > If not, would it make sense to bump them to {{large}} so that they can be run > successfully? > * Does anybody ever run the python upgrade tests on CircleCI and what is the > configuration that makes it work? > * Would it make sense to either hardcode the number of test runners in the > unit tests with `-Dtest.runners` in the config file to reflect the number of > CPUs on the instances, or change the build so that it is able to detect the > appropriate number of core available automatically? > Additionally, it seems this default config file (config.yml) is not as well > maintained as the > [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml] > (+its lowres/highres) version in the same folder (from CASSANDRA-14806). > What is the reasoning for maintaining these 2 versions of the build? Could > the better maintained version be used as the default? We could generate a > lowres version of the new config-2_1.yml, and rename it
[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config
[ https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077529#comment-17077529 ] Kevin Gallardo commented on CASSANDRA-15686: bq. I was told that if you have a build that requests > medium on a free tier, CircleCI would downgrade to medium automatically So that was incorrect. Instead you get an error saying: _This job has been blocked because resource-class:xlarge is not available on your plan. Please upgrade to continue building._ So it doesn't build, but as mentioned earlier, the dtests don't build successfully either on medium, so I would think it wouldn't make much difference for free tier users to bump to large or more for the dtests, and it would allow people with {{large}} instances access to build without having to switch their config. Does that make sense? > Improvements in circle CI default config > > > Key: CASSANDRA-15686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15686 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Kevin Gallardo >Priority: Normal > > I have been looking at and played around with the [default CircleCI > config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], > a few comments/questions regarding the following topics: > * Python dtests do not run successfully (200-300 failures) on {{medium}} > instances, they seem to only run with small flaky failures on {{large}} > instances or higher > * Python Upgrade tests: > ** Do not seem to run without many failures on any instance types / any > parallelism setting > ** Do not seem to parallelize well, it seems each container is going to > download multiple C* versions > ** Additionally it seems the configuration is not up to date, as currently > we get errors because {{JAVA8_HOME}} is not set > * Unit tests do not seem to parallelize optimally, number of test runners do > not reflect the available CPUs on the container. Ideally if # of runners == # > of CPUs, build time is improved, on any type of instances. > ** For instance when using the current configuration, running on medium > instances, build will use 1 junit test runner, but 2 CPUs are available. If > using 2 runners, the build time is reduced from 19min (at the current main > config of parallelism=4) to 12min. > * There are some typos in the file, some dtests say "Run Unit Tests" but > they are JVM dtests (see > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077], > > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386]) > So some ways to process these would be: > * Do the Python dtests run successfully for anyone on {{medium}} instances? > If not, would it make sense to bump them to {{large}} so that they can be run > successfully? > * Does anybody ever run the python upgrade tests on CircleCI and what is the > configuration that makes it work? > * Would it make sense to either hardcode the number of test runners in the > unit tests with `-Dtest.runners` in the config file to reflect the number of > CPUs on the instances, or change the build so that it is able to detect the > appropriate number of core available automatically? > Additionally, it seems this default config file (config.yml) is not as well > maintained as the > [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml] > (+its lowres/highres) version in the same folder (from CASSANDRA-14806). > What is the reasoning for maintaining these 2 versions of the build? Could > the better maintained version be used as the default? We could generate a > lowres version of the new config-2_1.yml, and rename it {{config.yml}} so > that it gets picked up by CircleCI automatically instead of the current > default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config
[ https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076590#comment-17076590 ] Kevin Gallardo commented on CASSANDRA-15686: bq. I am only speculating but I thought there were tests that spin up network and there are many tests which share disk, and I don't know if we isolate the paths or not. If this is not true then it should be a simple change to bump the number of runners for unit tests (definitely not jvm dtests). Hm afaict running with multiple runners is fine for the unit tests, I have tested on different configurations and it didn't seem to cause problems as long as # runners <= # CPUs > Improvements in circle CI default config > > > Key: CASSANDRA-15686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15686 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Kevin Gallardo >Priority: Normal > > I have been looking at and played around with the [default CircleCI > config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], > a few comments/questions regarding the following topics: > * Python dtests do not run successfully (200-300 failures) on {{medium}} > instances, they seem to only run with small flaky failures on {{large}} > instances or higher > * Python Upgrade tests: > ** Do not seem to run without many failures on any instance types / any > parallelism setting > ** Do not seem to parallelize well, it seems each container is going to > download multiple C* versions > ** Additionally it seems the configuration is not up to date, as currently > we get errors because {{JAVA8_HOME}} is not set > * Unit tests do not seem to parallelize optimally, number of test runners do > not reflect the available CPUs on the container. Ideally if # of runners == # > of CPUs, build time is improved, on any type of instances. > ** For instance when using the current configuration, running on medium > instances, build will use 1 junit test runner, but 2 CPUs are available. If > using 2 runners, the build time is reduced from 19min (at the current main > config of parallelism=4) to 12min. > * There are some typos in the file, some dtests say "Run Unit Tests" but > they are JVM dtests (see > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077], > > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386]) > So some ways to process these would be: > * Do the Python dtests run successfully for anyone on {{medium}} instances? > If not, would it make sense to bump them to {{large}} so that they can be run > successfully? > * Does anybody ever run the python upgrade tests on CircleCI and what is the > configuration that makes it work? > * Would it make sense to either hardcode the number of test runners in the > unit tests with `-Dtest.runners` in the config file to reflect the number of > CPUs on the instances, or change the build so that it is able to detect the > appropriate number of core available automatically? > Additionally, it seems this default config file (config.yml) is not as well > maintained as the > [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml] > (+its lowres/highres) version in the same folder (from CASSANDRA-14806). > What is the reasoning for maintaining these 2 versions of the build? Could > the better maintained version be used as the default? We could generate a > lowres version of the new config-2_1.yml, and rename it {{config.yml}} so > that it gets picked up by CircleCI automatically instead of the current > default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config
[ https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076459#comment-17076459 ] Kevin Gallardo commented on CASSANDRA-15686: Confirm the build is running as normal by using the {{config-2_1.yml}} directly renamed as {{config.yml}}, see build #42 here: https://app.circleci.com/pipelines/github/newkek/cassandra?branch=chg > Improvements in circle CI default config > > > Key: CASSANDRA-15686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15686 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Kevin Gallardo >Priority: Normal > > I have been looking at and played around with the [default CircleCI > config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], > a few comments/questions regarding the following topics: > * Python dtests do not run successfully (200-300 failures) on {{medium}} > instances, they seem to only run with small flaky failures on {{large}} > instances or higher > * Python Upgrade tests: > ** Do not seem to run without many failures on any instance types / any > parallelism setting > ** Do not seem to parallelize well, it seems each container is going to > download multiple C* versions > ** Additionally it seems the configuration is not up to date, as currently > we get errors because {{JAVA8_HOME}} is not set > * Unit tests do not seem to parallelize optimally, number of test runners do > not reflect the available CPUs on the container. Ideally if # of runners == # > of CPUs, build time is improved, on any type of instances. > ** For instance when using the current configuration, running on medium > instances, build will use 1 junit test runner, but 2 CPUs are available. If > using 2 runners, the build time is reduced from 19min (at the current main > config of parallelism=4) to 12min. > * There are some typos in the file, some dtests say "Run Unit Tests" but > they are JVM dtests (see > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077], > > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386]) > So some ways to process these would be: > * Do the Python dtests run successfully for anyone on {{medium}} instances? > If not, would it make sense to bump them to {{large}} so that they can be run > successfully? > * Does anybody ever run the python upgrade tests on CircleCI and what is the > configuration that makes it work? > * Would it make sense to either hardcode the number of test runners in the > unit tests with `-Dtest.runners` in the config file to reflect the number of > CPUs on the instances, or change the build so that it is able to detect the > appropriate number of core available automatically? > Additionally, it seems this default config file (config.yml) is not as well > maintained as the > [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml] > (+its lowres/highres) version in the same folder (from CASSANDRA-14806). > What is the reasoning for maintaining these 2 versions of the build? Could > the better maintained version be used as the default? We could generate a > lowres version of the new config-2_1.yml, and rename it {{config.yml}} so > that it gets picked up by CircleCI automatically instead of the current > default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config
[ https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076386#comment-17076386 ] Kevin Gallardo commented on CASSANDRA-15686: bq. I was under the impression the lager instances were not in the free tier; Correct, it is not. I was told that if you have a build that requests > medium on a free tier, CircleCI would downgrade to medium automatically, but I have not confirmed that, need to double check. I have seen [other OSS projects|https://github.com/envoyproxy/envoy/blob/master/.circleci/config.yml#L9] using {{xlarge}} resources on their default build config 路♂️. The problem is that the tests do not run well at all on medium, contributors get confused by seeing the python dtest build available and launches it, and then gets confused by the 200-300+ failures resulting. If these don't build successfully at all by default, I am wondering if it is worth to keep it in the "lowres" build at all given the confusion it causes? bq. Sorry I don't follow. [...] So the only file we should be maintaining is config-2_1.yml. So as we discussed offline, it would be possible to, instead of using the generated version of the {{config-2_1.yml}} file, we use the {{config-2_1.yml}} by default. This would cause less confusion to contributors too, we could also keep a HIGHRES version as discussed so that it doesn't change people's workflows. > Improvements in circle CI default config > > > Key: CASSANDRA-15686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15686 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Kevin Gallardo >Priority: Normal > > I have been looking at and played around with the [default CircleCI > config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], > a few comments/questions regarding the following topics: > * Python dtests do not run successfully (200-300 failures) on {{medium}} > instances, they seem to only run with small flaky failures on {{large}} > instances or higher > * Python Upgrade tests: > ** Do not seem to run without many failures on any instance types / any > parallelism setting > ** Do not seem to parallelize well, it seems each container is going to > download multiple C* versions > ** Additionally it seems the configuration is not up to date, as currently > we get errors because {{JAVA8_HOME}} is not set > * Unit tests do not seem to parallelize optimally, number of test runners do > not reflect the available CPUs on the container. Ideally if # of runners == # > of CPUs, build time is improved, on any type of instances. > ** For instance when using the current configuration, running on medium > instances, build will use 1 junit test runner, but 2 CPUs are available. If > using 2 runners, the build time is reduced from 19min (at the current main > config of parallelism=4) to 12min. > * There are some typos in the file, some dtests say "Run Unit Tests" but > they are JVM dtests (see > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077], > > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386]) > So some ways to process these would be: > * Do the Python dtests run successfully for anyone on {{medium}} instances? > If not, would it make sense to bump them to {{large}} so that they can be run > successfully? > * Does anybody ever run the python upgrade tests on CircleCI and what is the > configuration that makes it work? > * Would it make sense to either hardcode the number of test runners in the > unit tests with `-Dtest.runners` in the config file to reflect the number of > CPUs on the instances, or change the build so that it is able to detect the > appropriate number of core available automatically? > Additionally, it seems this default config file (config.yml) is not as well > maintained as the > [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml] > (+its lowres/highres) version in the same folder (from CASSANDRA-14806). > What is the reasoning for maintaining these 2 versions of the build? Could > the better maintained version be used as the default? We could generate a > lowres version of the new config-2_1.yml, and rename it {{config.yml}} so > that it gets picked up by CircleCI automatically instead of the current > default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config
[ https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074125#comment-17074125 ] David Capwell commented on CASSANDRA-15686: --- bq. Python dtests do not run successfully (200-300 failures) on medium instances, they seem to only run with small flaky failures on large instances or higher This is correct and should be improved. bq. Unit tests do not seem to parallelize optimally, number of test runners do not reflect the available CPUs on the container. Ideally if # of runners == # of bq. CPUs, build time is improved, on any type of instances. bq. For instance when using the current configuration, running on medium instances, build will use 1 junit test runner, but 2 CPUs are available. If using 2 bq. runners, the build time is reduced from 19min (at the current main config of parallelism=4) to 12min. I am only speculating but I thought there were tests that spin up network and there are many tests which share disk, and I don't know if we isolate the paths or not. If this is not true then it should be a simple change to bump the number of runners for unit tests (definitely not jvm dtests). bq. There are some typos in the file, some dtests say "Run Unit Tests" but they are JVM dtests (see here, here) Glad to review a patch to fix it =) bq. Do the Python dtests run successfully for anyone on medium instances? If not, would it make sense to bump them to large so that they can be run successfully? I was under the impression the lager instances were not in the free tier; I know xlarge is not, never tried large. My only concern would be the impact to the free tier of circle; if no impact then +1. bq. Additionally, it seems this default config file (config.yml) is not as well maintained as the config-2_1.yml (+its lowres/highres) version in the same folder (from CASSANDRA-14806). What is the reasoning for maintaining these 2 versions of the build? Could the better maintained version be used as the default? We could generate a lowres version of the new config-2_1.yml, and rename it config.yml so that it gets picked up by CircleCI automatically instead of the current default. Sorry I don't follow. {code} $ git pull --rebase upstream trunk >From github.com:apache/cassandra * branch trunk -> FETCH_HEAD Already up to date. Current branch trunk is up to date. $ cd .circleci/ $ diff config.yml config.yml.LOWRES $ {code} config.yml == config.yml.LOWERS. We modify config-2_1.yml, call generate.sh, then copy config.yml.LOWERS to config.yml. So the only file we should be maintaining is config-2_1.yml. > Improvements in circle CI default config > > > Key: CASSANDRA-15686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15686 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Kevin Gallardo >Priority: Normal > > I have been looking at and played around with the [default CircleCI > config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], > a few comments/questions regarding the following topics: > * Python dtests do not run successfully (200-300 failures) on {{medium}} > instances, they seem to only run with small flaky failures on {{large}} > instances or higher > * Python Upgrade tests: > ** Do not seem to run without many failures on any instance types / any > parallelism setting > ** Do not seem to parallelize well, it seems each container is going to > download multiple C* versions > ** Additionally it seems the configuration is not up to date, as currently > we get errors because {{JAVA8_HOME}} is not set > * Unit tests do not seem to parallelize optimally, number of test runners do > not reflect the available CPUs on the container. Ideally if # of runners == # > of CPUs, build time is improved, on any type of instances. > ** For instance when using the current configuration, running on medium > instances, build will use 1 junit test runner, but 2 CPUs are available. If > using 2 runners, the build time is reduced from 19min (at the current main > config of parallelism=4) to 12min. > * There are some typos in the file, some dtests say "Run Unit Tests" but > they are JVM dtests (see > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077], > > [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386]) > So some ways to process these would be: > * Do the Python dtests run successfully for anyone on {{medium}} instances? > If not, would it make sense to bump them to {{large}} so that they can be run > successfully? > * Does anybody ever run the python upgrade tests on CircleCI and what is the > configuration that makes it work? > * Would it make sense to either hardcode the number of test