[GitHub] [cassandra] ekaterinadimitrova2 commented on a diff in pull request #2554: Add declarative CI config, parsing scripts, and tests

via GitHub Fri, 18 Aug 2023 12:40:57 -0700


ekaterinadimitrova2 commented on code in PR #2554:
URL: https://github.com/apache/cassandra/pull/2554#discussion_r1298743261



##########
.build/config/README.md:
##########
@@ -0,0 +1,125 @@
+Declarative Test Suite Configuration
+-------------------------------------------
+
+Pipeline and test suite configurations are declarative so other CI 
implementations can build 
+durable, reactive systems based on changes to the upstream OSS C* CI. 
Additions to `jobs.cfg` and 
+`pipelines.cfg` can be picked up programmatically by CI implementations 
+without requiring human intervention.
+
+Concepts
+---------------------
+
+### Pipeline
+A [pipeline](cassandra_ci.yaml) is a collection of jobs. For a given pipeline 
to be considered 
+successful,
+all
+jobs listed in the pipeline must run to completion without error using the 
constraints, commands,
+and environment specified for the job in the config.
+
+### Job
+A [job](jobs.yaml) contains a collection of parameters that inform a CI system 
on both what needs to 
+run, how to run it, and the constraints of the environment in which it should 
execute. We 
+provide these limits to reflect what's available in our reference ASF CI 
implementation so other 
+CI environments are able to limit themselves to our resourcing upstream and 
thus not destabilize 
+ASF CI.
+
+Examples of jobs include unit tests, python dtests, in-jvm dtests, etc.
+
+Jobs include the following parameters:
+
+* `parent:` Another job defined in the file this job inherits parameters from, 
potentially 
+  overwriting any declared in duplication
+* `description:` Text based description of this job's purpose
+* `cmd:` The command a shell should run to execute the test job
+* `testlist:` A command that will create a text file listing all the test 
files to be run for 
+  this 
+  suite
+* `env:` Space delimited list of environment variables to be set for this 
suite. Duplicates for 
+  params are allowed and later declarations should supercede former.
+* `cpu:` Max cpu count allowed for a testing suite
+* `memory:` Max memory (in GB) allowable for a suite
+* `storage:` Max allowable storage (in GB) allowable for a suite to access
+
+Jobs can be split up and parallelized in whatever manner best suits the 
environment in which they're
+orchestraed.

Review Comment:
   ```suggestion
   orchestrated.
   ```



##########
.build/config/README.md:
##########
@@ -0,0 +1,125 @@
+Declarative Test Suite Configuration
+-------------------------------------------
+
+Pipeline and test suite configurations are declarative so other CI 
implementations can build 
+durable, reactive systems based on changes to the upstream OSS C* CI. 
Additions to `jobs.cfg` and 
+`pipelines.cfg` can be picked up programmatically by CI implementations 
+without requiring human intervention.
+
+Concepts
+---------------------
+
+### Pipeline
+A [pipeline](cassandra_ci.yaml) is a collection of jobs. For a given pipeline 
to be considered 
+successful,
+all
+jobs listed in the pipeline must run to completion without error using the 
constraints, commands,
+and environment specified for the job in the config.
+
+### Job
+A [job](jobs.yaml) contains a collection of parameters that inform a CI system 
on both what needs to 
+run, how to run it, and the constraints of the environment in which it should 
execute. We 
+provide these limits to reflect what's available in our reference ASF CI 
implementation so other 
+CI environments are able to limit themselves to our resourcing upstream and 
thus not destabilize 
+ASF CI.
+
+Examples of jobs include unit tests, python dtests, in-jvm dtests, etc.
+
+Jobs include the following parameters:
+
+* `parent:` Another job defined in the file this job inherits parameters from, 
potentially 
+  overwriting any declared in duplication
+* `description:` Text based description of this job's purpose
+* `cmd:` The command a shell should run to execute the test job
+* `testlist:` A command that will create a text file listing all the test 
files to be run for 
+  this 
+  suite
+* `env:` Space delimited list of environment variables to be set for this 
suite. Duplicates for 
+  params are allowed and later declarations should supercede former.
+* `cpu:` Max cpu count allowed for a testing suite
+* `memory:` Max memory (in GB) allowable for a suite
+* `storage:` Max allowable storage (in GB) allowable for a suite to access
+
+Jobs can be split up and parallelized in whatever manner best suits the 
environment in which they're
+orchestraed.
+
+Configuration Files
+---------------------
+
+[pipelines.cfg](./cassandra_ci.yaml): Contains pipelines for CI jobs for 
Apache Cassandra
+
+[jobs.cfg](./jobs.yaml): Contains reference CI jobs for Apache Cassandra
+
+Existing Pipelines
+---------------------
+
+As outlined in the `pipelines.cfg` file, we primarily have 3 pipelines:
+### pre-commit:
+* must run and pass on the lowest supported JDK before a committer merges any 
code
+### post-commit:
+* will run on the upstream ASF repo after a commit is merged, matrixed across 
more axes and including configurations expected to fail or diverge only rarely
+### nightly:
+* run nightly. Longer term, infra, very stable areas of code.
+
+Adding a new job to CI
+---------------------
+
+To add a new job to CI, you need to do 2 things:
+1. Determine which pipeline it will be a part of. Add the job name to that 
pipeline (or create a
+new pipeline with that job)
+
+2. Add a new entry to [jobs.cfg](./jobs.yaml). For example:
+```
+job:my-new-job
+    parent:base
+    description:new test suite that does important new things
+    cmd:ant new_job_name
+    testlist:find test/new_test_type -name '*Test.java' | sort
+    memory:12
+    cpu:4
+    storage:20
+    env:PARAM_ONE=val1 PARAM_TWO=val2 PARAM_THREE=val3
+    env:PARAM_FOUR=val4 PARAM_FIVE=val5
+```
+
+**NOTE**:
+
+You will also need to ensure the necessary values exist in 
[build.xml](../../build.xml) (timeouts, 
+etc).
+For now, there is duplication between the declarative declaration of test 
suites here and `build.
+xml`
+
+Building a Testing Environment
+-------------------------------------
+[ci_config_parser.sh](./ci_config_parser.sh) contains several methods to parse 
out pipelines, jobs, 
+and 
+job parameters:
+
+* `populate_pipelines`: populates a global array named `pipelines` with the 
names of all valid 
+  pipelines from the given input file
+* `populate_jobs`: populates all the required jobs for a given pipeline. 
Useful for determining 
+  / breaking down and iterating through jobs needed for a given pipeline
+* `parse_job_params`: populates some key global variables (see details in 
[ci_config_parser.sh](.
+  /ci_config_parser.sh) that can be used to build out constraints, commands, 
and details in a 
+  programmatic CI pipeline config builder.
+
+The workflow for building CI programmatically from the config might look 
something like this:
+* `populate_pipelines` to determine what pipelines you need to build out
+* For each pipeline:
+   1. `populate_jobs` to determine which jobs you need to write out config for
+   2. for each job:
+      1. `clear_job_params` to ensure nothing is left over from previous runs
+      2. `parse_job_params` to set up the params needed for the job
+      2. Write out the current job's params in whatever CI config format 
you're using in your 

Review Comment:
   ```suggestion
         3. Write out the current job's params in whatever CI config format 
you're using in your 
   ```



##########
.build/config/cassandra_ci.yaml:
##########
@@ -0,0 +1,355 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Contains definitions of all pipelines and jobs (test suites) in Apache 
Cassandra's CI.

Review Comment:
   Maybe add some big title so it is easy to see where license stops, docs 
start?



##########
.build/config/cassandra_ci.yaml:
##########
@@ -0,0 +1,355 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Contains definitions of all pipelines and jobs (test suites) in Apache 
Cassandra's CI.
+
+# CI consists of:
+#   1. job: a set of commands to run against a list of files containing tests
+#   2. pipeline: a list of jobs that can be run in arbitrary order
+#       pipelines contain a list of JDK's they have to be run across to 
certify correctness
+
+#-----------------------------------------------------------------------------
+# IMPLEMENTATION REQUIRED PARAMETERS:
+#-----------------------------------------------------------------------------
+# We do not provide a mechanism to transform the contents of $TEST_LIST_FILE 
into $TEST_SPLIT_FILE. Implementations
+# must provide that mechanism and set that environment variable or "job->run:" 
operations will fail, unable to find a test split.
+#
+# EXPECTED FLOW ON AN AGENT:
+# 1. Populate contents of $TEST_LIST_FILE for a given job using 
"job->test_list_cmd:" piped through "job->TEST_FILTER:"
+# 2. Split up $TEST_LIST_FILE using "job->num_split_cmd:"
+# 3. Populate $TEST_SPLIT_FILE with a given split (CI implementation specific)
+# 3. Execute "job->run:" to run the given $TEST_SPLIT_FILE

Review Comment:
   ```suggestion
   # 4. Execute "job->run:" to run the given $TEST_SPLIT_FILE
   ```



##########
.build/config/assert.sh:
##########
@@ -0,0 +1,266 @@
+#!/usr/bin/env bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Borrowed from https://github.com/torokmark/assert.sh/blob/main/assert.sh
+
+#####################################################################
+##
+## title: Assert Extension
+##
+## description:
+## Assert extension of shell (bash, ...)
+##   with the common assert functions
+## Function list based on:
+##   http://junit.sourceforge.net/javadoc/org/junit/Assert.html
+## Log methods : inspired by
+##     - https://natelandau.com/bash-scripting-utilities/
+## author: Mark Torok
+##
+## date: 07. Dec. 2016
+##
+## license: MIT
+##
+#####################################################################
+
+. functions.sh
+
+if command -v tput &>/dev/null && tty -s; then
+    RED=$(tput setaf 1)
+    GREEN=$(tput setaf 2)
+    MAGENTA=$(tput setaf 5)
+    NORMAL=$(tput sgr0)
+    BOLD=$(tput bold)
+else
+    RED=$(echo -en "\e[31m")
+    GREEN=$(echo -en "\e[32m")
+    MAGENTA=$(echo -en "\e[35m")
+    NORMAL=$(echo -en "\e[00m")
+    BOLD=$(echo -en "\e[01m")
+fi
+
+log_header() {

Review Comment:
   I haven't looked in detail the script, but I like the idea. Seems neat. 



##########
.build/config/cassandra_ci.yaml:
##########
@@ -0,0 +1,355 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Contains definitions of all pipelines and jobs (test suites) in Apache 
Cassandra's CI.
+
+# CI consists of:
+#   1. job: a set of commands to run against a list of files containing tests
+#   2. pipeline: a list of jobs that can be run in arbitrary order
+#       pipelines contain a list of JDK's they have to be run across to 
certify correctness
+
+#-----------------------------------------------------------------------------
+# IMPLEMENTATION REQUIRED PARAMETERS:
+#-----------------------------------------------------------------------------
+# We do not provide a mechanism to transform the contents of $TEST_LIST_FILE 
into $TEST_SPLIT_FILE. Implementations
+# must provide that mechanism and set that environment variable or "job->run:" 
operations will fail, unable to find a test split.
+#
+# EXPECTED FLOW ON AN AGENT:

Review Comment:
   ```suggestion
   -----------------------------------------------------------------------------
   # EXPECTED FLOW ON AN AGENT:
   -----------------------------------------------------------------------------
   ```



##########
.build/config/functions.sh:
##########
@@ -0,0 +1,53 @@
+#!/usr/bin/env bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+##############################################################################
+# Helper functions for use in our build scripting
+##############################################################################
+
+# Confirm that a given variable exists
+# $1: Message to print on error
+# $2: Variable to check for definition
+check_argument() {

Review Comment:
   Maybe I would change the sh script name to Validations or something, 
functions doesn't ring a bell to me :-) 



##########
.build/config/cassandra_ci.yaml:
##########
@@ -0,0 +1,355 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Contains definitions of all pipelines and jobs (test suites) in Apache 
Cassandra's CI.
+
+# CI consists of:
+#   1. job: a set of commands to run against a list of files containing tests
+#   2. pipeline: a list of jobs that can be run in arbitrary order
+#       pipelines contain a list of JDK's they have to be run across to 
certify correctness
+
+#-----------------------------------------------------------------------------
+# IMPLEMENTATION REQUIRED PARAMETERS:
+#-----------------------------------------------------------------------------
+# We do not provide a mechanism to transform the contents of $TEST_LIST_FILE 
into $TEST_SPLIT_FILE. Implementations
+# must provide that mechanism and set that environment variable or "job->run:" 
operations will fail, unable to find a test split.
+#
+# EXPECTED FLOW ON AN AGENT:
+# 1. Populate contents of $TEST_LIST_FILE for a given job using 
"job->test_list_cmd:" piped through "job->TEST_FILTER:"
+# 2. Split up $TEST_LIST_FILE using "job->num_split_cmd:"
+# 3. Populate $TEST_SPLIT_FILE with a given split (CI implementation specific)
+# 3. Execute "job->run:" to run the given $TEST_SPLIT_FILE
+
+#-----------------------------------------------------------------------------
+# SOURCES
+#-----------------------------------------------------------------------------
+# You can configure the different sources you're using for your CI stack here; 
we default to HEAD on a given branch
+# and you should print out what SHA you checked out and built against for 
reproducibility in a subsequent  investigation.
+repos:
+  cassandra:
+    url: https://github.com/apache/cassandra
+    branch: trunk
+    sha: HEAD
+  python_dtest:
+    url: &python_dtest_url https://github.com/apache/cassandra-dtest
+    branch: &python_dtest_branch trunk
+    sha: HEAD
+  cassandra-harry:
+    url: https://github.com/apache/cassandra-harry
+    branch: trunk
+    sha: HEAD
+
+
+#-----------------------------------------------------------------------------
+# PIPELINES
+#-----------------------------------------------------------------------------
+pipelines:
+  # All jobs in the pre-commit pipeline must run within constraints and pass
+  # before a commit is merged upstream. Committers are expected to validate
+  # and sign off on this if using non-reference CI environments.
+  #
+  # Failure to do so can lead to commits being reverted.
+  - name: pre-commit
+    jdk:
+      - 11
+    jobs:
+      - unit
+      - jvm-dtest
+      - python-dtest
+      - dtest
+      - dtest-large
+      - dtest-upgrade
+      - dtest-upgrade-large
+      - long-test
+      - cqlsh-test
+
+  # The post-commit pipeline is a larger set of tests that include all 
supported JDKs.
+  # We expect different JDKs and variations on test suites to fail very rarely.
+  #
+  # Failures in these tests will be made visible on JIRA tickets shortly after
+  # test run on reference CI and committers are expected to prioritize
+  # rectifying any failures introduced by their work.
+  - name: post-commit
+    jdk:
+      - 11
+      - 17

Review Comment:
   We added java.supported only in 5.0+



##########
.build/config/README.md:
##########
@@ -0,0 +1,125 @@
+Declarative Test Suite Configuration
+-------------------------------------------
+
+Pipeline and test suite configurations are declarative so other CI 
implementations can build 
+durable, reactive systems based on changes to the upstream OSS C* CI. 
Additions to `jobs.cfg` and 
+`pipelines.cfg` can be picked up programmatically by CI implementations 
+without requiring human intervention.
+
+Concepts
+---------------------
+
+### Pipeline
+A [pipeline](cassandra_ci.yaml) is a collection of jobs. For a given pipeline 
to be considered 
+successful,
+all
+jobs listed in the pipeline must run to completion without error using the 
constraints, commands,
+and environment specified for the job in the config.
+
+### Job
+A [job](jobs.yaml) contains a collection of parameters that inform a CI system 
on both what needs to 
+run, how to run it, and the constraints of the environment in which it should 
execute. We 
+provide these limits to reflect what's available in our reference ASF CI 
implementation so other 
+CI environments are able to limit themselves to our resourcing upstream and 
thus not destabilize 
+ASF CI.
+
+Examples of jobs include unit tests, python dtests, in-jvm dtests, etc.
+
+Jobs include the following parameters:
+
+* `parent:` Another job defined in the file this job inherits parameters from, 
potentially 
+  overwriting any declared in duplication
+* `description:` Text based description of this job's purpose
+* `cmd:` The command a shell should run to execute the test job
+* `testlist:` A command that will create a text file listing all the test 
files to be run for 
+  this 
+  suite
+* `env:` Space delimited list of environment variables to be set for this 
suite. Duplicates for 
+  params are allowed and later declarations should supercede former.
+* `cpu:` Max cpu count allowed for a testing suite
+* `memory:` Max memory (in GB) allowable for a suite
+* `storage:` Max allowable storage (in GB) allowable for a suite to access
+
+Jobs can be split up and parallelized in whatever manner best suits the 
environment in which they're
+orchestraed.
+
+Configuration Files
+---------------------
+
+[pipelines.cfg](./cassandra_ci.yaml): Contains pipelines for CI jobs for 
Apache Cassandra
+
+[jobs.cfg](./jobs.yaml): Contains reference CI jobs for Apache Cassandra
+
+Existing Pipelines
+---------------------
+
+As outlined in the `pipelines.cfg` file, we primarily have 3 pipelines:
+### pre-commit:
+* must run and pass on the lowest supported JDK before a committer merges any 
code
+### post-commit:
+* will run on the upstream ASF repo after a commit is merged, matrixed across 
more axes and including configurations expected to fail or diverge only rarely
+### nightly:
+* run nightly. Longer term, infra, very stable areas of code.
+
+Adding a new job to CI
+---------------------
+
+To add a new job to CI, you need to do 2 things:
+1. Determine which pipeline it will be a part of. Add the job name to that 
pipeline (or create a
+new pipeline with that job)
+
+2. Add a new entry to [jobs.cfg](./jobs.yaml). For example:
+```
+job:my-new-job
+    parent:base
+    description:new test suite that does important new things
+    cmd:ant new_job_name
+    testlist:find test/new_test_type -name '*Test.java' | sort
+    memory:12
+    cpu:4
+    storage:20
+    env:PARAM_ONE=val1 PARAM_TWO=val2 PARAM_THREE=val3
+    env:PARAM_FOUR=val4 PARAM_FIVE=val5
+```
+
+**NOTE**:
+
+You will also need to ensure the necessary values exist in 
[build.xml](../../build.xml) (timeouts, 
+etc).
+For now, there is duplication between the declarative declaration of test 
suites here and `build.
+xml`
+
+Building a Testing Environment
+-------------------------------------
+[ci_config_parser.sh](./ci_config_parser.sh) contains several methods to parse 
out pipelines, jobs, 
+and 
+job parameters:
+
+* `populate_pipelines`: populates a global array named `pipelines` with the 
names of all valid 
+  pipelines from the given input file
+* `populate_jobs`: populates all the required jobs for a given pipeline. 
Useful for determining 
+  / breaking down and iterating through jobs needed for a given pipeline
+* `parse_job_params`: populates some key global variables (see details in 
[ci_config_parser.sh](.
+  /ci_config_parser.sh) that can be used to build out constraints, commands, 
and details in a 
+  programmatic CI pipeline config builder.
+
+The workflow for building CI programmatically from the config might look 
something like this:
+* `populate_pipelines` to determine what pipelines you need to build out
+* For each pipeline:
+   1. `populate_jobs` to determine which jobs you need to write out config for
+   2. for each job:
+      1. `clear_job_params` to ensure nothing is left over from previous runs
+      2. `parse_job_params` to set up the params needed for the job
+      2. Write out the current job's params in whatever CI config format 
you're using in your 
+         env (circle, jenkinsfile, etc)
+
+As new entries are added to [pipelines.cfg](./cassandra_ci.yaml) and 
[jobs.cfg](./jobs.yaml), your 
+scripts should pick those up and integrate them into your configuration 
environment.
+
+Testing the in-tree config parsing scripts
+---------------------------------------------
+Currently testing is manual on the first addition of this declarative 
structure. As we integrate 
+it into our reference CI, we will integrate testing in as a new target.
+
+To run tests, execute [test_config.sh](./test/test_config.sh) from a terminal 
and inspect the 
+output.

Review Comment:
   I am not sure I understand this section. The title is about parsing, then we 
talk about new task and provide script name



##########
.build/config/cassandra_ci.yaml:
##########
@@ -0,0 +1,355 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Contains definitions of all pipelines and jobs (test suites) in Apache 
Cassandra's CI.
+
+# CI consists of:
+#   1. job: a set of commands to run against a list of files containing tests
+#   2. pipeline: a list of jobs that can be run in arbitrary order
+#       pipelines contain a list of JDK's they have to be run across to 
certify correctness
+
+#-----------------------------------------------------------------------------
+# IMPLEMENTATION REQUIRED PARAMETERS:
+#-----------------------------------------------------------------------------
+# We do not provide a mechanism to transform the contents of $TEST_LIST_FILE 
into $TEST_SPLIT_FILE. Implementations
+# must provide that mechanism and set that environment variable or "job->run:" 
operations will fail, unable to find a test split.

Review Comment:
   I would expect there is some default split that can be used if you do not 
have your own implementation? 



##########
.build/config/ci_config_parser.sh:
##########
@@ -0,0 +1,174 @@
+#!/usr/bin/env bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+. functions.sh
+
+# This script relies on yq: https://github.com/mikefarah/yq
+# License: MIT: https://github.com/jmckenzie-dev/yq/blob/master/LICENSE
+# Plain binary install: wget 
https://github.com/mikefarah/yq/releases/download/${VERSION}/${BINARY} -O 
/usr/bin/yq &&\ chmod +x /usr/bin/yq
+# Brew install: "brew install yq"
+
+# Text array of all known pipelines found in processed config file
+export pipelines=()
+
+# Text array of all known jobs found in the jobs config
+export pipeline_jobs=()
+
+# The keys for various properties as defined in the test jobs.yaml file
+KEY_PARENT="parent"
+KEY_CMD="cmd"
+KEY_ENV="env"
+KEY_TESTLIST_CMD="testlist"
+KEY_MEM="memory"
+KEY_CPU="cpu"
+KEY_STORAGE="storage"

Review Comment:
   probably we can add the unit as suffix?



##########
.build/config/cassandra_ci.yaml:
##########
@@ -0,0 +1,355 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Contains definitions of all pipelines and jobs (test suites) in Apache 
Cassandra's CI.
+
+# CI consists of:
+#   1. job: a set of commands to run against a list of files containing tests
+#   2. pipeline: a list of jobs that can be run in arbitrary order
+#       pipelines contain a list of JDK's they have to be run across to 
certify correctness
+
+#-----------------------------------------------------------------------------
+# IMPLEMENTATION REQUIRED PARAMETERS:
+#-----------------------------------------------------------------------------
+# We do not provide a mechanism to transform the contents of $TEST_LIST_FILE 
into $TEST_SPLIT_FILE. Implementations
+# must provide that mechanism and set that environment variable or "job->run:" 
operations will fail, unable to find a test split.
+#
+# EXPECTED FLOW ON AN AGENT:
+# 1. Populate contents of $TEST_LIST_FILE for a given job using 
"job->test_list_cmd:" piped through "job->TEST_FILTER:"
+# 2. Split up $TEST_LIST_FILE using "job->num_split_cmd:"
+# 3. Populate $TEST_SPLIT_FILE with a given split (CI implementation specific)
+# 3. Execute "job->run:" to run the given $TEST_SPLIT_FILE
+
+#-----------------------------------------------------------------------------
+# SOURCES
+#-----------------------------------------------------------------------------
+# You can configure the different sources you're using for your CI stack here; 
we default to HEAD on a given branch
+# and you should print out what SHA you checked out and built against for 
reproducibility in a subsequent  investigation.
+repos:
+  cassandra:
+    url: https://github.com/apache/cassandra
+    branch: trunk
+    sha: HEAD
+  python_dtest:
+    url: &python_dtest_url https://github.com/apache/cassandra-dtest
+    branch: &python_dtest_branch trunk
+    sha: HEAD
+  cassandra-harry:
+    url: https://github.com/apache/cassandra-harry
+    branch: trunk
+    sha: HEAD
+
+
+#-----------------------------------------------------------------------------
+# PIPELINES
+#-----------------------------------------------------------------------------
+pipelines:
+  # All jobs in the pre-commit pipeline must run within constraints and pass
+  # before a commit is merged upstream. Committers are expected to validate
+  # and sign off on this if using non-reference CI environments.
+  #
+  # Failure to do so can lead to commits being reverted.
+  - name: pre-commit
+    jdk:
+      - 11
+    jobs:
+      - unit
+      - jvm-dtest
+      - python-dtest
+      - dtest
+      - dtest-large
+      - dtest-upgrade
+      - dtest-upgrade-large
+      - long-test
+      - cqlsh-test
+
+  # The post-commit pipeline is a larger set of tests that include all 
supported JDKs.
+  # We expect different JDKs and variations on test suites to fail very rarely.
+  #
+  # Failures in these tests will be made visible on JIRA tickets shortly after
+  # test run on reference CI and committers are expected to prioritize
+  # rectifying any failures introduced by their work.
+  - name: post-commit
+    jdk:
+      - 11
+      - 17
+    jobs:
+      - unit
+      - unit-cdc
+      - compression
+      - test-oa
+      - test-system-keyspace-directory
+      - test-tries
+      - jvm-dtest
+      - jvm-dtest-upgrade
+      - dtest
+      - dtest-novnode
+      - dtest-offheap
+      - dtest-large
+      - dtest-large-novnode
+      - dtest-upgrade
+      - dtest-upgrade-large
+      - long-test
+      - cqlsh-test
+
+  # These are longer-term, much more rarely changing pieces of infrastructure 
or
+  # testing. We expect these to fail even more rarely than post-commit.
+  - name: nightly
+    jdk:
+      - 11
+      - 17
+    jobs:
+      - stress-test
+      - fqltool-test
+      - test-burn
+
+#-----------------------------------------------------------------------------
+# RESOURCE LIMITS, ALIASES, AND DEFAULT ENV VARS
+#-----------------------------------------------------------------------------
+# Downstream test orchestration needs to use <= the following values when 
running tests.
+# Increasing these values indicates  a change in resource allocation 
https://ci-cassandra.apache.org/ and should not be done downstream.
+small_executor: &small_executor {cpu: 4, memory: 1g, storage: 5g}
+medium_executor: &medium_executor {cpu: 4, memory: 6g, storage: 25g}
+large_executor: &large_executor {cpu: 4, memory: 16g, storage: 50g}
+
+# On test addition or change, we repeat the job many times to try and suss out 
flakes. Instead of having it be bespoke
+# per job, we want to provide some general guidelines for folks to default to 
and provide guidance on each test suite.
+repeat_default: &repeat_many 500
+repeat_less: &repeat_moderate 100
+repeat_tiny: &repeat_few 25
+
+# Default to at least one split
+default_split_num: &default_split_num let NUM_SPLITS=$(( $(wc -l < 
"$TEST_LIST_FILE") / $SPLIT_SIZE )); if [ "$NUM_SPLITS" -eq 0 ]; then 
NUM_SPLITS=1; fi

Review Comment:
   I do not understand... shouldn't we have this as default and give a chance 
to make the num_splits a variable instead of const? 1 by default, whatever you 
feel like otherwise... And you can plug your own way of doing splits, as it was 
mentioned earlier. 



##########
.build/config/cassandra_ci.yaml:
##########
@@ -0,0 +1,355 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Contains definitions of all pipelines and jobs (test suites) in Apache 
Cassandra's CI.
+
+# CI consists of:
+#   1. job: a set of commands to run against a list of files containing tests
+#   2. pipeline: a list of jobs that can be run in arbitrary order
+#       pipelines contain a list of JDK's they have to be run across to 
certify correctness
+
+#-----------------------------------------------------------------------------
+# IMPLEMENTATION REQUIRED PARAMETERS:
+#-----------------------------------------------------------------------------
+# We do not provide a mechanism to transform the contents of $TEST_LIST_FILE 
into $TEST_SPLIT_FILE. Implementations
+# must provide that mechanism and set that environment variable or "job->run:" 
operations will fail, unable to find a test split.
+#
+# EXPECTED FLOW ON AN AGENT:
+# 1. Populate contents of $TEST_LIST_FILE for a given job using 
"job->test_list_cmd:" piped through "job->TEST_FILTER:"
+# 2. Split up $TEST_LIST_FILE using "job->num_split_cmd:"
+# 3. Populate $TEST_SPLIT_FILE with a given split (CI implementation specific)
+# 3. Execute "job->run:" to run the given $TEST_SPLIT_FILE
+
+#-----------------------------------------------------------------------------
+# SOURCES
+#-----------------------------------------------------------------------------
+# You can configure the different sources you're using for your CI stack here; 
we default to HEAD on a given branch
+# and you should print out what SHA you checked out and built against for 
reproducibility in a subsequent  investigation.
+repos:
+  cassandra:
+    url: https://github.com/apache/cassandra
+    branch: trunk
+    sha: HEAD
+  python_dtest:
+    url: &python_dtest_url https://github.com/apache/cassandra-dtest
+    branch: &python_dtest_branch trunk
+    sha: HEAD
+  cassandra-harry:
+    url: https://github.com/apache/cassandra-harry
+    branch: trunk
+    sha: HEAD
+
+
+#-----------------------------------------------------------------------------
+# PIPELINES
+#-----------------------------------------------------------------------------
+pipelines:
+  # All jobs in the pre-commit pipeline must run within constraints and pass
+  # before a commit is merged upstream. Committers are expected to validate
+  # and sign off on this if using non-reference CI environments.
+  #
+  # Failure to do so can lead to commits being reverted.
+  - name: pre-commit
+    jdk:
+      - 11
+    jobs:
+      - unit
+      - jvm-dtest
+      - python-dtest
+      - dtest
+      - dtest-large
+      - dtest-upgrade
+      - dtest-upgrade-large
+      - long-test
+      - cqlsh-test
+
+  # The post-commit pipeline is a larger set of tests that include all 
supported JDKs.
+  # We expect different JDKs and variations on test suites to fail very rarely.
+  #
+  # Failures in these tests will be made visible on JIRA tickets shortly after
+  # test run on reference CI and committers are expected to prioritize
+  # rectifying any failures introduced by their work.
+  - name: post-commit
+    jdk:
+      - 11
+      - 17
+    jobs:
+      - unit
+      - unit-cdc
+      - compression
+      - test-oa
+      - test-system-keyspace-directory
+      - test-tries
+      - jvm-dtest
+      - jvm-dtest-upgrade
+      - dtest
+      - dtest-novnode
+      - dtest-offheap
+      - dtest-large
+      - dtest-large-novnode
+      - dtest-upgrade
+      - dtest-upgrade-large
+      - long-test
+      - cqlsh-test
+
+  # These are longer-term, much more rarely changing pieces of infrastructure 
or
+  # testing. We expect these to fail even more rarely than post-commit.
+  - name: nightly
+    jdk:
+      - 11
+      - 17
+    jobs:
+      - stress-test
+      - fqltool-test
+      - test-burn
+
+#-----------------------------------------------------------------------------
+# RESOURCE LIMITS, ALIASES, AND DEFAULT ENV VARS
+#-----------------------------------------------------------------------------
+# Downstream test orchestration needs to use <= the following values when 
running tests.
+# Increasing these values indicates  a change in resource allocation 
https://ci-cassandra.apache.org/ and should not be done downstream.
+small_executor: &small_executor {cpu: 4, memory: 1g, storage: 5g}
+medium_executor: &medium_executor {cpu: 4, memory: 6g, storage: 25g}
+large_executor: &large_executor {cpu: 4, memory: 16g, storage: 50g}
+
+# On test addition or change, we repeat the job many times to try and suss out 
flakes. Instead of having it be bespoke
+# per job, we want to provide some general guidelines for folks to default to 
and provide guidance on each test suite.
+repeat_default: &repeat_many 500
+repeat_less: &repeat_moderate 100
+repeat_tiny: &repeat_few 25
+
+# Default to at least one split
+default_split_num: &default_split_num let NUM_SPLITS=$(( $(wc -l < 
"$TEST_LIST_FILE") / $SPLIT_SIZE )); if [ "$NUM_SPLITS" -eq 0 ]; then 
NUM_SPLITS=1; fi
+
+# These env vars are required for tests to complete successfully given the 
run: commands, however downstream implementations
+# are welcome to change them as needed to setup their env
+default_env_vars: &default_env_vars
+  ANT_HOME: /usr/share/ant
+  KEEP_TEST_DIR: true
+  CASSANDRA_DIR: /home/cassandra/cassandra
+  CASSANDRA_DTEST_DIR: /home/cassandra/cassandra-dtest
+  CCM_CONFIG_DIR: ${DIST_DIR}/.ccm
+  TMPDIR: "$(mktemp -d)"
+  DIST_DIR: "${CASSANDRA_DIR}/build"
+  # Default to test.timeout as found in build.xml; should parse out of there 
in building local env and set this env var based on job
+  TEST_TIMEOUT: 480000
+  # Whether the repeated test iterations should stop on the first failure by 
default.
+  REPEATED_TESTS_STOP_ON_FAILURE: false
+
+# Anything specified in the required env vars SHOULD NOT BE CHANGED except for 
ASF CI; these are expected to have a
+# material impact on test correctness and changes to them on a downstream 
system will likely destabilize our reference
+# CI implementation
+required_env_vars: &required_env_vars
+  LANG: en_US.UTF-8
+  PYTHONIOENCODING: "utf-8"
+  PYTHONUNBUFFERED: true
+  CASS_DRIVER_NO_EXTENSIONS: true
+  CASS_DRIVER_NO_CYTHON: true
+  #Skip all syncing to disk to avoid performance issues in flaky CI 
environments
+  CASSANDRA_SKIP_SYNC: true
+  CCM_MAX_HEAP_SIZE: "1024M"
+  CCM_HEAP_NEWSIZE: "512M"
+  PYTEST_OPTS: "-vv --log-cli-level=DEBUG 
--junit-xml=${DIST_DIR}/test/output/nosetests.xml 
--junit-prefix=${DTEST_TARGET} -s"
+
+#-----------------------------------------------------------------------------
+# JOBS
+#
+# By convention, anything in caps in the .yaml should be exported to an env 
var during the test run cycle.
+#
+# Parameters:
+#   job: the name
+#   resources: cpu: memory: storage: max allowable for the suite.
+#   SPLIT_SIZE: This indicates how many tests to include in a given split. Can 
raise or lower as needed in your env.
+#   REPEAT_COUNT: Number of times to repeat a test when multiplexing. *Do not 
lower below upstream config default.*
+#   env:
+#     TYPE: The type of test; this should translate into tests found under 
${CASSANDRA_DIR}/test/${type} in $TEST_SPLIT_FILE
+#     TEST_FILTER: filter to run after test_list_cmd to narrow down tests 
(splits, upgrade vs. non, etc)
+#   test_list_cmd: command to run in shell to generate full list of tests to 
run. By default, randomizes test list by file name
+#   num_split_cmd: Calculation that populates NUM_SPLITS based on suite, 
count, weighting.
+#     *If you make changes to this value, they must be >= the default value*
+#   run: command to run in shell to execute tests
+#
+#-----------------------------------------------------------------------------
+
+#-----------------------------------------------------------------------------
+# Single node JVM tests
+#-----------------------------------------------------------------------------
+jobs:
+  - &job_unit
+    name: unit
+    resources: *medium_executor
+    REPEAT_COUNT: *repeat_many
+    SPLIT_SIZE: 20
+    env:
+      <<: *default_env_vars
+      <<: *required_env_vars
+      ANT_TEST_OPTS: -Dno-build-test=true
+      # type lines up with the various targets in build.xml for usage by the 
<testclasslist> target
+      TYPE: unit
+      TEST_FILTER: ""
+    TEST_LIST_FILE: ${DIST_DIR}/test_list.txt
+    test_list_cmd: find "test/${type}" -name "*Test.java" ${TEST_FILTER:-} | 
sed "s;^test/${type}/;;" | sort -R > ${TEST_LIST_FILE}
+    num_split_cmd: *default_split_num

Review Comment:
   And what do we do instead? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [cassandra] ekaterinadimitrova2 commented on a diff in pull request #2554: Add declarative CI config, parsing scripts, and tests

Reply via email to