[GitHub] [spark] EnricoMi commented on a diff in pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

GitBox Tue, 21 Jun 2022 07:17:30 -0700


EnricoMi commented on code in PR #36940:
URL: https://github.com/apache/spark/pull/36940#discussion_r902668952



##########
.github/workflows/build_java11.yml:
##########
@@ -32,9 +32,16 @@ jobs:
       java: 11
       branch: master
       hadoop: hadoop3
-      type: scheduled
       envs: >-
         {
           "SKIP_MIMA": "true",
           "SKIP_UNIDOC": "true"
         }
+      jobs: >-
+        {
+          "build": "true",
+          "pyspark": "true",

Review Comment:
   job `pyspark` did not used to run for Java 11:
   
   ```
   inputs.type == 'scheduled' && inputs.java == '17'
   ```



##########
.github/workflows/build_hadoop2.yml:
##########
@@ -32,4 +32,11 @@ jobs:
       java: 8
       branch: master
       hadoop: hadoop2
-      type: scheduled
+      jobs: >-
+        {
+          "build": "true",
+          "pyspark": "true",
+          "sparkr": "true",

Review Comment:
   job `sparkr` did not used to run for scheduled Java 8:
   
   ```
   inputs.type == 'scheduled' && inputs.java == '17'
   ```



##########
.github/workflows/build_and_test.yml:
##########
@@ -67,27 +73,42 @@ jobs:
     - name: Check all modules
       id: set-outputs
       run: |
-        # is-changed.py is missing in branch-3.2, and it might run in 
scheduled build, see also SPARK-39517
-        build=true; pyspark=true; sparkr=true; tpcds=true; docker=true;
-        if [ -f "./dev/is-changed.py" ]; then
-          build=`./dev/is-changed.py -m 
avro,build,catalyst,core,docker-integration-tests,examples,graphx,hadoop-cloud,hive,hive-thriftserver,kubernetes,kvstore,launcher,mesos,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,spark-ganglia-lgpl,sparkr,sql,sql-kafka-0-10,streaming,streaming-kafka-0-10,streaming-kinesis-asl,tags,unsafe,yarn`
-          pyspark=`./dev/is-changed.py -m 
avro,build,catalyst,core,graphx,hive,kvstore,launcher,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,sql,tags,unsafe`
-          sparkr=`./dev/is-changed.py -m 
avro,build,catalyst,core,hive,kvstore,launcher,mllib,mllib-local,network-common,network-shuffle,repl,sketch,sparkr,sql,tags,unsafe`
-          tpcds=`./dev/is-changed.py -m 
build,catalyst,core,hive,kvstore,launcher,network-common,network-shuffle,repl,sketch,sql,tags,unsafe`
-          docker=`./dev/is-changed.py -m 
build,catalyst,core,docker-integration-tests,hive,kvstore,launcher,network-common,network-shuffle,repl,sketch,sql,tags,unsafe`
+        if [ -z "${{ inputs.jobs }}" ]; then
+          # is-changed.py is missing in branch-3.2, and it might run in 
scheduled build, see also SPARK-39517
+          pyspark=true; sparkr=true; tpcds=true; docker=true;
+          if [ -f "./dev/is-changed.py" ]; then
+            pyspark_modules=`python -c "import sparktestsupport.modules as m; 
print(','.join(m.name for m in m.all_modules if m.name.startswith('pyspark')))"`
+            pyspark=`./dev/is-changed.py -c -m $pyspark_modules`
+            sparkr=`./dev/is-changed.py -c -m sparkr`
+            tpcds=`./dev/is-changed.py -c -m sql`
+            docker=`./dev/is-changed.py -c -m docker-integration-tests`
+          fi
+          # 'build', 'scala-213', and 'java-11-17' are always true for now.
+          # It dose not save significant time and most of PRs trigger the 
build.
+          precondition="
+            {
+              \"build\": \"true\",
+              \"pyspark\": \"$pyspark\",
+              \"sparkr\": \"$sparkr\",
+              \"tpcds-1g\": \"$tpcds\",
+              \"docker-integration-tests\": \"$docker\",
+              \"scala-213\": \"true\",
+              \"java-11-17\": \"true\",
+              \"lint\" : \"true\"
+            }"

Review Comment:
   this is bash, so you can replace the outer `"` with `'` and get rid of the 
escaped quotes:
   ```suggestion
             precondition='
               {
                 "build": "true",
                 "pyspark": "$pyspark",
                 "sparkr": "$sparkr",
                 "tpcds-1g": "$tpcds",
                 "docker-integration-tests": "$docker",
                 "scala-213": "true",
                 "java-11-17": "true",
                 "lint" : "true"
               }'
   ```



##########
.github/workflows/build_hadoop2.yml:
##########
@@ -32,4 +32,11 @@ jobs:
       java: 8
       branch: master
       hadoop: hadoop2
-      type: scheduled
+      jobs: >-
+        {
+          "build": "true",
+          "pyspark": "true",

Review Comment:
   job `pyspark` did not used to run for scheduled Java 8:
   
   ```
   inputs.type == 'scheduled' && inputs.java == '17'
   ```



##########
.github/workflows/build_java11.yml:
##########
@@ -32,9 +32,16 @@ jobs:
       java: 11
       branch: master
       hadoop: hadoop3
-      type: scheduled
       envs: >-
         {
           "SKIP_MIMA": "true",
           "SKIP_UNIDOC": "true"
         }
+      jobs: >-
+        {
+          "build": "true",
+          "pyspark": "true",
+          "sparkr": "true",

Review Comment:
   job `sparkr` did not used to run for Java 11:
   
   ```
   (inputs.type == 'scheduled' && inputs.java == '17')
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] EnricoMi commented on a diff in pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

Reply via email to