HyukjinKwon commented on a change in pull request #35121:
URL: https://github.com/apache/spark/pull/35121#discussion_r780044211



##########
File path: .github/workflows/build_and_test.yml
##########
@@ -96,15 +96,39 @@ jobs:
           echo '::set-output name=hadoop::hadoop3'
         fi
 
+  build-precondition:
+    name: Check a code change
+    runs-on: ubuntu-20.04
+    outputs:
+      required: ${{ steps.set-outputs.outputs.required }}
+    steps:
+    - name: Checkout Spark repository
+      uses: actions/checkout@v2
+      with:
+        fetch-depth: 0
+        repository: apache/spark
+        ref: master
+    - name: Sync the current branch with the latest in Apache Spark
+      if: github.repository != 'apache/spark'
+      run: |
+        echo "APACHE_SPARK_REF=$(git rev-parse HEAD)" >> $GITHUB_ENV
+        git fetch https://github.com/$GITHUB_REPOSITORY.git 
${GITHUB_REF#refs/heads/}
+        git -c user.name='Apache Spark Test Account' -c 
user.email='[email protected]' merge --no-commit --progress --squash 
FETCH_HEAD
+        git -c user.name='Apache Spark Test Account' -c 
user.email='[email protected]' commit -m "Merged commit"
+    - name: Check all modules except 'docs'
+      id: set-outputs
+      run: |
+        echo "::set-output name=required::$(./dev/is-changed.py -m 
avro,build,catalyst,core,docker-integration-tests,examples,graphx,hadoop-cloud,hive,hive-thriftserver,kubernetes,kvstore,launcher,mesos,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,spark-ganglia-lgpl,sparkr,sql,sql-kafka-0-10,streaming,streaming-kafka-0-10,streaming-kinesis-asl,tags,unsafe,yarn)"

Review comment:
       One way to deduplicate this precondition job might be to execute this 
multiple times for the jobs of `build`, `pyspark`, `sparkr`, etc, and save the 
output as a JSON object such as:
   
   ```
   echo '::set-output name=required::{"build": "yes", "pyspark": "no"}'
   ```
   
   And then, we can use the JSON in the downstream job (e.g., `build`) as below:
   
   ```yaml
   if fromJson(needs.build-precondition.outputs).build == 'yes'
   ```
   
   e.g., 
https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L83
 and 
https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L226
   
   It's a bit sad that GItHub Actions doesn't have a feature like exit-early .. 
then the change would have been very simple .. 😢 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to